Evidence for a High Ancestral GC Content in Drosophila

Francisco Rodríguez-Trelles*{dagger}, Rosa Tarrío* and Francisco J. Ayala*

*Department of Ecology and Evolutionary Biology, University of California at Irvine; and
{dagger}Instituto de Investigaciones Agrobiológicas de Galicia (CSIC), Santiago de Compostela, Spain


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Study of the nucleotide composition in Drosophila, focusing on the saltans and willistoni groups, has revealed unanticipated differences in nucleotide composition among lineages. Compositional differences are associated with an accelerated rate of nucleotide substitution in functionally less constrained regions. These observations have been set forth against the extended opinion that the pattern of point mutation has remained constant during the evolution of the genus. A crucial assumption has been that the most recent common ancestor of the subgenus Sophophora had an elevated GC content. Until now, this assumption has been supported by indirect arguments, consisting of extrapolations from closely related outgroups and limited by the robustness of mathematical descriptions concerning the extensive nucleotide composition differences among sequences. The present study seeks to test the assumption of a high ancestral GC content using realistic representations of the nucleotide substitution process to account for potential biases induced by the heterogeneous GC content of the taxa. The analysis of eight nuclear genes unambiguously corroborates that the common ancestor of Sophophora had an elevated GC content.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
The notion that the pattern of point mutation has remained constant during the evolution of Drosophila has recently been called into question. A study of the nucleotide composition in the subgenus Sophophora, including representatives of the Drosophila saltans and Drosophila willistoni species groups, has revealed an unsuspected amount of variation (Rodríguez-Trelles, Tarrío, and Ayala 1999a, 2000a). We have found that (1) the GC content in the third codon positions, and to a lesser extent in the first positions and introns, is higher in the Drosophila melanogaster and Drosophila obscura groups than in the D. saltans and D. willistoni groups; (2) compositional differences are greater for the xanthine dehydrogenase (Xdh) region than for the alcohol dehydrogenase (Adh), superoxide dismutase (Sod), period (per), and 28SrRNA regions, which are functionally more constrained; (3) across genes, base composition differences among species are paralleled by changes in the pattern and extent of codon bias; and (4) the saltans and willistoni groups show an increased rate of amino acid substitution in Xdh, with the new replacements preferentially involving amino acids encoded by low-GC-content codons. Subsequently, we observed that the branch ancestral to the fast-evolving saltans-willistoni lineage, allegedly the one where most of the change in GC content has occurred, exhibits an excess of synonymous substitutions in Adh and Xdh and that the extent of spatial structuring of the among-sites rate variation of Xdh is relatively reduced in this lineage (Rodríguez-Trelles, Tarrío, and Ayala 2000a). These observations are best accounted for by a shift in the pattern of point mutation that occurred in the lineage leading to the saltans-willistoni complex after its split from the lineage that gave rise to the melanogaster and obscura groups (Rodríguez-Trelles, Tarrío, and Ayala 1999a, 2000aCitation ).

This explanation hinges on the assumption that the most recent common ancestor of the subgenus Sophophora had an elevated GC content, closer to the composition presently observed in extant species of the melanogaster and obscura groups than to that of the extant representatives of the saltans and willistoni groups (see fig. 1 ). Yet, arguments offered to support this assumption are largely indirect, consisting of extrapolations from what is observed in closely related outgroups and limited by the robustness of the models used to describe the evolution of the sequences.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 1.—Three hypothetical models for the evolution of GC content in the Sophophora subgenus of Drosophila. The GC content of the ancestor (Anc) could have sharply decreased in the saltans-willistoni lineage (S-W) (model A), increased in the melanogaster-obscura lineage (M-O) (model C), or moderately increased in one and moderately decreased in the other (model B)

 
Recently, Galtier and Gouy (1998)Citation devised a maximum-likelihood implementation of a new nonhomogeneous-nonstationary model of DNA sequence evolution. Given a phylogeny, Galtier and Gouy's (1998)Citation approach allows extraction of reliable information about ancestral GC content from data sets of usual sizes. The method has proven useful in the study of the evolution of the isochore structure in mammals (Galtier and Mouchiroud 1998Citation ), and also in inferring nucleotide composition of ribosomal RNA in the cenancestor (i.e., the most recent common ancestor to all extant life forms) (Galtier, Tourasse, and Gouy 1999Citation ). Here, we use this method to test the assumption that the most recent common ancestor of the Sophophora subgenus had an elevated GC content. By focusing on eight loci in representatives of the major species groups, we establish the correctness of this assumption. Moreover, we readdress the inference of an accelerated rate of amino acid substitution along the saltans-willistoni lineage by using a multiple-species relative-rate test (Li and Bousquet 1992Citation ) based on the LogDet (Lockhart et al. 1994Citation ; Gu and Li 1996Citation ; Tourasse and Li 1999) distance model. Analytical approaches used in earlier assessments of this issue (Rodríguez-Trelles, Tarrío, and Ayala 1999aCitation ) have been shown to lead far too often to the false conclusion that substitution rates are unequal when there are large base composition differences among sequences (Tourasse and Li 1999), as in our study. The LogDet transformation is particularly suitable in these situations (Tourasse and Li 1999).


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Species and Sequences
We examined the 17 species listed in table 1 . Species were chosen to represent the four major subgroups of the Sophophora subgenus (i.e., melanogaster, obscura, saltans, and willistoni). Sequences were not available for all the eight genes for some species; in these cases, we used the closest relatives found in the database within their subgroup. Drosophila virilis (Drosophila hydei for the H2a-H2b region), of the subgenus Drosophila, and Scaptodrosophila lebanonensis were used as outgroups (except for the Amyrel gene, which has not been found outside Sophophora; Da Lage et al. 1998Citation ). The eight nuclear loci investigated and their physical map positions and corresponding linkage groups in D. melanogaster are alcohol dehydrogenase (Adh), 35B2, 2L; amylase-related (Amyrel), 53D1–3, 2R; dopa decarboxylase (Ddc), 37C1, 2L; {alpha}-glycerophosphate dehydrogenase (Gpdh), 26A1, 2L; histones H2a-H2b, 39D3-E1, 2L; period (per), 3B2, X; superoxide dismutase (Sod), 68A8–9, 3R; and xanthine dehydrogenase (Xdh), 87D11, 3L, respectively.


View this table:
[in this window]
[in a new window]
 
Table 1 GC Content (%) in Third Codon Positions of Eight Gene Regions

 
Models of DNA Sequence Evolution
Ancestral GC content was inferred using Galtier and Gouy's (1998)Citation maximum-likelihood method. This method relies on a new model of DNA sequence evolution that relaxes the premises of homogeneity and stationarity to accommodate base composition differences among present-day sequences (Galtier and Gouy 1998Citation ). The model is based on Tamura's (1992)Citation representation of the substitution process, which allows unequal transition and transversion rates, and GC != AT (with G = C and A = T) content at equilibrium. The transition/transversion ratio is kept constant throughout the tree, but the nucleotide composition is allowed to change from branch to branch by assigning a different equilibrium GC content parameter to each branch. GC content in nodes is the expected value generated by the model, given the estimated GC content equilibria for branches. The model is neither homogeneous (i.e., uniformity of the substitution pattern over the tree) nor stationary (i.e., constancy of the base composition among lineages), since equilibrium GC content and expected base composition can vary among lineages. Because the model lacks reversibility, rooted trees are used, including one extra parameter to account for the position of the root.

Estimates of the GC content for nodes generated with Galtier and Gouy's (1998)Citation method were compared with those obtained using Yang's (1999)Citation maximum-likelihood implementation of a homogenous-stationary model based on the substitution process of Hasegawa, Kishino, and Yano (1985)Citation (HKY85). The HKY85 model is a generalization of Tamura's (1992)Citation model that allows unequal G and C (respectively, A and T) contents at equilibrium. Because the substitution process is assumed to be homogeneous and stationary, the HKY85 transition probability matrix is kept constant all over the tree. Estimated GC content in internal nodes is the percentage of GC in the corresponding marginally reconstructed ancestral sequence (Yang 1999Citation ).

GC content differences among species are largest in third codon positions; these differences likely reflect the mutational equilibrium of the genome better than the GC content variation in first and second codon positions, because this is impacted by the functional constraints of the proteins. Therefore, estimation of ancestral GC content will focus on third codon positions.

Maximum-likelihood methods assume a tree topology and a model of sequence change. figure 2 shows the tree topologies used (for Amyrel, Ddc, Gpdh, H2a-H2b, and per, replacement of some species by their closest relatives does not change the basic topology). These hypotheses are supported by data of several sorts (see Powell 1997Citation , pp. 267–298; Tatarenkov et al. 1999Citation ; for the species of the saltans and willistoni groups, see Tarrío, Rodríguez-Trelles, and Ayala 2000). The transition probability matrices of models and details on parameter estimation are given in Galtier and Gouy (1998)Citation and in Yang (1999)Citation . Our analyses were conducted with the EVAL_NH program (NHML package; Galtier and Gouy 1998Citation ) and the BASEML program (PAML 2.0 package; Yang 1999).



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 2.—GC content evolution in third codon positions of Drosophila genes as inferred by maximum-likelihood using nonhomogeneous-nonstationary (Galtier and Gouy 1998Citation ) and homogeneous-stationary (Yang 1999) models. Ancestral GC content for nodes is the expected value based on estimated equilibrium GC content for branches for the nonhomogeneous-nonstationary model and on observed GC percentage in marginally reconstructed ancestral sequences (Yang 1999) for the homogenous-stationary model. Underlined values are average GC percentages across present-day sequences for the melanogaster-obscura and saltans-willistoni lineages

 

    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Locating Compositional Change in Drosophila
Table 1 shows GC content (%) in third codon positions for the eight gene regions considered in this study. Considering all species, the premise of stationarity is rejected (P < 10-6, except that P < 0.05 for H2a-H2b because of its short sequence length—87 codons) for the eight genes by Rhzestky and Nei's (1995) test. The representatives of the saltans-willistoni lineage exhibit distinctively lower GC content in third codon positions (averages across species: 49.1%, 48.5%, 51.5%, 50.1%, 48.6%, 65.0%, 49.3%, and 44.5% for Adh, Amyrel, Ddc, Gpdh, H2a-H2b, per, Sod, and Xdh, respectively) than do the species of the melanogaster-obscura lineage (77.5%, 76.2%, 72.4%, 68.0%, 56.9%, 81.1%, 78.8%, and 71.3%); the average GC content across the outgroups is intermediate (60.8%, 64.7%, 58.1%, 53.3%, 68.4%, 60.1%, and 60.8%).

Figure 2 represents GC content evolution in third codon positions of the Adh, Sod, and Xdh regions in Drosophila as inferred by maximum-likelihood using the nonhomogeneous-nonstationary method of Galtier and Gouy (1998)Citation and by the homogeneous-stationary HKY85 model as implemented by Yang (1999)Citation . Underlined values are GC content averages across present-day sequences for the melanogaster-obscura and saltans-willistoni lineages. Similar representations are obtained for the Amyrel, Ddc, Gpdh, H2a-H2b, and per regions (data not shown). Table 2 summarizes for each gene GC content increase (positive values) or decrease (negative values) in the evolution of the melanogaster-obscura and saltans-willistoni lineages since the time when they split from their most recent common ancestor (values enclosed in gray boxes in fig. 2 ) to the present (underlined values). According to Galtier and Gouy's (1998)Citation method, the eight gene regions clearly support hypothesis A in figure 1 : the common ancestor of Sophophora had an elevated GC content that remained relatively unchanged in the evolution of the melanogaster-obscura lineage (average departure from the ancestor is +2.3%, +9.7%, -0.4%, +5.7%, +4.2%, -2.4%, +6.6%, and -1.7% for Adh, Amyrel, Ddc, Gpdh, H2a-H2b, Sod, and Xdh, respectively; Student's t-test; P = 0.69 and df = 14 for the difference between the ancestral and present GC content averages; see table 2 ) but dramatically decreased in the evolution of the saltans-willistoni lineage (-34.9%, -30.1%, -23.0%, -21.2%, -18.7%, -19.1%, -33.3%, and -38.6%; Student's t-test; P << 0.001, df = 14). The estimated shift in base composition was largest for Adh, Sod, and Xdh, apparently because of the inclusion of saltans species in the analysis of these genes (saltans sequences are not available for the other genes); these species exhibited the lowest GC content in all three gene regions, thereby lowering the average value. In the case of willistoni, the shift in GC content occurred for the most part (~100%, 88%, and 75%) in the ancestor of the species group; in the case of saltans, however, it appears that a substantial amount of AT continued to accumulate (by ~14%, 34%, and 45%) after the emergence of the species group.


View this table:
[in this window]
[in a new window]
 
Table 2 Estimated GC Content Changes, {{Delta}}(GC), in the melanogaster-obscura (Mel-Obs) and saltans-willistoni (Sal-Wil) Lineages Since They Split form Their Common Ancestor, Under Nonhomogeneous-Nonstationary (Galtier and Gouy 1998), and Homogeneous-Stationary (Yang 1999) Models of Evolution

 
Galtier and Gouy's (1998)Citation model seeks to extract information about ancestral GC content regardless of the subsequent evolution of the sequences and is largely insensitive to species sampling (Galtier and Mouchiroud 1998Citation ). Consequently, we did not expect estimated ancestral GC contents to be deflected because of the base composition of the outgroups included in our study. Yet, in order to control for such an effect, we repeated the analyses excluding the outgroups. The root-branch (i.e., the one connecting the melanogaster-obscura and the saltans-willistoni lineages) is long for all eight genes (see fig. 2 ). In circumstances like this, Galtier and Gouy's (1998)Citation method might be unreliable owing to the incorrect location of the root (Galtier and Gouy 1998Citation ; Galtier and Mouchiroud 1998Citation ). The method is, however, reliable if the location of the root is provided (Galtier and Gouy 1998Citation ). Therefore, for each gene, we rooted the corresponding tree at the point indicated by the outgroups (see fig. 2 ). This rooting made the branch leading to the split of the saltans and willistoni groups longer than the corresponding branch for the melanogaster and obscura groups (see fig. 2 ), in accordance with the relationships inferred from other sources (Throckmorton 1975Citation ; Tarrío, Rodríguez-Trelles, and Ayala 2000Citation ). Estimated ancestral GC contents were very similar to those enclosed in gray boxes in figure 2 (70.6% vs. 75.4%, 72.1% vs. 72.7%, 59.6% vs. 63.6%, 60.3% vs. 60.0%, 80.2% vs. 80.3%, 73.2% vs. 73.9%, and 73.2% vs. 72.5% for Adh, Ddc, Gpdh, H2a-H2b, per, Sod, and Xdh, respectively).

Contrasting with Galtier and Gouy's (1998)Citation method, the homogenous-stationary HKY85 model supports an elevated GC content in the most recent common ancestor of Sophophora (A in fig. 1 ) only for Ddc, H2a-H2b, per, and Xdh, while Gpdh and Sod favor model B, Amyrel favors model C, and Adh is consistent with A or B (although it is closer to model A; see fig. 2 and table 2 ). The variation among the eight genes occurs because, if one assumes that the substitution process is homogeneous and stationary, inferred ancestral GC contents represent averages across descendant nodes weighted inverse proportionally to corresponding branch lengths. Thus, for example, in figure 2 , the length of the branch leading to the melanogaster and obscura groups relative to the corresponding one in the saltans-willistoni stem is largest for Sod and shortest for Xdh, while for Adh it is intermediate.

Xdh Rates of Substitution
Previous analyses of the Xdh region using the three-species relative-rate test of Wu and Li (1985)Citation , with S. lebanonensis as a reference, revealed an accelerated rate of nonsynonymous substitution in the saltans group species compared with the species of the melanogaster and obscura groups (Rodríguez-Trelles, Tarrío, and Ayala 1999aCitation ). We applied the Wu and Li (1985) test based on Kimura's (1980)Citation two-parameter model, which assumes that the substitution process is homogeneous and stationary. Both premises are untenable for the data set at hand (see also Rodríguez-Trelles, Tarrío, and Ayala 1999a, 1999b). Tourasse and Li (1999) noted that when the process of substitution is not homogeneous and/or not stationary, a significant fraction of the differences observed between sequences can be due to changes in nucleotide composition rather than changes in substitution rate. In such cases, the relative-rate test based on Kimura's (1980) distance performs too liberally (Tourasse and Li 1999). There is, therefore, the possibility that inferred accelerated nonsynonymous rates of Xdh in saltans may have been an artifact created by the model assumption's violation.

We explored this possibility by conducting additional relative-rate tests using the Kimura (1980)Citation and the bias-corrected LogDet distance (Gu and Li 1996Citation ; Tourasse and Li 1999) models. The LogDet transformation is based on the most general representation of the substitution process (Lockhart et al. 1998Citation ; Gu and Li 1996Citation ) and performs adequately as a model for the relative-rate test under nonhomogenous and/or nonstationary conditions (Tourasse and Li 1999). We used Li and Bousquet's (1992)Citation method as implemented by Tourasse and Li (1999). This method is an extension of Wu and Li's (1985)Citation method devised to compare the mean rates of two lineages, each consisting of several taxa (Li and Bousquet 1992Citation ). The lineages involved in the comparison were the melanogaster-obscura and the saltans-willistoni lineages (lineages 1 and 2 in table 3 ), each consisting of the four sequences shown in figure 2 . The sequences of D. virilis and S. lebanonensis were used as outgroups. The values of D (D1.3 - D2.3) in table 3 represent the difference between the number of substitutions per site for lineages 1 and 2 after their divergence. Only first and second codon positions are considered, because most changes in these sites are nonsynonymous. In conformity with the trends already reported (Rodríguez-Trelles, Tarrío, and Ayala 1999a, 2000a), the results in table 3 indicate that Xdh has evolved faster in the saltans-willistoni lineage than in the melanogaster-obscura stem.


View this table:
[in this window]
[in a new window]
 
Table 3 Relative-Rate Test Showing the Substitution Rate Differences in First and Second Codon Positions Between Lineages 1 (melanogaster-obscura) and 2 (saltans-willistoni) relative to Drosophila virilis or Scaptodrosophila (3), Using the Kimura (1980) Two-Parameter (K2P) and LogDet Distance Models as Implemented by Tourasse and Li (1999)

 
Relative-rate tests similar to those above for Xdh were applied to the other seven gene regions, and they produced insignificant results. However, analysis of the amino acid composition variation of their corresponding proteins across the species of table 1 indicated that, similar to what had already been reported for Xdh and per (Rodríguez-Trelles, Tarrío, and Ayala 1999a), Amyrel exhibited an excess of AT-coded amino acids (N, I, K, M, F, and Y) in the willistoni group (25.3%) relative to what was found in the species of the melanogaster (24.2% average) and obscura (24.9% average) groups, a deficit of GC-coded amino acids (A, G, P, and W; 21.1% vs. 22.0% and 23.4%), and a medium proportion of intermediate amino acids (C, D, E, H, Q, S, T, and V; 40.4% vs. 41.7% and 40.2%). The same pattern was observed for Ddc (24.3% vs. 23.8% and 23.4% for AT-coded, 25.2% vs. 25.4% and 25.9% for GC-coded, and 36.0% vs. 35.9% and 36.9% for intermediate amino acids).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Previous inferences of directional mutation pressure toward increased AT content in the saltans and willistoni species groups of Sophophora (Rodríguez-Trelles, Tarrío, and Ayala 1999a, 2000aCitation ) rely on the assumption that the last common ancestor of the subgenus had an elevated GC content. Support for this assumption hitherto came from indirect evidence based on extrapolations from what was observed in closely related outgroups and depended on the robustness of the analytical approaches employed (Rodríguez-Trelles, Tarrío, and Ayala 1999a, 2000aCitation ). The present study assesses this question using more appropriate representations of the nucleotide substitution process in order to account for potential biases elicited by the large composition differences among sequences. The results unambiguously corroborate that the common ancestor of Sophophora had an elevated GC content.

The two premises of the Galtier and Gouy's (1998)Citation method are that (1) G% = C% and A% = T% at equilibrium (Tamura 1992Citation ), and (2) all sites in the sequence change at the same rate. In order to explore the robustness of the model against deviations from the first premise, Galtier and Mouchiroud (1998)Citation devised a measure of the sequence departure from the G% = C% and A% = T% equalities as follows: GC-skewness = |(G% - C%)/(G% + C%)|, and AT-skewness = |(A% - T%)/(A% + T%)|. Both statistics range from 0 (i.e., no skew), to 1 (i.e., maximum skew). Galtier and Mouchiroud (1998)Citation found that Galtier and Gouy's (1998)Citation algorithm yielded biased estimates of the GC content when the GC (AT)-skewness was higher than 0.6. In the present work, however, the majority of the GC (AT)-skewness values for the 63 analyzed sequences (123 out of the 126 values, i.e., ~98%) was below this threshold.

With respect to the second premise, ignoring the existence of among-sites rate variation can yield biased phylogenetic estimators (Yang 1996Citation ). Nevertheless, using the homogeneous HKY85 model assuming discrete-gamma distributed rates, Galtier and Mouchiroud (1998) found that Galtier and Gouy's (1998) method was robust against gamma-shape parameter values ({alpha}) as low as 0.5 (i.e., high among-sites rate variation). The assumption of equal rates among sites is untenable for the gene regions under consideration due to large substitution rate differences ({alpha} < 0.5) between codon positions and within first and second codon positions (Rodríguez-Trelles, Tarrío, and Ayala 1999b, 2000a). However, substitution rate differences within third codon positions were small ({alpha} > 1) for the eight gene regions investigated ({alpha} values obtained with the discrete gamma HKY85 model with eight categories of rates assuming the trees in fig. 2 are 2.16 ± 0.79, 6.65 ± 5.78, 3.30 ± 1.22, 3.93 ± 1.98, 2.48 ± 2.52, 1.20 ± 0.36, 3.75 ± 2.21, and 3.34 ± 0.61 for Adh, Amyrel, Ddc, Gpdh, H2a-H2b, per, Sod, and Xdh, respectively). Therefore, we do not expect inferred ancestral GC contents in this study to be seriously deflected by either undue GC (AT)-skewness or extreme rate variation among sites.

Tourasse and Li (1999)Citation have recently shown that the relative-rate test based on Kimura's (1980)Citation distance leads far too often to the false conclusion that substitution rates are unequal when there are large base composition differences among sequences. When there is no rate difference among lineages but the outgroup is compositionally closer to one of the two lineages compared, the relative-rate test based on Kimura's (1980)Citation model clusters the outgroup with that lineage to the exclusion of the other (Tourasse and Li 1999). The relative-rate test based on the LogDet model works properly in these situations, provided the sequences are >=500 nt. In this respect, the substitution rate differences between lineages in Xdh detected in this study (see Rodríguez-Trelles, Tarrío, and Ayala 1999a, 1999bCitation ) are real because they remain significant after accounting for the base composition differences among sequences with the LogDet transformation.

The eight gene regions analyzed were scattered throughout the genome (see Materials and Methods). Therefore, the patterns of this study likely evidence genomewide processes. Extensive variation in GC content is not unique to Drosophila; large compositional differences have long been known among bacterial genomes (Lee, Wahl, and Barbu 1956Citation ) and between isochores of the mammalian genome (Bernardi et al. 1985Citation ).

The base composition of the genome reflects an interplay between functional constraints and mutational biases. In Drosophila, functional regions generally exhibit higher GC contents than unconstrained regions, which is attributed to natural selection for greater GC content in the former. Increased GC can enhance translation efficiency and/or accuracy if it better matches the tRNA pool (the "major codon preference model"; see Akashi, Kliman, and Eyre-Walker 1998Citation ). Under the assumption that mutation bias has remained constant in the evolution of Drosophila (Petrov and Hartl 1999Citation ), the reduced GC content in the saltans and willistoni groups can be accounted for by positive selection for increased AT, and/or a reduced efficiency of selection against the mutation bias caused by either (1) a reduced recombination rate or (2) diminished population numbers. However, is unclear why selection should favor a greater AT-content in the saltans-willistoni offshoot.

When recombination drops, the effect of natural selection at a given site accelerates genetic drift at linked sites. Kliman and Hey (1993)Citation found lower codon usage circumscribed to regions with the very lowest levels of recombination in D. melanogaster (i.e., near centromeres and telomeres and on the fourth chromosome). However, it seems unlikely that eight genes, scattered throughout five Drosophila linkage groups, all had occupied regions of extremely low recombination persistently in the evolution of the saltans and willistoni groups. The case of the H2a-H2b histone region is noteworthy. In D. melanogaster, the histone family consists of a tandem repeat of over 100 units, each ~5 kb long, which evolve concertedly (see Baldo, Les, and Strausbaugh 1999Citation ). The histone genes are transcribed very actively; however, their codon usage is among the less biased for D. melanogaster genes (Fitch and Strausbaugh 1993Citation ). Under the major codon preference model, this is explained because the histones are placed in a centromeric position in this species (Sharp and Matassi 1994Citation ). It follows that one would not anticipate finding homologous histone genes in other species with substantially less C- and G-ending codons than there are in D. melanogaster. Yet, this is the case for the species of the willistoni group (see table 1 ). Also, the cytological map positions of the Adh and Sod genes determined for some willistoni species (Rohde et al. 1994, 1995Citation ) cannot be ascribed to low-recombination regions on an a priori basis. Reduced GC content in saltans and willistoni could be the reflection of a genomewide reduction in recombination rate. However, the map length of Drosophila prosaltans is similar to that of D. melanogaster (285.4 vs. 294.9 cM, respectively; Cáceres, Barbadilla, and Ruiz 1999).

Diminished efficiency of natural selection as the agent for the reduced GC-content in saltans and willistoni has been recently challenged on other grounds (Rodríguez-Trelles, Tarrío, and Ayala 2000a). Focusing on the Adh and Xdh loci, this study found that the branch leading to the saltans and willistoni groups, where most of the change in GC content has occurred (see fig. 2 ), exhibits an excess of synonymous substitutions relative to the nonsynonymous replacements. This result can hardly be accounted for by most common scenarios of selection, which, rather, predict a relative increase in the rate of nonsynonymous substitutions (see Rodríguez-Trelles, Tarrío, and Ayala 2000a).

The arguments above rely on the notion that the mutation bias is equal for all Drosophila species. Relaxing this premise allows us to better account for the data. Theoretical results of Shields (1990) show that a shift in mutation bias can trigger a switch in codon preference. Unlike natural selection, mutation bias affects the less constrained parts of the genome more than functionally significant parts (Sueoka 1988Citation ). Accordingly, a shift in the pattern of point mutation in the saltans-willistoni stem would explain why replacements in Amyrel, Ddc, per, and Xdh occur preferentially by amino acids encoded by low-GC-content codons in this lineage, because these genes code for the fastest-evolving proteins in Sophophora out of the eight gene regions examined (Ka = 0.1249, 0.1035, and 0.0931 for Amyrel, per, and Xdh, respectively, for the average of the comparisons of D. willistoni against D. pseudoobscura and D. melanogaster; by the method of Nei and Gojobori [1986Citation ], Ddc changes at an intermediate rate, and Ka = 0.0497). Gpdh and H2a-H2b evolve the slowest (Ka = 0.0091 and 0.0000, respectively), which suggests that their encoded proteins are too constrained to reflect the mutation bias. Similarly, a shift in mutation bias would account for the increased rates of synonymous substitution in the common ancestor of saltans and willistoni detected in previous studies (Rodríguez-Trelles, Tarrío, and Ayala 2000a).

Available information on noncoding, putatively unconstrained regions favors this hypothesis as well. The spacer region between the H2a and H2b histone genes (~250 nt long; Baldo, Les, Strausbaugh 1999Citation ) exhibits lower GC content in the willistoni (28.5%, average across D. paulistorum and D. insularis) species than in the species of the melanogaster (39.6%, average across D. melanogaster and D. yakuba) and obscura groups (40.6%, average across D. pseudoobscura and D. persimilis), and similar patterns have been noticed for the introns of Adh (unpublished GenBank accession number AB026533 for D. saltans), Amyrel (Da Lage et al. 1998), and Xdh (Rodríguez-Trelles, Tarrío, and Ayala 1999aCitation ).

The idea of the constancy of the pattern of point mutation in Drosophila is largely based on the analysis of only two species, D. melanogaster, of the Sophophora subgenus, and D. virilis, of the subgenus Drosophila (Petrov and Hartl 1999Citation ). GC content differences between these two species are by far smaller than the ones detected in this study (see Powell 1997Citation ). Rather, our results suggest that mutation bias can fluctuate between previously unsuspected broad limits in Drosophila so as to be able to generate extensive nucleotide composition differences between relatively closely related species.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
We are indebted to Nicolas Galtier and Nicolas Tourasse for making their software available and to Howard Ochman for valuable suggestions. F.R.-T. received support from the Spanish Council for Scientific Research (Contrato Temporal de Investigación) and Grant XUGA 400001B98 to A. M. Vieitez. Research was supported by NIH grant GM42397 to F.J.A.


    Footnotes
 
Howard Ochman, Reviewing Editor

1 Keywords: GC content mutation pressure relative-rate test Drosophila saltans group Drosophila willistoni group Back

2 Address for correspondence and reprints: Francisco Rodríguez-Trelles incare Francisco J. Ayala, Department of Ecology and Evolutionary Biology, 321 Steinhaus Hall, University of California, Irvine, California 92697-2525. E-mail: ftrelles{at}ds.cesga.es Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 

    Akashi, H., R. M. Kliman, and A. Eyre-Walker. 1998. Mutation pressure, natural selection, and the evolution of base composition in Drosophila. Genetica 102/103:49–60.

    Baldo, A. M., D. H. Les, and L. D. Strausbaugh. 1999. Potentials and limitations of the histone repeat sequences for phylogenetic reconstruction of Sophophora. Mol. Biol. Evol. 16:1511–1520.[Abstract]

    Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunier-Rotival, and F. Rodier. 1985. The mosaic genome of warm-blooded vertebrates. Science 228:953–958.

    Cáceres, M., A. Barbadilla, and A. Ruiz. 1999. Recombination rate predicts inversion size in Diptera. Genetics 153:251–259.

    Da Lage, J.-L., E. Renard, F. Chartois, F. Lemeunier, and M.-L. Cariou. 1998. Amyrel, a paralogous gene of the amylase gene family in Drosophila melanogaster and the Sophophora subgenus. Proc. Natl. Acad. Sci. USA 95:6848–6853.

    Fitch, D. H. A., and L. D. Strausbaugh. 1993. Low codon bias and high rates of synonymous substitution in Drosophila hydei and D. melanogaster histone genes. Mol. Biol. Evol. 10:397–413.

    Galtier, N., and M. Gouy. 1998. Inferring the pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15:871–879.[Abstract]

    Galtier, N., and D. Mouchiroud. 1998. Isochore evolution in mammals: a human-like ancestral structure. Genetics 150:1577–1584.

    Galtier, N., N. Tourasse, and M. Gouy. 1999. A nonhyperthermophilic common ancestor to extant life forms. Science 283:220–221.

    Gu, X., and W.-H. Li. 1996. Bias-corrected paralinear and LogDet distances and tests of molecular clocks and phylogenies under nonstationary nucleotide frequencies. Mol. Biol. Evol. 13:1375–1383.[Abstract/Free Full Text]

    Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174.[ISI][Medline]

    Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120.[ISI][Medline]

    Kliman, R. M., and J. Hey. 1993. Reduced natural selection associated with low recombination in Drosophila melanogaster. Mol. Biol. Evol. 10:1239–1258.[Abstract]

    ———. 1994. The effects of mutation and natural selection on codon bias in the genes of Drosophila. Genetics 137:1049–1056.

    Lee, K. Y., R. Wahl, and E. Barbu. 1956. Contenu en bases puriques et pyrimidiques des acids desoxyribonucleiques des bacteries. Ann. Inst. Pasteur 91:212–224.

    Li, W.-H., and J. Bousquet. 1992. Relative-rate test for nucleotide substitutions between two lineages. Mol. Biol. Evol. 9:1185–1189.[Free Full Text]

    Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. 11:605–612.[Free Full Text]

    Nei, M., and T. Gojobori. 1986. Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418–426.[Abstract]

    Petrov, D. A., and D. L. Hartl. 1999. Patterns of nucleotide substitution in Drosophila and mammalian genomes. Proc. Natl. Acad. Sci. USA 96:1475–1479.

    Powell, J. R. 1997. Progress and prospects in evolutionary biology: the Drosophila model. Oxford University Press, New York.

    RodrÍguez-Trelles, F., R. TarrÍo, and F. J. Ayala. 1999a. Switch in codon bias and increased rates of amino acid substitution in the Drosophila saltans species group. Genetics 153:339–350.

    ———. 1999b. Molecular evolution and phylogeny of the Drosophila saltans species group inferred from the Xdh gene. Mol. Phylogenet. Evol. 13:110–121.

    ———. 2000a. Fluctuating mutation bias and the evolution of the base composition in Drosophila. J. Mol. Evol. 50:1–10.

    ———. 2000b. Disparate evolution of paralogous introns in the Xdh gene of Drosophila. J. Mol. Evol. 50:123–130.

    Rohde, C., E. Abdelhay, H. Pinto, A. Schrank, and V. L. S. Valente. 1995. Analysis and in situ mapping of the Adh locus in species of the willistoni group of Drosophila. Cytobios 81:37–47.

    Rohde, C., H. Pinto, V. H. Valiati, A. Schrank, and V. L. S. Valente. 1994. Localization of the Cu/Zn superoxide dismutase gene in the Drosophila willistoni species group by in situ hybridization. Cytobios 80:193–198.

    Rzhetsky, A., and M. Nei. 1995. Tests of the applicability of several substitution models for DNA sequence data. Mol. Biol. Evol. 12:131–151.[Abstract]

    Sharp, P. M., and G. Matassi. 1994. Codon usage and genome evolution. Curr. Opin. Genet. Dev. 4:851–860.[Medline]

    Shields, D. C. 1990. Switches in species specific codon preferences: the influence of mutation biases. J. Mol. Evol. 31:71–80.[ISI][Medline]

    Sueoka, N. 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85:2653–2657.

    Tamura, K. 1992. Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C content biases. Mol. Biol. Evol. 9:678–687.[Abstract]

    TarrÍo, R., F. RodrÍguez-Trelles, and F. J. Ayala. 1998. New Drosophila introns originate by duplication. Proc. Natl. Acad. Sci. USA 95:1652–1658.

    ———. 2000. Tree rooting with outgroups when they differ in their nucleotide composition from the ingroup: the Drosophila saltans and willistoni groups, a study case. Mol. Phylogenet. Evol. (in press).

    Tatarenkov, A., J. Kwiatowski, D. Skarecky, E. Barrio, and F. J. Ayala. 1999. On the evolution of Dopa decarboxylase (Ddc) and Drosophila systematic. J. Mol. Evol. 48:445–462.[ISI][Medline]

    Throckmorton, L. H. 1975. The phylogeny ecology and geography of Drosophila. Pp. 421–436 in R. C. King, ed. Handbook of genetics. Vol. 3. Plenum Press, New York.

    Tourasse, N. J., and W.-H. Li. 1999. Performance of the relative-rate test under nonstationary models of nucleotide substitution. Mol. Biol. Evol. 16:1068–1078.[Abstract]

    Wu, C.-I., and W.-H. Li. 1985. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc. Natl. Acad. Sci. USA 82:1741–1745.

    Yang, Z. 1996. The among-site rate variation and its impact on phylogenetic analyses. TREE 11:367–372.

    ———. 1999. Phylogenetic analysis by maximum likelihood (PAML). Version 2.0. University College London.

Accepted for publication August 1, 2000.