*Department of Ecology and Evolutionary Biology, University of California at Irvine; and
Instituto de Investigaciones Agrobiológicas de Galicia (CSIC), Santiago de Compostela, Spain
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
This explanation hinges on the assumption that the most recent common ancestor of the subgenus Sophophora had an elevated GC content, closer to the composition presently observed in extant species of the melanogaster and obscura groups than to that of the extant representatives of the saltans and willistoni groups (see fig. 1 ). Yet, arguments offered to support this assumption are largely indirect, consisting of extrapolations from what is observed in closely related outgroups and limited by the robustness of the models used to describe the evolution of the sequences.
|
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Estimates of the GC content for nodes generated with Galtier and Gouy's (1998)
method were compared with those obtained using Yang's (1999)
maximum-likelihood implementation of a homogenous-stationary model based on the substitution process of Hasegawa, Kishino, and Yano (1985)
(HKY85). The HKY85 model is a generalization of Tamura's (1992)
model that allows unequal G and C (respectively, A and T) contents at equilibrium. Because the substitution process is assumed to be homogeneous and stationary, the HKY85 transition probability matrix is kept constant all over the tree. Estimated GC content in internal nodes is the percentage of GC in the corresponding marginally reconstructed ancestral sequence (Yang 1999
).
GC content differences among species are largest in third codon positions; these differences likely reflect the mutational equilibrium of the genome better than the GC content variation in first and second codon positions, because this is impacted by the functional constraints of the proteins. Therefore, estimation of ancestral GC content will focus on third codon positions.
Maximum-likelihood methods assume a tree topology and a model of sequence change. figure 2
shows the tree topologies used (for Amyrel, Ddc, Gpdh, H2a-H2b, and per, replacement of some species by their closest relatives does not change the basic topology). These hypotheses are supported by data of several sorts (see Powell 1997
, pp. 267298; Tatarenkov et al. 1999
; for the species of the saltans and willistoni groups, see Tarrío, Rodríguez-Trelles, and Ayala 2000). The transition probability matrices of models and details on parameter estimation are given in Galtier and Gouy (1998)
and in Yang (1999)
. Our analyses were conducted with the EVAL_NH program (NHML package; Galtier and Gouy 1998
) and the BASEML program (PAML 2.0 package; Yang 1999).
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Figure 2
represents GC content evolution in third codon positions of the Adh, Sod, and Xdh regions in Drosophila as inferred by maximum-likelihood using the nonhomogeneous-nonstationary method of Galtier and Gouy (1998)
and by the homogeneous-stationary HKY85 model as implemented by Yang (1999)
. Underlined values are GC content averages across present-day sequences for the melanogaster-obscura and saltans-willistoni lineages. Similar representations are obtained for the Amyrel, Ddc, Gpdh, H2a-H2b, and per regions (data not shown). Table 2
summarizes for each gene GC content increase (positive values) or decrease (negative values) in the evolution of the melanogaster-obscura and saltans-willistoni lineages since the time when they split from their most recent common ancestor (values enclosed in gray boxes in fig. 2
) to the present (underlined values). According to Galtier and Gouy's (1998)
method, the eight gene regions clearly support hypothesis A in figure 1
: the common ancestor of Sophophora had an elevated GC content that remained relatively unchanged in the evolution of the melanogaster-obscura lineage (average departure from the ancestor is +2.3%, +9.7%, -0.4%, +5.7%, +4.2%, -2.4%, +6.6%, and -1.7% for Adh, Amyrel, Ddc, Gpdh, H2a-H2b, Sod, and Xdh, respectively; Student's t-test; P = 0.69 and df = 14 for the difference between the ancestral and present GC content averages; see table 2
) but dramatically decreased in the evolution of the saltans-willistoni lineage (-34.9%, -30.1%, -23.0%, -21.2%, -18.7%, -19.1%, -33.3%, and -38.6%; Student's t-test; P << 0.001, df = 14). The estimated shift in base composition was largest for Adh, Sod, and Xdh, apparently because of the inclusion of saltans species in the analysis of these genes (saltans sequences are not available for the other genes); these species exhibited the lowest GC content in all three gene regions, thereby lowering the average value. In the case of willistoni, the shift in GC content occurred for the most part (
100%, 88%, and 75%) in the ancestor of the species group; in the case of saltans, however, it appears that a substantial amount of AT continued to accumulate (by
14%, 34%, and 45%) after the emergence of the species group.
|
Contrasting with Galtier and Gouy's (1998)
method, the homogenous-stationary HKY85 model supports an elevated GC content in the most recent common ancestor of Sophophora (A in fig. 1
) only for Ddc, H2a-H2b, per, and Xdh, while Gpdh and Sod favor model B, Amyrel favors model C, and Adh is consistent with A or B (although it is closer to model A; see fig. 2
and table 2
). The variation among the eight genes occurs because, if one assumes that the substitution process is homogeneous and stationary, inferred ancestral GC contents represent averages across descendant nodes weighted inverse proportionally to corresponding branch lengths. Thus, for example, in figure 2
, the length of the branch leading to the melanogaster and obscura groups relative to the corresponding one in the saltans-willistoni stem is largest for Sod and shortest for Xdh, while for Adh it is intermediate.
Xdh Rates of Substitution
Previous analyses of the Xdh region using the three-species relative-rate test of Wu and Li (1985)
, with S. lebanonensis as a reference, revealed an accelerated rate of nonsynonymous substitution in the saltans group species compared with the species of the melanogaster and obscura groups (Rodríguez-Trelles, Tarrío, and Ayala 1999a
). We applied the Wu and Li (1985) test based on Kimura's (1980)
two-parameter model, which assumes that the substitution process is homogeneous and stationary. Both premises are untenable for the data set at hand (see also Rodríguez-Trelles, Tarrío, and Ayala 1999a, 1999b). Tourasse and Li (1999) noted that when the process of substitution is not homogeneous and/or not stationary, a significant fraction of the differences observed between sequences can be due to changes in nucleotide composition rather than changes in substitution rate. In such cases, the relative-rate test based on Kimura's (1980) distance performs too liberally (Tourasse and Li 1999). There is, therefore, the possibility that inferred accelerated nonsynonymous rates of Xdh in saltans may have been an artifact created by the model assumption's violation.
We explored this possibility by conducting additional relative-rate tests using the Kimura (1980)
and the bias-corrected LogDet distance (Gu and Li 1996
; Tourasse and Li 1999) models. The LogDet transformation is based on the most general representation of the substitution process (Lockhart et al. 1998
; Gu and Li 1996
) and performs adequately as a model for the relative-rate test under nonhomogenous and/or nonstationary conditions (Tourasse and Li 1999). We used Li and Bousquet's (1992)
method as implemented by Tourasse and Li (1999). This method is an extension of Wu and Li's (1985)
method devised to compare the mean rates of two lineages, each consisting of several taxa (Li and Bousquet 1992
). The lineages involved in the comparison were the melanogaster-obscura and the saltans-willistoni lineages (lineages 1 and 2 in table 3 ), each consisting of the four sequences shown in figure 2
. The sequences of D. virilis and S. lebanonensis were used as outgroups. The values of D (D1.3 - D2.3) in table 3
represent the difference between the number of substitutions per site for lineages 1 and 2 after their divergence. Only first and second codon positions are considered, because most changes in these sites are nonsynonymous. In conformity with the trends already reported (Rodríguez-Trelles, Tarrío, and Ayala 1999a, 2000a), the results in table 3
indicate that Xdh has evolved faster in the saltans-willistoni lineage than in the melanogaster-obscura stem.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The two premises of the Galtier and Gouy's (1998)
method are that (1) G% = C% and A% = T% at equilibrium (Tamura 1992
), and (2) all sites in the sequence change at the same rate. In order to explore the robustness of the model against deviations from the first premise, Galtier and Mouchiroud (1998)
devised a measure of the sequence departure from the G% = C% and A% = T% equalities as follows: GC-skewness = |(G% - C%)/(G% + C%)|, and AT-skewness = |(A% - T%)/(A% + T%)|. Both statistics range from 0 (i.e., no skew), to 1 (i.e., maximum skew). Galtier and Mouchiroud (1998)
found that Galtier and Gouy's (1998)
algorithm yielded biased estimates of the GC content when the GC (AT)-skewness was higher than 0.6. In the present work, however, the majority of the GC (AT)-skewness values for the 63 analyzed sequences (123 out of the 126 values, i.e.,
98%) was below this threshold.
With respect to the second premise, ignoring the existence of among-sites rate variation can yield biased phylogenetic estimators (Yang 1996
). Nevertheless, using the homogeneous HKY85 model assuming discrete-gamma distributed rates, Galtier and Mouchiroud (1998) found that Galtier and Gouy's (1998) method was robust against gamma-shape parameter values (
) as low as 0.5 (i.e., high among-sites rate variation). The assumption of equal rates among sites is untenable for the gene regions under consideration due to large substitution rate differences (
< 0.5) between codon positions and within first and second codon positions (Rodríguez-Trelles, Tarrío, and Ayala 1999b, 2000a). However, substitution rate differences within third codon positions were small (
> 1) for the eight gene regions investigated (
values obtained with the discrete gamma HKY85 model with eight categories of rates assuming the trees in fig. 2
are 2.16 ± 0.79, 6.65 ± 5.78, 3.30 ± 1.22, 3.93 ± 1.98, 2.48 ± 2.52, 1.20 ± 0.36, 3.75 ± 2.21, and 3.34 ± 0.61 for Adh, Amyrel, Ddc, Gpdh, H2a-H2b, per, Sod, and Xdh, respectively). Therefore, we do not expect inferred ancestral GC contents in this study to be seriously deflected by either undue GC (AT)-skewness or extreme rate variation among sites.
Tourasse and Li (1999)
have recently shown that the relative-rate test based on Kimura's (1980)
distance leads far too often to the false conclusion that substitution rates are unequal when there are large base composition differences among sequences. When there is no rate difference among lineages but the outgroup is compositionally closer to one of the two lineages compared, the relative-rate test based on Kimura's (1980)
model clusters the outgroup with that lineage to the exclusion of the other (Tourasse and Li 1999). The relative-rate test based on the LogDet model works properly in these situations, provided the sequences are
500 nt. In this respect, the substitution rate differences between lineages in Xdh detected in this study (see Rodríguez-Trelles, Tarrío, and Ayala 1999a, 1999b
) are real because they remain significant after accounting for the base composition differences among sequences with the LogDet transformation.
The eight gene regions analyzed were scattered throughout the genome (see Materials and Methods). Therefore, the patterns of this study likely evidence genomewide processes. Extensive variation in GC content is not unique to Drosophila; large compositional differences have long been known among bacterial genomes (Lee, Wahl, and Barbu 1956
) and between isochores of the mammalian genome (Bernardi et al. 1985
).
The base composition of the genome reflects an interplay between functional constraints and mutational biases. In Drosophila, functional regions generally exhibit higher GC contents than unconstrained regions, which is attributed to natural selection for greater GC content in the former. Increased GC can enhance translation efficiency and/or accuracy if it better matches the tRNA pool (the "major codon preference model"; see Akashi, Kliman, and Eyre-Walker 1998
). Under the assumption that mutation bias has remained constant in the evolution of Drosophila (Petrov and Hartl 1999
), the reduced GC content in the saltans and willistoni groups can be accounted for by positive selection for increased AT, and/or a reduced efficiency of selection against the mutation bias caused by either (1) a reduced recombination rate or (2) diminished population numbers. However, is unclear why selection should favor a greater AT-content in the saltans-willistoni offshoot.
When recombination drops, the effect of natural selection at a given site accelerates genetic drift at linked sites. Kliman and Hey (1993)
found lower codon usage circumscribed to regions with the very lowest levels of recombination in D. melanogaster (i.e., near centromeres and telomeres and on the fourth chromosome). However, it seems unlikely that eight genes, scattered throughout five Drosophila linkage groups, all had occupied regions of extremely low recombination persistently in the evolution of the saltans and willistoni groups. The case of the H2a-H2b histone region is noteworthy. In D. melanogaster, the histone family consists of a tandem repeat of over 100 units, each
5 kb long, which evolve concertedly (see Baldo, Les, and Strausbaugh 1999
). The histone genes are transcribed very actively; however, their codon usage is among the less biased for D. melanogaster genes (Fitch and Strausbaugh 1993
). Under the major codon preference model, this is explained because the histones are placed in a centromeric position in this species (Sharp and Matassi 1994
). It follows that one would not anticipate finding homologous histone genes in other species with substantially less C- and G-ending codons than there are in D. melanogaster. Yet, this is the case for the species of the willistoni group (see table 1 ). Also, the cytological map positions of the Adh and Sod genes determined for some willistoni species (Rohde et al. 1994, 1995
) cannot be ascribed to low-recombination regions on an a priori basis. Reduced GC content in saltans and willistoni could be the reflection of a genomewide reduction in recombination rate. However, the map length of Drosophila prosaltans is similar to that of D. melanogaster (285.4 vs. 294.9 cM, respectively; Cáceres, Barbadilla, and Ruiz 1999).
Diminished efficiency of natural selection as the agent for the reduced GC-content in saltans and willistoni has been recently challenged on other grounds (Rodríguez-Trelles, Tarrío, and Ayala 2000a). Focusing on the Adh and Xdh loci, this study found that the branch leading to the saltans and willistoni groups, where most of the change in GC content has occurred (see fig. 2 ), exhibits an excess of synonymous substitutions relative to the nonsynonymous replacements. This result can hardly be accounted for by most common scenarios of selection, which, rather, predict a relative increase in the rate of nonsynonymous substitutions (see Rodríguez-Trelles, Tarrío, and Ayala 2000a).
The arguments above rely on the notion that the mutation bias is equal for all Drosophila species. Relaxing this premise allows us to better account for the data. Theoretical results of Shields (1990) show that a shift in mutation bias can trigger a switch in codon preference. Unlike natural selection, mutation bias affects the less constrained parts of the genome more than functionally significant parts (Sueoka 1988
). Accordingly, a shift in the pattern of point mutation in the saltans-willistoni stem would explain why replacements in Amyrel, Ddc, per, and Xdh occur preferentially by amino acids encoded by low-GC-content codons in this lineage, because these genes code for the fastest-evolving proteins in Sophophora out of the eight gene regions examined (Ka = 0.1249, 0.1035, and 0.0931 for Amyrel, per, and Xdh, respectively, for the average of the comparisons of D. willistoni against D. pseudoobscura and D. melanogaster; by the method of Nei and Gojobori [1986
], Ddc changes at an intermediate rate, and Ka = 0.0497). Gpdh and H2a-H2b evolve the slowest (Ka = 0.0091 and 0.0000, respectively), which suggests that their encoded proteins are too constrained to reflect the mutation bias. Similarly, a shift in mutation bias would account for the increased rates of synonymous substitution in the common ancestor of saltans and willistoni detected in previous studies (Rodríguez-Trelles, Tarrío, and Ayala 2000a).
Available information on noncoding, putatively unconstrained regions favors this hypothesis as well. The spacer region between the H2a and H2b histone genes (250 nt long; Baldo, Les, Strausbaugh 1999
) exhibits lower GC content in the willistoni (28.5%, average across D. paulistorum and D. insularis) species than in the species of the melanogaster (39.6%, average across D. melanogaster and D. yakuba) and obscura groups (40.6%, average across D. pseudoobscura and D. persimilis), and similar patterns have been noticed for the introns of Adh (unpublished GenBank accession number AB026533 for D. saltans), Amyrel (Da Lage et al. 1998), and Xdh (Rodríguez-Trelles, Tarrío, and Ayala 1999a
).
The idea of the constancy of the pattern of point mutation in Drosophila is largely based on the analysis of only two species, D. melanogaster, of the Sophophora subgenus, and D. virilis, of the subgenus Drosophila (Petrov and Hartl 1999
). GC content differences between these two species are by far smaller than the ones detected in this study (see Powell 1997
). Rather, our results suggest that mutation bias can fluctuate between previously unsuspected broad limits in Drosophila so as to be able to generate extensive nucleotide composition differences between relatively closely related species.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: GC content
mutation pressure
relative-rate test
Drosophila saltans group
Drosophila willistoni group
2 Address for correspondence and reprints: Francisco Rodríguez-Trelles Francisco J. Ayala, Department of Ecology and Evolutionary Biology, 321 Steinhaus Hall, University of California, Irvine, California 92697-2525. E-mail: ftrelles{at}ds.cesga.es
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Akashi, H., R. M. Kliman, and A. Eyre-Walker. 1998. Mutation pressure, natural selection, and the evolution of base composition in Drosophila. Genetica 102/103:4960.
Baldo, A. M., D. H. Les, and L. D. Strausbaugh. 1999. Potentials and limitations of the histone repeat sequences for phylogenetic reconstruction of Sophophora. Mol. Biol. Evol. 16:15111520.[Abstract]
Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunier-Rotival, and F. Rodier. 1985. The mosaic genome of warm-blooded vertebrates. Science 228:953958.
Cáceres, M., A. Barbadilla, and A. Ruiz. 1999. Recombination rate predicts inversion size in Diptera. Genetics 153:251259.
Da Lage, J.-L., E. Renard, F. Chartois, F. Lemeunier, and M.-L. Cariou. 1998. Amyrel, a paralogous gene of the amylase gene family in Drosophila melanogaster and the Sophophora subgenus. Proc. Natl. Acad. Sci. USA 95:68486853.
Fitch, D. H. A., and L. D. Strausbaugh. 1993. Low codon bias and high rates of synonymous substitution in Drosophila hydei and D. melanogaster histone genes. Mol. Biol. Evol. 10:397413.
Galtier, N., and M. Gouy. 1998. Inferring the pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15:871879.[Abstract]
Galtier, N., and D. Mouchiroud. 1998. Isochore evolution in mammals: a human-like ancestral structure. Genetics 150:15771584.
Galtier, N., N. Tourasse, and M. Gouy. 1999. A nonhyperthermophilic common ancestor to extant life forms. Science 283:220221.
Gu, X., and W.-H. Li. 1996. Bias-corrected paralinear and LogDet distances and tests of molecular clocks and phylogenies under nonstationary nucleotide frequencies. Mol. Biol. Evol. 13:13751383.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160174.[ISI][Medline]
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111120.[ISI][Medline]
Kliman, R. M., and J. Hey. 1993. Reduced natural selection associated with low recombination in Drosophila melanogaster. Mol. Biol. Evol. 10:12391258.[Abstract]
. 1994. The effects of mutation and natural selection on codon bias in the genes of Drosophila. Genetics 137:10491056.
Lee, K. Y., R. Wahl, and E. Barbu. 1956. Contenu en bases puriques et pyrimidiques des acids desoxyribonucleiques des bacteries. Ann. Inst. Pasteur 91:212224.
Li, W.-H., and J. Bousquet. 1992. Relative-rate test for nucleotide substitutions between two lineages. Mol. Biol. Evol. 9:11851189.
Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. 11:605612.
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the number of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418426.[Abstract]
Petrov, D. A., and D. L. Hartl. 1999. Patterns of nucleotide substitution in Drosophila and mammalian genomes. Proc. Natl. Acad. Sci. USA 96:14751479.
Powell, J. R. 1997. Progress and prospects in evolutionary biology: the Drosophila model. Oxford University Press, New York.
RodrÍguez-Trelles, F., R. TarrÍo, and F. J. Ayala. 1999a. Switch in codon bias and increased rates of amino acid substitution in the Drosophila saltans species group. Genetics 153:339350.
. 1999b. Molecular evolution and phylogeny of the Drosophila saltans species group inferred from the Xdh gene. Mol. Phylogenet. Evol. 13:110121.
. 2000a. Fluctuating mutation bias and the evolution of the base composition in Drosophila. J. Mol. Evol. 50:110.
. 2000b. Disparate evolution of paralogous introns in the Xdh gene of Drosophila. J. Mol. Evol. 50:123130.
Rohde, C., E. Abdelhay, H. Pinto, A. Schrank, and V. L. S. Valente. 1995. Analysis and in situ mapping of the Adh locus in species of the willistoni group of Drosophila. Cytobios 81:3747.
Rohde, C., H. Pinto, V. H. Valiati, A. Schrank, and V. L. S. Valente. 1994. Localization of the Cu/Zn superoxide dismutase gene in the Drosophila willistoni species group by in situ hybridization. Cytobios 80:193198.
Rzhetsky, A., and M. Nei. 1995. Tests of the applicability of several substitution models for DNA sequence data. Mol. Biol. Evol. 12:131151.[Abstract]
Sharp, P. M., and G. Matassi. 1994. Codon usage and genome evolution. Curr. Opin. Genet. Dev. 4:851860.[Medline]
Shields, D. C. 1990. Switches in species specific codon preferences: the influence of mutation biases. J. Mol. Evol. 31:7180.[ISI][Medline]
Sueoka, N. 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85:26532657.
Tamura, K. 1992. Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C content biases. Mol. Biol. Evol. 9:678687.[Abstract]
TarrÍo, R., F. RodrÍguez-Trelles, and F. J. Ayala. 1998. New Drosophila introns originate by duplication. Proc. Natl. Acad. Sci. USA 95:16521658.
. 2000. Tree rooting with outgroups when they differ in their nucleotide composition from the ingroup: the Drosophila saltans and willistoni groups, a study case. Mol. Phylogenet. Evol. (in press).
Tatarenkov, A., J. Kwiatowski, D. Skarecky, E. Barrio, and F. J. Ayala. 1999. On the evolution of Dopa decarboxylase (Ddc) and Drosophila systematic. J. Mol. Evol. 48:445462.[ISI][Medline]
Throckmorton, L. H. 1975. The phylogeny ecology and geography of Drosophila. Pp. 421436 in R. C. King, ed. Handbook of genetics. Vol. 3. Plenum Press, New York.
Tourasse, N. J., and W.-H. Li. 1999. Performance of the relative-rate test under nonstationary models of nucleotide substitution. Mol. Biol. Evol. 16:10681078.[Abstract]
Wu, C.-I., and W.-H. Li. 1985. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc. Natl. Acad. Sci. USA 82:17411745.
Yang, Z. 1996. The among-site rate variation and its impact on phylogenetic analyses. TREE 11:367372.
. 1999. Phylogenetic analysis by maximum likelihood (PAML). Version 2.0. University College London.