*Committee on Genetics
Department of Ecology and Evolution, University of Chicago
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We also generated a second restricted data set consisting only of gene families of size 2 with a maximum Ks of 0.5. This was done in order to avoid problems of both phylogenetic nonindependence which arise when analyzing multigene families and the possibility of serious errors in estimating Ka and Ks between highly diverged genes (Comeron 1995
). For the data sets discussed here, we ignored genes on the small fourth chromosome and focused on chromosomes X, 2, and 3 because there are very few genes on the fourth chromosome involved in duplication events.
Divergence Analysis
We aligned the peptide sequences for each paralagous pair using Clustal W 1.81 (Thompson et al. 1994
) and then aligned the coding sequences (CDS) for each paralagous pair, using amino acid alignments as guides. We then calculated the number of amino acid replacement substitutions per site (Ka) and the number of synonymous substitutions per site (Ks) for each pair of aligned CDS sequences using Li's (1993)
method as implemented in the GCG 10.1 software package. We also calculated Ka and Ks for each pairwise comparison by the maximum likelihood method implemented in PAML 3.01d (Yang 2000
). For the likelihood method, we used the F3x4 method to calculate equilibrium base compositions for each pairwise comparison, which corrects for codon usage bias (Dunn, Bielawski, and Yang 2001
). Results obtained from the likelihood and Li's method were very similar, so we report the results from the Li method here for simplicity.
In general, assuming the strict neutrality of silent substitutions, a Ka/Ks < 1.0 indicates selective constraint on amino acids (but does not rule out positive selection), whereas Ka/Ks > 1.0 is often taken as evidence for strong positive Darwinian selection. A Ka/Ks = 1.0 is the expectation under a strictly neutral model of molecular evolution (Kimura 1983
) and should be observed for unconstrained sequences such as pseudogenes. In the extreme case of a pair of duplicates where one gene maintains its original function and the other copy is a pseudogene, the pair's Ka/Ks ratio could be as low as 0.5. We therefore take 0.5 as conservative criterion to test if both copies of the gene duplicates are functional. We used a simple form of the sign test (Sokal and Rohlf 1995
) to test the null hypothesis that Ka/Ks values are equally likely to be less than 0.5 or greater than 0.5. In this test, the distribution of the ratio should follow binomial distribution which can be approximated by the normal distribution C = (X - 0.5N)/(0.5 x
) where X is the number of duplicate pairs with Ka/Ks < 0.5, and N is the total number of duplicate pairs.
Statistical Procedures
In order to assess the statistical significance of the observed mean Ka/Ks value between X-linked duplicates, we employed a random resampling procedure similar to the standard bootstrap (Sokal and Rohlf 1995
, pp. 823825). If the entire data set consists of m values and a subset of size n has a mean Ka/Ks =
obs, we randomly chose n values from the set of m (with resampling) and calculated the mean,
. The resampling was repeated 105 times, and the corresponding one-tailed P-value is estimated from the fraction of runs where
obs. In the case where we ask if the mean is significantly lower than expected, the P-value is the proportion of runs where
obs. We applied the one-sided Kolmogorov-Smirnov two sample test (Sokal and Rohlf 1995
, p. 434) to test for differences between two empirical distributions, calculating P-values using the R software package (Ihaka and Gentleman 1996
).
Software and Data Availability
The programs written to automate and parse the FASTA33 searches, alignments, and calculations of Ka/Ks were written in perl. The resampling method was implemented in C. These programs, and a MySQL database of the computational results, are available from the authors upon request.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Figure 1
shows the distributions of Ka/Ks for four different kinds of duplicate pairs: X-linked and autosomal-linked (fig. 1a
) and unlinked pairs (X-autosome or chromosome 2/3; fig. 1b
). Qualitatively, the distributions for linked autosomal (fig. 1a
), unlinked X-autosome, and unlinked autosomal (fig. 1b
) look identical, with most of the mass centered near Ka/Ks 0.27. The mean values for these three distributions fall in the narrow range of 0.25810.2740 (table 1
). However, the distribution of Ka/Ks for X-linked duplicates has less mass around Ka/Ks
0.27 than the other three distributions and more mass where Ka/Ks > 0.50 (fig. 1a
). The Ka/Ks distribution for X-linked duplicates is significantly shifted to the right when compared with the distribution for linked, autosomal paralogs (P = 10-7). Further, the mean Ka/Ks between X-linked duplicates is 0.4701, nearly double that of all other duplicates in the Drosophila genome, regardless of linkage (table 1
). The high mean Ka/Ks between X-linked duplicates and the shift in mass of the distribution of Ka/Ks implies that a subset of X-linked gene duplicate pairs have diverged much more rapidly from each other than have most gene duplicate pairs in Drosophila.
|
|
Table 1
shows that the mean Ks for X-linked duplicates is 0.8934, whereas the genomic mean is 1.3585, suggesting a large difference between X-linked and autosomal duplicates. The distributions of Ks between duplicate pairs are shown in figure 2
. The histograms are plotted separately for the linkage relationships described earlier and in figure 1
. The distribution of Ks for X-linked duplicates is significantly shifted toward lower values (fig. 2a
) compared with the other distributions (P 10-11).
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In order to explain the rapid divergence between X-linked paralogs, we need to consider the following possible hypotheses: First, it is possible that we have analyzed an unknown number of X-linked pseudogenes, inflating the Ka/Ks between X-linked duplicates. Secondly, Ka/Ks may be higher between X-linked duplicates either because of higher constraint at silent sites on the X, relative to autosomal loci, or because of a relative lack of constraint on X-linked replacement sites. Finally, Ka/Ks may be accelerated by fixation of amino acid changes under selection.
X-linked Pseudogenes
Pseudogenes are nonfunctional duplications of coding sequence and as such are expected to evolve with high Ka/Ks because of an absence of purifying selection. Thus, the acceleration of Ka/Ks on the X chromosome could be because of including pseudogenes in the analysis. Because we used sequence annotation obtained directly from the genome project web site, we would only have analyzed pseudogenes if they had been misdiagnosed as functional loci. We find this explanation unlikely for three reasons. First, it would imply that X-linked pseudogenes are disproportionately misdiagnosed as functional genes, which seems implausible. Secondly, the pseudogene argument would predict that the mean Ks between X-linked paralogs would be much smaller than the genome average because pseudogenes are rapidly eliminated from Drosophila genomes (Petrov, Lozovskaya, and Hartl 1996
). To test this, we again restricted our analysis to independent pairs diverged by Ks
0.5. For this data set, the mean Ks between X-linked duplicates is 0.1957, which is actually higher than the genomic mean but not significantly high (P = 0.2610). Finally, the Ka/Ks is less that 0.50 for most of the X-X pairs, suggesting selective constraint on amino acid substitutions.
Constraint on Silent Sites
In Drosophila, patterns of codon usage bias have been interpreted as evidence for weak selection on silent sites (Akashi 1995
). Genes on the X chromosome of D. melanogaster show higher codon bias than autosomal loci (Comeron, Kreitman, and Aguade 1999
), and Powell and Moriyama (1997)
observed an inverse relationship between codon bias and Ks, as expected from the weak selection hypothesis. It is therefore possible that Ka/Ks is accelerated for X-linked duplications because of stronger constraint on silent sites at X-linked loci. Table 1
shows that the mean Ks for X-linked paralogs is 0.8934, substantially lower than the genomic mean of 1.3585, seemingly consistent with the codon bias hypothesis. However, we do not believe that codon bias is a good explanation for the results for two reasons. First, the excess codon bias on the D. melanogaster X chromosome is rather slight (Comeron, Kreitman, and Aguade 1999
), and silent divergence between D. melanogaster and its close relative D. simulans is not reduced on the X chromosome relative to autosomes (Bauer and Aquadro 1997
). More important, however, is the possibility that the relationship between Ks and codon bias observed by Powell and Moriyama (1997)
is a consequence of using measures of divergence that do not properly account for the compositional bias of the sequences, as pointed out by Dunn, Bielawski, and Yang (2001)
. Using the maximum likelihood method of the PAML package (Yang 2000
), Dunn, Bielawski, and Yang (2001)
were only able to recover an inverse relationship between Ks and codon bias by not accounting for compositional bias (per coding position) in the estimates. Thus, if the main reason why we observe an acceleration of Ka/Ks on the X chromosome is simply the result of codon bias, then repeating the analysis using the likelihood method (Yang 2000
) and correcting for codon bias should eliminate the evidence for acceleration between X-linked paralogs. However, repeating the analysis using PAML (Yang 2000
) and correcting for compositional bias (see Methods) did not change any of the qualitative patterns shown in figures 1 or 2
and table 1
, suggesting that codon bias alone cannot explain the increased Ka/Ks between X-linked duplicates.
As an alternative to codon bias, the distribution of Ks in figure 2a suggests that there is a higher proportion of young duplicate pairs on the X chromosomes compared with the autosomes. We hypothesize that either the rate of gene duplication on the D. melanogaster X chromosome is higher than for the autosomes or that there has been a recent burst of duplication on the X. Both these hypotheses provide an explanation for the excess of X-linked duplicate pairs with low Ks (fig. 2a ).
Slightly Deleterious Substitutions
It is possible that gene duplicates experience less selective constraint with slightly deleterious mutations. Under this model, their fates will be governed by drift rather than selection. The two critical parameters to consider are the effective population size of the X relative to autosomes and the dominance of the weakly deleterious mutations (Charlesworth, Coyne, and Barton 1987
). This relaxed constraint model qualitatively provides an explanation for the distributions of Ka/Ks between duplicate pairs (fig. 1
). However, this model has two limitations as a general explanation for the data. First, the relative difference in constraint between the X and the autosomes would have to be about 1.74-fold (0.4701/0.2697, table 1
), whereas a maximum of a 1.2-fold difference is expected (this maximum occurs when weakly deleterious alleles are fully dominant) (Charlesworth, Coyne, and Barton 1987
). It is known that most deleterious mutations are recessive rather than dominant (Crow and Temin 1964
; Mukai et al. 1972
; Crow and Simmons 1983
). The effect of slightly deleterious recessives is to slow the rate of substitution on the X relative to autosomes (Charlesworth, Coyne, and Barton 1987
; McVean and Charlesworth 1999
), a prediction incompatible with our observations. Secondly, relaxation of constraint should apply to all duplicate pairs, and so the mode of the distribution of Ka/Ks on the X should be near the mean, which is not what we observe (fig. 1
). Rather, figure 1
shows a large mode centered near the genomic mean and a large tail of pairs with accelerated Ka/Ks.
Adaptive Models
One possible adaptive reason why X-linked duplicates should diverge rapidly is that selectively favorable mutations may generally be fully or partially recessive and fix via transmission in the heterogametic sex (males in Drosophila). Charlesworth, Coyne, and Barton (1987)
have shown that X-linked genes will evolve more rapidly than autosomal loci, provided that most adaptive mutations are recessive or partially recessive. The relative increase in the evolutionary rate of X-linked loci occurs because the adaptive alleles are sheltered from positive selection on autosomes but are fully expressed in males when X-linked, resulting in a higher fixation probability for X-linked mutations. To date, the dominance of adaptive changes remains an open question (Charlesworth, Coyne, and Barton 1987
). Although it is believed that most adaptive changes should be dominant (Haldane 1924
, 1927
), some adaptive phenotypes in inbreeding plant species have been shown to be recessive (Charlesworth 1992
; Bradshaw et al. 1995
, 1998
), whereas insecticide resitance, an adaptation found in outcrossing insect species, is generally dominant (see Charlesworth, Coyne, and Barton 1987
; Orr and Betancourt 2001
and references therein).
The theory of Charlesworth, Coyne, and Barton (1987)
is a model in which selection occurs on new adaptive mutations, rather than on standing variation in a population at mutation-selection balance (Orr and Betancourt 2001
). The classical model of gene duplications posits that purifying selection is relaxed for a short period after duplication, allowing the accumulation of fixed substitutions in the duplicated genes and that these substitutions can be used for later adaptive evolution, i.e., when environmental conditions change (see for example, Kimura 1983
, pp. 104113). Because the classical model is a neutral model, there should be a corresponding increase in amino acid polymorphism at the duplicate loci, resulting in an accumulation of amino acid variation early in the evolution of duplicate genes. Many of the amino acid changes that accumulate during the period of relaxed negative selection should be partially recessive, in the genetic sense that they create partial loss of function alleles. Positive selection may then later act on the standing variation accumulated at duplicate loci, selecting for these recessive changes. However, if selection acts on standing variation rather than on new mutations, the fixation probabilities of the adaptive mutants are essentially independent of dominance, and X-linked loci will evolve more slowly than autosomal loci, assuming equal degrees of selection (Orr and Betancourt 2001
). Thus, the standing variation model would predict that X-linked duplicates diverge more slowly than autosomal duplicates, a pattern opposite to what we observe (table 1
and fig. 1
). In the light of the above considerations, we conclude that selection on recessive adaptations is the most general model that accounts for the high average Ka/Ks observed between X-linked duplicates (table 1
and fig. 1
).
The hypothesis that gene duplicates in D. melanogaster are subject to selection on recessive adaptations has two implications. First, it argues that adaptations at X-linked duplicate loci may be recessive on average, supporting the theory of Charlesworth, Coyne, and Barton (1987)
. Secondly, it provides new insights into the evolutionary fates of gene duplicates. We argued earlier that our data are more compatible with selection on new mutants, rather than on standing variation. Therefore, we suggest that, given that a duplicate gene does not degenerate into a pseudogene, several adaptive substitutions are required to guarantee its survival. This suggestion is more compatible with adaptive models of gene duplicate evolution (Clark 1994
; Walsh 1995
) than with the classical model (Kimura 1983
) or other neutral models (Force et al. 1999
; Lynch and Force 2000
).
Recent analyses of polymorphism data in Drosophila have found significantly reduced amino acid (Andolfatto 2001
) and silent (Begun and Whitley 2001
) polymorpshism on the X, relative to autosomes in D. melanogaster and D. simulans, respectively. Both these patterns are consistent with the "faster X" observations seen here for duplicate genes and are also consistent with selection for adaptive recessive substitutions. One cannot, however, rule out a strong effect of deleterious recessives in reducing amino acid polymorphism on the D. melanogaster X chromosome (Andolfatto 2001
). It is important to note that most of the genes studied by Andolfatto (2001)
and Begun and Whitley (2001)
are single-copy loci, and it is likely that the dynamics of single-copy genes and duplicates differ substantially. In general, single copy genes should evolve under purifying selection to maintain protein function. Duplicate loci, on the other hand, may in general degenerate quickly into pseudogenes (Haldane 1933
; Fisher 1935
). The fate of those duplicates that survive degeneration can be resolved by positive selection for improved or new functions or neutral processes such as subfunctionalization.
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: gene duplication
Drosophila
sex chromosomes
adaptation
selection
nonsynonymous
synonymous
Address for correspondence and reprints: Manyuan Long, Department of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, Illinois 60637. mlong{at}midway.uchicago.edu
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adams M. D., S. E. Celniker, R. A. Holt, et al. (195 co-authors) 2000 The genome sequence of Drosophila melanogaster Science 287:2185-2195
Akashi H., 1995 Inferring weak selection from patterns of polymorphism and divergence at silent sites in Drosophila DNA Genetics 139:1067-1076
Andolfatto P., 2001 Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans Mol. Biol. Evol 18:279-290
Bauer V. L., C. F. Aquadro, 1997 Rates on DNA sequence evolution are not sex-biased in Drosophila melanogaster and D. simulans Mol. Biol. Evol 14:1252-1257[Abstract]
Begun D. J., P. Whitley, 2001 Reduced X-linked nucleotide polymorphism in Drosophila simulans PNAS 97:5960-5965
Bradshaw H. D., K. G. Otto, B. E. Frewen, J. K. McKay, D. W. Schemske, 1998 Quantitative trait loci affecting differences in floral morphology between two species of monkeyflower (Mimulus) Genetics 149:367-382
Bradshaw H. D., S. M. Wilbert, K. G. Otto, D. W. Schemske, 1995 Genetic-mapping of floral traits associated with reproductive isolation in monkeyflowers (Mimulus) Nature 376:762-765[ISI]
Charlesworth B., 1992 Evolutionary rates in partially self-fertilizing species Am. Nat 140:126-148[ISI]
Charlesworth B., J. A. Coyne, N. H. Barton, 1987 The relative rates of evolution of sex-chromosomes and autosomes Am. Nat 130:113-146[ISI]
Clark A. G., 1994 Invasion and maintenance of a gene duplication Proc. Natl. Acad. Sci. USA 91:2950-2954[Abstract]
Comeron J. M., 1995 A method for estimating the numbers of synonymous and nonsynonymous substitutions per site J. Mol. Evol 41:1152-1159[ISI][Medline]
Comeron J. M., M. Kreitman, M. Aguade, 1999 Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila Genetics 151:239-249
Crow J. F., M. J. Simmons, 1983 The mutation load in Drosophila Pp. 135 in M. Ashburner, H. L. Carson, and J. N. Thompson Jr., eds. The genetics and biology of Drosophila, Vol. 3c. Academic Press, London
Crow J. F., R. G. Temin, 1964 Evidence for the partial dominance of recessive lethal genes in natural populations of Drosophila Am. Nat 98:21-33[ISI]
Dunn K. A., J. P. Bielawski, Z. H. Yang, 2001 Substitution rates in Drosophila nuclear genes: implications for translational selection Genetics 157:295-305
Fisher R. A., 1935 The sheltering of lethals Am. Nat 69:446-455
Force A., M. Lynch, F. B. Pickett, A. Amores, Y. L. Yan, J. Postlethwait, 1999 Preservation of duplicate genes by complementary, degenerative mutations Genetics 151:1531-1545
Haldane J. B. S., 1924 A mathematical theory of natural and artificial selection, Part I Trans. Camb. Philos. Soc 28:19-41
. 1927 A mathematical theory of natural and artificial selection, Part II Trans. Camb. Philos. Soc 28:838-844
. 1933 The part played by recurrent mutation in evolution Am. Nat 67:5-19
Hughes A. L., M. Nei, 1992 Maintenance of MHC polymorphism Nature 355:402-403[ISI][Medline]
Ihaka R., R. Gentleman, 1996 R: a language for data analysis and graphics J. Comput. graphical statistics 5:299-314
Kimura M., 1983 The neutral theory of molecular evolution Cambridge University Press, Cambridge
Li W. H., 1993 Unbiased estimation of the rates of synonymous and nonsynonymous substitution J. Mol. Evol 36:96-99[ISI][Medline]
Long M., K. Thornton, 2001 Gene duplication and evolution Science 293:1551a.
Long M. Y., C. H. Langley, 1993 Natural-selection and the origin of jingwei, a chimeric processed functional gene in Drosophila Science 260:91-95[ISI][Medline]
Lynch M., J. S. Conery, 2000 The evolutionary fate and consequences of duplicate genes Science 290:1151-1155
Lynch M., A. Force, 2000 The probability of duplicate gene preservation by subfunctionalization Genetics 154:459-473
McVean G. A. T., B. Charlesworth, 1999 A population genetic model for the evolution of synonymous codon usage: patterns and predictions Genet. Res 74:145-158[ISI]
Mukai T., S. I. Chigusa, L. E. Mettler, J. F. Crow, 1972 Mutation rate and dominance of genes affecting viability in Drosophila melanogaster Genetics 72:335-355
Nurminsky D. I., M. V. Nurminskaya, D. De Aguiar, D. L. Hartl, 1998 Selective sweep of a newly evolved sperm-specific gene in Drosophila Nature 396:572-575[ISI][Medline]
Orr H. A., A. J. Betancourt, 2001 Haldane's sieve and adaptation from the standing genetic variation Genetics 157:875-884
Pearson W. R., 1990 Rapid and sensitive sequence comparison with fastp and fasta Methods Enzymol 183:63-98[ISI][Medline]
Petrov D. A., E. R. Lozovskaya, D. L. Hartl, 1996 High intrinsic rate of DNA loss in Drosophila Nature 384:346-349[ISI][Medline]
Powell J. R., E. N. Moriyama, 1997 Evolution of codon usage bias in Drosophila Proc. Natl. Acad. Sci. USA 94:7784-7790
Rubin G. M., M. D. Yandell, J. R. Wortman, et al. (50 co-authors) 2000 Comparative genomics of the eukaryotes Science 287:2204-2215
Sokal R. R., F. J. Rohlf, 1995 Biometry. 3rd edition W. H. Freeman and Company
Stahl E. A., G. Dwyer, R. Mauricio, M. Kreitman, J. Bergelson, 1999 Dynamics of disease resistance polymorphism at the Rpm1 locus of Arabidopsis Nature 400:667-671[ISI][Medline]
Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]
Walsh J. B., 1995 How often do duplicated genes evolve new functions Genetics 139:421-428
Wang W., J. M. Zhang, C. Alvarez, A. Llopart, M. Long, 2000 The origin of the Jingwei gene and the complex modular structure of its parental gene, yellow emperor, in Drosophila melanogaster Mol. Biol. Evol 17:1294-1301
Wyckoff G. J., W. Wang, C. I. Wu, 2000 Rapid evolution of male reproductive genes in the descent of man Nature 403:304-309[ISI][Medline]
Yang Z., 2000 PAML: phylogenetic analysis by maximum likelihood. Version 3.0 University College London, London