Section of Evolution and Ecology, University of California at Davis
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Recent analyses of nucleotide polymorphism and divergence at eight genes from Drosophila simulans and its close relatives have led to three hypotheses regarding the frequency distribution of nucleotide polymorphisms (Akashi 1996, 1999
; Akashi and Schaeffer 1997
). The first hypothesis is that roughly equal numbers of preferred and unpreferred codons (mutations) have fixed along the D. simulans lineage. The observation is consistent with the notion that codon bias is not evolving in D. simulans (i.e., that D. simulans is at equilibrium for codon bias). The second hypothesis (Akashi and Schaeffer 1997
) is that unpreferred polymorphisms segregate at significantly lower frequencies than preferred polymorphisms in D. simulans. The third hypothesis is that replacement polymorphisms are skewed toward rare alleles in D. simulans (Akashi 1996, 1999
). According to this worldview, many unpreferred polymorphisms and replacement polymorphisms in D. simulans belong to a special category of "borderline" alleles. These alleles have selection coefficients such that Ns, the product of the effective population size and the selection coefficient, is close to 1. Selection on such alleles is sufficiently weak that they can reach appreciable frequencies, yet sufficiently strong that they are unlikely to reach high frequencies or fix (Kimura 1983
; Ohta 1992
).
A weakness of the D. simulans data, as acknowledged by Akashi (1996, 1999), is that support for a significant skew toward rare amino acid polymorphisms is based on data from only a few genes. Only three of the eight genes analyzed by Akashi (1996)
harbored amino acid polymorphism. Of the nine singleton amino acid polymorphisms, five were from the period locus. We would be unwise to draw general conclusions about the frequency distribution of amino acid polymorphisms from so few data. Given that greater numbers of silent polymorphisms were observed in D. simulans, conclusions on their frequency distribution would seem to be more sound. Nevertheless, period data account for about 30% of the derived singleton unpreferred polymorphisms in the D. simulans data analyzed by Akashi and Schaeffer (1997)
. If period data are excluded, there is no significant skew toward rare, unpreferred alleles in D. simulans (one-tailed Mann-Whitney U test; P = 0.11). This dependence of the statistical results on period data could be indicative of locus effects or could be attributable to reduced power associated with removal of a large amount of data from the analysis.
The conclusion of roughly equal numbers of preferred and unpreferred fixations is based on observation of only 27 mutations (Akashi 1999
). The observation of 14 unpreferred and 13 preferred fixations (Akashi 1999
) is compatible with an equilibrium model (i.e., 50% of the fixations preferred and 50% unpreferred). However, this observation is also compatible with an underlying model with highly asymmetric fixation rates of the two mutant types. For example, the observation of 14 unpreferred and 13 preferred fixations in D. simulans is compatible with an underlying model of 65% unpreferred and 35% preferred fixations (two-tailed binomial probability; P = 0.22). That is, although the equilibrium model was not rejected with the available D. simulans data, this should not be construed as strong support for the model.
In general, the previously available data from D. simulans were insufficient to draw strong conclusions on the frequency distribution. Here, I reexamine the frequency distributions of different types of mutations from a larger sample of D. simulans codons (Begun and Whitley 2000b
) and use the results to make inferences on the causes of variation in D. simulans populations.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The criteria of Sharp and Lloyd (1993)
were used to assign codons to putative fitness classes, preferred and unpreferred. Following Akashi (1995, 1996)
, I analyzed the frequency distribution of polarized mutations. Polarized polymorphisms are those for which parsimony can be used to infer which of two alleles at a polymorphic codon is ancestral. Drosophila melanogaster and D. yakuba served as outgroups for the D. simulans data. I used a haphazardly selected allele from each of the two outgroup species for all inferences of the ancestral state in D. simulans. When each of the outgroup codons was identical to one of the segregating D. simulans codons, the outgroup codon was inferred to be the (monomorphic) codon in the hypothetical ancestral D. simulans population. Fixations along the D. simulans lineage were inferred when all D. simulans alleles had a particular base at a given site and both outgroups shared the same base, which was different from the base present in D. simulans. Changes from preferred alleles to unpreferred alleles are referred to as unpreferred mutations, while changes from unpreferred to preferred alleles are referred to as preferred (i.e., higher fitness) mutations. Codons harboring more than one mutation in the sample of three species were excluded from all analyses. Many of Akashi's analyses focused on silent mutations assigned to either of two fitness categories. Here, I also analyzed "no-change" mutations, defined as unpreferred-to-unpreferred changes, or preferred-to-preferred changes. These mutations are hypothesized to have lesser fitness effects than mutations between categories. Replacement mutations were polarized in the same way as silent mutations for the purposes of estimating frequencies, although they were not assigned to presumptive fitness categories. Polarized polymorphisms can have frequencies between 1/n and (n - 1)/n, where n is the number of sampled alleles.
I also analyzed unpolarized mutations. This approach has at least two advantages. First, there is no inference regarding the ancestral state, and thus no potential uncertainty or bias introduced into the analysis. Second, many more codons are available for analysis. For the purposes of this paper, most unpolarized analyses are on the frequencies of unpreferred codons. Unpolarized unpreferred polymorphisms can have frequencies ranging from 1/n to (n - 1)/n. Codons for which there were more than two alleles were excluded from the analysis. For silent versus replacement polymorphism frequencies (no parsing of silent mutations into fitness classes), the frequency of a codon was taken as the frequency of the less common allele.
For some analyses, I assessed the effect of codon bias on frequency of mutations by dividing the D. simulans genes into "higher-bias" and "lower-bias" categories. Higher bias genes were defined as having an effective number of codons (ENC; Wright 1990
) below the median ENC (43.8) of simulans genes in the data (appendix A); lower bias genes had ENCs above the median. The v and nos loci had the median ENC for the data; v was haphazardly assigned to the lower-bias category, while nos was assigned to the higher-bias category (none of the results are sensitive to this assignment). A more powerful approach for assessing the effect of bias on frequencies might result if we omit genes of intermediate codon bias. Therefore, in some analyses, I included only the genes having ENC values near the tails of the ENC distribution for all the data. For analyses of polarized mutations, the following genes were assigned to the high-bias category: Tpi, Hsc70, G6pd, mir, per, Pgd, and sn; the low-bias category included ry, hyd, dec-1, and Cp190. The mean ENCs for these high- and low-bias categories were 33.6 and 55.1, respectively, compared to a mean ENC of 44.2 for the 40 D. simulans genes from Begun and Whitley (2000b)
. For analyses of unpolarized polymorphisms, the high-bias genes included Yp3, Yp2, and sqh in addition to the above set. Low-bias genes for unpolarized analyses included those used in the polarized data, as well as fzo, mei-9, otu, Gld, mei-218, AATS, and ovo.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Under the mutation-selection-drift model of silent-site evolution, genes under stronger selection for codon bias might be expected to show a greater skew toward rare alleles for unpreferred mutations (Akashi 1999
; McVean and Charlesworth 1999
). Figure 1
, a scatterplot of codon bias in D. simulans genes (ENC) versus the average frequency of unpreferred polymorphisms (per gene), reveals no effect of codon bias on the average frequency of unpreferred polymorphisms. Although the mean frequency of unpreferred polymorphisms is lower in higher-bias genes (n = 114 polymorphisms, frequency = 0.307) than in lower-bias genes (n = 90 polymorphisms, frequency = 0.333), the difference is not significant (Mann-Whitney; P = 0.16). There is no difference in the average frequencies of unpreferred polymorphisms in high-bias genes (n = 59 polymorphisms) versus low-bias genes (n = 25 polymorphisms) (Mann-Whitney; P = 0.47). The ratios of unpreferred to preferred polymorphisms are not significantly different in the higher-bias (114:19) versus lower-bias (90:24) genes; neither are the ratios significantly different in the high-bias (59:8) versus low-bias (25:6) genes. Overall, there is little evidence that selection has heterogeneous effects on the mean frequencies of derived polymorphisms across categories of mutants.
|
|
|
|
Tests of Polymorphism and Divergence
Tables 6 and 7
show the numbers of polarized polymorphisms and fixations in D. simulans. The 2 x 4 contingency tables are significantly heterogeneous with all the data (P < 0.001) and with the Relish and G6pd data excluded (P < 0.001). There is strong evidence that both G6pd and Relish have undergone adaptive protein evolution in the D. simulans lineage. Therefore, significant heterogeneity of the data in Table 7
shows that large numbers of excess amino acid fixations in Relish and G6pd (Eanes et al. 1996
; Begun and Whitley 2000a
) do not account for the result. As one would suspect from inspection of table 7 , the ratio of polymorphic to fixed mutations is not significantly heterogeneous for the unpreferred, no-change, and replacement mutations (P = 0.23). Thus, the main cause of the significant rejection of homogeneity in this table is that the ratio of preferred fixations to polymorphisms is significantly greater than the ratio observed for the other mutant classes.
|
|
Silent-site divergence was estimated by counting all silent mutants that fixed along the D. simulans lineage; polymorphism data from D. simulans, as well as outgroup data from D. melanogaster and D. yakuba, were used in the analysis. Figure 2
shows that there is no correlation between codon bias (ENC) and the silent-site divergence in the D. simulans lineage. An earlier study showing a similar result did not examine the D. simulans lineage separately (Powell and Moriyama 1997
). Silent divergence for 10 X-linked genes (0.029) was slightly greater than the divergence for 3R genes (0.021); the difference was marginally significant (P = 0.04) by a Mann-Whitney test. However, there was no difference in the ratio of unpreferred to preferred fixations for X-linked (34:28) versus 3R (37:18) genes.
|
Frequency of Unpolarized Unpreferred Codons
There were 422 codons that were polymorphic for an unpreferred codon and a preferred codon (codons with allele frequencies of 0.5 were excluded). Of these, the rarer allele was unpreferred at 285 codons. Unpolarized data can be used directly in tests to determine if unpreferred codons are maintained at low frequency by natural selection. Under the mutation-selection-drift model, the degree of codon bias reflects the intensity of purifying selection at silent sites. The frequency of the unpreferred codon was calculated for each of 452 codons (n = 40 genes) in which one allele was unpreferred and one was preferred. Figure 3 shows the relationship between ENC and the frequency of unpreferred alleles per gene; the two variable are significantly correlated (Spearman correlation; P = 0.005). Furthermore, the average frequency of unpreferred codons is marginally significantly lower (Mann-Whitney; P = 0.04) in the higher-bias genes (mean = 0.213) than in the lower-bias genes (mean = 0.252). The same is true for the 10 most biased (mean frequency of unpreferred alleles per gene = 0.219) versus the 10 least biased (mean frequency of unpreferred alleles per gene = 0.286) genes among the 40 D. simulans genes (Mann-Whitney; P = 0.03). These analyses support the notion that frequencies of unpreferred codons are depressed by purifying selection. Further support for this notion comes from categorization of unpreferred polymorphisms into two categories, singletons versus nonsingletons. There are 85 singletons and 111 nonsingletons for higher-bias genes; there are 61 singletons and 195 nonsingletons for lower-bias genes. The proportion of unpreferred polymorphisms that are singletons is significantly greater in higher-bias than in lower-bias genes (G-test; P < 0.0001), as one would expect if purifying selection depresses frequencies of unpreferred codons more effectively in higher-bias genes.
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The results presented here are similar to Akashi's in that the ratio of polarized unpreferred to preferred polymorphisms is much greater than the ratio of unpreferred to preferred fixations. If one attributes this result to "too many" unpreferred polymorphisms, then a plausible explanation for such an excess is that unpreferred polymorphisms are borderline deleterious mutations (i.e., 1 < Ns < 3) (Akashi 1996
). The contribution of such mutations to polymorphism is expected to be greater than their contribution to divergence (e.g., Ohta and Kimura 1971
; Kimura 1983
; Ohta 1992
). Their frequencies in samples are expected to be lower than frequencies of neutral polymorphisms or borderline beneficial polymorphisms (Akashi 1999
). Previous analyses of polymorphisms from D. simulans provided little support for skewed frequency distributions for mutants of various putative fitness classes. The results from polarized polymorphisms in D. simulans presented here also provide little support for heterogeneity of frequencies between unpreferred, preferred, or amino acid polymorphisms. On the other hand, analysis of unpolarized polymorphisms from higher-bias versus lower-bias genes provides the best evidence to date for a skew toward rare alleles for unpreferred polymorphisms. Excess numbers of unpreferred polymorphisms and the skew toward rare alleles for unpolarized unpreferred codons provide complementary support for the notion that borderline mutations make significant contributions to silent variation in D. simulans. The rather different results for the unpolarized versus polarized mutations is, however, a bit troubling. This is especially true given that one expects biases arising from analysis of polarized mutations to result in a greater likelihood of detecting skews toward rare alleles for unpreferred mutations. A possible explanation for the discrepancy is that analysis of unpolarized unpreferred mutations is more powerful because there are greater numbers of unpolarized mutations (452) than of polarized mutations (204). Given the results in table 4
, it would not be surprising if larger samples of polarized unpreferred polymorphisms from higher-bias versus lower-bias genes supported a skew toward rare alleles in higher-bias genes.
The analyses presented here support the idea that selection at silent sites is stronger at twofold codons than at fourfold codons. A reasonable interpretation is that fourfold codons sometimes (or often) have more than two potential fitness classes. Assume the least fit allele at a fourfold codon is as deleterious as the less fit allele at a twofold codon. If this is true, then we expect the average unpreferred allele at a fourfold codon to be selected more weakly than the average unpreferred allele at a twofold codon. Kreitman and Antezana (2000)
noted that the rank order of alternative codon frequency for most four-codon families was conserved between D. melanogaster and D. pseudoobscura. This suggests that there are more than two fitness classes and as many as four fitness classes for some codon families. Frequencies of polymorphisms for twofold and fourfold codons in D. simulans support this hypothesis. If true, the hypothesis predicts that many "no-change" mutations are very weakly deleterious (although some may also be weakly beneficial). The observation that the ratio of polymorphic to fixed no-change mutations is similar to the ratio for unpreferred mutations (tables 6 and 7
) is consistent with the hypothesis that the two types of mutations have similar distributions of selection coefficients. The summary of codon use in high-bias D. melanogaster genes given in Kreitman and Antezana (2000)
was used to assign a fitness ranking based on relative abundance. The number of fitness classes was equal to the size of the codon family (two-, three-, or fourfold). All D. simulans mutations previously assigned to the no-change category, along with those that had not been assigned any category based on the analysis of Sharp and Lloyd (1993)
, were reclassified as preferred or unpreferred based on these rankings. Among the reclassified polymorphic mutations, roughly twice as many are to "lower-fitness" codons (49) as to "higher-fitness" codons (23). Among the reclassified fixed mutations, 12 are to lower-fitness codons, while 9 are to higher-fitness codons. Although the 2 x 2 contingency table is not significantly heterogeneous, the configuration is in the same direction as for the unpreferred mutations. This, too, is consistent with the idea that categorization of silent mutations into two categories is overly simplistic.
Begun and Whitley (2000b)
suggested that reduced X-linked versus autosomal polymorphism in D. simulans is best explained by stronger effects of positively selected mutants on the X chromosome. The result reported here, that conditioned on a site being polymorphic in a sample, X-linked polymorphisms occur at a higher frequency than those on 3R (table 2
), is another distinguishing feature of X-linked versus autosomal variation in this species. Further theoretical research is required to determine which models of linked selection may be able to account for these data (e.g., Gillespie 1997
; Fay and Wu 2000
).
Sequencing surveys and microsatellite analyses of D. simulans are indicative of small but significant differentiation between populations and slightly higher levels of variation in African versus United States D. simulans (Irvin et al. 1998
; Hamblin and Veuille 1999
). Inferences on the dynamics of mutations in D. simulans populations reported here rely on comparisons of different mutant classes or comparison of mutations on different chromosomes. Because deviations from population equilibrium are expected to affect all weakly selected sites in the genome in a similar manner, such comparisons remain useful. Nevertheless, theoretical studies would be required to confirm that deviations from equilibrium have only minor effects on the behavior of the tests carried out for this paper.
Cargill et al. (1999)
measured polymorphism in 106 human genes, with an average sample size of 114 alleles per gene. They found that replacement polymorphisms occurred at a significantly lower average frequency than silent polymorphisms, primarily because replacement polymorphisms were overrepresented among the class of very rare alleles. They attributed this observation to stronger purifying selection against replacement polymorphisms than against silent polymorphisms. Determining whether the frequency distributions of replacement polymorphism in Drosophila populations and human populations are similar would require sampling of larger numbers of D. simulans alleles.
|
|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: Drosophila,
DNA variation
population genetics
molecular evolution
natural selection
2 Address for correspondence and reprints: David Begun, Section of Evolution and Ecology, University of California, Davis, California 95616. E-mail: djbegun{at}ucdavis.edu
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Akashi H., 1994 Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy Genetics 136:927-935
. 1995 Inferring weak selection from patterns of polymorphism and divergence at silent sites in Drosophila Genetics 139:1067-1076
. 1996 Molecular evolution between Drosophila melanogaster and D. simulans: Reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster. Genetics 144:1297-1307
. 1999 Inferring the fitness effects of DNA mutations from polymorphism and divergence data: statistical power to detect directional selection under stationarity and free recombination Genetics 151:221-238
Akashi H., S. W. Schaeffer, 1997 Natural selection and the frequency distribution of "silent" DNA polymorphism in Drosophila Genetics 146:295-307
Begun D. J., P. Whitley, 2000a. Adaptive evolution of RELISH, a Drosophila NF-B/I
B protein Genetics 154:1231-1238
. 2000b. Reduced X-linked nucleotide polymorphism in Drosophila simulans. Proc. Natl. Acad. Sci. USA 97:5960-5965
Bulmer M., 1991 The selection-mutation-drift theory of synonymous codon usage Genetics 129:897-907
Cargill M., D. Altshuler, J. Ireland, et al. (17 co-authors) 1999 Characterization of single-nucleotide polymorphisms in coding regions of human genes Nat. Genet 22:231-238[ISI][Medline]
Charlesworth B., M. T. Morgan, D. Charlesworth, 1993 The effect of deleterious mutations on neutral molecular variation Genetics 134:1289-1303
Eanes W. F., M. Kirchner, J. Yoon, C. H. Biermann, I. N. Wang, M. A. McCartney, B. C. Verrelli, 1996 Historical selection, amino acid polymorphism and lineage-specific divergence at the G6pd locus in Drosophila melanogaster and D. simulans. Genetics 144:1027-1041
Fay J. C., C.-I. Wu, 2000 Hitchhiking under positive Darwinian selection Genetics 155:1405-1413
Gillespie J. H., 1997 Junk ain't what junk does: neutral alleles in a selected context Gene 205:291-299[ISI][Medline]
Hamblin M., M. Veuille, 1999 Population structure among African and derived populations of Drosophila simulans: evidence for ancient subdivision and recent admixture Genetics 153:305-317
Irvin S. D., K. A. Wetterstrand, C. M. Hutter, C. F. Aquadro, 1998 Genetic variation and differentiation at microsatellite loci in Drosophila simulans: evidence for founder effects in new world populations Genetics 150:777-790
Kimura M., 1983 The neutral theory of molecular evolution Cambridge University Press, Cambridge, England
Kreitman M., M. Antezana, 2000 Population and evolutionary genetics of codon usage in Drosophila Pp. 82101 in R. Singh and C. Krimbas, eds. Evolutionary genetics: from molecules to morphology. Cambridge University Press, Oxford, England
Li W.-H., 1987 Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons J. Mol. Evol 24:337-345[ISI][Medline]
McVean G. A. T., B. Charlesworth, 1999 A population genetic model for the evolution of synonymous codon usage: patterns and predictions Genet. Res 74:145-158[ISI]
McVean G. A. T., J. Vieira, 1999 The evolution of codon preference in Drosophila: a maximum-likelihood approach to parameter estimation and hypothesis testing J. Mol. Evol 49:63-75[ISI][Medline]
Maruyama T., P. A. Fuerst, 1984 Population bottlenecks and nonequilibrium models in population genetics. I. Allele numbers when populations evolve from zero variability Genetics 108:745-763
Ohta T., 1992 The nearly neutral theory of molecular evolution Annu. Rev. Ecol. Syst 23:263-286[ISI]
Ohta T., M. Kimura, 1971 On the constrancy of the evolutionary rate of cistrons J. Mol. Evol 1:18-25[Medline]
Powell J. R., E. N. Moriyama, 1997 Evolution of codon usage bias in Drosophila Proc. Natl. Acad. Sci. USA 94:7784-7790
Rozas J., R. Rozas, 1999 DnaSP 3: an integrated program for molecular population genetics and molecular evolution analysis Bioinformatics 15:174-175
Sharp P. M., A. T. Lloyd, 1993 Codon usage Pp. 378397 in G. Maroni, ed. An atlas of Drosophila genes: sequences and molecular features. Oxford University Press, Oxford, England
Sturtevant A. H., 1929 Contributions to the genetics of Drosophila simulans and Drosophila melanogaster. Publ. Carnegie Inst 399:1-62
Tajima F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism Genetics 123:585-595
Takano-Shimizu T., 1999 Local recombination and mutation effects on molecular evolution in Drosophila Genetics 153:1285-1296
True J. R., J. M. Mercer, C. C. Laurie, 1996 Differences in crossover frequency and distribution among three sibling species of Drosophila Genetics 142:507-523
Wright S., 1938 The distribution of gene frequencies under irreversible mutation Proc. Natl. Acad. Sci. USA 24:253-259
. 1990 The "effective number of codons" used in a gene Gene 87:23-29[ISI][Medline]