*Centre for the Study of Evolution & School of Biological Sciences, University of Sussex;
Institute of Cell, Animal and Population Biology, University of Edinburgh;
Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
There are two lines of evidence which suggest that a significant proportion of mutations are slightly deleterious, that is, deleterious mutations with selective effects close to 1/Ne. The first comes from several studies showing that the level of selective constraint in protein-coding sequences is positively correlated to population size or to correlates of population size. Constraint is usually calculated as one minus the ratio of the rate of nonsynonymous (or amino acid) substitution to the rate of synonymous (or silent) substitution. Under a model in which synonymous mutations are neutral and nonsynonymous mutations are either neutral or deleterious, constraint is the proportion of amino acid mutations which are deleterious and removed by natural selection. A correlation between constraint and a correlate of generation time was first demonstrated by Ohta (1972a)
, who showed that the ratio of DNA sequence divergence to protein sequence divergence is negatively correlated to generation time across a broad range of animal taxa (mammals and Drosophilids). Because generation time and population size appear to be negatively correlated (Chao and Carr 1993
), this study suggested that constraint is positively correlated to the population size of a species. Ohta's result was corroborated by those of Li, Tanimura, and Sharp (1987)
and Ohta (1995)
, who showed that the ratio of nonsynonymous to synonymous substitution rates is greater in primates and artiodactyls than in rodents (rodents are thought to have larger population sizes than primates and artiodactyls), and by that of Keightley and Eyre-Walker (2000)
, who found a negative correlation between constraint and generation time over a broad range of animal taxa (mammals, birds, and Drosophilids). Studies of island species have also yielded evidence of slightly deleterious mutations; in both Hawaiian Drosophila (Ohta 1976
, 1993
) and species of birds restricted to islands (Johnson and Seger 2001
), levels of constraint are lower than those in continental species.
The second line of evidence comes from studies of within-population variation. It has been observed that the ratio of polymorphism to substitution is greater for nonsynonymous than for synonymous changes in many mitochondrial DNA data sets (Rand and Kann 1996
; Nachman 1998
), in nuclear genes of Arabidopsis thaliana (Weinreich and Rand 2000
), and in Escherichia coli (N. G. C. Smith and A. Eyre-Walker, unpublished data). This pattern is consistent with the segregation of slightly deleterious amino acid mutations, which contribute to polymorphism but rarely become fixed (Kimura 1983
, p. 44). This conjecture gains support from the observation that nonsynonymous mutations tend to segregate at lower frequencies than synonymous mutations in several mitochondrial DNA data sets, which show an excess of nonsynonymous polymorphism (Nielsen and Weinreich 1999
).
So far, however, there have been few attempts to quantify the fraction of amino acid mutations that are slightly deleterious and to estimate the strength of selection acting against them (Fay et al. 2001). In this study we attempt to estimate the fraction of mutations which are slightly deleterious by examining the level of constraint in nuclear protein-coding genes in primates, rodents, and Drosophilids, three groups of organisms for which we can also estimate the average recent effective population size. We also examine the nature of any changes in constraint by examining the ratio of radical to conservative amino acid substitutions.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We used either introns or intergenic regions to estimate Ne because for the most part these regions are thought to be free of selection. We calculated average estimates for Ne, weighting by sequence length. We have used published estimates of where possible but have otherwise estimated the values using data retrieved from GenBank or provided by the authors of the papers cited. To estimate divergence, we randomly chose one of the sequences used to estimate nucleotide diversity and a single out-group sequence.
To estimate effective population sizes in primates, we used the 10-kb noncoding sequences from 1q24 (Yu et al. 2001
) and 22q11 (Zhao et al. 2000
) in humans and intron sequences from HoxB6 and ApoB in chimpanzees (Deinard and Kidd 2000
). We ignored the data from Xq13.3 in humans and chimpanzees because these are from a low-recombination region, so the diversity may be unduly influenced by background selection and genetic hitchhiking (Kaessman, Wiebe, and Paabo 1999
; Kaessman et al. 1999
). For humans we only considered diversity in African sequences because humans are thought to have expanded out of Africa; the African sequences are therefore likely to better reflect the effective population size of humans. For humans and chimpanzees we assumed a generation time of 25 years because studies of natural human and chimpanzee populations suggest generation times in excess of 27 and 23 years, respectively (see references in Eyre-Walker and Keightley [1999
]). We assumed a divergence time of 6 Myr (Goodman et al. 1998
).
For rodents, we used diversity data from intron (plus short adjoining lengths of exon) sequences from two X-linked genes surveyed in Mus domesticus, with M. caroli used as an out-group for the divergence data (Nachman 1997
). We only included two of the four genes surveyed because there is a correlation between nucleotide diversity and recombination rate in mice (Nachman 1997
), and two of the genes that were surveyed came from regions of low recombination. The two intron sequences came from the Glra2 and Amg genes. We inferred an evolutionary divergence date from a local molecular clock calibrated to the date of the Mus-Rattus divergence. But there is substantial uncertainty over the dates of rodent divergences, so we used two alternatives, the first from fossil evidence, which implies an age of 13 Myr for the Mus-Rattus divergence (Jaeger, Tong, and Denys 1986
), and the second from a recent molecular analysis, which implies a date of 23 Myr (Adkins et al. 2001
). We also assume that mice have 2 generations per year (see references in Keightley and Eyre-Walker [2000]
).
For D. melanogaster and D. simulans we used a recent compilation of noncoding sequences from African flies (Andolfatto 2001
). The noncoding sequences came from anon1A3, anon1E9, anon1G5, eve, per, vermilion, yp2, and zeste. We restricted our analysis to African lines of D. melanogaster and D. simulans because it is thought that non-African populations have gone through a recent population bottleneck; the African population is therefore likely to give a better estimate of the long-term Ne. We assume that there are 10 generations per year in Drosophila and an evolutionary divergence date of 2.5 Myr for D. melanogasterD. simulans (Powell and DeSalle 1995
). To estimate constraint we aligned D. melanogaster, D. simulans, and D. yakuba sequences using the D. yakuba sequence as an out-group.
Calculation of Constraint
We calculated levels of constraint using methods based on those described previously (Eyre-Walker and Keightley 1999
; Keightley and Eyre-Walker 2000
). We calculated rates for synonymous transitions (Kts4 and Kts2 for fourfold and twofold sites, respectively) and transversions (Ktv) by applying the methods of Bulmer, Wolfe, and Sharp (1991)
for twofolds and of Tamura and Nei (1993)
for fourfolds; both methods take into account the variation in GC content. We then calculated Kts, the average of Kts4 and Kts2 weighted by the numbers of sites. Under the assumption that Kts (Ktv) estimates the synonymous transition (transversion) mutation rate (i.e., the fixation probability of synonymous mutations is that of a truly neutral mutation), we obtained an estimate for the predicted rate of amino acid mutations in a gene from
![]() |
![]() |
Gene Sequence Data for Estimation of Constraint
We estimated levels of evolutionary constraint in protein-coding genes in samples of protein-coding gene sequences extracted from GenBank using ENTREZ. In the comparisons involving mammals, we controlled for differences in the level of evolutionary constraint induced by the specific properties of the samples of genes by compiling "four-way" sets involving mammal-A, mammal-B, mouse, and rat. We computed estimates of constraint between mammal-A and mammal-B and between the rodents; the rodent estimate acts as a control for effects specific to the gene sample. The coding sequences of homologous genes were aligned using CLUSTALX (Thompson, Higgins, and Gibson 1994
) and adjusted manually. Sequences corresponding to gaps or insertions were deleted to exclude nonhomologous gene segments from the analysis. For the comparison between human and chimpanzee, we included a set of genes compiled previously (Eyre-Walker and Keightley 1999
; Keightley and Eyre-Walker 2000
) for which the orthologous mouse and rat gene sequences were also available. These data were augmented by human-chimpanzee-mouse-rat homologues deposited in GenBank during 20002001.
Nature of Amino Acid Substitutions
To investigate the nature of changes in constraint, we used parsimony to count the number of conservative and radical amino acid replacements (NC and NR) in our interspecies comparisons. Amino acid changes were classified as conservative or radical according to a classification by polarity and volume (Zhang 2000
), which divides the 20 amino acids into six groups (special: C; neutral and small: A, G, P, S, T; polar and relatively small: N, D, Q, E; polar and relatively large: R, H, K; nonpolar and relatively small: I, L, M, V; nonpolar and relatively large: F, W, Y). Changes within an amino acid group are termed conservative, whereas amino acid changes between groups are termed radical.
We also estimated the rates of substitution at conservative and radical amino acid sites (DC and DR), as well as at synonymous sites (DS), using Zhang's method (Zhang 2000
), which accounts for biases in the relative rates of transition and transversion (the transition/transversion ratio was assumed to be 2 in Drosophila species and 3 in mammals), and in which multiple-hits correction is performed using the Jukes-Cantor formula. For the comparisons between Drosophila species, substitution rates down lineages were calculated using the least squares method (Zhang 2000
). The estimation of conservative and radical substitution rates allows measures of conservative and radical constraint: CC = 1 - DC/DS and CR = 1 - DR/DS.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We can use the differences in constraint in the two data sets, which show a significant difference in constraint, to make inferences about the shape of the distribution of fitness effects in these species (fig. 2
). Because the corrected level of constraint in humans and chimpanzees is 69%, this implies that 69% of all amino acid mutations have selection coefficients more negative than 1/4Ne(p), where Ne(p) is the long-term effective population size of humans and chimpanzees (note that when s = -1/4Ne, the fixation probability of a deleterious mutation is approximately one-half of the fixation probability of a neutral mutation). The difference in constraint between mouse-rat and human-chimpanzee (table 2
), therefore, implies that
15% of mutations have -1/4Ne(p) < s < -1/4Ne(r), where Ne(r) is the long-term effective population size of mouse-rat. If we take Ne(p) = 15,000 and Ne(r) = 220,000 (table 1
), the distribution of fitness effects is as shown in figure 2a.
Fay, Wycoff, and Wu (2001)
recently estimated that
20% of amino acid mutations in humans are slightly deleterious, which is similar to our conclusion. It does not seem appropriate to combine the data from the mammalian and Drosophila data sets because it seems unlikely that the distribution of fitness effects would be the same in the two groups; in fact, one might legitimately argue that the distribution of fitness effects is likely to be different in rodents and primates because these groups differ markedly in their level of social interaction. Using the D. simulans and D. melanogaster data, we estimate the distribution to be that shown in figure 2b.
We emphasize that these estimates are crude, relying as they do on several simplifying assumptions and the fact that we only have an estimate of the recent effective population size of our species and not their long-term Ne (see below). But they provide the first approximate estimate of how prevalent slightly deleterious mutations are.
|
Unfortunately, the situation is less clear in primates and rodents. These two groups differ in their level of synonymous codon bias (Mouchiroud, Gautier, and Bernardi 1988
), but we do not fully understand the basis of this difference (Eyre-Walker and Hurst 2001
). It is generally accepted that synonymous codon bias is declining in rodents (Mouchiroud, Gautier, and Bernardi 1988
; Galtier and Mouchiroud 1998
; Smith and Eyre-Walker 2002
), and recent evidence suggests that this may be the case in primates (L. Duret, personal communication). If this is the case, then constraint is likely to have been overestimated in both rodents and primates, although this overestimation is probably small for most genes.
There are potentially two sources of advantageous mutations to consider. First, it seems likely that there are slightly advantageous mutations if there are slightly deleterious mutations, and second, there may be strongly selected advantageous mutations contributing to adaptation. First, let us consider a model where there is a balance between weakly advantageous and deleterious mutations. If a slightly deleterious mutation A2 occurs at a site that was fixed for allele A1, and the strength of selection against it is -s, then an A1 mutation will have an advantage of +s at a site which is fixed for A2. At equilibrium, this model has little effect on either the predictions of the slightly deleterious model used here (i.e., species with large effective population sizes should have high levels of constraint) or the estimation of the shape of the distribution of fitness effects (we can simply replace Nes in figure 1 by |Nes|). But the nonequilibrium situation can be complicated because an increase in Ne can lead to a temporary decrease in constraint. This arises because sites at which selection has previously been ineffective will often be fixed for a deleterious mutation; when Ne increases, advantageous mutations can become fixed, leading to a temporary increase in the rate of evolution which will manifest itself as a decrease in constraint. But it seems likely that in each of the comparisons we have studied, the prevailing trend has been toward a decline in effective population size, rather than an increase.
It has been estimated that 35% and
45% of all amino acid substitutions are adaptive in humans (Fay et al. 2001
) and Drosophila (Bustamante et al. 2002;
Fay et al. 2002;
Smith and Eyre-Walker 2002
), respectively. The fixation of strongly advantageous mutations will reduce the level of constraint and thus lead to overestimation of the proportion of mutations which are slightly deleterious: if a proportion
of the substitutions are advantageous, then the proportion of mutations which are effectively neutral is (1 -
) Ka/M = (1 -
) (1 - C). For example, if we accept that 45% of substitutions in Drosophila are advantageous, then we estimate that 89% of mutations are more deleterious than 1/4Ne(mel), 4.4% lie between 1/4Ne(mel) and 1/4Ne(sim), with the remainder being less deleterious than 1/4Ne(sim). Rather more mutations will lie between two limits if the rate of adaptation is positively correlated to the effective population size. We might expect the rate of adaptation to be correlated to population size if the rate of adaptation is mutation limited because the rate of evolution under this model is equal to 2Nuxs, where u is the mutation rate, x is the proportion of mutations which are advantageous, and s is the average strength of selection in favor of the advantageous mutations (where s << 1 and Nes >> 1).
![]() |
Summary |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: slightly deleterious mutations
nearly neutral mutations
neutral theory
effective population size
Address for correspondence and reprints: Adam Eyre-Walker, School of Biological Sciences, University of Sussex, Brighton, BN1 9QG, United Kingdom. E-mail: a.c.eyre-walker{at}sussex.ac.uk
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adkins R. M., E. L. Gelke, D. Rowe, R. L. Honeycutt, 2001 Molecular phylogeny and divergence time estimates for major rodent groups: evidence from multiple genes Mol. Biol. Evol 18:777-791
Akashi H., 1996 Molecular evolution between Drosophila melanogaster and D. simulans: reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster Genetics 144:1297-1307
Akashi H., S. W. Schaeffer, 1997 Natural selection and the frequency distributions of "silent" DNA polymorphism in Drosophila Genetics 146:295-307
Andolfatto P., 2001 Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans Mol. Biol. Evol 18:279-290
Begun D., 2001 The frequency distribution of nucleotide variation in Drosophila simulans Mol. Biol. Evol 18:1343-1352
Bulmer M., K. H. Wolfe, P. M. Sharp, 1991 Synonymous substitution rates in mammalian genes: implications for the molecular clock and the relationships of mammalian orders Proc. Natl. Acad. Sci. USA 88:5974-5978[Abstract]
Bustamante C. D., R. Nielsen, S. A. Sawyer, K. M. Olsen, M. D. Purugganan, D. L. Hartl, 2002 The cost of inbreeding in Arabidopsis Nature 416:531-534[ISI][Medline]
Chao L., D. E. Carr, 1993 The molecular clock and the relationship between population size and generation time Evolution 47:688-690[ISI]
Deinard A. S., K. Kidd, 2000 Identifying conservation units within captive chimpanzee populations Am. J. Phys. Anthropol 111:25-44[ISI][Medline]
Eyre-Walker A., L. D. Hurst, 2001 The evolution of isochores Nat. Rev. Genet 2:549-555[ISI][Medline]
Eyre-Walker A., P. D. Keightley, 1999 High genomic deleterious mutation rates in hominids Nature 397:344-347[ISI][Medline]
Fay J., G. J. Wycoff, C.-I. Wu, 2001 Positive and negative selection on the human genome Genetics 158:1227-1234
Fay J., G. J. Wycoff, C.-I. Wu, 2002 Testing the neutral theory of molecular evolution with genomic data from Drosophila Nature 415:1024-1026[ISI][Medline]
Galtier N., D. Mouchiroud, 1998 Isochore evolution in mammals: a human-like ancestral structure Genetics 150:1577-1584
Goodman M., C. A. Porter, J. Czelusniak, S. L. Page, H. Schneider, J. Shoshani, G. Gunnell, C. P. Groves, 1998 Toward a phylogenetic classification of Primates based on DNA evidence complemented by fossil evidence Mol. Phylogenet. Evol 9:585-598[ISI][Medline]
Jaeger J. J., H. Tong, C. Denys, 1986 The age of the Mus-Rattus divergencepaleontological data compared with the molecular clock Cr. Acad. Sci 302:917-922
Johnson K. P., J. Seger, 2001 Elevated rates of nonsynonymous substitution in island birds Mol. Biol. Evol 18:874-881
Kaessman H., F. Heissig, A. von Haeseler, S. Paabo, 1999 DNA sequence variation in a non-coding region of low recombination on the human X chromosome Nat. Genet 22:78-81[ISI][Medline]
Kaessman H., V. Wiebe, S. Paabo, 1999 Extensive nuclear DNA sequence diversity among chimpanzees Science 286:1159-1162
Keightley P. D., A. Eyre-Walker, 2000 Deleterious mutations and the evolution of sex Science 290:331-333
Kimura M., 1968 Evolutionary rate at the molecular level Nature 217:624-626[ISI][Medline]
. 1983 The neutral theory of molecular evolution Cambridge University Press, Cambridge, U.K
Kliman R., 1999 Recent selection on synonymous codon usage in Drosophila J. Mol. Evol 49:343-351[ISI][Medline]
Li W.-H., M. Tanimura, P. M. Sharp, 1987 An evaluation of the molecular clock hypothesis using mammalian DNA sequences J. Mol. Evol 25:330-342[ISI][Medline]
Makalowski W., M. S. Boguski, 1998 Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences Proc. Natl. Acad. Sci. USA 95:9407-9412
McVean G., J. Vieira, 2001 Inferring parameters of mutation, selection and demography from patterns of synonymous site evolution in Drosophila Genetics 157:245-257
Mouchiroud D., C. Gautier, G. Bernardi, 1988 The compositional distribution of coding sequences and DNA molecules in humans and murids J. Mol. Evol 27:311-320[ISI][Medline]
Nachman M. W., 1997 Patterns of DNA variability at X-linked loci in Mus domesticus Genetics 147:1303-1316
. 1998 Deleterious mutations in animal mitochondrial DNA Genetica 102:61-69
Nielsen R., D. M. Weinreich, 1999 The age of nonsynonymous and synonymous mutations in animal mtDNA and implications for the mildly deleterious theory Genetics 153:497-506
Ohta T., 1972a. Evolutionary rate of cistrons and DNA divergence J. Mol. Evol 1:150-157.
. 1972b. Population size and rate of evolution J. Mol. Evol 1:305-314[ISI][Medline]
. 1973 Slightly deleterious mutant substitutions in evolution Nature 246:96-98[ISI][Medline]
. 1976 Role of slightly deleterious mutations in molecular evolution and polymorphism Theor. Popul. Biol 10:254-275[ISI][Medline]
. 1977 Extension of the neutral mutation drift hypothesis Pp. 148167 in M. Kimura, ed. Molecular evolution and polymorphism. National Institute of Genetics, Mishima, Japan
. 1992 The nearly neutral theory of molecular evolution Annu. Rev. Ecol. Syst 23:263-286[ISI]
. 1993 Amino acid substitution at the Adh locus of Drosophila is facilitated by small population size Proc. Natl. Acad. Sci. USA 90:45484551
. 1995 Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory J. Mol. Evol 40:56-63[ISI][Medline]
Ohta T., M. Kimura, 1971 On the constancy of the evolutionary rate of cistrons J. Mol. Evol 1:18-25[Medline]
Powell J. R., R. DeSalle, 1995 Drosophila molecular phylogenies and their uses Evol. Biol 28:87-138[ISI]
Rand D. M., L. M. Kann, 1996 Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice and humans Mol. Biol. Evol 13:735-748[Abstract]
Shields D. C., P. M. Sharp, D. G. Higgins, F. Wright, 1988 "Silent" sites in Drosophila are not neutral: evidence of selection among synonymous codons Mol. Biol. Evol 5:704-716[Abstract]
Smith N. G. C., A. Eyre-Walker, 2002 Adaptive protein evolution in Drosophila Nature 415:10221024
Smith N. G. C., A. Eyre-Walker, 2002 The compositional evolution of the murid genome J. Mol. Evol. 55:197201
Tamura K., M. Nei, 1993 Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees Mol. Biol. Evol 10:512-526[Abstract]
Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 ClustalWimproving the sensitivity of progressive multiple alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]
Weinreich D. M., D. M. Rand, 2000 Contrasting patterns of nonneutral evolution in proteins encoded in nuclear and mitochondrial genomes Genetics 156:385-399
Yu N., Z. Zhao, Y.-X. Fu, et al. (11 co-authors) 2001 Global patterns of human DNA sequence variation in a 10 kb region on chromosome 1 Mol. Biol. Evol 18:214-222.
Zhang D.-X., 2000 Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes J. Mol. Evol 50:56-68[ISI][Medline]
Zhao Z., L. Jin, Y.-X. Fu, et al. (13 co-authors) 2000 Worldwide DNA sequence variation in a 10-kilobase noncoding region on human chromosome 22 Proc. Natl. Acad. Sci. USA 97:11354-11358.