The Genealogy of a Sequence Subject to Purifying Selection at Multiple Sites
Scott Williamson and
Maria E. Orive
Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence
 |
Abstract
|
---|
We investigate the effect of purifying selection at multiple sites on both the shape of the genealogy and the distribution of mutations on the tree. We find that the primary effect of purifying selection on a genealogy is to shift the distribution of mutations on the tree, whereas the shape of the tree remains largely unchanged. This result is relevant to the large number of coalescent estimation procedures, which generally assume neutrality for segregating polymorphismsapplying these estimators to evolutionarily constrained sequences could lead to a significant degree of bias. We also estimate the statistical power of several neutrality tests in detecting weak to moderate purifying selection and find that the power is quite good for some parameter combinations. This result contrasts with previous studies, which predicted low statistical power because of the minor effect that weak purifying selection has on the shape of a genealogy. Finally, we investigate the effect of Hill-Robertson interference among linked deleterious mutations on patterns of molecular variation. We find that dependence among selected loci can substantially reduce the efficacy of even fairly strong purifying selection.
 |
Introduction
|
---|
A gene genealogy represents the historical relationships among DNA sequences. As such, genealogies are closely related to many patterns of molecular variation, such as the sampling distribution of segregating sites (Watterson 1975
; Fu 1995
) and nucleotide diversity (Tajima 1983
). Also, because of their historical representation, genealogies provide a link between population genetic patterns of variation and phylogenetic patterns of evolution. Coalescent theory, which describes the statistical properties of genealogies, has been very successful in providing estimators for important population parameters and in distinguishing various models of evolution (reviewed in Hudson 1990
; Fu and Li 1999
). Most of these successes are based on the assumption that all new mutations are selectively neutral. For sequences subject to selection, the validity of these methods is contingent on selection having an insignificant effect on the genealogy.
A number of studies have investigated the statistical properties of genealogies for sequences subject to selection. (A brief note on terminology: in keeping with related studies such as Fu and Li 1993
, we use the term "branch" to refer to both internodes and leaves. This definition of branch is different from the formal graph theory definition. Further, "internal branch" refers to an internode, and "external branch" refers to a leafi.e., a branch that connects to a tip of the tree. Finally, we use "tree shape" as shorthand for the distribution of branch lengths among internal and external branches, without specific reference to tree topology.) For example, Charlesworth B, Morgan, and Charlesworth D (1993)
and Hudson and Kaplan (1994
, 1995
) have shown that background selection (i.e., very strong purifying selection at linked loci) can appreciably reduce the overall tree length, analogous to a reduction in effective population size. In contrast, for the case of balancing selection, Kaplan, Darden, and Hudson (1988)
have found that the total tree length can be considerably increased. Further, Kelly (1997)
and Kelly and Wade (2000)
found that balancing selection also affects the shape of the tree, giving rise to long internal branches. For weak purifying selection, Krone and Neuhauser (1997)
derived a new representation of genealogies called the ancestral selection graph (ASG), which is the weak selection analog of Kingman's (1982) neutral coalescent. Using their ASG methodology, Neuhauser and Krone (1997)
found that weak purifying selection has a negligible effect on the time back to the most recent common ancestor (MRCA) of all the sequences in a sample. Przeworski, Charlesworth, and Wall (1999)
and Slade (2000)
expanded this result to investigate the effect of weak purifying selection on tree shape, again finding virtually no effect of this type of selection. Further, using forward simulations and a correspondingly different set of assumptions, Golding (1997)
also found that weak purifying selection has a negligible effect on tree shape.
All these studies share two dominant themes. First, they focus on "structural" changes in the genealogy; e.g., changes in tree shape, tree length, or MRCA time. But because structural changes to the true genealogy are generally unobservable, these studies all assume that observed variation is neutral, whereas the variation on which selection acts is unseen. Under this assumption, mutations are distributed randomly over the entire treei.e., the expected number of mutations on any branch is simply proportional to the length of the branch. Thus, the number of mutations on a branch is an unbiased estimate of relative branch length. However, if the selected variation is observed segregating in the sample, selection could alter both the shape of the genealogy and the distribution of mutations on the tree (Golding, Aquadro, and Langley 1986
; Przeworski, Charlesworth, and Wall 1999
). For example, in a sequence subject to weak purifying selection, observed mutations are expected to be recent because older deleterious mutations would have already been lost from the population. Hence, the expected number of mutations on a branch would depend on the age of the branch (fig. 1
). A second theme of these earlier studies is that they generally consider selection at a single locus, which can be either completely or partially linked to the sequence sampled. These studies cannot be readily expanded to multiple selected sites because, with low levels of recombination, distinct selected sites within a sequence do not evolve independently. For example, the buildup of negative linkage disequilibrium in regions of low recombination reduces the efficacy of directional selectiona process known as Hill-Robertson interference (Hill and Robertson 1966
; McVean and Charlesworth 2000
). Przeworski, Charlesworth, and Wall (1999)
attempted to deal with this problem by using the ASG to simulate an infinitely many sites mutation model. However, because of the limitations of the ASG methodology, they could only simulate very weak selection (2Ns = 0.2).

View larger version (9K):
[in this window]
[in a new window]
|
Fig. 1.The potential action of selection on a genealogy. (a) A neutral genealogy, in which the expected number of mutations on a branch is proportional to its length. (b) A genealogy in which selection has altered the shape of the tree, in this case, increasing the relative length of the external branches. (c) A genealogy with an apparently neutral shape but with a nonrandom distribution of mutations caused by selection. Observed mutations tend to be recent and occur on the external branches
|
|
The study of genealogies with selection leaves two open questions (1) How does selection at multiple sites affect the distribution of mutations on the tree? and (2) how does mutual dependence among selected sites affect the shape of a genealogy and other patterns of molecular variation? This study uses simulations to address these two questions for the specific case of weak and moderate purifying selection. Because mildly deleterious mutations are eliminated slowly from a population, they are likely to be observed segregating in a sample. Therefore, if a large proportion of new mutations are mildly deleterious, weak purifying selection could lead to an appreciable shift in the distribution of observed mutations on a tree. Also, the long sojourn times of mildly deleterious mutations allow multiple selected sites to segregate simultaneously, which can lead to Hill-Robertson interference. McVean and Charlesworth (2000)
have investigated the effect of interference on sojourn times, fixation probabilities, and the sampling distribution of segregating sites. They use a reversible-mutation model (Bulmer 1991
) with identical mutational fitness effects at each site, which is generally used as a basis for investigations of codon bias. However, this mutation model is probably inappropriate for nonsynonymous changes and mutations in noncoding DNA. Therefore, we investigate the important limiting case of unconditionally deleterious mutation with varying distributions of mutational fitness effects.
We also use our simulations to assess the power of several neutrality tests (Tajima's [1989]
D test and Fu and Li's [1993]
D, D*, F, and F* tests) for detecting weak and intermediate strengths of purifying selection. Under the assumption that selected variation is unobserved, Golding (1997)
and Przeworski, Charlesworth, and Wall (1999)
suggest that statistical power should be low because weak purifying selection at a single locus has only a small effect on the shape of a genealogy. However, if (1) selection at multiple sites has a synergistic effect on tree shape compared with selection at a single site or if (2) the primary effect of selection is to shift the distribution of mutations on the tree, then Tajima's and Fu and Li's tests might be able to detect purifying selection. McVean and Charlesworth (2000)
and Tachida (2000)
both use simulations of weak selection at multiple sites to estimate the power of Tajima's (1989)
D statistic and some of Fu and Li's (1993)
statistics. However, both studies allow adaptive, as well as deleterious, mutation, which could potentially override the effect of weak purifying selection on the statistics.
Przeworski, Charlesworth, and Wall (1999)
pointed out the limitations of current retrospective simulation methods (e.g., Krone and Neuhauser 1997
; Neuhauser and Krone 1997
; Slade 2000
) for modeling selection at multiple linked sites; the necessary assumptions regarding the genetic system are very restrictive, and expanding to multiple sites is labor-intensive. Therefore, following Golding (1997)
, we use forward simulations which track ancestry in each generation. But, rather than using Golding's single-locus, two-allele model with symmetrical mutation, we consider an infinitely many sites mutation model within a nonrecombining sequence. In addition, we track the mutational history of each individual in the population. This allows us to determine the distribution of mutations on the tree.
 |
Materials and Methods
|
---|
We ran stochastic simulations of a single nonrecombining sequence forward in time; in each generation, we kept track of the ancestry of each gene in all previous generations. A summary of all simulation parameters and output statistics is given in table 1
. Selection and reproduction followed the Wright-Fisher model for diploids. Individual fitness was determined as the average fitness of the two parental genes, thus assuming no dominance. Mutation occurred at rate µ per sequence per generation. The deleterious fitness effect, s, of each new mutation was drawn from a gamma distribution of mutational effects with mean
and shape parameter ß. The gamma distribution is represented by the density function
The gamma distribution was used because it can approximate a wide variety of distributions. For instance, when the shape parameter ß = 1, the gamma reduces to an exponential distribution with mean
. For reasonably large ß (>5), the gamma is an approximately symmetrical, bell-shaped curve. And as beta tends to infinity, the gamma approximates an equal-effects model of mutation. Thus, as ß increases, the variance decreases. There was no epistasisfitness effects of multiple mutations were combined multiplicatively. We assumed that each new mutation is unique, following an infinitely many sites model. At the end of each generation, the absolute mean fitness of the entire population was reset to unityi.e., we used a soft selection model. Otherwise, with unconditionally deleterious mutation and no recombination, Muller's ratchet (Muller 1964
) could have driven the mean fitness close to zero.
At the beginning of each run, the population was allowed 8N generations to reach selective equilibrium; during this time, we did not follow ancestry. This established the founding generation. (We have done simulations with a variable number of generations [2N, 4N, 8N, and 16N] to establish mutation-selection-drift equilibrium; we find that the length of time the population is run before the founding generation makes no difference in output statistics.) Next, the population was run until coalescence occurredthat is, until all individual genes were descendents of a single gene in the founding generation. At this point, it would not have been appropriate to end the run; the distribution of all statistics would have been conditional on the fact that coalescence of the entire population had just occurred. Therefore, the population was run for an additional 7.5N generations to eliminate any conditionality. We do not know of any theory that predicts the necessary time to eliminate this sort of conditionality. We used 7.5N generations after coalescence because, at this level, simulations of the neutral case achieved reasonable accordance with the neutral expectations for all statistics (data not shown).
At the end of each run, n sequences were randomly sampled from the population. For this sample, a genealogy was constructed from the ancestral information stored with each sequence. Three statistics were recorded to summarize the shape of the simulated trees (1) the sum of the external branch lengths Ln, (2) the total tree length Jn, and (3) the time to the most recent common ancestor (MRCA). Three mutation-based statistics were also calculated (1) the total number of segregating sites, S; (2) nucleotide diversity,
; and (3) the number of external mutations,
e. External mutations are defined as mutations occurring on branches that connect to the tips of the genealogy. For large sample sizes, the number of external mutations is generally equal to or very close to the number of singletons (i.e., the number of polymorphic sites for which the less frequent nucleotide is represented only once in the sample). We conducted power analyses for Tajima's (1989)
D test and Fu and Li's (1993)
D, D*, F, and F* tests. For Tajima's D test, we used the critical values of Simonsen, Churchill, and Aquadro (1995)
, which are contingent on the observed number of segregating sites. Fu and Li's (1993)
D and F tests require a single out-group sequence to determine character polarity at segregating sites. However, because we kept track of the history of new mutations, character polarity was implicit to the information stored by the program. Thus, we did not need to simulate a sister population to generate an out-group sequence.
We ran 1,000 simulations for each combination of parametersmutation rate µ, mean fitness effect of mutations
, and shape parameter ß. Unless otherwise noted, simulations were run with a sample size of n = 50 and a diploid population size of N = 250. The simulations were written in the C programming languagecopies of the source code are available upon request from the corresponding author. For a random number generator, we used the function ran1 from Press et al. (1992, p. 280).
To assess the realism of our simulations, we ran simulations with solely neutral mutations and compared these with the neutral expectations. The means of all the mutation-based and tree-based summary statistics we considered were within 2.5 standard errors of their respective neutral expectations (table 2
)most were actually much closer than this. The most substantial difference between our simulations and the neutral coalescent was that the standard deviations of many of the statistics were somewhat lower than expected. This is possibly because we only ran our simulations 7.5N generations after coalescence, and therefore we might have missed a few very extreme events. To determine whether our results were scalable to larger populations with smaller mutation rates and mean selective effects, we also ran simulations with variable population sizes but with constant values of 4Nµ and 2N
. Consistent with diffusion theory (Crow and Kimura 1970; Sawyer and Hartl 1992
), the neutral coalescent (Kingman 1982; Hudson 1990
), and similar simulations (e.g., McVean and Charlesworth 2000
), we found that the actual population size is unimportant when the mutation rate, selection coefficient, and time are scaled by N (table 3
). Over the range of diploid population sizes we considered (N = 125, 500), all the mutational statistics (S,
e, and
) and their standard deviations were virtually constant. Also, when branch length is scaled by N, all the tree statistics were constant.
View this table:
[in this window]
[in a new window]
|
Table 2 A Comparison of Average Mutation and Tree Statistics with Their Neutral Expectations for Simulations Without Selection
|
|
View this table:
[in this window]
[in a new window]
|
Table 3 Average Tree-Based and Mutation-Based Statistics for Simulations with Constant Values of 4Nµ and 2N but with Variable Population Sizes
|
|
 |
Results and Discussion
|
---|
Tree Shape Versus Distribution of Mutations
The results for both tree and mutation statistics are presented in table 4
. Among mutation statistics, selection had the largest effect on nucleotide diversity; for the larger values of ß (i.e., symmetrical distributions of fitness effects with lower variance), the average nucleotide diversity was reduced by almost 10-fold for every 10-fold increase in the strength of selection. Selection also had a strong effect on the number of segregating sites. The number of external mutations was not substantially affected except in the case of strong selection. In contrast to the mutation statistics, selection only had a moderate effect on the tree statistics, which is consistent with the single-locus results of Golding (1997)
, Neuhauser and Krone (1997)
, Przeworski, Charlesworth, and Wall (1999)
, and Slade (2000)
. The total tree length and MRCA time were affected the most by intermediate strengths of selection. But this departure was minor relative to the stochasticity inherent to genealogies. No tree statistic strayed from its neutral expectation by much more than one neutral standard deviation.
View this table:
[in this window]
[in a new window]
|
Table 4 Average Tree-Based and Mutation-Based Statistics for Simulations with Variable Mean Selective Coefficients and Variable Distributions of Fitness Effects
|
|
A straightforward way to describe the shift in the distribution of mutations is by contrasting the average proportion of the tree length composed of external branches (
) with the average proportion of new mutations that are external (
). For neutral mutations, these two ratios are expected to be equal;
>
indicates that the distribution of mutations has shifted toward the tips of the external branches, and
<
indicates a shift toward the root of the tree. We find that
changes very little with increasing levels of selection for all distributions of mutational fitness effects. The largest, although still modest, departure from the neutral value occurs at intermediate strengths of selection with the higher values of ß and 4Nµ (fig. 2
). In sharp contrast,
changes markedly with increasing strengths of selection. With 4Nµ = 10 and 2N
= 100, more than 80% of all mutations are external, compared with just 24% in the neutral case (fig. 2
). Clearly, purifying selection at multiple sites can cause a major shift in the distribution of mutations toward the tips of the tree.

View larger version (21K):
[in this window]
[in a new window]
|
Fig. 2.The average proportion of tree length composed of external branches (circles) and the average proportion of segregating mutations that are external (squares) for different mutation rates and distributions of mutational effects
|
|
The shift in the distribution of mutations on the tree is particularly relevant to the large number of coalescent estimation procedures that have been developed over the last decade. All these methods assume that observed segregating mutations are selectively neutral, and applying these methods to constrained regions of the genome could lead to considerable bias. Methods have been developed to estimate migration rate (Slatkin and Maddison 1989
; Nath and Griffiths 1996
; Wakeley 1998
; Beerli and Felsenstein 1999
), effective population size (Orive 1993
; Li and Fu 1994
; Kuhner, Yamato, and Felsenstein 1995
), population growth rate (Kuhner, Yamato, and Felsenstein 1998
), divergence time for isolated populations (Wakeley and Hey 1997
), ancestral population size (Wakeley and Hey 1997
), admixture proportions (Bertorelle and Excoffier 1998
), and the per-site recombination rate (Hey and Wakeley 1997
; Kuhner, Yamato, and Felsenstein 2000
). Wakeley and Hey's (1997)
method for estimating divergence time provides a good example of how weak purifying selection could bias coalescent estimators. They divide the total number of segregating sites into (1) sites with polymorphisms shared by both populations, (2) sites polymorphic in only one population, and (3) fixed differences between populations. If one assumes no homoplasy, shared polymorphisms must have arisen in the ancestral population, whereas the other two classes of segregating sites could have arisen before or after divergence. Our results indicate that weak purifying selection can cause a substantial reduction in the number of "old" mutations, which, in this example, would lead to a lower than expected estimate for the number of shared polymorphisms and would have a lesser effect on other classes of polymorphism. Using Wakeley and Hey's (1997)
method, weak purifying selection against slightly deleterious mutations could lead to a substantial overestimate of divergence time. This is just one example of how selection is an important source of bias for studies that use genealogical information from constrained regions of the genome.
Evolutionary studies of deleterious mutation generally focus on either very slightly deleterious mutation (s
1/2N; e.g., Ohta 1973
) or strongly deleterious mutation (s > 1%; e.g., Charlesworth B, Morgan, and Charlesworth D 1993
), while paying fairly little attention to the transition between the two. Our results indicate that some predictions for the effect of strongly deleterious mutations hold for even relatively weak selection. Studies of background selection (Charlesworth B, Morgan, and Charlesworth D 1993
; Hudson and Kaplan 1994
, 1995
) predict that the effect of strong selection against recurrent deleterious mutation should be roughly equivalent to the effect of a reduction in effective population size. Specifically, for an equal-effects mutation model, Charlesworth B, Morgan, and Charlesworth D (1993,
Eq. 4) predict that the expected length of each branch should be equal to the neutral branch length multiplied by a factor of
where h is the dominance coefficient (for our simulations, h = 1/2). For 2N
= 30 and 2N
= 100, our results agree reasonably well for the background selection predictions (fig. 3
), especially for mutation models with low variance (the background selection predictions are based on an equal-effects mutation model). Although 2N
= 30 and 2N
= 100 are strong in the context of our simulations, these values represent unmeasurable strengths of selection for a reasonably large effective population size. For example, if N = 100,000, the selection coefficients are 0.0003 and 0.001, respectively.

View larger version (20K):
[in this window]
[in a new window]
|
Fig. 3.Tree length as a function of the strength of selection for different distributions of mutational effects. The hatched line is the neutral expectation, and the solid line is the background selection expectation
|
|
The consequence of variation in mutational fitness effects is rather complex. The relative impact of different mutation models depends on the strength of selection. For example, tree length was affected the most under the equal-effects mutation model (ß
) for weak and intermediate selection (fig. 3
). But for strong selection, tree length was most affected under an exponential distribution of mutational effects (ß = 1). We suggest that, relative to the equal-effects case, increasing the variance in the distribution of mutational effects is analogous to a simultaneous reduction in selective coefficient and population size. For distributions with high variance (low ß), a large class of mutations will be so strongly selected against that they can contribute little to polymorphism. This class of mutations will have an effect analogous to a reduction in population size (Charlesworth B, Morgan, and Charlesworth D 1993
; Hudson and Kaplan 1994
, 1995
). In addition, among mutations that contribute to polymorphism, weakly selected mutations will be disproportionately represented in the population. Consequently, this class of mutations will disproportionately affect tree statistics, which is analogous to a reduction in mean fitness effect.
Interference Between Sites
Sawyer and Hartl (1992)
and Hartl, Moriyama, and Sawyer (1994)
develop predictions for the frequency distribution of segregating sites subject to selection under the assumption of free recombination. We can compare these predictions to our no-recombination results to explore the effect of interference. Sawyer and Hartl (1992)
find that the limiting density function for population mutant frequencies is
They used this result to derive the expected number of segregating sites in a sample of size n
Also, Hartl, Moriyama, and Sawyer (1994)
expanded this result to the entire frequency distribution of segregating sites, represented as the expected number of sites with new mutations shared by r sequences in a sample of size n
All these results assume that all mutations have equal fitness effects, and H(50), H(2), and M(1,50) are directly comparable to our equal-effects (ß
) simulation results for S,
, and
e. Free-recombination expectations for the frequency distribution of segregating sites with variable mutational fitness effects can be achieved through a simple modification to Sawyer and Hartl's (1992)
results. Namely, the limiting density function for population mutant frequencies is simply the equal-effects density function f of selective effect s multiplied by the distribution of mutational fitness effects (in our case, the gamma distribution g), then integrated over all possible s
The arguments of Sawyer and Hartl (1992)
and Hartl, Moriyama, and Sawyer (1994)
which lead from the population frequency density to the distribution of segregating sites in a sample apply similarly here. Therefore, the expected number of segregating sites is
and the frequency distribution of segregating sites is
H*(50), H*(2), and M*(1,50) can be compared with our results for S,
, and
e with ß = 1 and ß = 10.
A comparison of our results with the free-recombination expectations is shown in figure 4
. For all selection coefficients, interference increased the level of observed variation, reflected in S,
, and
e. In contrast, under a reversible mutation, finite-sites model, McVean and Charlesworth (2000,
Fig. 3
) find that interference decreases standing variation for weak selection. Further, McVean and Charlesworth (2000)
reasoned that interference should have the largest effect with weak and intermediate strengths of selectionstrongly deleterious mutations should not be appreciably affected because they are maintained at such low frequencies, reducing the opportunity for negative linkage disequilibrium. For the situation we consider, this reasoning depends on whether the effect of interference is considered relatively or absolutely. When the effect of interference is represented as the ratio of the observed mean divided by the free-recombination expectation, as in McVean and Charlesworth (2000,
Figs. 13
), we found that the magnitude of interference increased monotonically with the strength of selection. It is unclear whether this trend would continue for stronger selection. However, when the effect of interference is represented as the absolute difference between the observed mean and the free-recombination expectation, interference had the maximum effect for weak to intermediate levels of selection.
Statistical Power
For most parameter combinations, the statistical power of all tests considered was quite low (fig. 5
). But, for intermediate and strong selection with relatively high mutation rates, Fu and Li's (1993)
and Tajima's (1989)
tests achieved good power. For a two-tailed test at the 95% confidence level, power was as high as 80%. It should be noted, however, that this power estimation is based on the assumption that all mutations segregating in the sample are at least slightly deleterious. If neutral mutations were added, statistical power could be diluted. Because purifying selection has only a minor effect on the shape of the tree, neutral mutations would continue to "look" neutral even though they were linked to deleterious mutations; i.e., with close to the neutral expectation for nucleotide diversity, number of segregating sites, and number of external mutations. Also, because deleterious mutations are generally maintained at low frequencies, they would be underrepresented in a sample of sequences with both neutral and deleterious mutations. For example, in our simulations, statistical power was the highest for 4Nµ = 10 and 2N
= 30. For these parameters, if neutral mutations occurred at the same rate, then, extrapolating from tree statistics, one would expect roughly 77% of all mutations segregating in the sample to be neutral. Therefore, the evolutionary signal caused by purifying selection at multiple sites could be partially obscured by neutral mutation.

View larger version (30K):
[in this window]
[in a new window]
|
Fig. 5.Statistical power to detect purifying selection for Tajima's D test and Fu and Li's D, D*, F, and F* tests. Statistical power is measured as the fraction, out of 1,000 simulated samples, of two-tailed tests rejecting neutrality for P < 0.05. For all simulations, N = 250. The different graphs show results for different mutation models, ß = 1, 10, and , and for different scaled mutation rates, 4Nµ = 3 and 10
|
|
 |
Conclusions
|
---|
Our most important result is that purifying selection at multiple sites has a much stronger effect on the distribution of mutations than on the shape of the genealogy. Although previous studies have presented heuristic arguments that weak purifying selection should shift the distribution of mutations toward the tips of the genealogy (e.g., Fu and Li 1993
; Akashi 1999
), to our knowledge, ours is the first study to quantify this shift. The shift in the distribution of mutations could considerably bias the many coalescent estimators when applied to constrained regions of the genome. This potential bias is particularly important in the light of studies that suggest that most nonsynonymous mutations (e.g., Fay, Wyckoff, and Wu 2001
) and many synonymous mutations (reviewed in Sharp et al. 1995
; Akashi and Eyre-Walker 1998
) are subject to purifying selection.
 |
Acknowledgements
|
---|
We would like to thank John Kelly for many invaluable discussions and suggestions on the manuscript. We also thank Brian Golding, Peter Waddell, and one anonymous reviewer for helpful comments on the manuscript. This work was supported by a National Science Foundation Predoctoral Fellowship to S.W. and by a University of Kansas New Faculty General Research Fund Grant and National Science Foundation Grant DEB-0108242 to M.E.O.
 |
Footnotes
|
---|
Brian Golding, Reviewing Editor
Keywords: purifying selection
coalescence theory
molecular evolution 
Address for correspondence and reprints: Scott Williamson, Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas 66045. scottw{at}ku.edu 
 |
References
|
---|
Akashi H., 1999 Within- and between-species DNA sequence variation and the footprint of natural selection Gene 238:39-51[ISI][Medline]
Akashi H., A. Eyre-Walker, 1998 Translational selection and molecular evolution Curr. Opin. Genet. Dev 8:688-693[ISI][Medline]
Beerli P., J. Felsenstein, 1999 Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach Genetics 152:763-773[Abstract/Free Full Text]
Bertorelle G., L. Excoffier, 1998 Inferring admixture proportions from molecular data Mol. Biol. Evol 15:1298-1311[Abstract/Free Full Text]
Bulmer M. G., 1991 The selection-mutation-drift theory of synonymous codon usage Genetics 129:897-907[Abstract/Free Full Text]
Charlesworth B., M. T. Morgan, D. Charlesworth, 1993 The effect of deleterious mutation on neutral molecular variation Genetics 134:1289-1303[Abstract/Free Full Text]
Crow J. F., M. Kimura, 1970 An introduction to population genetics theory Burgess, Minneapolis, Minn
Fay J. C., G. J. Wyckoff, C. I. Wu, 2001 Positive and negative selection on the human genome Genetics 158:1227-1234[Abstract/Free Full Text]
Fu Y. X., 1995 Statistical properties of segregating sites Theor. Popul. Biol 48:172-197[ISI][Medline]
Fu Y. X., W. H. Li, 1993 Statistical test of neutrality of mutations Genetics 133:693-709[Abstract/Free Full Text]
. 1999 Coalescing into the 21st century: an overview and prospects of coalescent theory Theor. Popul. Biol 56:1-10[ISI][Medline]
Golding G. B., 1997 The effect of purifying selection on genealogies Pp. 271285 in P. Donnelly and S. Tavare, eds. Progress in population genetics and human evolution, Vol. 87. IMA volumes in mathematics and its applications. Springer-Verlag, New York
Golding G. B., C. F. Aquadro, C. H. Langley, 1986 Sequence evolution within populations under multiple types of mutation Proc. Natl. Acad. Sci. USA 83:427-431[Abstract]
Hartl D. L., E. N. Moriyama, S. A. Sawyer, 1994 Selection intensity for codon bias Genetics 138:227-234[Abstract/Free Full Text]
Hey J., J. Wakeley, 1997 A coalescent estimator of the population recombination rate Genetics 145:833-846[Abstract/Free Full Text]
Hill W. G., A. Robertson, 1966 The effect of linkage on limits to artificial selection Genet. Res 8:269-294[ISI][Medline]
Hudson R. R., 1990 Gene genealogies and the coalescent process Oxf. Surv. Evol. Biol 1:1-14
Hudson R. R., N. L. Kaplan, 1994 Gene trees with background selection Pp. 140153 inB. Golding, ed. Non-neutral evolution: theories and molecular data. Chapman and Hall, New York
. 1995 The coalescent process and background selection Philos. Trans. R. Soc. Lond. B 349:19-23[ISI][Medline]
Kaplan N. L., T. Darden, R. R. Hudson, 1988 The coalescent process in models with selection Genetics 120:819-829[Abstract/Free Full Text]
Kelly J. K., 1997 A test of neutrality based on interlocus associations Genetics 146:1197-1206[Abstract/Free Full Text]
Kelly J. K., M. J. Wade, 2000 Molecular evolution near a two-locus balanced polymorphism J. Theor. Biol 204:83-101[ISI][Medline]
Kingman J. F. C., 1982 The coalescent Stochastic Process. Appl 13:235-248
Krone S. K., C. Neuhauser, 1997 Ancestral processes with selection Theor. Popul. Biol 51:210-237[ISI][Medline]
Kuhner M. K., J. Yamato, J. Felsenstein, 1995 Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling Genetics 140:1421-1430[Abstract/Free Full Text]
. 1998 Maximum likelihood estimation of population growth rates based on the coalescent Genetics 149:439-434
. 2000 Maximum likelihood estimation of recombination rates from population data Genetics 156:1393-1401[Abstract/Free Full Text]
Li W.-H., Y.-X. Fu, 1994 Estimation of population parameters and detection of natural selection from DNA sequences Pp. 112126 inB. Golding, ed. Non-neutral evolution: theories and molecular data. Chapman and Hall, New York
McVean G. A. T., B. Charlesworth, 2000 The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation Genetics 155:929-944[Abstract/Free Full Text]
Muller H. J., 1964 The relation of recombination to mutational advance Mutat. Res 1:2-9[ISI]
Nath H., R. Griffiths, 1996 Estimation in an island model using simulation Theor. Popul. Biol 50:227-253[ISI][Medline]
Neuhauser C., S. K. Krone, 1997 The genealogy of samples in models with selection Genetics 145:519-534[Abstract/Free Full Text]
Ohta T., 1973 Slightly deleterious mutant substitutions in evolution Nature 246:96-98[ISI][Medline]
Orive M. E., 1993 Effective population size in organisms with complex life-histories Theor. Popul. Biol 44:316-340[ISI][Medline]
Press W. H., B. P. Flannery, S. A. Teukolsky, W. T. Vetterling, 1992 Numerical recipes in C: the art of scientific computing Cambridge University Press, Cambridge, U.K
Przeworski M., B. Charlesworth, J. D. Wall, 1999 Genealogies and weak purifying selection Mol. Biol. Evol 16:246-252[Abstract]
Sawyer S. A., D. L. Hartl, 1992 Population genetics of polymorphism and divergence Genetics 132:1161-1176[Abstract/Free Full Text]
Sharp P. M., M. Averof, A. T. Lloyd, G. Matassi, J. F. Peden, 1995 DNA sequence evolution: the sounds of silence Philos. Trans. R. Soc. Lond. B 349:241-247[ISI][Medline]
Simonsen K. L., G. A. Churchill, C. F. Aquadro, 1995 Properties of statistical tests of neutrality for DNA polymorphism data Genetics 141:413-429[Abstract/Free Full Text]
Slade P. F., 2000 Simulation of selected genealogies Theor. Popul. Biol 57:35-49[ISI][Medline]
Slatkin M., W. P. Maddison, 1989 A cladistic measure of gene flow inferred from the phylogenies of alleles Genetics 123:603-613[Abstract/Free Full Text]
Tachida H., 2000 Molecular evolution in a multisite nearly neutral model J. Mol. Evol 50:69-81[ISI][Medline]
Tajima F., 1983 Evolutionary relationship of DNA sequences in finite populations Genetics 105:437-460[Abstract/Free Full Text]
. 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism Genetics 123:585-595[Abstract/Free Full Text]
Wakeley J., 1998 Segregating sites in Wright's island model Theor. Popul. Biol 53:166-174[ISI][Medline]
Wakeley J., J. Hey, 1997 Estimating ancestral population parameters Genetics 145:847-855[Abstract/Free Full Text]
Watterson G. A., 1975 On the number of segregating sites in genetical models without recombination Theor. Popul. Biol 7:256-276[ISI][Medline]
Accepted for publication April 24, 2002.