Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh, Scotland
Ecole Pratique des Hautes Études and Centre National de la Recherche Scientifique Unité Mixte de Recherche 7625, Université Pierre et Marie Curie, Paris, France
Recently, we proposed two neutrality tests (Depaulis and Veuille 1998
) based on haplotype number (K) and haplotype diversity (H). They relied on coalescent simulations conditional on the observed number of segregating sites (S) following the coalescent simulation procedure proposed by Hudson (1993)
. In a companion letter, Markovtsova, Marjoram, and Tavaré (2001) use an alternative approach, based on the joint distribution of K and S, and show that the corresponding tests are not independent of the population mutational parameter
(
= 4Neµ, where Ne is the effective population size and µ is the neutral mutation rate per generation). They use the classical procedure of coalescent simulations conditional on
and restrict their distribution to the particular subset of genealogies consistent with a particular value of S. They show that if
is extreme, the probability of rejection can be substantially different from 5%.
In another companion letter, Wall and Hudson (2001) show that the test based on K is reasonably robust in its original form. They perform coalescent simulations conditional on for a wide range of values. In contrast to the previous approach, they consider all the outcomes of the neutral simulations with various S values and look at the corresponding (various) confidence intervals given by our (Depaulis and Veuille 1998
) procedure regardless of the
value in the input of their simulations. This latter approach may be a better representation of a neutral distribution of genealogies. They study various statistics, including K, and the resulting type I error given by the confidence interval of Depaulis and Veuille (1998)
remains close to 5% once corrected for the discreteness of the statistics.
In practice, the exact value of is unknown, and we generally have no information on its value independent of a given data set. One should find a reliable procedure to account for uncertainty on
. Replacing
with an estimate would not be conservative. Rather than conditioning on this unknown parameter, we chose to condition directly on the observed value of S.
In the present letter, we question the relevance of considering extreme values in addressing the robustness of the tests. We first show that those values of
that lead to nonrobust neutrality tests with our procedure are highly unlikely given S under a neutral model. Second, we show with a Bayesian approach that our tests conditional on S are reliable, thus confirming Wall and Hudson's (2001) simulation results by an alternative approach.
All values of are not equally likely given an observed value of S under the neutral model. Using Hudson's (1990)
recursion, we computed the probability of obtaining an S value equal to or more extreme than a given value (the parameter values used by Markovtsova, Marjoram, and Tavaré [2001, tables 1 and 2 ]). Only when
= 10 is the S value not highly unexpected (table 1
). For this
value, the tests are conservative according to Markovtsova, Marjoram, and Tavaré (2001, table 1
). We also computed Watterson's estimate of
given S and its confidence interval following Kreitman and Hudson's (1991)
method. The 95% confidence interval for
always shows a much smaller range (1.327) than the 1100 range used by Markovtsova, Marjoram, and Tavaré (2001). As pointed out by Wall and Hudson (2001), the fact that the observed S value is highly unexpected given
is a sufficient reason to reject the null Wright-Fisher neutral model, and there is no need to use any other neutrality test.
|
Following Markovtsova, Marjoram, and Tavaré's (2001) approach, one sensible alternative procedure would be to weight the probability of rejection for different values by f(
| Sn = s), the density probability of
given S. In the notation of Markovtsova, Marjoram, and Tavaré (2001),
|
The density of given S was obtained using a Bayesian approach similar to that followed by Fu (1998)
. We used a uniform prior distribution between 0 and 100 encompassing the range of values considered by Markovtsova, Marjoram, and Tavaré (2001)
. For the posterior distribution, Bayes' theorem gives
|
|
Acknowledgements
We thank N. Barton, M. Cobb, Y. X. Fu, I. Gordo, A. Navarro, and S. Otto for helpful discussions and comments on earlier versions of this manuscript, and S. Tavaré for providing Markovtsova, Marjoram, and Tavaré's (2001) manuscript via his website. A computer program that implements Markovtsova, Marjoram, and Tavaré's (2001) rejection algorithm and the H and K haplotype tests conditional on either S or and on a value of the population recombination parameter are available from smousset@snv.jussieu.fr. F.D. was supported by NERC and S.M. and M.V. were supported by Groupe de Recherche GDR 1928 of the Centre National de la Recherche Scientifique.
Footnotes
1 Keywords: coalescent theory
simulations
neutrality tests
haplotype distribution
2 Address for correspondence and reprints: Frantz Depaulis, Institute of Cell, Animal and Population Biology, Ashworth Laboratory, King's Buildings, West Mains Road, Edinburgh EH9 3JT, United Kingdom. frantz.depaulis{at}ed.ac.uk
literature cited
Depaulis, F., L. Brazier, and M. Veuille. 1999. Selective sweep at the Drosophila melanogaster Suppressor of Hairless locus and its association with the In(2L)t inversion polymorphism. Genetics 152:10171024
Depaulis, F., and M. Veuille. 1998. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:17881790
Fu, Y. X. 1998. Probability of a segregating pattern in a sample of DNA sequences. Theor. Popul. Biol. 54:110[ISI][Medline]
Hudson, R. R. 1983. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183201[ISI][Medline]
. 1990. Gene genealogies and the coalescent process. Pp. 144 in D. Futuyma and J. Antonovics, eds. Oxford surveys in evolutionary biology. Vol. 7. Oxford University Press, Oxford, England
. 1993. The how and why of generating gene genealogies. Pp. 2336 in N. Takahata and A. G. Clark, eds. Mechanism of molecular evolution. Japan Scientific Societies Press, Tokyo
Kelly, J. K. 1997. A test of neutrality based on interlocus associations. Genetics 146:11971206
Kreitman, M., and R. R. Hudson. 1991. Inferring the evolutionary histories of the Adh and Adh-dup loci in Drosophila melanogaster from pattern of polymorphism and divergence. Genetics 127:565582
Markovtsova, L., P. Marjoram, and S. Tavaré. 2001. On a test of Depaulis and Veuille. Mol. Biol. Evol. 18:11321133
Nielsen, R. 2000. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154:931942
Wall, J. D., and R. R. Hudson. 2001. Coalescent simulations and statistical tests of neutrality. Mol. Biol. Evol. 18:11341135
Watterson, G. A. 1975. On the number of segregation sites. Theor. Popul. Biol. 7:256276[ISI][Medline]