Department of Ecology and Evolution, University of Chicago
Recently, Depaulis and Veuille (1998)
proposed two new test statistics based on the number and frequency of different haplotypes. In a companion letter, Markovtsova, Marjoram, and Tavaré (2001) point out that Depaulis and Veuille (1998)
do not use the standard implementation for their coalescent simulations. Standard coalescent simulations first produce random genealogies, then place mutations at constant rate
/2 (
= 4Nµ is the population mutation parameter, where N is the effective population size and µ is the per-locus mutation rate per generation) along each of the branches (Kingman 1982a, 1982b
; Hudson 1990
). Instead, Depaulis and Veuille (1998)
generate distributions for their statistics by first constructing random genealogies, then placing S (the observed number of segregating sites) mutations on each tree. This "fixed S" method has been used before (e.g., Hudson 1993
; Rozas and Rozas 1999
), partly because it is easy to simulate, and partly because S is observed, while
must be estimated from the data (see, e.g., Fu 1996
). In fact, it is not clear how to estimate
independent of polymorphism data. Although the fixed S scheme does not directly use
, Markovtsova, Marjoram, and Tavaré (2001) highlight that the actual distributions of test statistics conditional on S are not independent of
. In particular, knowing both
and S changes the expected shape of a genealogy. For example, if S is unusually large given
, we expect the genealogy to be longer than average. Thus, the critical values in Depaulis and Veuille (1998)
might not be appropriate, since the actual rejection probabilities for their tests are functions of the unknown parameter
.
In this letter, we examined the type I errors of various statistical tests by simulation, and we determined what the actual rejection probability was for the fixed S method when data was simulated under standard coalescent assumptions. We considered three different test statistics: K (Strobeck 1987
; Fu 1996
; Depaulis and Veuille 1998
), D (Tajima 1989
), and D* (Fu and Li 1993
). First, we ran 105 replicates under the fixed S method to generate critical values (at the 5% level) for each possible value of S. Then, we ran 105 standard coalescent simulations with fixed mutation parameter
. For each trial, acceptance or rejection was determined from the fixed S critical values using the observed value of S. We tabulated what proportion of trials led to significantly low or significantly high values of each of the three test statistics. K is the number of distinct haplotypes in the sample, and D and D* are two commonly used test statistics that determine whether the frequencies of segregating mutations are consistent with the standard neutral model. An excess of low-frequency variants leads to negative D and D* values, while an excess of intermediate-frequency variants leads to positive D and D* values. To more accurately compare nominal and actual rejection probabilities, we used a randomized test (see, e.g., Lehmann 1986
, p. 71). All simulations were run with no recombination.
Table 1
shows results for a sample size of n = 50 and = 3.0, 5.0, 10.0, and 15.0. The actual rejection probabilities for K were near 5%, while those for D and D* were between 5% and 5.4%. In all cases, the fractions of trials rejected on each tail (i.e., the proportions that were significantly too high or significantly too low) were roughly equal. If a nonrandomized test was used (as would be done in practice), the actual rejection probabilities for K and D* were substantially lower (
3.6% for K and
4.4% for D*), while those for D were about the same. To check for the effect of sample size, we reran all of our simulations with n = 20. The actual (randomized) rejection probabilities were about the same as in those table 1
: for 3.0
20.0, rejection probabilities were
5% for K,
5.6% for D*, and
5.4% for D (results not shown).
|
|
Acknowledgements
We thank Y.-X. Fu, P. Marjoram, M. Przeworski, and an anonymous reviewer for helpful comments and discussions. J.D.W. was supported by NIH grant 5 R01 H610847.
Footnotes
1 Present address: Department of Organismic and Evolutionary Biology, Harvard University.
2 Keywords: Coalescent theory, neutrality tests
3 Address for correspondence and reprints: Jeffrey D. Wall, Department of Organismic and Evolutionary Biology, Harvard University, 2102 Biological Laboratories, 16 Divinity Avenue, Cambridge, Massachusetts 02138. jwall{at}oeb.harvard.edu
literature cited
Depaulis, F., and M. Veuille. 1998. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:17881790
Fu, Y.-X. 1996. New statistical tests of neutrality for DNA samples from a population. Genetics 143:557570
Fu, Y.-X., and W.-H. Li. 1993. Statistical tests of neutrality of mutations. Genetics 133:693709
Hudson, R. R. 1990. Gene genealogies and the coalescent process. Pp. 144 in D. Futuyma and J. Antonovics, eds. Oxford surveys in evolutionary biology. Vol. 7. Oxford University Press, New York
Hudson, R. R. 1993. The how and why of generating gene genealogies. Pp. 2336 in N. Takahata and A. G. Clark, eds. Mechanisms of molecular evolution. Sinauer, Sunderland, Mass
Kingman, J. F. C. 1982a. On the genealogy of large populations. J. Appl. Prob. 19A:2743
. 1982b. The coalescent. Stochastic Processes Appl. 13:235248
Lehmann, E. L. 1986. Testing statistical hypotheses. Wiley, New York
Markovtsova, L., P. Marjoram, and S. Tavaré. 2001. On a test of Depaulis and Veuille. Mol. Biol. Evol. 18:11321133
Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174175
Strobeck, C. 1987. Average number of nucleotide differences in a sample from a single subpopulation: a test for population subdivision. Genetics 117:149153
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585595
Wall, J. D. 1999. Recombination and the power of statistical tests of neutrality. Genet. Res. Camb. 74:6579[ISI]
Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256276[ISI][Medline]