Department of Ecology and Genetics, University of Aarhus, Denmark
Much recent work focuses on the study of sequence evolution of fast-evolving RNA-based viruses such as HIV, influenza, and foot-and-mouth disease. From both an evolutionary and an epidemiological point of view, it is of interest to know whether sequences evolve according to a molecular clock or whether evolutionary rates vary among evolutionary lineages or over time. Present phylogenetic analyses based on the likelihood ratio test (Felsenstein 1981
; Huelsenbeck and Rannala 1997
) often reject a molecular clock (Elena, Gonzalezcandelas, and Moya 1992
; Holmes, Pybus, and Harvey 1999
, Kelsey, Crandall, and Voevodin 1999
), which, in turn, is often taken as evidence for rate variation caused by varying selection pressures (Holmes, Pybus, and Harvey 1999
). However, many viruses readily recombine (Robertson et al. 1995
; Holmes, Worobey, and Rambaut 1999
; Santti et al. 1999
), which implies that no single phylogenetic tree describes the genealogy of the sampled sequences. We have found that very small levels of recombination invalidate the likelihood ratio test of the molecular clock.
Data sets were simulated under the coalescent model with recombination (Hudson 1983
) with the scaled recombination rate
= 4Nr, where N is the effective population size and r is the recombination rate per gene per generation. For each value of
, 1,6002,000 replicates were used. The simulation program was written in C and can be accessed through http://www.daimi.au.dk/
compbio/. To mimic an average viral data set, we simulated 1,000-bp sequences evolving according to a Jukes-Cantor model of substitution with constant rate (i.e., a molecular clock) and an average distance between sequences of 20% divergence. From the simulated data sets, the maximum-likelihood values of the most likely phylogenies with and without the assumption of a molecular clock were compared using the DNAml and DNAmlk programs of PHYLIP (Felsenstein 1995
), assuming the Jukes-Cantor model under which the sequences were simulated. We restricted analysis to cases in which the two methods returned the same tree topology. This was done because use of the
2 distribution for likelihood ratio tests assumes that hypotheses are nested (Huelsenbeck and Rannala 1997
; Whelan and Goldman 1999
). In this case,
= -2
ln(likelihood) is approximately
2 distributed with n - 2 degrees of freedom, where n is the number of sequences (Felsenstein 1981
).
Table 1
shows for 10 sequences the percentage of cases in which the molecular clock is rejected using a 2(8) distribution at the 0.1%, 1%, and 5% levels for different rates of the population recombination rate
. Also shown is the mean of
, which is expected to be 8 for a
2(8) distribution. The three percentiles and the mean
value for
= 0 are in good agreement with the
2(8) distribution. However, even low levels of recombination cause a large proportion of false rejections of the molecular clock, and when
> 8, the clock is rejected in almost all cases. When
approaches infinity, all sequences are expected to be equidistant and a molecular clock should reappear, but no sign of this is observed even for our largest value of
, 64. Conditioning on the observed number of recombinations in the sequences, we found that the likelihood of rejecting the molecular clock exceeds 50% when the total number of recombinations in the history of the 10 sequences exceeds 6. We emphasize that six recombination events in many cases would not be detectable in data sets. The last column of table 1
shows the percentage of cases in which the same topologies were found with DNAml and DNAmlk. This percentage decreases with increasing
values, because recombination affects the topology. We also analyzed the remaining cases in which different topologies were found by forcing DNAmlk to use the same topology as that found by DNAml. This led to an even higher percentage of rejections of the clock when recombination was present; thus, the results of table 1
are an underestimate of the effect of recombination.
|
More complex substitution models than used here including rate variation over the sequence are expected to increase the likelihood of rejecting the molecular clock in most cases; thus, our estimates are conservative. We conclude that methods of testing the molecular clock that incorporate recombination or are independent of recombination would be very desirable.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: virus
molecular clock
likelihood ratio test
recombination
mtDNA
2 Address for correspondence and reprints: Mikkel Heide Schierup, Department of Ecology and Genetics, University of Aarhus, Building 540, Ny Munkegade, DK-8000 Aarhus C., Denmark. E-mail: mikkel.schierup{at}biology.au.dk
![]() |
literature cited |
---|
![]() ![]() ![]() |
---|
Awadalla, P., A. Eyre-Walker, and J. M. Smith. 1999. Linkage disequilibrium and recombination in hominid mitochondrial DNA. Science 286:25242525
Elena, S. F., F. Gonzalezcandelas, and A. Moya. 1992. Does the Vp1 gene of foot-and-mouth-disease virus behave as a molecular clock. J. Mol. Evol. 35:223229[ISI][Medline]
Felsenstein, J. 1981. Evolutionary trees from DNA-sequencesa maximum-likelihood approach. J. Mol. Evol. 17:368376[ISI][Medline]
. 1995. PHYLIP (phylogeny inference package). Version 3.572. Distributed by the author, Department of Genetics, University of Washington, Seattle
Holmes, E. C., O. G. Pybus, and P. H. Harvey. 1999. The molecular population dynamics of HIV-1. Pp. 177207 in K. A. Crandall, ed. The evolution of HIV. Johns Hopkins University Press, Baltimore, Md
Holmes, E. C., M. Worobey, and A. Rambaut. 1999. Phylogenetic evidence for recombination in dengue virus. Mol. Biol. Evol. 16:405409[Abstract]
Hudson, R. R. 1983. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183201[ISI][Medline]
Huelsenbeck, J. P., and B. Rannala. 1997. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276:227232
Kelsey, C. R., K. A. Crandall, and A. F. Voevodin. 1999. Different models, different trees: the geographic origin of PTLV-I. Mol. Phylogenet. Evol. 13:336347[ISI][Medline]
Korber, B., J. Theiler, and S. Wolinsky. 1998. Limitations of a molecular clock applied to considerations of the origin of HIV-1. Science 280:18681871
Robertson, D. L., P. M. Sharp, F. E. McCutchan, and B. H. Hahn. 1995. Recombination in HIV-1. Nature 374:124126
Santti, J., T. Hyypia, L. Kinnunen, and M. Salminen. 1999. Evidence of recombination among enteroviruses. J. Virol. 73:87418749
Whelan, S., and N. Goldman. 1999. Distributions of statistics used for the comparison of models of sequence evolution in phylogenies. Mol. Biol. Evol. 16:12921299
Zhu, T. F., B. T. Korber, A. J. Nahmias, E. Hooper, P. M. Sharp, and D. D. Ho. 1998. An African HIV-1 sequence from 1959 and implications for the origin of the epidemic. Nature 391:594597