Recombination and the Molecular Clock

Mikkel H. SchierupGo, and Jotun Hein

Department of Ecology and Genetics, University of Aarhus, Denmark

Much recent work focuses on the study of sequence evolution of fast-evolving RNA-based viruses such as HIV, influenza, and foot-and-mouth disease. From both an evolutionary and an epidemiological point of view, it is of interest to know whether sequences evolve according to a molecular clock or whether evolutionary rates vary among evolutionary lineages or over time. Present phylogenetic analyses based on the likelihood ratio test (Felsenstein 1981Citation ; Huelsenbeck and Rannala 1997Citation ) often reject a molecular clock (Elena, Gonzalezcandelas, and Moya 1992Citation ; Holmes, Pybus, and Harvey 1999Citation , Kelsey, Crandall, and Voevodin 1999Citation ), which, in turn, is often taken as evidence for rate variation caused by varying selection pressures (Holmes, Pybus, and Harvey 1999Citation ). However, many viruses readily recombine (Robertson et al. 1995Citation ; Holmes, Worobey, and Rambaut 1999Citation ; Santti et al. 1999Citation ), which implies that no single phylogenetic tree describes the genealogy of the sampled sequences. We have found that very small levels of recombination invalidate the likelihood ratio test of the molecular clock.

Data sets were simulated under the coalescent model with recombination (Hudson 1983Citation ) with the scaled recombination rate {rho} = 4Nr, where N is the effective population size and r is the recombination rate per gene per generation. For each value of {rho}, 1,600–2,000 replicates were used. The simulation program was written in C and can be accessed through http://www.daimi.au.dk/~compbio/. To mimic an average viral data set, we simulated 1,000-bp sequences evolving according to a Jukes-Cantor model of substitution with constant rate (i.e., a molecular clock) and an average distance between sequences of 20% divergence. From the simulated data sets, the maximum-likelihood values of the most likely phylogenies with and without the assumption of a molecular clock were compared using the DNAml and DNAmlk programs of PHYLIP (Felsenstein 1995Citation ), assuming the Jukes-Cantor model under which the sequences were simulated. We restricted analysis to cases in which the two methods returned the same tree topology. This was done because use of the {chi}2 distribution for likelihood ratio tests assumes that hypotheses are nested (Huelsenbeck and Rannala 1997Citation ; Whelan and Goldman 1999Citation ). In this case, {delta} = -2{Delta}ln(likelihood) is approximately {chi}2 distributed with n - 2 degrees of freedom, where n is the number of sequences (Felsenstein 1981Citation ).

Table 1 shows for 10 sequences the percentage of cases in which the molecular clock is rejected using a {chi}2(8) distribution at the 0.1%, 1%, and 5% levels for different rates of the population recombination rate {rho}. Also shown is the mean of {delta}, which is expected to be 8 for a {chi}2(8) distribution. The three percentiles and the mean {delta} value for {rho} = 0 are in good agreement with the {chi}2(8) distribution. However, even low levels of recombination cause a large proportion of false rejections of the molecular clock, and when {rho} > 8, the clock is rejected in almost all cases. When {rho} approaches infinity, all sequences are expected to be equidistant and a molecular clock should reappear, but no sign of this is observed even for our largest value of {rho}, 64. Conditioning on the observed number of recombinations in the sequences, we found that the likelihood of rejecting the molecular clock exceeds 50% when the total number of recombinations in the history of the 10 sequences exceeds 6. We emphasize that six recombination events in many cases would not be detectable in data sets. The last column of table 1 shows the percentage of cases in which the same topologies were found with DNAml and DNAmlk. This percentage decreases with increasing {rho} values, because recombination affects the topology. We also analyzed the remaining cases in which different topologies were found by forcing DNAmlk to use the same topology as that found by DNAml. This led to an even higher percentage of rejections of the clock when recombination was present; thus, the results of table 1 are an underestimate of the effect of recombination.


View this table:
[in this window]
[in a new window]
 
Table 1 Recombination and Probability of Rejecting the Molecular Clock

 
We argue that recombination is the simplest explanation for the lack of a molecular clock in many data sets of viruses. For example, the recombination rate for HIV 1 is likely to be higher than even the largest value used here. Thus, dating the origin of the HIV 1 pandemic from an early (1959) sequence (Korber, Theiler, and Wolinsky 1998Citation ; Zhu et al. 1998Citation ) may yield misleading results. The implications may extend to the human mitochondrial data, where evidence for recombination was reported recently (Awadalla, Eyre-Walker, and Smith 1999Citation ).

More complex substitution models than used here including rate variation over the sequence are expected to increase the likelihood of rejecting the molecular clock in most cases; thus, our estimates are conservative. We conclude that methods of testing the molecular clock that incorporate recombination or are independent of recombination would be very desirable.


    Acknowledgements
 TOP
 Acknowledgements
 literature cited
 
We thank Thomas Christensen for programming assistance and Xavier Vekemans, Roald Forsberg, and two anonymous reviewers for comments on the manuscript. This study was supported by grant 9701412 from the Danish Natural Sciences Research Council and by BRICS, Center of the Danish National Research Foundation.


    Footnotes
 
Keith Crandall, Reviewing Editor

1 Keywords: virus molecular clock likelihood ratio test recombination mtDNA Back

2 Address for correspondence and reprints: Mikkel Heide Schierup, Department of Ecology and Genetics, University of Aarhus, Building 540, Ny Munkegade, DK-8000 Aarhus C., Denmark. E-mail: mikkel.schierup{at}biology.au.dk Back


    literature cited
 TOP
 Acknowledgements
 literature cited
 

    Awadalla, P., A. Eyre-Walker, and J. M. Smith. 1999. Linkage disequilibrium and recombination in hominid mitochondrial DNA. Science 286:2524–2525

    Elena, S. F., F. Gonzalezcandelas, and A. Moya. 1992. Does the Vp1 gene of foot-and-mouth-disease virus behave as a molecular clock. J. Mol. Evol. 35:223–229[ISI][Medline]

    Felsenstein, J. 1981. Evolutionary trees from DNA-sequences—a maximum-likelihood approach. J. Mol. Evol. 17:368–376[ISI][Medline]

    ———. 1995. PHYLIP (phylogeny inference package). Version 3.572. Distributed by the author, Department of Genetics, University of Washington, Seattle

    Holmes, E. C., O. G. Pybus, and P. H. Harvey. 1999. The molecular population dynamics of HIV-1. Pp. 177–207 in K. A. Crandall, ed. The evolution of HIV. Johns Hopkins University Press, Baltimore, Md

    Holmes, E. C., M. Worobey, and A. Rambaut. 1999. Phylogenetic evidence for recombination in dengue virus. Mol. Biol. Evol. 16:405–409[Abstract]

    Hudson, R. R. 1983. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183–201[ISI][Medline]

    Huelsenbeck, J. P., and B. Rannala. 1997. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276:227–232

    Kelsey, C. R., K. A. Crandall, and A. F. Voevodin. 1999. Different models, different trees: the geographic origin of PTLV-I. Mol. Phylogenet. Evol. 13:336–347[ISI][Medline]

    Korber, B., J. Theiler, and S. Wolinsky. 1998. Limitations of a molecular clock applied to considerations of the origin of HIV-1. Science 280:1868–1871

    Robertson, D. L., P. M. Sharp, F. E. McCutchan, and B. H. Hahn. 1995. Recombination in HIV-1. Nature 374:124–126

    Santti, J., T. Hyypia, L. Kinnunen, and M. Salminen. 1999. Evidence of recombination among enteroviruses. J. Virol. 73:8741–8749[Abstract/Free Full Text]

    Whelan, S., and N. Goldman. 1999. Distributions of statistics used for the comparison of models of sequence evolution in phylogenies. Mol. Biol. Evol. 16:1292–1299[Free Full Text]

    Zhu, T. F., B. T. Korber, A. J. Nahmias, E. Hooper, P. M. Sharp, and D. D. Ho. 1998. An African HIV-1 sequence from 1959 and implications for the origin of the epidemic. Nature 391:594–597

Accepted for publication June 2, 2000.