Department of Zoology, University of Oxford, Oxford, England
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The most common approach to accounting for site-specific rate heterogeneity is to apply a maximum-likelihood (ML) method to the original sequence data, with rate variation modeled by the gamma distribution (reviewed in Yang 1996a
). Likelihood modeling is appealing because it can simultaneously account not only for site-specific rate variation, but also for transition/transversion rate bias and unequal base frequencies, by using an explicit model of nucleotide substitution. The gamma distribution accommodates different degrees of rate heterogeneity by varying a single parameter,
. When the
parameter is small, the distribution conforms to cases in which most changes have occurred at a minority of sites (high rate heterogeneity); as
approaches infinity, the gamma model reduces to the special case of equal rates for all sites (rate homogeneity).
It is worth noting that this method for estimating depends on a phylogenetic tree, which is itself an assumption about the evolutionary history of the sequences in question. Often, this tree assumption will be of little consequence when measuring rate heterogeneity. Many sets of gene sequences (e.g., interspecies data sets) have treelike histories that are uncomplicated by recombination, and simulation studies have shown that estimates of
are robust to uncertainty in the inference of the phylogenetic tree (Sullivan, Holsinger, and Simon 1996
). In such cases, the standard interpretation for observed site-specific rate heterogeneity holds: point substitutions occur more readily at some sites than at others due to mutation rate bias or, perhaps more commonly, different selective constraints among sites.
However, if recombination has contributed to the genetic diversity among the sequences under scrutinyand there is an impressive body of evidence for this, notably within populations of viruses (Sharp, Robertson, and Hahn 1995
; Worobey and Holmes 1999
) and bacteria (Maynard Smith et al. 1993
)then a single tree cannot accurately model their history. This leads to an often overlooked bias when estimating site-specific rate heterogeneity (but see Schierup and Hein 2000)
. If no single tree can accurately depict the evolutionary history of all the sites in a recombinogenic data set, even the best "compromise" tree will require extra changes at some sites to account for the homoplasies introduced by recombination. Such sites will appear to exhibit inflated substitution rates when shoehorned onto this inaccurate tree. Thus, even if all sites have actually shared an identical underlying rate of point substitutions, recombination will create the appearance of site-specific rate heterogeneity. Higher levels of recombination will tend to generate greater apparent rate heterogeneity (ARH). Note that ARH here is not meant to refer only to the artifactual component of the rate heterogeneity generated by recombination; it is the observed rate heterogeneity (i.e., the estimate of
in a particular likelihood model), which may or may not have a component produced by recombination. In the face of recombination, the ARH is not an estimate of the "real" rate heterogeneity (RRH, i.e., the true, underlying variation among sites in their rate of point substitutions); it is an estimate of the combined effects of both the RRH and recombination.
Importantly, the ARH on its own is of little value in detecting recombination if its constituents are not known: a low value of may be the result of recombination, or RRH, or some combination of the two. Here, a new method, the informative-sites test, is proposed that exploits the relationship between recombination and apparent rate heterogeneity to detect and measure recombination from nucleotide sequence data and to test whether recombination has contributed to the ARH. The approach is first introduced with a simple example, then applied to both simulated and real examples.
![]() |
The Method |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The rationale for the method is simple. The branches of a true "star" phylogeny emanate from a single node. In the absence of recombination, sequence alignments that yield starlike trees will tend to exhibit relatively numerous (parsimony) uninformative polymorphic sites (e.g., singletons) and few informative sites (i.e., polymorphic sites where a minority nucleotide is present in at least two taxa). However, as figure 1 demonstrates, starlike trees can also arise from the strong but conflicting phylogenetic signal generated by recombination. The 2,000-nt alignment contained four distinct regions with well-supported partitions of taxa. However, the informative sites for each region defined different partitions of taxa. The result was a relatively starlike phylogenetic tree, deceptively similar to what might be expected if there were actually minimal phylogenetic signal throughout the entire alignment. Recombination produces trees that are more starlike than expected given the composition of their polymorphic sites.
To put it another way, recombination gives rise to phylogenetic trees that are unexpectedly rich in (conflicting) phylogenetic information given their shape. The informative-sites test uses a Monte Carlo approach to simulate nucleotide sequence evolution under the constraint of clonal descent (i.e., no recombination) and then to test whether the proportion of informative sites in the real data is higher than the clonal expectation. The procedure is as follows.
For instance, if the observed proportion of informative sites is greater than any out of 1,000 clonal simulations (P < 0.001), significant elongation of the terminal branches of the overall tree is inferred, and a significant pattern of recombination is concluded. This is equivalent to testing whether recombination is at least partially responsible for the ARH in the original data. If, however, the level of phylogenetic signal in the data, as measured by the proportion of informative sites, is typical of clonally evolved data sets (e.g., P = 0.511), then the hypothesis of clonality cannot be rejected.
In alignments of just four taxa, all of the parsimony-informative sites will include exactly two of the four possible nucleotides. With added taxa, informative sites exhibiting more than two states will sometimes arise, especially in saturated data sets. However, it is a matter of empirical observation that nonreciprocal recombination tends to inflate the proportion of two-state informative sites versus all other sorts of polymorphic sites, including three- and four-state parsimony-informative sites (data not shown). Hence, the measure of phylogenetic signal used for the informative-sites test, q, is defined as the proportion of two-state parsimony-informative sites among the polymorphic sites as a whole.
The example in figure 1
illustrates the approach. The value of q is shown below the ML tree found for the first 500-nt region. Since this alignment was generated without recombination, q was not expected to be significantly greater than c, the average proportion of two-state informative sites calculated from the clonal null distribution. Indeed, for this data set, the observed proportion of informative sites was identical to the clonal expectation, with q =
c = 0.36. Accordingly, there was no statistical evidence for recombination (P = 0.511; i.e., q was less than qc in 511 out of 1,000 clonally generated alignments).
On the other hand, when the informative-sites test was applied to the 2,000-nt recombinant alignment, it strongly rejected the clonal model. Although the value of q remained at 0.36, c dropped to 0.21, reflecting the relatively starlike shape of the estimated tree for the overall alignment (fig. 1
). In fact, for the 2,000-nt alignment, q was greater than any qc from 1,000 clonally evolved data sets (P < 0.001), strong evidence of its recombinant origin.
The Informative-Sites Index
In addition to providing a means for detecting whether or not recombination has likely occurred, this method, like the homoplasy test (Maynard Smith and Smith 1998
), can be extended to measurement of the degree to which recombination has shaped the data. The informative-sites index (ISI) can be found by applying the following formula:
|
Software to run the informative-sites test is available at http://evolve.zoo.ox.ac.uk/software.
![]() |
Analysis of Simulated Data Sets |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Table 1 summarizes the results of the informative-sites test for the clonal and recombinant alignments. The mean (or median) values for the various statistics were calculated from the results of the 100 replicates in each group (0, 20, 50, 200, or 500 recombination events).
|
A comparison of the mean values of q and c for each group captures the essence of the method. For the clonal data sets (no recombination events), q and
c were nearly identical, as anticipated. No clonal alignment gave a statistically significant result, and the average ISI for these data sets, at 0.02, was near 0, reflecting their clonal history. The pattern for the recombinant alignments was very different. Here, the disparity between q and
c grew ever larger with increasing recombination. The trend was clear even after 20 recombination events, although with only 14 out of the 100 in this group proving significant, the test was fairly conservative. The tendency for recombination to generate two-state informative sites, moreover, was plainly illustrated by the increasing value of q associated with every successive jump in recombination rate. The average value of q after 500 rounds of recombination, for instance, was 0.64up from 0.52. Nevertheless, for this group, which predictably gave rise to the most starlike trees and the lowest estimate of
, the clonal expectation for the proportion of informative sites was the lowest of all at just 0.27. With t close to
r and an average ISI of 0.89, these alignments were evidently approaching complete linkage equilibrium. (See fig. 2
for representative results at various recombination levels.)
To investigate how robust the test was to uncertainty in the likelihood estimation of model parameters used for generating the null data, 10 alignments from each recombination level (0 through 500) were reexamined. This time, approximate 95% confidence limits were obtained for each parameter (i.e., transition/transversion ratio and ) using the likelihood ratio test. These confidence limits were then specifiedinstead of the ML estimatesas the model parameters when generating the clonal, null data sets for the test. All four combinations of the extreme values of the two parameters were tried. Comparison of the results obtained using the ML estimates of the parameters versus the 95% confidence limits revealed virtually no difference. Using the confidence limits, no false positives (i.e., type I errors) were generated from the data sets with 0 recombination events, and no false negatives (i.e., type II errors) were observed in the data sets with 200 and 500 recombination events. At the lower levels of recombination, all data sets with significant results using ML estimates were significant in some or all of the combinations of 95% confidence limit parameters. Data sets that were not significant using ML estimates were similarly not significant when the confidence limits were used instead. These findings indicate that the informative-sites test is very robust to error in the estimation of parameter values and that such error is unlikely to greatly bias the results of the method.
Comparisons with the Homoplasy Test
A subset of the alignments from each of the groups listed in table 1
was evaluated by both the informative-sites test and the homoplasy test in order to compare their performances in detecting and quantifying recombination. The homoplasy test uses the presence of excessive homoplasy as an indication of recombination and, like the informative-sites test, permits the calculation of an index, the "homoplasy ratio," that measures the extent of recombination (Maynard Smith and Smith 1998
). Like the ISI, the homoplasy ratio is expected to be about 0 for clonal data and 1 for data at complete linkage equilibrium.
Briefly, 10 randomly chosen alignments from each recombination level listed in table 1
were subjected to both tests, and the numbers of statistically significant results (0.01 level) and the range of index values were compared. Next, a representative likelihood tree from each group served as the template in Seq-Gen to generate 10 new clonal alignments using the corresponding and
recorded for each group in table 1
. Thus, for every original alignment, a parallel alignment was produced that mimicked its phylogenetic tree,
, and
but was generated without recombination. This resulted in five new groups, with 10 alignments each, that were characterized by their rate heterogeneitywith the new, clonal "
= 0.76" group, for example, corresponding to the original "200 recombination events" group.
The results of the comparisons are illustrated in figure 3 . Notably, the tests gave very similar results for the original data sets (fig. 3a and b ), which were all simulated without any RRH. Neither test returned any false positives in the first (clonal) group, and both tests detected recombination in all alignments with 200 or more events and showed comparable sensitivity to one another at lower levels. Furthermore, their respective index values traced very similar paths from near 0 for the clonal data to near 1 at the highest level of recombination.
|
Although the homoplasy test includes techniques designed to account for rate heterogeneity and thus avoid false positives (Maynard Smith and Smith 1998
), some important conclusions can be drawn from the comparisons here. First, the informative-sites test clearly benefits by accommodating any apparent rate heterogeneity as an integral part of the test itself. Since it does not rely on ad hoc methods to account for site-specific rate heterogeneity, the test does not appear to be prone to mistaking site-specific rate heterogeneity for recombination. Second, because the homoplasy test can evidently give misleading results in the face of even mild unaccounted-for rate heterogeneity, extremely reliable methods must be used to measure its extent.
The results of two further comparisons of the informative-sites test and the homoplasy test are shown in figures 4 and 5
. In the first of these, 10 clonal data sets were generated using the same starting tree and model of evolution as for the data in table 1
, except that a transition/transversion ratio of 20.0 ( = 40.0) was specified. These data sets were then subjected to increasing levels of nonreciprocal recombination using the procedure outlined previously. While the power of the homoplasy test was unaffected by extreme transition/transversion rate bias, the informative-sites test appeared to become more conservative under these circumstances (fig. 4
). Although the results of the informative-sites test should thus be interpreted with caution for data sets with unusually strong transition/transversion rate bias, this finding highlights the observation that the method appears to be a "safe" test for recombination: it is unlikely to produce false-positive results. Indeed, the simulations in this study suggested no circumstances under which the method could be biased toward type I error.
|
|
In addition to the techniques already described, recombination was also simulated using the program Treevolve (N. Grassly and A. Rambaut, http://evolve.zoo.ox.ac.uk/software), which implements a coalescent model that can incorporate recombination as well as exponential population growth, a more widely recognized cause of starlike phylogenies (Slatkin and Hudson 1991
). The informative-sites test reliably identified recombination in this context too (data not shown). This was not surprising, since this approach to recombination simulation is essentially the same as that used in figure 1
in that different regions of an alignment are allowed to evolve on different trees. Importantly, the coalescent simulations showed that the test was able to distinguish between the effects of recombination and exponential population growth. Because population growth had no influence on ARH, its effects were not mistakenly interpreted as evidence for recombination by the informative-sites test.
![]() |
Analysis of Real Data |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The ICV alignment, 642 third sites in length, included 16 sequences from the haemagglutinin-esterase gene with the GenBank accession numbers D63467D63470, D63472, D28967, D28969D28971, M11637, M11639M11643, and M17868. The intergenotype HCV alignment consisted of six sequences from the complete coding region (2,971 third sites) with the accession numbers D50409, D00944, D63821, D28917, D17763, and Y13184. The DEN-1 virus data set (seven taxa, 774 third sites from three genes) is described in Worobey, Rambaut, and Holmes (1999)
. The H. pylori alignment (144 synonymous third sites of the flaA gene from 33 Canadian isolates) is described in Suerbaum et al. (1998)
. The GBV-C type 2 alignment (nine taxa, 2,841 third sites from entire coding region) and GBV-C type 3 alignment (16 taxa, 2,836 third sites from entire coding region) are both described in Worobey and Holmes (2001)
. Finally, the mtDNA alignment (40 taxa, 3,561 synonymous third sites from entire coding region) was modified from the data set described in Eyre-Walker, Smith, and Maynard Smith (1999b)
by removing identical sequences, eliminating one incomplete sequence, and then removing sites with gaps. All seven alignments are available from the author on request. The heuristic search procedure that was applied to the simulated data sets listed in table 1
was also followed with these alignments except for H. pylori. Unusually, in this case, the likelihood topology required substantially more steps than the MPT. Since the ISI is calculated using the value of t from the MPT, that topology was chosen for the subsequent analysis.
The results were largely as expected except for the mtDNA data set that exhibited a slightly smaller value of q than the null expectation, a pattern not suggestive of recombination but consistent with a clonal history for this population (table 2
). This was in contrast to the results of the homoplasy test, which rejected the clonal model when applied to the same sequences (Eyre-Walker, Smith, and Maynard Smith 1999a, 1999b
). The two viral examples that were assumed to be clonal indeed appeared to be so on the basis of the informative-sites test. For both ICV and HCV, the observed proportion of informative sites was almost exactly that expected under clonality. Their ISI values were close to 0, and the null hypothesis of clonality could not be rejected. Helicobacter pylori, DEN-1 virus, and the two GBV-C data sets, on the other hand, all exhibited values of q substantially larger than
c, along with ISI values suggestive of a large role for recombination, supported by highly significant P values (table 2
). Interestingly, the high ISI value for H. pylori, 0.85, was very similar to the homoplasy ratio of 0.8 calculated using the homoplasy test on the same data (Suerbaum et al. 1998
). The DEN-1 data, with ISI = 0.49, appeared to be somewhat less affected by recombination.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
If recombination has significantly influenced current genetic diversity, the test should be appropriate whether the events have been ancient, recent, rare, or frequent and whether or not clear mosaic sequences are evident. Thus, it is particularly relevant for those populations where recombination may be so common, or sequences so similar, that methods that rely on mosaic detection (reviewed in Maynard Smith 1999
) will be inadequate. Although it may be convenient, it is probably unwise to treat as clonal any data set that passes through the relatively coarse filter imposed by such tests.
While the informative-sites test gave results very similar to those of the homoplasy test for the H. pylori data set, the two methods differed when applied to the mtDNA data. One possible explanation is that the informative-sites test suffered a type II errora false negativein this case. In light of figure 4
, and given that these data were marked by considerable transition/transversion rate bias as well as high base composition bias ( = 45.3; table 2
), it is difficult to rule this possibility out. However, it is interesting, although not necessarily indicative of clonality, that the value of q in the mtDNA example did not just fall short of significance, but was slightly lower than the clonal expectation (table 2
).
Another possibility is that the homoplasy test generated a type I error, or false positive. Given the results presented in figure 3 , it is worth noting that when the homoplasy test was applied to the mtDNA, the data were assumed to be free of site-specific rate heterogeneity (Eyre-Walker, Smith, and Maynard Smith 1999a, 1999b
). Hypervariable sites due to selective constraints were ruled out by comparing the observed divergence of mtDNA sequences between different primate species with that expected, at saturation, in the absence of selective constraints (Eyre-Walker, Smith, and Maynard Smith 1999a
). However, this method is suitable for detecting site-specific rate heterogeneity only in the biologically unlikely form of "constrained" versus "hypervariable" sites, where one class of sites cannot change and the other changes at a single rate. If rates among sites actually vary over a range of values, and if changes between nucleotides at a given site are symmetric, such a method will not be capable of detecting among-sites rate heterogeneity, since any site with a nonzero rate will eventually reach saturation.
In addition, though, Eyre-Walker, Smith, and Maynard Smith (1999a) examined the number of variable third sites shared between human and other primate mtDNA and found no evidence for an excess. Since elevated substitution rates at some third sites might cause those that are hypervariable in humans to also appear in other primates, this was taken as evidence against site-specific rate heterogeneity. Therefore, if the homoplasy test has produced a false positive in this case due to undetected rate heterogeneity, and if the high degree of ARH in these mtDNA data (table 2 ) actually reflects RRH in a clonal population (as the informative-sites test suggests), then the constraints producing rate heterogeneity at third sites in mtDNA may be inconsistent across species.
In other cases, the evidence for recombination is overwhelming, so its implications need to be very carefully considered (see Schierup and Hein [2000] and Worobey and Holmes [2001] for a discussion of many of these implications). For example, the notion that phylogenetic trees reconstructed from recombinant data will systematically underestimate divergence times appears to be a misconception. The example in figure 1
is sufficient to show that this is not always the case. In this instance, the branch lengths of the tree for the 2,000-nt alignment, once corrected for the considerable apparent rate heterogeneity caused by recombination, implied a deceptively long genetic distance/time to the common ancestor of the four taxa. In fact, recombinant data analyzed by ML models that include rate heterogeneity will give rise to two competing effects: a tree-shortening tendency due to the homogenizing effects of recombination, and a tree-lengthening tendency due to the inflated ARH generated by recombination. Figure 1
shows that this tree-lengthening effect can result in overestimation of the time to most recent common ancestor (TMRCA) when ML models incorporating rate heterogeneity are naïvely used on data sets that have a recombinant history. Interestingly, Schierup and Hein (2000)
recently concluded that recombination could give rise to underestimation of the TMRCA when using distance methods but to unbiased estimates when using ML methods. However, an important point to consider in this context is that data sets with higher levels of recombination will also show higher levels of ARH. If this recombination-generated ARH had been accounted for during tree construction in Schierup and Hein's (2000)
simulation study, the ML approach may well have indicated a bias toward overestimation of TMRCA, as suggested by figure 1
here. While further work will be required to understand the relative strengths of the conflicting effects that might bias dating, it is clear from these studies that phylogenetic inference in the face of recombination is much more complicated than is currently appreciated.
For any recombining population, a key question is the following: If the assumption of clonality is not valid, at what level of recombination is the convenient inference of a single phylogenetic tree no longer useful? Limited recombination may sometimes have insignificant effects and be ignored without consequence. Obvious recombinants can be detected and removed in other instances. However, in cases like that of the GBV-C subtypes analyzed here, the most appropriate use of a phylogenetic tree may be to show that a phylogenetic tree is not of much use. In such circumstances, it might be worth searching for small genomic regions that are less likely to be profoundly affected by recombination but which may contain sufficient phylogenetic signal to address the question at hand.
![]() |
Note Added in Proof |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Abbreviations: ARH, apparent rate heterogeneity; DEN-1, dengue virus type 1; GBV-C, GB virus C; HCV, hepatitis C virus; ICV, influenza C virus; ISI, informative-sites index; ML, maximum likelihood; MPT, maximum-parsimony tree; mtDNA, mitochondrial DNA; nt, nucleotide; RRH, real rate heterogeneity; TMRCA, time to most recent common ancestor.
2 Address for correspondence and reprints: Michael Worobey, Department
of Zoology, University of Oxford, South Parks Road, Oxford
OX1 3PS, United Kingdom. michael.worobey{at}zoo.ox.ac.uk
3 Keywords: recombination
GB virus C
mitochondria
maximum likelihood
rate heterogeneity
clonal
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Eyre-Walker A., N. H. Smith, J. Maynard Smith, 1999a. How clonal are human mitochondria? Proc. R. Soc. Lond. B Biol. Sci 266:477-483[ISI][Medline]
. 1999b. Reply to Macauley et al. (1999): mitochondrial DNA recombinationreasons to panic Proc. R. Soc. Lond. B Biol. Sci 266:2041-2042[ISI]
Hasegawa M., H. Kishino, T. Yano, 1985 Dating of the human-ape splitting by a molecular clock of mitochondrial DNA J. Mol. Evol 22:160-174[ISI][Medline]
Holmes E. C., M. Worobey, A. Rambaut, 1999 Phylogenetic evidence for recombination in dengue virus Mol. Biol. Evol 16:405-409[Abstract]
Manzin A., L. Solforosi, M. Debiaggi, F. Zara, E. Tanzi, L. Romano, A. R. Zanetti, M. Clementi, 2000 Dominant role of host selective pressure in driving hepatitis C virus evolution in perinatal infection J. Virol 74:4327-4334
Maynard Smith J., 1999 The detection and measurement of recombination from sequence data Genetics 153:1021-1027
Maynard Smith J., N. H. Smith, 1998 Detecting recombination from gene trees Mol. Biol. Evol 15:590-599[Abstract]
Maynard Smith J., N. H. Smith, M. O'Rourke, B. G. Spratt, 1993 How clonal are bacteria? Proc. Natl. Acad. Sci. USA 90:4384-4388[Abstract]
Muerhoff A. S., D. B. Smith, T. P. Leary, J. C. Erker, S. M. Desai, I. K. Mushahwar, 1997 Identification of GB virus C variants by phylogenetic analysis of 5'-untranslated and coding region sequences J. Virol 71:6501-6508[Abstract]
Rambaut A., N. C. Grassly, 1997 Seq-Gen: an application for the Monte Carlo simulation of sequence evolution along phylogenetic trees Comput. Appl. Biosci 13:235-238[Abstract]
Schierup M. H., J. Hein, 2000 Consequences of recombination on traditional phylogenetic analysis Genetics 156:879-891
Sharp P. M., D. L. Robertson, B. H. Hahn, 1995 Cross-species transmission and recombination of "AIDS" viruses Philos. Trans. R. Soc. Lond. B Biol. Sci 349:41-47[ISI][Medline]
Slatkin M., R. R. Hudson, 1991 Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations Genetics 129:555-562
Suerbaum S., J. Maynard Smith, K. Bapumia, G. Morelli, N. H. Smith, E. Kunstmann, I. Dyrek, M. Achtman, 1998 Free recombination within Helicobacter pylori Proc. Natl. Acad. Sci. USA 95:12619-12624
Sullivan J., K. E. Holsinger, C. Simon, 1996 The effect of topology on estimation of among-site rate variation J. Mol. Evol 42:308-312[ISI][Medline]
Swofford D. L., 2000 PAUP*: phylogenetic analysis using parsimony (*and other methods) Version 4. Sinauer, Sunderland, Mass
Worobey M., E. C. Holmes, 1999 Evolutionary aspects of recombination in RNA viruses J. Gen. Virol 80:2535-2543
. 2001 Homologous recombination in GB virus C/hepatitis G virus Mol. Biol. Evol 18:254-261
Worobey M., A. Rambaut, E. C. Holmes, 1999 Widespread intra-serotype recombination in natural populations of dengue virus Proc. Natl. Acad. Sci. USA 96:7352-7357
Yang Z., 1996a. Among-site rate variation and its impact on phylogenetic analysis Trends Ecol. Evol 11:367-372[ISI]
. 1996b. Maximum likelihood models for combined analyses of multiple sequence data J. Mol. Evol 42:587-596[ISI][Medline]