Department of Zoology, University of Oxford, Oxford, England
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Phylogenetic analyses based mostly on 5' UTR sequences, although more recent analyses have used full-genome sequences of isolates from around the world, have identified four genetic subtypes of GBV-C (type 14) with highly regional distributions (Muerhoff et al. 1997
; Saito et al. 1999
; Naito, Win, and Abe 1999
; Naito, Hayashi, and Abe 2000
). Type 1 is common in Africa, types 3 and 4 are common in Asia, and type 2 is common in Europe and the United States. Moreover, a virus closely related to GBV-C circulates in chimpanzees (Adams et al. 1998
), while more distant homologs are present in New World monkeys (Bukh and Apgar 1997
). This phylogeographic pattern has led to suggestions that GB viruses have cospeciated with their primate hosts (Charrel, De Micco, and de Lamballerie 1999
) and has been interpreted by some as evidence for an African origin of GBV-C that predates human migration out of that continent around 100,000 years ago (e.g., Tanaka et al. 1998
; Suzuki et al. 1999
). Under this hypothesis, current phylogenetic relationships among GBV-C isolates may reflect ancient human population movements.
In addition to providing a potentially important viral marker for human population history and a model system for the study of the evolution of nonpathogenic infectious agents, GBV-C is puzzling in a variety of ways. For instance, the apparently long timescale of GBV-C evolution in humans seems difficult to reconcile with the high evolutionary rates expected of RNA viruses, which would predict only residual sequence similarities after 100,000 years of evolution (Simmonds and Smith [1999
], but see Nakao et al. [1997
] and Suzuki et al. [1999]
for dramatically different substitution rate estimates for GBV-C inferred from the same data). An apparent conflict between high rates of nucleotide substitution and clear sequence similarities between viruses which may have cospeciated with their hosts has also been raised with respect to the primate lentiviruses (Sharp et al. 2000). Furthermore, phylogenetic groupings among GBV-C isolates are often inconsistent across different regions of the genome, unlike related viruses such as hepatitis C virus (HCV) (Muerhoff et al. 1997
; Smith et al. 2000
). This has led to extensive efforts to find small regions of the GBV-C genome that will faithfully replicate the phylogenetic trees produced using longer sequences and hence will aid typing schemes (e.g., Muerhoff et al. 1997
; Smith et al. 2000
).
One fundamental evolutionary process that may have been overlooked in previous studies of GBV-C, and which could play a key role in determining its distinctive patterns of genetic diversity, is recombination. Several observations, including that (1) GBV-C infection is common, (2) infection is often persistent, and (3) homologous recombination is known to be a property of Flaviviridae viruses (Holmes, Worobey, and Rambaut 1999
; Worobey, Rambaut, and Holmes 1999
), led us to wonder what role, if any, recombination plays in the evolution of GBV-C. Although many RNA viruses are assumed to be clonal, their population structure could, in fact, range from truly clonal to effectively panmictic, depending on the extent of recombination (Worobey and Holmes 1999
). Establishing where a given virus falls along this continuum is not a trivial task, but it is important if we are to trust the results of phylogenetic analyses. If recombination has been sufficiently common to make a phylogenetic tree an inappropriate representation of evolutionary history, then inferences based on it must be treated with caution.
In this study, we collected published gene sequences of full and nearly full genomes of GBV-C and employed methods for detecting putative recombinants, for locating possible breakpoints, and for evaluating these results for statistical significance to test the hypothesis that GBV-C represents a clonal group of viruses. Our results indicate that GBV-C is not clonal.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Initially, we used exploratory tree analysis to look for evidence of conflicting phylogenetic signal within particular sequences (see Gao et al. [1998]
for a similar approach). First, we reconstructed neighbor-joining (NJ) trees on 600-nt windows of the original alignment shifted in 300-nt increments, as well as for the full-length data set, with distances estimated using the HKY85 (Hasegawa, Kishino, and Yano 1985
) model of DNA substitution. Methods correcting for site-specific rate heterogeneity were not employed at this point, since they would have greatly increased the time but not the quality of this analysis, which was intended only to identify sequences that dramatically changed phylogenetic position over the length of their genomes.
Such possible mosaic strains, which clustered with distinct groups in different regions of the full alignment, were then analyzed using diversity plots which show the similarity of a particular sequence across its whole length compared with different putative "parents"the related sequences it most closely resembles in certain regions (e.g., Gao et al. 1998
; Worobey, Rambaut, and Holmes 1999
).
The putative recombinants recovered using these methods were then subjected to maximum-likelihood (ML) breakpoint estimation using the program LARD (Holmes, Worobey, and Rambaut 1999
; http://evolve.zoo.ox.ac.uk). LARD finds optimal recombination breakpoints, given a sequence alignment of a putative recombinant and its two parents, by breaking the alignment into two parts and reconstructing a separate ML tree for each. For every possible partition of the alignment, the process is repeated, and the ML scores are combined to produce a "recombination model" score for that set of breakpoints. If recombination has occurred among the sequences, the highest score is expected when the alignment is broken at the actual recombination breakpoint, since the two trees reconstructed in this case best reflect the true phylogenetic history of the separate recombinant regions. Estimating the breakpoints in this manner leads to a natural test of their statistical significance: the recombination model likelihood score can be compared with the likelihood score for the unbroken alignment (i.e., the "no-recombination model") by a likelihood ratio test and a Monte Carlo approach using sequences simulated without recombination but with the observed levels of site-specific rate heterogeneity (see Holmes, Worobey, and Rambaut 1999
; Worobey, Rambaut, and Holmes 1999
).
Because more than two putative parents were inferred in some cases, and this method becomes computationally intensive with longer sequences, the alignment was broken into nine overlapping fragments during LARD analysis. Except at the 5' and 3' ends, these fragments were 1,200 nt in length and overlapped neighboring fragments by 200 nt at each end. For each region analyzed, 200 simulated sequence alignments (used to generate null distributions for significance testing) were produced using ML parameter values estimated from the observed data (under the HKY85 model of nucleotide substitution, with transition : transversion ratios and shape parameters of the gamma distribution of among-sites rate heterogeneity estimated during tree reconstruction). All trees were produced using PAUP*, version 4 (Swofford 1998
). The conflicting phylogenetic signal contained within the genomes of putative recombinants (i.e., those cases that showed a much greater improvement than expected by chance even when site-specific rate heterogeneity was accounted for) was then confirmed by bootstrap phylogenetic trees (10,000 replicates using NJ trees with distances estimated using the HKY85 model of DNA substitution).
Finally, to complement our phylogenetic approach to analyzing recombination, we applied Sawyer's (1989)
runs test. This method, implemented in the package GENECONV (S. Sawyer, Department of Mathematics, Washington University, St. Louis, Mo.) searches for unusually long fragments within an alignment over which a pair of sequences are identical or nearly identical, then assesses the significance of the hypothesis that the similar fragments arose by recombination by using randomly permuted data sets derived from the real alignment. For each fragment, the test calculates two P values: "pairwise" permutation P values represent the proportion of the permuted data having a score for a certain sequence pair greater than or equal to the real score for that pair; "global" permutation P values reflect the proportion of permuted alignments for which some fragment for some pair of sequences has a higher score than a particular fragment. We chose this test because it seemed better suited than the phylogenetic method described above to detecting large amounts of recombination among relatively similar viruses, such as those within a GBV-C subtype. In all cases, the default parameters were used with the following exceptions: indels were skipped, mismatches were tolerated (Gscale = 1), and the thresholds used for global and pairwise P values were set at 0.05 and 0.01, respectively.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
In addition to the intrasubtype recombination evident in BL230-BOL, two type 3 intrasubtype recombinants, K1916-JAP and K1789-JAP, were also characterized by the above methods (fig. 2d and e ), with P < 0.005 in both cases and results confirmed with bootstrap phylogenetic trees (data not shown). However, because these sequences were almost identical in their recombinant regions to otherwise divergent viruses, they were simply the most obvious of many apparent intrasubtype recombinants.
The results of the runs test on the full alignment of 33 sequences provided strong support for recombination in GBV-C: 407 fragments had pairwise P values <0.01, and 78 fragments had global P values <0.05. In the absence of recombination, we would expect no global P values <0.05 and very few pairwise P values <0.01 (randomized data yielded only zero global and five pairwise significant fragments). Significantly, the runs test results corresponded closely to those of the phylogenetic approach, in many cases identifying virtually identical sequence fragments as highly significant even though the two methods weighted mismatches slightly differently. Table 2 lists all the fragments with global P values <0.005 for the analysis of all 33 GBV-C sequences.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Taken together, these results demonstrate for the first time that GBV-C is a recombinogenic virus. Moreover, they suggest that recombination may play a large role in shaping its genetic diversity. Several lines of evidence support this conclusion. First, the high number of confirmed mosaics in the data set shows that recombination can occur both within and between subtypes. Next, the results of Sawyer's runs test reflect a great deal of possible recombination, especially within subtypes. Finally, the existence of complex recombinants like C964-CHI and BL230-BOL, which evidently contain sequence originating from several sources, strongly implies a series of recombination events that would be unlikely unless this process were rather commonplace.
Failure to take recombination into account in GBV-C data sets may have seriously affected previous studies of the evolutionary history of this virus. For example, Simmonds and Smith (1999)
found an excess of invariant synonymous sites in GBV-C coding regions compared with the frequency expected by chance, which they estimated through simulations of sequence evolution. They suggested that this difference reflected large constraints imposed by RNA secondary structure on GBV-C genomic change and concluded that such constraints invalidated the use of the molecular clock for this and perhaps other RNA viruses. However, their simulated data sets, while reflecting observed transition : transversion ratios, base frequencies, and relative frequencies of synonymous and nonsynonymous substitutions, did not incorporate recombination or site-specific rate heterogeneity. Since recombination can dramatically inflate the apparent degree of rate heterogeneity, it is not surprising that this is essentially what their analysis of this recombinant virus revealed. It therefore seems premature to conclude that such results indicate important evolutionary constraints imposed by RNA secondary structure, as opposed to the effects of overlooked recombination.
The influence of recombination on patterns of genetic diversity also provides a plausible explanation for why phylogenetic trees reconstructed using small segments of the GBV-C genome do not always mirror those produced using longer sequences. This discrepancy stands in marked contrast to what is seen for hepatitis C virus, an apparently clonal relative, where analysis of a variety of subgenomic regions reproduces the phylogenetic relationships of full genomes (Smith et al. 2000). One explanation proposed for this difference is that many GBV-C sites may have become saturated, thereby obscuring evolutionary relationships (Muerhoff et al. 1997
). However, since sequence divergence in GBV-C is less than that in HCV, it seems likely that recombination has played at least as important a role here in obscuring phylogenetic relationships. Hence, the desirability of finding small segments of the GBV-C genome that faithfully reproduce the potentially deceptive phylogenetic relationships inferred from complete genomes should be reconsidered.
Although it remains a matter of future research to precisely establish the ways in which recombination might bias estimates of population divergence times, it is likely that the process seriously affects the reliability of this virus as a marker for human population history. Previous approaches (e.g., Suzuki et al. 1999
) that implicitly assumed a clonal history for GBV-C may have underestimated its age by ignoring the homogenizing effects of recombination. However, while correctly accounting for recombination may push back the perceived time depth of the GBV-C phylogeny, it is worth noting that its very ancient pedigree is still only an assumption. Despite the regional distributions of GBV-C genotypes, taken by some as evidence of an "out of Africa" diversification, most subtypes are found in diverse locations, and some areas exhibit the "wrong" subtype for their region. For example, a study of GBV-C in Vietnam and Myanmar (Naito, Win, and Abe 1999
) revealed type 2 to be the most prevalent in these Southeast Asian countries, while the supposedly "Asian" types were less common.
Nevertheless, the fact that some identifiable subtypes of GBV-C have apparently arisen and persisted in the face of recombination is illuminating. Even if some of the recognized subtypes had a recombinant origin, and even if the gaps in the GBV-C phylogeny were eventually filled by strains that fell outside these subtypes, there is clearly some geographically associated population structure within worldwide GBV-C. This implies that recombination has not been pervasive enough to obscure all genealogical information within GBV-C. In other words, some subpopulations of the virus, although they may generally recombine freely within themselves, evidently have been isolated enough from one another in the past to accumulate characteristic differences. Whether increased mixing of human populations in the future may lead to sufficient opportunity for recombination to break down the current population structure is an open question.
Many phylogeny-based methodologies can fall afoul of unrecognized recombination, and the implications of this study apply generally to viruses and other organisms that are often analyzed as though they were clonal. For example, Sharp et al. (2000) demonstrated how incorporating estimates of site-specific rate heterogeneity can push back the estimated divergence times on the primate lentivirus tree, but they did not investigate the possibility that recombination could also affect these estimates. Likewise, inferences about demographic history drawn from the shapes of phylogenetic trees (e.g., Pybus, Holmes, and Harvey 1999
) could also be compromised by recombination. Starlike gene genealogies, often interpreted as signatures of epidemic expansion, may actually reflect populations subject to high levels of recombination. Indeed, recombination, and not demographic history, may underlie the remarkably starlike tree of GBV-C (fig. 1
). Although more work needs to be done investigating such concerns, the possibility that recombination has biased our reconstruction of population history should not be overlooked.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: recombination
GB virus C
hepatitis G virus
phylogeny
maximum likelihood
mosaic
2 Address for correspondence and reprints: Michael Worobey, Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, United Kingdom. E-mail: michael.worobey{at}zoo.ox.ac.uk
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adams, N. J., L. E. Prescott, L. M. Jarvis, J. C. M. Lewis, M. O. McClure, D. B. Smith, and P. Simmonds. 1998. Detection in chimpanzees of a novel flavivirus related to GB virus/C hepatitis G virus. J. Gen. Virol. 79:18711877.[Abstract]
An, P., L. Wei, X. Y. Wu, N. Yuhki, S. J. O'Brien, and C. Winkler. 1997. Evolutionary analysis of the 5'-terminal region of hepatitis G virus isolated from different regions in China. J. Gen. Virol. 78:24772482.[Abstract]
Bukh, J., and C. L. Apgar. 1997. Five new or recently discovered (GBV-A) virus species are indigenous to New World monkeys and may constitute a separate genus of the Flaviviridae. Virology 229:429436.
Charrel, R. N., P. De Micco, and X. de Lamballerie. 1999. Phylogenetic analysis of GB viruses A and C: evidence for cospeciation between virus isolates and their primate hosts. J. Gen. Virol. 80:23292335.
Erker, J. C., J. N. Simons, A. S. Muerhoff, T. P. Leary, M. L. Chalmers, S. M. Desai, and I. K. Mushahwar. 1996. Molecular cloning and characterization of a GB virus C isolate from a patient with non-A-E hepatitis. J. Gen. Virol. 77:27132720.[Abstract]
Fan, X. F., Y. J. Xu, H. Solomon, S. Ramrakhiani, B. A. Neuschwander-Tetri, and A. M. Di Bisceglie. 1999. Is hepatitis G/GB virus-C virus hepatotropic? Detection of hepatitis G/GB virus-C viral RNA in liver and serum. J. Med. Virol. 58:160164.[ISI][Medline]
Gao, F., D. L. Robertson, C. D. Carruthers et al. (13 co-authors). 1998. A comprehensive panel of near-full-length clones and reference sequences for non-subtype B isolates of human immunodeficiency virus type 1. J. Virol. 72:56805698.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160174.[ISI][Medline]
Holmes, E. C., M. Worobey, and A. Rambaut. 1999. Phylogenetic evidence for recombination in dengue virus. Mol. Biol. Evol. 16:405409.[Abstract]
Kaneko, T., S. Hayashi, Y. Arakawa, and K. Abe. 1998. Molecular cloning of full-length sequence of hepatitis G virus genome isolated from a Japanese patient with liver disease. Hepatol. Res. 12:207216.[ISI]
Konomi, N., C. Miyoshi, C. L. Zerain, T. C. Li, Y. Arakawa, and K. Abe. 1999. Epidemiology of hepatitis B, C, E, and G virus infections and molecular analysis of hepatitis G virus isolates in Bolivia. J. Clin. Microbiol. 37:32913295.
Linnen, J., J. Wages, Z. Y. Zhang-Keck et al. (30 co-authors). 1996. Molecular cloning and disease association of hepatitis G virus: a transfusion-transmissible agent. Science 271:505508.
Muerhoff, A. S., D. B. Smith, T. P. Leary, J. C. Erker, S. M. Desai, and I. K. Mushahwar. 1997. Identification of GB virus C variants by phylogenetic analysis of 5'-untranslated and coding region sequences. J. Virol. 71:65016508.[Abstract]
Naito, H., S. Hayashi, and K. Abe. 2000. The entire nucleotide sequence of two hepatitis G virus isolates belonging to a novel genotype: Isolation in Myanmar and Vietnam. J. Gen. Virol. 81:189194.
Naito, H., K. M. Win, and K. Abe. 1999. Identification of a novel genotype of hepatitis G virus in Southeast Asia. J. Clin. Microbiol. 37:12171220.
Nakao, H., H. Okamoto, M. Fukuda, F. Tsuda, T. Mitsui, K. Masuko, H. Lizuka, Y. Miyakawa, and M. Mayumi. 1997. Mutation rate of GB virus C hepatitis G virus over the entire genome and in subgenomic regions. Virology 233:4350.
Pybus, O. G., E. C. Holmes, and P. H. Harvey. 1999. The mid-depth method and HIV-1: a practical approach for testing hypotheses of viral epidemic history. Mol. Biol. Evol. 16:953959.[Abstract]
Saito, T., K. Ishikawa, M. Osei-Kwasi et al. (13 co-authors). 1999. Prevalence of hepatitis G virus and characterization of viral genome in Ghana. Hepatol. Res. 13:221231.[ISI]
Sawyer, S. 1989. Statistical tests for detecting gene conversion. Mol. Biol. Evol. 6:526538.[Abstract]
Sharp, P. M., E. Bailes, F. Gao, B. E. Beer, V. M. Hirsch, and B. H. Hahn. 2000. Origins and evolution of AIDS viruses: estimating the time-scale. Biochem. Soc. Trans. 28:275282.[ISI][Medline]
Simmonds, P., and D. B. Smith. 1999. Structural constraints on RNA virus evolution. J. Virol. 73:57875794.
Simons, J. N., T. P. Leary, G. J. Dawson, T. J. Pilot-Matias, A. S. Muerhoff, G. G. Schlauder, S. M. Desai, and I. K. Mushahwar. 1995. Isolation of novel virus-like sequences associated with human hepatitis. Nat. Med. 1:564569.[ISI][Medline]
Smith, D. B., M. Basaras, S. Frost, D. Haydon, N. Cuceanu, L. Prescott, C. Kamenka, D. Millband, M. A. Sathar, and P. Simmonds. 2000. Phylogenetic analysis of GBV-C/hepatitis G virus. J. Gen. Virol. 81:769780.
Suzuki, Y., K. Katayama, S. Fukushi, T. Kageyama, A. Oya, H. Okamura, Y. Tanaka, M. Mizokami, and T. Gojobori. 1999. Slow evolutionary rate of GB virus C/hepatitis G virus. J. Mol. Evol. 48:383389.[ISI][Medline]
Swofford, D. L. 1998. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer, Sunderland, Mass.
Tanaka, Y., M. Mizokami, E. Orito et al. (15 co-authors). 1998. African origin of GB virus C/hepatitis G virus. FEBS Lett. 423:143148.[ISI][Medline]
Worobey, M., and E. C. Holmes. 1999. Evolutionary aspects of recombination in RNA viruses. J. Gen. Virol. 80:25352543.
Worobey, M., A. Rambaut, and E. C. Holmes. 1999. Widespread intra-serotype recombination in natural populations of dengue virus. Proc. Natl. Acad. Sci. USA 96:73527357.
Zhou, Y. S., W. Chen, Q. M. Zhao, H. L. Zhao, J. S. Zhang, J. Xu, and H. T. Wang. 1996. cDNA cloning and sequencing of HGV genome from Chinese. Bull. Acad. Military Med. Sci. 20:249253.