Phylogenetic analysis of GBV-C/hepatitis G virus

Donald B. Smith1, Miren Basaras2, Simon Frost3, Dan Haydon4, Narcisa Cuceanu1, Linda Prescott1, Cara Kamenka1, David Millband1, Mahomed A. Sathar5 and Peter Simmonds1

Department of Medical Microbiology, University of Edinburgh, Summerhall, Edinburgh EH9 1QH, UK1
Department of Microbiology, School of Medicine, Universidad del Pais Vasco, 48080 Bilbao, Spain2
Centre for HIV Research, University of Edinburgh, Waddington Building, King’s Buildings, West Mains Road, Edinburgh EH9 3JN, UK3
Centre for Tropical Veterinary Medicine, University of Edinburgh, Easter Bush, Roslin EH25 9RG, UK4
Department of Medicine, University of Natal, Congella, South Africa 40135

Author for correspondence: Donald Smith. Present address: Garden Cottage, Clerkington, Haddington, East Lothian EH41 4NJ, UK. e-mail Donald.B.Smith{at}gardencottage.screaming.net


   Abstract
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
Comparison of 33 epidemiologically distinct GBV-C/hepatitis G virus complete genome sequences suggests the existence of four major phylogenetic groupings that are equally divergent from the chimpanzee isolate GBV-Ctro and have distinct geographical distributions. These four groupings are not consistently reproduced by analysis of the virus 5'-noncoding region (5'-NCR), or of individual genes or subgenomic fragments with the exception of the E2 gene as a whole or of 200–600 nucleotide fragments from its 3' half. This region is upstream of a proposed anti-sense reading frame and contains conserved potential RNA secondary structures that may be capable of directing the internal initiation of translation. Phylogenetic analysis of this region from certain South African isolates is consistent with previous analysis of the 5'-NCR suggesting that these belong to a fifth group. The geographical distribution of virus variants is consistent with a long evolutionary history that may parallel that of pre-historic human migrations, implying that the long-term evolution of this RNA virus is extremely slow.


   Introduction
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
The similarity in genome organization between GB virus-C/hepatitis G virus (GBV-C/HGV) and hepatitis C virus (HCV) has led to the naïve expectation that variation of these closely related and persistent flaviviruses might also be similar. However, our limited understanding of the causal reasons for virus variability is underscored by the increasing evidence that these viruses vary in quite different ways. Although both viruses have a similar rate of nucleotide substitution during persistent infection [0·4–1·9x10-3 for HCV (Major et al., 1999 ; Smith et al., 1997a ; Okamoto et al., 1992 ; Ogata et al., 1991 ), 0·4–2·4x10-3 for HGV (Khudyakov et al., 1997 ; Nakao et al., 1997) ], GBV-C/HGV lacks a hypervariable region comparable to that present at the NH2 terminus of the HCV E2 protein (Takahashi et al., 1997a ; Nakao et al., 1997 ) and observed ratios of synonymous to nonsynonymous substitution are higher for GBV-C/HGV (30:1) than within HCV subtypes (9:1) although the latter are less divergent (Muerhoff et al., 1997 ; Smith et al., 1997b) . In addition, while different genotypes of HCV differ by more than 30%, the most extreme GBV-C/HGV variants differ by only 14%. Previous studies have identified three (Suzuki et al., 1999 ; Okamoto et al., 1997 ), four (Charrel et al., 1999 ) or five (Takahashi et al., 1997b ) phylogenetic groupings of GBV-C/HGV, although some of these groupings are weak and inconsistent between different studies. However, whereas HCV genotypes can be distinguished by phylogenetic analysis of a variety of subgenomic regions as small as 222 nt, variants of GBV-C/HGV cannot be reliably identified in this way. Systematic analysis of six complete GBV-C/HGV genome sequences revealed that congruent phylogenetic relationships were obtained for only a minority of 300, 600 and 1200 nt fragments, and that the optimal region was all or part of the 5'-noncoding region (5'-NCR) (Muerhoff et al., 1997 ; Smith et al., 1997b ).

At present 33 epidemiologically unrelated GBV-C HGV complete virus genome sequences are available, including three of group 1 and several isolates of uncertain relationship to previously defined groupings. In addition, the recent discovery of closely related chimpanzee viruses (Adams et al., 1998 ) and the availability of a complete genome sequence (GBV-Ctro) (Birkenmeyer et al., 1998 ) allows phylogenetic trees to be constructed using a more appropriate outgroup. We have therefore undertaken a re-analysis of the phylogenetic relationships of GBV-C/HGV complete genome sequences, the extent to which these can be reproduced by analysis of subgenomic regions and the implications of virus geographical variation for theories about its evolutionary history.


   Methods
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
{blacksquare} Nucleotide sequences.
Nucleotide sequences were obtained from GenBank and manipulated and aligned using Simmonic 2000 software (P. Simmonds, unpublished). Nucleotide positions of aligned sequences are numbered relative to the AUG codon at the beginning of the E1 gene of the prototype isolate GBV-C (U36380; Leary et al., 1996b ). The 32 other epidemiologically unrelated complete genome sequences were AB003288–AB003293 (Takahashi et al., 1997b ), AB013500 (Saito et al., 1999 ), AF104403 (Charrel et al., 1999 ), AB013501 (Konomi et al., 1999 ), AF031829 (Bukh et al., 1998 ), D87255 (Shao et al., 1996 ), D90600, D90601 (Okamoto et al., 1997 ), U44402, U45966 (Linnen et al., 1996 ), U63715 (Erker et al., 1996 ), AB008342 (Kaneko et al., 1998 ), AF006500 (L. Lu et al., unpublished), D87263 (Nakao et al., 1997 ), D87708–D87715 (Katayama et al., 1998 ), U75356 (Y. S. Zhou and H. T. Wang unpublished), U94695 (Wang et al., 1997 ), AB018667 and AB021287 (H. Naito & K. Abe, unpublished), together with the outgroup AF070476 (GBV-Ctro; Birkenmeyer et al., 1998 ). The partial sequence (positions -396 to 6118) of an additional isolate from Thailand (K-10) was made available by Sirirrug Songsivilai (personal communication). The boundaries between genes were as given in Charrel et al. (1999) .

{blacksquare} Serum samples.
Sera were collected from individuals from Papua New Guinea, Sudan and The Gambia as part of population-based hookworm or malaria surveys, from pregnant women in the Democratic Republic of Congo (Mokili et al., 1999 ) and from blood donors from Saudi Arabia. Sera from South Africa were as described previously (Sathar et al., 1999 ).

{blacksquare} PCR amplification of GBV-C/HGV genomes.
RNA was extracted from plasma using a proteinase K–SDS lysis buffer (Jarvis et al., 1994 ). The 5'-NCR was detected by nested RT–PCR amplification using the outer primers 5' TGCCACCCGCCCTCACCCGAA 3' (positions -21 to -41) and 5' AGGTGGTGGATGGGTGAT 3' (-443 to -426), and the inner primers 5' GGRGCTGGGTGGCCYCATGCWT 3' (-76 to -97) and 5' TGGTAGGTCGTAAATCCCGGT 3' (-415 to -397). Amplification reactions were 30 cycles for each round and consisted of 94 °C for 36 s, 55 °C for 42 s and 72 °C for 90 s.

For samples that were PCR positive for the 5'-NCR [7/74 (9·4%) Papua New Guinea, 2/66 (3%) Sudan, 3/74 (4%) The Gambia and 1/48 (2%) Saudi Arabia], the E2 region was reverse transcribed and amplified from RNA purified in a combined reaction using a standard buffer system (Access PCR, Promega) according to the manufacturer’s instructions. Two sets of primers were used to amplify adjoining regions of the E2 gene. Set 1 consisted of outer primers 5' GCCTCHGCCAGCTTCATCAGRTA 3' (1682–1660) and 5' GGYAAYCCGGTGCGGTCVCCCYTGC 3' (1255–1279), and inner primers 5' AAAYACAAARTCCARVAGCARCCA 3' (1650–1627) and 5' TCCTACRCCATGACCAARATCCG 3' (1288–1310). Amplification conditions were 30 cycles of 94 °C for 18 s, 55 °C for 21 s and 72 °C for 90 s. Set 2 consisted of outer primers 5' ARCTYYGAACACCRSCGVACCAG 3' (1499–1477) and 5' GCCASYTGYACCATAGCYGC 3' (979–998), and inner primers 5' ACCCRAACGTYCCRGTBGGAGGC 3' (1375–1353) and 5' GTNGYBGAGCTSTYCGAGTGGGG 3' (1027–1046).

The region of NS5A where duplications are observed in some isolates (Tanaka et al., 1998 ) was amplified for 5'-NCR-PCR positive samples from the Democratic Republic of Congo (n=7), The Gambia (n=4), Sudan (n=2) and Papua New Guinea (n=8) using outer primers 5' CACAATAGGCTGTATGGTTCTGG 3' (positions 6736–6714) and 5' CCATCGCCWGCACTWATCTCGG 3' (positions 6409–6430), and inner primers 5' TACRGARAGRGCCACRTTGAAGAC 3' (positions 6573–6550) and 5' ACNGAGAGCAGCTCAGATGAG 3' (positions 6433–6453). Amplifications were started at 80 °C followed by 30 cycles of 94 °C 18 s, 52 °C 21 s and 72 °C 90 s. The size of amplified DNA fragments was assessed by electrophoresis through 4% Metaphor agarose gels, with expected sizes of 140 or 180 bp for fragments with and without a duplication.

{blacksquare} Nucleotide sequencing and phylogenetic analysis.
Nucleotide sequences of amplified fragments were obtained by direct sequencing of amplified genome regions using Thermosequenase (Amersham) in reactions containing 33P-labelled dideoxynucleotides. Potential RNA secondary structures were investigated using RNADraw version 1.0 (Matzura, 1995 ).

Complete coding region sequences were analysed by three different methods. (1) Distances were estimated using an F84 model of substitution (Felsenstein, 1984 ), which allows for unequal transition/transversion rates and unequal base frequencies. 100 bootstrapped datasets, distance matrices, neighbour joining trees and a consensus tree were produced using SEQBOOT, DNADIST, NEIGHBOR and CONSENSE, all part of the PHYLIP package (Felsenstein, 1993 ). (2) Synonymous and nonsynonymous distances were estimated using Method I of Nei & Gojobori (1986) for 200 datasets produced by bootstrapping codons (D. Haydon, unpublished). Neighbour joining trees and a consensus tree were produced as above. (3) Distances were estimated using a Tamura & Nei (1993) model of substitution, a more general form of the F84 model that also allows for unequal transition rates between purines and pyrimidines, together with rate heterogeneity modelled as a proportion of invariable sites plus eight rates taken from a discrete gamma distribution using parameters estimated from the data. 100 bootstrapped datasets were produced using SEQBOOT, and distance matrices calculated with the programs PUZZLE version 4.0.2 (Strimmer & von Haeseler, 1996 ) and PUZZLEBOOT version 1.01 (Holder & Roger, 1999 ) using parameters estimated from the dataset. Trees with the lowest least-squares deviation were produced using the program FITCH applying global search, and a consensus tree produced using CONSENSE. Phylogenetic trees of subgenomic regions were produced with MEGA (Kumar et al., 1993 ) using the Kimura 2-parameter model of substitution on datasets of 100 bootstrap replicates.

In order to determine which GBV-C/HGV group was closest to the outgroup GBV-Ctro (AF070476), we used a reduced dataset consisting of three sequences from each group (Group 1, HGU36380, AB003291, AB013500; Group 2, AB013501, U44402, U63715; Group 3, AB003288, D90601, U94695; Group 4, AB003292, AB018667, AB021287). We estimated the likelihood of the outgroup clustering with each of the four groups under four different models of nucleotide substitution: (i) an F84 model; (ii) an F84 model allowing for rate variation modelled using a discrete gamma distribution with eight categories; (iii) as for (ii), but allowing for different transition/transversion ratios, different base frequencies, different rates and different levels of rate heterogeneity for each gene; and (iv), as for (ii) but allowing parameters to differ for each codon position. These models were fitted and the bootstrap support for each cluster compared using the fast approximate bootstrap procedure of Kishino & Hasegawa (1989) as implemented in the PAML version 2.0 package (Yang, 1999 ). Similar analyses of amino acid sequences were carried out using Phylip 3.57c (Kimura substitution matrix), or RELL [Jones et al. (1992) substitution matrix)] assuming gamma rate heterogeneity.


   Results
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
Complete genome sequences
Phylogenetic analysis of the coding region of 33 epidemiologically unrelated complete GBV-C/HGV genome sequences yielded evidence for four distinct phylogenetic groupings each of which was supported by high levels of bootstrap support (Fig. 1). These groupings correspond to the three groupings previously identified from the analysis of complete genome sequences (Okamoto et al., 1997 ) together with an additional group consisting of an unclassified Japanese sequence (Takahashi et al., 1997b ) and three sequences from Thailand (K-10; Sirirrug Songsivilai, unpublished results), Vietnam (AB018667) and Myanmur (AB021287). Similar groupings and levels of bootstrap support were obtained when the analysis was confined to either synonymous or nonsynonymous sites, or to amino acid substitutions, if comparisons were extended to include the entire virus genome, or if the chimpanzee isolate GBV-Ctro was used as an outgroup (data not shown).



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 1. Phylogenetic tree of GBV-C/HGV complete coding region sequences. Maximum likelihood distances between sequences were calculated with Phylip DNADIST (ts:tv=2, assuming no rate variation between sites), and used to produce a neighbour joining tree with Phylip NEIGHBOR. Bootstrap values (100 replicates) were obtained with the SEQBOOT and CONSENSE options of Phylip. The tree was produced using Treeview version 1.5.

 
Although subgroupings of groups 1, 2 and 3 have been distinguished by phylogenetic analysis of subgenomic regions (Muerhoff et al., 1996 ; Takahashi et al., 1997a ; Muerhoff et al., 1997 ) there was only limited evidence from the analysis of complete genome sequences for subgroupings. Three group 3 isolates from China (U75356, U94695 and AF006500) grouped together with high bootstrap support, while the isolates AB013501 and U63715 grouped separately from the other group 2 isolates (group 2a) and isolate AB003292 from Japan was relatively divergent from other group 4 isolates. However, analysis of pairwise distances between GBV-C/HGV sequences revealed a distribution with only two overlapping peaks (Fig. 2) corresponding to comparisons between sequences belonging to different groups (distances of 0·131–0·171) or within one of the four phylogenetic groups (0·076–0·137). The overlap between these two distributions comprised 18/528 pairwise comparisons (3·5%), more than half of which involved a divergent group 2 isolate from South America (AB013501).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 2. Frequency distribution of pairwise evolutionary distances between GBV-C/HGV complete coding region sequences. Distance were calculated as in Fig. 1.

 
A previous study suggested that certain group 1 isolates containing a duplicated region within NS5A (e.g. isolate AB003291) were ancestral to other GBV-C/HGV isolates relative to GBV-A (Tanaka et al., 1998 ). However, this was not confirmed when the analysis was extended to complete genome sequences (Suzuki et al., 1999 ), or when we investigated the relative likelihood with which the more closely related GBV-Ctro isolate clustered with three representatives from each group (Table 1). Under four different models of substitution, clustering with group 1 could be rejected at the 5% level. Although clustering with group 2 gave the best fit to the data for the models which included rate heterogeneity, differences between genes in transition/transversion ratios, rate heterogeneity and base frequencies were relatively small (results not shown), and the bootstrap support for this clustering was not high enough to be conclusive. The best fit to the data were obtained with the model in which parameters could vary according to position within codons, reflecting the strong bias against nonsynonymous substitution. Bootstrap resampling analysis using simpler models of substitution at synonymous or nonsynonymous sites supported group 4 as being ancestral, while analysis of amino acid sequences supported either groups 1 or 3 as being ancestral depending on the substitution matrix used. Hence, our analysis does not consistently place any one of the four human GBV-C/HGV groups as being ancestral.


View this table:
[in this window]
[in a new window]
 
Table 1. Log likelihood and approximate bootstrap support for the clustering of the outgroup GBV-Ctro with groups 1–4 under different models of substitution

 
Furthermore, the duplicated region may be derivative rather than ancestral since this requires fewer evolutionary steps (Suzuki et al., 1999 ). Most isolates that lack the duplication have instead an 8 nt direct repeat (ACCCCGTC, positions 6457–6464 and 6487–6494) flanking a sequence with the potential to form a structure with an 11 nt stem and a 3 nt loop. The formation of this stem–loop during RNA synthesis could lead to slippage and mispairing between the direct repeats resulting in the duplication of the hairpin loop and the direct repeat. We did not detect the duplication in NS5A amongst other isolates from Africa (n=13) or Papua New Guinea (n=8) or other NS5A sequences available in GenBank (n=94).

Subgenomic regions
Next, we investigated the extent to which subgenomic regions of GBV-C/HGV bear the same phylogenetic relationships as do entire virus genomes. Analysis of individual virus genes failed to produce congruent phylogenetic trees with the sole exception of the E2 gene (Fig. 3). Similar analysis of subgenomic fragments of 2000, 1000 or 500 nt produced congruent trees only for fragments including the COOH-terminal region of the E2 gene. Analysis of the entire 3'-terminal half of the virus genome or any of its subfragments failed to produce congruent trees.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 3. Congruence between phylogenetic analysis of complete genomes and subgenomic fragments. The level of bootstrap support for phylogenetic groupings 1–4 is indicated by the degree of shading for different genes and subgenomic fragments.

 
In most cases, sequences that grouped aberrantly were not consistently associated with a particular group. The only exception was the group 4 sequence AB0021287 that grouped with group 2 sequences for subgenomic fragments encompassing the NH2 terminus of the NS2 gene or the COOH-terminal half of the NS5B gene. Even if the assumption is made that this isolate is a recombinant between isolates of group 4 and 2 [the major GBV-C/HGV groups found in this geographical region (Naito et al., 1999 )], the only additional subgenomic region that then produces a congruent phylogenetic tree is a 1000 nt fragment encompassing the COOH terminus of the E2 gene and most of the NS2 gene.

Since a 500 nt fragment of the E2 gene could reproduce the phylogenetic relationships of the complete genome sequences, we next analysed this region in more detail (Figs 4 and 5). Congruent phylogenetic trees were produced using a region as small as 200 nt (positions 1344–1543). The shortest region that gave a congruent tree with more than 98% bootstrap support for each group was the 600 nt region from positions 994–1594. This region also produced a congruent tree when analysed at synonymous sites (>85% support) but not at nonsynonymous sites.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 4. Congruence of phylogenetic relationships produced using fragments of the E2 gene. The extent to which phylogenetic groupings observed from analysis of complete genome sequences are supported by comparison of fragments of the E2 gene is indicated by the level of shading. The number of groups supported by more than 70% of bootstrap replicates is indicated. The minimum degree of bootstrap support is given for selected examples. Also indicated are the position of an anti-genome reading frame (Kondo et al., 1998 ), a region where synonymous substitutions are suppressed (Simmonds & Smith, 1999 ) and regions where the presence of covariant nucleotide substitutions suggests the presence of RNA secondary structures (Simmonds & Smith, 1999 ).

 


View larger version (28K):
[in this window]
[in a new window]
 
Fig. 5. South African isolates form an additional E2 phylogenetic grouping. A phylogenetic tree was constructed by neighbour joining using Jukes–Cantor distances for the region 1146–1496 for selected South African isolates and for the corresponding region from complete genome sequences. Bootstrap values (500 replicates) for branches are indicated where they were greater than 70%.

 
Previous studies have suggested that the phylogenetic relationships of GBV-C/HGV complete genome sequences can be reproduced by analysis of the whole or part of the 5'-NCR (Muerhoff et al., 1996 ; Smith et al., 1997b ; Muerhoff et al., 1997 ). The extreme 5'-terminal sequence is available for only a single group 1 and none of the group 4 isolates, so an adequate analysis can only be carried out from positions -388 to -1. Analysis of this region and of various subfragments failed to produce congruent phylogenetic trees (Fig. 6), with two sequences responsible for the majority of aberrant groupings AB013500, an extreme group 2 isolate from Bolivia, groups with group 3 isolates for the 5'-NCR and some subgenomic fragments in the 3' half of the genome, although the 5'-NCR is atypical. Similarly, U75356, an unpublished group 3 isolate from China, has a 5'-NCR sequence similar to but distinct from those of group 2 isolates. This sequence also has a frameshift within NS5A and an unusual sequence at the 3' terminus (Smith et al., 1997b ). Hence, it is uncertain whether these isolates represent recombinants (An et al., 1997 ) or divergent variants with unique 5'-NCR sequences. If these two sequences are excluded from the dataset, a congruent tree is obtained when the region -388 to -1 is analysed. However, subfragments of this region, including those previously identified as reproducing the phylogenetic relationships of group 1 and 2 isolates (Smith et al., 1997b ), provide <70% bootstrap support for either groups 3 or 4.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 6. Congruence of phylogenetic relationships produced using fragments of the 5'-NCR. The extent to which phylogenetic groupings observed from analysis of complete genome sequences are supported by comparison of fragments of the 5'-NCR is indicated by the level of shading. Groupings were considered to be supported if they were observed in >70% of bootstrap replicates.

 
Evidence for additional groupings from the analysis of subgenomic regions
Having identified a region within the E2 gene that consistently reproduces the phylogenetic relationships of complete genome sequences, we compared published sequences of this region for evidence of additional GBV-C/HGV groups. Of 16 additional sequences for the region 994–1594 (accession nos U87653U87664, AF063827, AF063828, AF063830, AF017533), 14 (from USA, Jamaica, Greece, UK and Egypt) grouped with group 2, and one each with groups 1 (USA) and 3 (Hong Kong) (>99% support for all groups; data not shown). Similarly, nine additional sequences of the region 1414–1693 from Italian patients (AF015842–AF015862) clustered with group 2, although as expected from the analysis of complete genome sequences (Fig. 4), the bootstrap support was low (44%). In contrast, E2 sequences for the 350 nt region 1146–1495 of five GBV-C variants from South Africa with unusual 5'-NCR sequences (Sathar et al., 1999 ) formed an additional group (Fig. 5), while other South African isolates with 5'-NCR sequences that clustered with groups 1 (n=3) or 2 (n=2) clustered with groups 1 or 2 respectively. Isolates from Papua New Guinea had E2 sequences that clustered with group 2 (n=2) or group 4 (n=2), while an isolate from Sudan (n=1) clustered with group 2.

Similar analysis of 5'-NCR sequences is complicated by the large number of sequences available within this region (1409 accessions) and the inconsistent phylogenetic relationships of all but the largest fragments (Fig. 6). Visual examination for motifs distinct from those typical of groups 1, 2, 3 and 4 revealed a small number of unusual sequences, but of these only the variants from South Africa (Sathar et al., 1999 ) described above, and isolates from Spain (Lopez-Alcorocho et al., 1999 ) grouped separately from the 5'-NCR sequences of complete genome sequences for the region -388 to -1. These later isolates could represent recombinants since they have motifs typical of group 3 sequences between positions -489 and -459 but the remainder of the 5'-NCR is similar to, although distinct from, that of group 2 isolates.


   Discussion
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
Identification of phylogenetic groups
This analysis of complete genome sequences of GBV-C/HGV identifies four main phylogenetic groupings (Fig. 1). The existence of the fourth group had been suggested from the analysis of 5'-NCR sequences of Southeast Asian isolates (Naito et al., 1999 ). A previous phylogenetic analysis of complete genome sequences also proposed four groups in which groups 1 and 4 were combined and group 3 was split into two (Charrel et al., 1999 ), but the published bootstrap analysis did not support these groupings. Similarly, an analysis based upon genetic distances identified five groups with group 1 split into two (Takahashi et al., 1997b ), but our analysis suggests that phylogenetic groups cannot be defined using pairwise distances (Fig. 2).

An unexpected finding of this study is that the phylogenetic relationships of complete GBV-C/HGV sequences (but not subgroupings) can be reproduced by short COOH-terminal fragments of the E2 gene (Fig. 4). Previous studies of smaller numbers of sequences have shown that phylogenetic relationships are inconsistent when comparisons are made for individual genes (Takahashi et al., 1997b ) or subgenomic fragments (Smith et al., 1997b ). Although three (Cong et al., 1999 ) or four (Lim et al., 1997 ) phylogenetic groupings are observed when fragments larger than 1500 nucleotides including the E2 gene are compared, interpretation of these studies is complicated by idiosyncratic labelling, the absence of group 4 isolates and the failure to assess the robustness of groupings. We show here that analysis of a 200 nt fragment from the centre of the E2 gene (positions 1344–1543) provides >75% bootstrap support for all four phylogenetic groupings, while a 600 nt region (positions 994–1594) provided >98% support. This is in stark contrast to other coding regions: all subgenomic fragments of 2000 nt or less that did not contain this region of E2 failed to produce congruent trees with the sole exception of a 2000 nt fragment encompassing the remainder of the E2 gene, NS2 and the NH2-terminal half of NS3 (Fig. 3). An important unresolved question is the extent to which some complete genome sequences represent recombinants between different groups. Inconsistent relationships were observed for some sequences between the 5'-NCR and coding regions or between different coding regions. However, the aberrant groupings were typically weak and inconsistent suggesting that these do not represent simple recombinants.

Our analysis helps to clarify previous conflicting studies on the extent to which subgenomic regions can be used to identify GBV-C/HGV phylogenetic groupings. Most early studies of virus diversity concentrated on a 118–135 nt fragment within NS3 that was the first part of the genome to be sequenced (Simons et al., 1995 ; Masuko et al., 1996 ; Heringlake et al., 1996 ; Kao et al., 1996 ; Berg et al., 1996 ; Schreier et al., 1996 ; Tsuda et al., 1996 ; Schmidt et al., 1996 ; Muerhoff et al., 1997 ; Pickering et al., 1997 ; Ibanez et al., 1998 ). However, the phylogenetic conclusions of these studies appear to be unreliable since analysis of even the entire NS3 gene fails to produce congruent groupings. Inconsistent phylogenetic groupings have also been observed for the NH2 terminus of E2 (Lim et al., 1997 ; Muerhoff et al., 1997 ), 354 nt of NS5A/NS5B (Viazov et al., 1997 ) or 279 nt of NS5B (Muerhoff et al., 1997 ). Finally, although analysis of a 2600 nt fragment containing both NS5A and NS5B produced a congruent tree for 12 isolates from groups 1, 2 and 3 (Khudyakov et al., 1997 ), comparison of this region from the 33 complete genome sequences gave aberrant groupings of some isolates with only marginal support for groups 1 and 4 (data not shown).

The inconsistent phylogenetic relationships of GBV-C/HGV subgenomic coding regions contrasts with that observed for HCV where analysis of a variety of subgenomic regions reproduces the phylogenetic relationships of complete genomes (Simmonds et al., 1994 ; Tokita et al., 1998 ). This difference could arise if different regions of the GBV-C/HGV genome were subject to different evolutionary processes in which case combining these regions could produce a less reliable reconstruction than the analysis of individual regions (Bull et al., 1993 ). However, the validity of the analysis based on complete genome sequences is supported by the correlation between their phylogenetic relationships and their geographical origin. An alternative explanation is that the amino acid sequence of the GBV-C/HGV polyprotein is well conserved (dN:dS 0·033, divergence <11%) relative to HCV subtypes (dN:dS 0·094, divergence <10%). Phylogenetic groupings of GBV-C/HGV therefore rely more on variation at synonymous sites, many of which are invariant or saturated, possibly because of the presence of extensive RNA secondary structures within the GBV-C/HGV genome (Simmonds & Smith, 1999 ). Consequently, GBV-C/HGV isolates with a common evolutionary origin may share only a small number of polymorphisms and analysis of subgenomic regions could often fail to produce congruent phylogenetic trees.

There are several potential explanations for our observation that analysis of the E2 gene or specific subfragments produces phylogenetic trees congruent with those observed for complete genome sequences. First, since E2 is the most variable part of the GBV-C/HGV polyprotein (Katayama et al., 1998 ; Erker et al., 1996 ), the phylogenetic relationships of complete genome sequences could depend entirely on substitutions in E2. However, identical groupings are observed if E2 is excluded and most amino acid substitutions occur at the NH2 terminus of E2 (Lim et al., 1997 ; Katayama et al., 1998 ; Cong et al., 1999 ), whereas it is only the central and 3' regions that produce congruent phylogenetic trees, and then only for synonymous rather than nonsynonymous substitutions. A second potential explanation is that this region of the genome encodes an open reading frame on the anti-sense strand (nucleotides 870–1226, Fig. 4), possibly encoding a nucleocapsid protein (Kondo et al., 1998 ). This would constrain the accumulation of substitutions and so retain evidence of phylogenetic relationships. However, although synonymous substitutions are suppressed in this part of the genome (Muerhoff et al., 1997 ; Simmonds & Smith, 1999 ), phylogenetic analysis of the anti-sense reading frame fails to produce a congruent tree (Fig. 4). Another potential source of constraint in this region of the genome would be if translation of the anti-sense reading frame was dependent on an upstream internal ribosome entry site. Indeed, substitutions in this region of the genome are frequently covariant and associated with potential RNA secondary structures (Simmonds & Smith, 1999 ) (Fig. 7), while RNA folding predictions for representatives of groups 1–3 identify structures with free energies one standard deviation below those of random sequences of the same base composition. However, these RNA structures were not observed in sequences from group 4 isolates, and the anti-sense reading frame is 10 residues longer in some group 3 isolates and lacks the initial AUG codon in GBV-Ctro.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 7. Potential RNA secondary structures upstream of a potential anti-sense reading frame. Potential hairpin loops formed by the nucleotide sequence of the antisense strand corresponding to positions 1279–1213 of the sense strand are indicated. Covariant nucleotide substitutions observed amongst sequences from phylogenetic groups 1, 2 and 3 are indicated in upper case, while substitutions that disrupt the proposed structures are indicated in lower case. The conserved AUG codon at the beginning of the anti-sense reading frame is indicated by bold lettering.

 
Geographical distribution and origin of GBV-C HGV isolates
The distinct geographical distribution of GBV-C/HGV variants is consistent with their co-evolution with humans during pre-historic migrations. Group 1 and 5 isolates are African and have relatively diverse 5'-NCR sequences (Muerhoff et al., 1997 ; Smith et al., 1997b ; Naito et al., 1999 ; Sathar et al., 1999 ), while the remaining groups correspond to the three main waves of human migration from Africa to Europe (group 2), northern Asia (group 3) and southern Asia (group 4). In addition, the presence of group 3 variants amongst native populations in South America (Pujol et al., 1998 ; Gonzalez-Perez et al., 1997 ) is consistent with the first colonization of this continent from northern Asia via the Bering Strait. Although Japanese isolates have been found belonging to groups 1–4 (Takahashi et al., 1997b ; Okamoto et al., 1997 ), most are group 3 suggesting that the presence of other groups represents recent introductions.

Finally, since a virus related to GBV-C/HGV is present in chimpanzees (Adams et al., 1998 ; Birkenmeyer et al., 1998 ), while New World monkeys harbour more distantly related but species-specific variants (Bukh & Apgar, 1997 ; Leary et al., 1996a ) it is possible that GBV-C/HGV has been continuously present in human populations since speciation. In this case the virus appears to have evolved at an overall rate of less than 10-5 per site per year (Suzuki et al., 1999 ; Simmonds & Smith, 1999 ), although its rate of sequence evolution measured in longitudinal studies is similar to that of other RNA viruses. Further sequence analysis of the E2 region from GBV-C/HGV variants isolated from different geographical regions may help to clarify the origins of this unusual virus.


   Acknowledgments
 
We are grateful to John Mokili, David Pritchard, David Arnot, Eleanor Riley and Brian Greenwood for providing serum samples, and to David Leach for advice on the mechanisms of RNA rearrangements. M.B. is supported by a grant from the Sociedad Española de Quimioterapia. D.B.S. was supported by a grant from the Wellcome Trust. P.S. is a Darwin Trust Fellow.


   Footnotes
 
The GenBank accession numbers of the sequences reported here are AF181977AF181981.


   References
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
Adams, N. J., Prescott, L. E., Jarvis, L. M., Lewis, J. C. M., McClure, M. O., Smith, D. B. & Simmonds, P. (1998). Detection of a novel flavivirus related to hepatitis G virus/GB virus C in chimpanzees. Journal of General Virology 79, 1871-1877.[Abstract]

An, P., Wei, L., Wu, X. Y., Yuhki, N., O'Brien, S. J. & Winkler, C. (1997). Evolutionary analysis of the 5'-terminal region of hepatitis G virus isolated from different regions in China. Journal of General Virology 78, 2477-2482.[Abstract]

Berg, T., Dirla, U., Naumann, U., Heuft, H. G., Kuther, S., Lobeck, H., Schreier, E. & Hopf, U. (1996). Responsiveness to interferon alpha treatment in patients with chronic hepatitis C coinfected with hepatitis G virus. Journal of Hepatology 25, 763-768.[Medline]

Birkenmeyer, L. G., Desai, S. M., Muerhoff, A. S., Leary, T. P., Simons, J. N., Montes, C. C. & Mushahwar, I. K. (1998). Isolation of a GB virus-related genome from a chimpanzee. Journal of Medical Virology 56, 44-51.[Medline]

Bukh, J. & Apgar, C. L. (1997). Five new or recently discovered (GBV-A) virus species are indigenous to New World monkeys and may constitute a separate genus of the Flaviviridae.Virology 229, 429-436.[Medline]

Bukh, J., Kim, J. P., Govindarajan, S., Apgar, C. L., Foung, S. K. H., Wages, J., Yun, A. J., Shapiro, M., Emerson, S. U. & Purcell, R. H. (1998). Experimental infection of chimpanzees with hepatitis G virus and genetic analysis of the virus. Journal of Infectious Diseases 177, 855-862.[Medline]

Bull, J. J., Huelsenbeck, J. P., Cunningham, C. W., Swofford, D. L. & Waddell, P. J. (1993). Partitioning and combining data in phylogenetic analysis. Systematic Biology 42, 384-397.

Charrel, R. N., Attoui, H., Demicco, P. & de Lamballerie, X. (1999). The complete coding sequence of a European isolate of GB-C hepatitis G virus. Biochemical and Biophysical Research Communications 255, 432-437.[Medline]

Cong, M. E., Fried, M. W., Lambert, S., Lopareva, E. N., Zhan, M. Y., Pujol, F. H., Thyagarajan, S. P., Byun, K. S., Fields, H. A. & Khudyakov, Y. E. (1999). Sequence heterogeneity within three different regions of the hepatitis G virus genome. Virology 255, 250-259.[Medline]

Erker, J. C., Simons, J. N., Muerhoff, A. S., Leary, T. P., Chalmers, M. L., Desai, S. M. & Mushahwar, I. K. (1996). Molecular cloning and characterization of a GB virus C isolate from a patient with non-A–E hepatitis. Journal of General Virology 77, 2713-2720.[Abstract]

Felsenstein, J. (1984). Distance methods for inferring phylogenies: a justification. Evolution 38, 16-24.

Felsenstein, J. (1993). PHYLIP Inference Package version 3.5, Department of Genetics, University of Washington, Seattle, USA.

Gonzalez-Perez, M. A., Norder, H., Bergstrom, A., Lopez, E., Visona, K. A. & Magnius, L. O. (1997). High prevalence of GB virus C strains genetically related to strains with Asian origin in Nicaraguan hemophiliacs. Journal of Medical Virology 52, 149-155.[Medline]

Heringlake, S., Osterkamp, S., Trautwein, C., Tillmann, H. L., Boker, K., Muerhoff, S., Mushahwar, I. K., Hunsmann, G. & Manns, M. P. (1996). Association between fulminant hepatic failure and a strain of GBV virus C. Lancet 348, 1626-1629.[Medline]

Holder, A. & Roger, M. (1999). PUZZLEBOOT, Marine Biological Laboratory, Woods Hole, MA, USA.

Ibanez, A., Gimenez-Barcons, M., Tajahuerce, A., Tural, C., Sirera, G., Clotet, B., Sanchez-Tapias, J. M., Rodes, J., Martinez, M. A. & Saiz, J. C. (1998). Prevalence and genotypes of GB virus C/hepatitis G virus (GBV-C/HGV) and hepatitis C virus among patients infected with human immunodeficiency virus: evidence of GBV-C/HGV sexual transmission. Journal of Medical Virology 55, 293-299.[Medline]

Jarvis, L. M., Watson, H. G., McOmish, F., Peutherer, J. F., Ludlam, C. A. & Simmonds, P. (1994). Frequent reinfection and reactivation of hepatitis C virus genotypes in multitransfused hemophiliacs. Journal of Infectious Diseases 170, 1018-1022.[Medline]

Jones, D. T., Taylor, W. R. & Thornton, J. M. (1992). The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences 8, 275-282.[Abstract]

Kaneko, T., Hayashi, S., Arakawa, Y. & Abe, K. (1998). Molecular cloning of full-length sequence of hepatitis G virus genome isolated from a Japanese patient with liver disease. Hepatology Research 12, 207-216.

Kao, J. H., Chen, P. J., Hsiang, S. C., Chen, W. & Chen, D. S. (1996). Phylogenetic analysis of GB virus C: comparison of isolates from Africa, North America, and Taiwan. Journal of Infectious Diseases 174, 410-413.[Medline]

Katayama, K., Kageyama, T., Fukushi, S., Hoshino, F. B., Kurihara, C., Ishiyama, N., Okamura, H. & Oya, A. (1998). Full-length GBV-C/HGV genomes from nine Japanese isolates: characterization by comparative analyses. Archives of Virology 143, 1063-1075.[Medline]

Khudyakov, Y. E., Cong, M. E., Bonafonte, M. T., Abdulmalek, S., Nichols, B. L., Lambert, S., Alter, M. J. & Fields, H. A. (1997). Sequence variation within a nonstructural region of hepatitis G virus genome. Journal of Virology 71, 6875-6880.[Abstract]

Kishino, H. & Hasegawa, M. (1989). Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data and the branching order in Hominoidea. Journal of Molecular Evolution 29, 170-179.[Medline]

Kondo, Y., Mizokami, M., Nakano, T., Kato, T., Tanaka, Y., Hirashima, N., Ueda, R., Kunimatsu, M., Sasaki, M., Yasuda, K. & Iino, S. (1998). Analysis of conserved ambisense sequences within GB virus C. Journal of Infectious Diseases 178, 1185-1188.[Medline]

Konomi, N., Miyoshi, C., Zerain, C. L., Li, T. C., Arakawa, Y. & Abe, K. (1999). Epidemiology of hepatitis B, C, E and G virus infections and molecular analysis of hepatitis G virus isolates in Bolivia. Journal of Clinical Microbiology 37, 3291-3295.[Abstract/Free Full Text]

Kumar, S., Tamura, K. & Nei, M. (1993). MEGA: Molecular Evolutionary Genetics Analysis, version 1.0, Pennsylvania State University, PA, USA.

Leary, T. P., Desai, S. M., Yamaguchi, J., Chalmers, M. L., Schlauder, G. G., Dawson, G. J. & Mushahwar, I. K. (1996a). Species-specific variants of GB virus A in captive monkeys. Journal of Virology 70, 9028-9030.[Abstract]

Leary, T. P., Muerhoff, A. S., Simons, J. N., Pilot-Matias, T. J., Erker, J. C., Chalmers, M. L., Schlauder, G. G., Dawson, G. J., Desai, S. M. & Mushahwar, I. K. (1996b). Sequence and genomic organization of GBV-C: a novel member of the Flaviviridae associated with human non-A–E hepatitis. Journal of Medical Virology 48, 60-67.[Medline]

Lim, M. Y., Fry, K., Yun, A., Chong, S., Linnen, J., Fung, K. & Kim, J. P. (1997). Sequence variation and phylogenetic analysis of envelope glycoprotein of hepatitis G virus. Journal of General Virology 78, 2771-2777.[Abstract]

Linnen, J., Wages, J., Zhangkeck, Z. Y., Fry, K. E., Krawczynski, K. Z., Alter, H., Koonin, E., Gallagher, M., Alter, M., Hadziyannis, S., Karayiannis, P., Fung, K., Nakatsuji, Y., Shih, J. W. K., Young, L., Piatak, M., Hoover, C., Fernandez, J., Chen, S., Zou, J. C., Morris, T., Hyams, K. C., Ismay, S., Lifson, J. D., Hess, G., Foung, S. K. H., Thomas, H., Bradley, D., Margolis, H. & Kim, J. P. (1996). Molecular cloning and disease association of hepatitis G virus: a transfusion transmissible agent. Science 271, 505-508.[Abstract]

Lopez-Alcorocho, J. M., Castillo, I., Tomas, J. F. & Carreno, V. (1999). Identification of a novel GB type C virus/hepatitis G virus subtype in patients with hematologic malignancies. Journal of Medical Virology 57, 80-84.[Medline]

Major, M. E., Mihalik, K., Fernandez, J., Seidman, J., Kleiner, D., Kolykhalov, A. A., Rice, C. M. & Feinstone, S. M. (1999). Long-term follow-up of chimpanzees inoculated with the first infectious clone for hepatitis C virus. Journal of Virology 73, 3317-3325.[Abstract/Free Full Text]

Masuko, K., Mitsui, T., Iwano, K., Yamazaki, C., Okuda, K., Meguro, T., Murayama, N., Inoue, T., Tsuda, F., Okamoto, H., Miyakawa, Y. & Mayumi, M. (1996). Infection with hepatitis GB virus C in patients on maintenance hemodialysis. New England Journal of Medicine 334, 1485-1490.[Abstract/Free Full Text]

Matzura, O. (1995). RNADraw v1.0, Karolinska Institute, Stockholm, Sweden.

Mokili, J. L. K., Wade, C. M., Burns, S. M., Cutting, W. A. M., Bopopi, J. M., Green, S. D. R., Peutherer, J. F. & Simmonds, P. (1999). Genetic heterogeneity of HIV type 1 subtypes in Kimpese, rural Democratic Republic of Congo. AIDS Research and Human Retroviruses 15, 655-664.[Medline]

Muerhoff, A. S., Simons, J. N., Leary, T. P., Erker, J. C., Chalmers, M. L., Pilot-Matias, T. J., Dawson, G. J., Desai, S. M. & Mushahwar, I. K. (1996). Sequence heterogeneity within the 5'-terminal region of the hepatitis GB virus C genome and evidence for genotypes. Journal of Hepatology 25, 379-384.[Medline]

Muerhoff, A. S., Smith, D. B., Leary, T. P., Erker, J. C., Desai, S. M. & Mushahwar, I. K. (1997). Identification of GB virus C variants by phylogenetic analysis of 5'-untranslated and coding region sequences. Journal of Virology 71, 6501-6508.[Abstract]

Naito, H., Win, K. M. & Abe, K. (1999). Identification of a novel genotype of hepatitis G virus in Southeast Asia. Journal of Clinical Microbiology 37, 1217-1220.[Abstract/Free Full Text]

Nakao, H., Okamoto, H., Fukuda, M., Tsuda, F., Mitsui, T., Masuko, K., Lizuka, H., Miyakawa, Y. & Mayumi, M. (1997). Mutation rate of GB virus C hepatitis G virus over the entire genome and in subgenomic regions.Virology 233, 43-50.[Medline]

Nei, M. & Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous substitutions. Molecular Biology and Evolution 3, 418-426.[Abstract]

Ogata, N., Alter, H. J., Miller, R. H. & Purcell, R. H. (1991). Nucleotide sequence and mutation rate of the H strain of hepatitis C virus. Proceedings of the National Academy of Sciences, USA 88, 3392-3396.[Abstract]

Okamoto, H., Kojima, M., Okada, S.-I., Yoshizawa, H., Iizuka, H., Tanaka, T., Muchmore, E. E., Ito, Y. & Mishiro, S. (1992). Genetic drift of hepatitis C virus during an 8·2 year infection in a chimpanzee: variability and stability. Virology 190, 894-899.[Medline]

Okamoto, H., Nakao, H., Inoue, T., Fukuda, M., Kishimoto, J., Iizuka, H., Tsuda, F., Miyakawa, Y. & Mayumi, M. (1997). The entire nucleotide sequences of two GB virus C/hepatitis G virus isolates of distinct genotypes from Japan. Journal of General Virology 78, 737-745.[Abstract]

Pickering, J. M., Thomas, H. C. & Karayiannis, P. (1997). Genetic diversity between hepatitis G virus isolates: analysis of nucleotide variation in the NS-3 and putative ‘core' peptide genes. Journal of General Virology 78, 53-60.[Abstract]

Pujol, F. H., Khudyakov, Y. E., Devesa, M., Cong, M. E., Loureiro, C. L., Blitz, L., Capriles, F., Beker, S., Liprandi, F. & Fields, H. A. (1998). Hepatitis G virus infection in Amerindians and other Venezuelan high-risk groups.Journal of Clinical Microbiology 36, 470-474.[Abstract/Free Full Text]

Saito, T., Ishikawa, K., Oseikwasi, M., Kaneko, T., Brandful, J. A. M., Nuvor, V., Aidoo, S., Ampofo, W., Apeagyei, F. A., Ansah, J. E., Adusarkodie, Y., Nkrumah, F. K. & Abe, K. (1999). Prevalence of hepatitis G virus and characterization of viral genome in Ghana. Hepatology Research 13, 221-231.

Sathar, M. A., Soni, P. N., Pegoraro, R., Simmonds, P., Smith, D. B., Dhillon, A. P. & Dusheiko, G. M. (1999). A new variant of GB virus C/hepatitis G virus (GBV-C/HGV) from South Africa. Virus Research 64, 151-160.[Medline]

Schmidt, B., Korn, K. & Fleckenstein, B. (1996). Molecular evidence for transmission of hepatitis G virus by blood transfusion. Lancet 347, 909.[Medline]

Schreier, E., Hohne, M., Kunkel, U., Berg, T. & Hopf, U. (1996). Hepatitis GBV-C sequences in patients infected with HCV contaminated anti-D immunoglobulin and among iv drug users in Germany. Journal of Hepatology 25, 385-389.[Medline]

Shao, L., Shinzawa, H., Ishikawa, K., Zhang, X. H., Ishibashi, M., Misawa, H., Yamada, N., Togashi, H. & Takahashi, T. (1996). Sequence of hepatitis G virus genome isolated from a Japanese patient with non-A–E hepatitis: amplification and cloning by long reverse transcription-PCR. Biochemical and Biophysical Research Communications 228, 785-791.[Medline]

Simmonds, P. & Smith, D. B. (1999). Structural constraints on RNA virus evolution. Journal of Virology 73, 5787-5794.[Abstract/Free Full Text]

Simmonds, P., Smith, D. B., McOmish, F., Yap, P. L., Kolberg, J., Urdea, M. S. & Holmes, E. C. (1994). Identification of genotypes of hepatitis C virus by sequence comparisons in the core, E1 and NS-5 regions. Journal of General Virology 75, 1053-1061.[Abstract]

Simons, J. N., Leary, T. P., Dawson, G. J., Pilot-Matias, T. J., Muerhoff, A. S., Schlauder, G. G., Desai, S. M. & Mushahwar, I. K. (1995). Isolation of novel virus-like sequences associated with human hepatitis. Nature Medicine 1, 564-569.[Medline]

Smith, D. B., Pathirana, S., Davidson, F., Lawlor, E., Power, J., Yap, P. L. & Simmonds, P. (1997a). The origin of hepatitis C virus genotypes. Journal of General Virology 78, 321-328.[Abstract]

Smith, D. B., Cuceanu, N., Davidson, F., Jarvis, L. M., Mokili, J. L. K., Hamid, S., Ludlam, C. A. & Simmonds, P. (1997b). Discrimination of hepatitis G virus/GBV-C geographical variants by analysis of the 5' non-coding region. Journal of General Virology 78, 1533-1542.[Abstract]

Strimmer, K. & Von Haeseler, A. (1996). Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Molecular Biology and Evolution 13, 964-969.[Free Full Text]

Suzuki, Y., Katayama, K., Fukushi, S., Kageyama, T., Oya, A., Okamura, H., Tanaka, Y., Mizokami, M. & Gojobori, T. (1999). Slow evolutionary rate of GB virus C hepatitis G virus. Journal of Molecular Evolution 48, 383-389.[Medline]

Takahashi, K., Hijikata, M., Aoyama, K., Hoshino, H., Hino, K. & Mishiro, S. (1997a). Characterization of GBV-C/HGV viral genome: comparison among different isolates for a similar to 2 kb-sequence that covers entire E1 and most of 5'UTR and E2. International Hepatology Communications 6, 253-263.

Takahashi, K., Hijikata, M., Hino, K. & Mishiro, S. (1997b). Entire polyprotein-ORF sequences of Japanese GBV-C/HGV isolates: implications for new genotypes. Hepatology Research 8, 139-148.

Tamura, K. & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in human and chimpanzee. Molecular Biology and Evolution 10, 512-526.[Abstract]

Tanaka, Y., Mizokami, M., Orito, E., Ohba, K., Kato, T., Kondo, Y., Mboudjeka, I., Zekeng, L., Kaptue, L., Bikandou, B., Mpele, P., Takehisa, J., Hayami, M., Suzuki, Y. & Gojobori, T. (1998). African origin of GB virus C hepatitis G virus. FEBS Letters 423, 143-148.[Medline]

Tokita, H., Okamoto, H., Iizuka, H., Kishimoto, J., Tsuda, F., Miyakawa, Y. & Mayumi, M. (1998). The entire nucleotide sequences of three hepatitis C virus isolates in genetic groups 7–9 and comparison with those in the other eight genetic groups. Journal of General Virology 79, 1847-1857.[Abstract]

Tsuda, F., Hadiwandowo, S., Sawada, N., Fukuda, M., Tanaka, T., Okamoto, H., Miyakawa, Y. & Mayumi, M. (1996). Infection with GB virus C (GBV-C) in patients with chronic liver disease or on maintenance hemodialysis in Indonesia. Journal of Medical Virology 49, 248-252.[Medline]

Viazov, S., Riffelmann, M., Khoudyakov, Y., Fields, H., Varenholz, C. & Roggendorf, M. (1997). Genetic heterogeneity of hepatitis G virus isolates from different parts of the world. Journal of General Virology 78, 577-581.[Abstract]

Wang, H. L., Hou, Y. D. & Jin, D. Y. (1997). Identification of a single genotype of hepatitis G virus by comparison of one complete genome from a healthy carrier with eight from patients with hepatitis. Journal of General Virology 78, 3247-3253.[Abstract]

Yang, Z. (1999). Phylogenetic analysis by maximum likelihood (PAML) version 2.0, University College, London, UK.

Received 17 August 1999; accepted 30 November 1999.