2000 Fleming Lecture. The origin and evolution of hepatitis viruses in humans

Peter Simmonds1

Laboratory for Clinical and Molecular Virology, University of Edinburgh, Summerhall, Edinburgh EH9 1QH, UK1

Author for correspondence: Peter Simmonds. Fax +44 131 650 7965. e-mail Peter.Simmonds{at}ed.ac.uk

Abstract

The spread and origins of hepatitis C virus (HCV) in human populations have been the subject of extensive investigations, not least because of the importance this information would provide in predicting clinical outcomes and controlling spread of HCV in the future. However, in the absence of historical and archaeological records of infection, the evolution of HCV and other human hepatitis viruses can only be inferred indirectly from their epidemiology and by genetic analysis of contemporary virus populations. Some information on the history of the latter may be obtained by dating the time of divergence of various genotypes of HCV, hepatitis B virus (HBV) and the non-pathogenic hepatitis G virus (HGV)/GB virus-C (GBV-C). However, the relatively recent times predicted for the origin of these viruses fit poorly with their epidemiological distributions and the recent evidence for species-associated variants of HBV and HGV/GBV-C in a wide range of non-human primates. The apparent conservatism of viruses over long periods implied by these latter observations may be the result of constraints on sequence change peculiar to viruses with single-stranded genomes, or with overlapping reading frames. Large population sizes and intense selection pressures that optimize fitness may be the factors that set virus evolution apart from that of their hosts.

2000 Fleming Lecture Delivered at the 146th Meeting of the Society for General Microbiology, 13 April 2000

Introduction

Smallpox was always present, filling the churchyard with corpses, tormenting with constant fear all whom it had not yet stricken, leaving on those whose lives it spared the hideous traces of its power, turning the babe into a changeling at which the mother shuddered, and making the eyes and cheeks of the betrothed maiden objects of horror to the lover.

T. B. Macaulay, The History of England from the Accession of James II, vol. IV (1855)

O’er ladies’ lips, who straight on kisses dream

Which oft the angry Mab with blisters plague

William Shakespeare, Romeo and Juliet, act I, scene IV; discovered by Peter Wildy (Wildy et al., 1982 )

These descriptions of smallpox and herpes simplex virus provide a glimpse into the past history of virus infections in humans. Alas, our knowledge of virus history is poor, none more so than for hepatitis viruses for which historical descriptions such as the above do not exist. In this Lecture, I will summarize what is currently known about the origins and evolution of a number of hepatitis viruses, and try to illustrate how our understanding of the transmission of hepatitis C virus (HCV) and its likely future clinical impact would be enhanced if we were able to discover some basic facts about its past.

Chronic infection with HCV has become established as the major infectious cause of chronic liver disease in Western countries. Its spread in these populations is poorly understood, although it is known to be transmitted by blood contact, and has particularly targeted risk groups such as injecting drug users (IDUs), and in the past, recipients of blood transfusion and blood products. HCV infection is frequently persistent, and sets in train an inexorable course of slowly progressive liver disease. Part of the difficulty in understanding the epidemiology of HCV is the lack of symptoms associated with both initial infection and for prolonged periods of chronic infection (Seeff et al., 2000 ; Kenny-Walsh, 1999 ; Wiese et al., 2000 ). Most HCV-infected individuals are unaware of their clinical status and there is thus considerable under-diagnosis and under-reporting of infection. Indeed, in many individuals, HCV infection may first become apparent only on the development of liver failure or liver cancer several decades after initial infection. We urgently need to assess the likely scale of HCV infection, the factors leading to its current epidemic spread and the interventions that could be made to avert future problems associated with disease progression.

Our basic knowledge of the epidemiology of HCV would be helped if there was some understanding of its origins, and in particular information about the nature of HCV infection in populations from whom HCV spread during the current epidemic. Investigation in this area is complex for a number of reasons, although there are now accounts of the recent spread of other viruses, such as human immunodeficiency virus (HIV), on which the spread of HCV could be modelled. In this Lecture I will review the progress that has been made in reconstructing the evolution and spread of a number of hepatitis viruses, and the analogies there might be with the spread of HCV.

HCV infection

HCV has a positive-sense RNA genome and is classified as a flavivirus. Its inclusion in the family Flaviviridae is based upon a number of similarities in its genome organization, structure and replication to a large group of vector-borne viruses causing diseases such as dengue fever and yellow fever. Important virological similarities include the existence in both of a single, large gene that encodes a polyprotein which is cleaved after synthesis into functional proteins. HCV and flaviviruses are enveloped; amongst the structural proteins encoded by HCV, two large glycoproteins are incorporated into its viral envelope, and are likely to be homologous to E and NS1 of flaviviruses. There are other similarities in the position and amino acid sequences of evolutionarily conserved elements involved in virus replication, such as the RNA polymerase (NS5) and helicase (NS3). However, there are also differences, particularly in the mechanism of translation of the HCV and flavivirus polyproteins. Flaviviruses have a relatively short leader sequence upstream of the coding region, and the host-cell ribosomes are thought to scan this region to identify the first methionine codon to initiate translation. In contrast, the corresponding 5' untranslated region (5'UTR) of HCV folds into a complex RNA secondary structure which acts as an internal ribosomal entry site. This directs ribosomal binding and initiation of translation to an internal methionine residue. Unfortunately, extensive functional analysis of HCV has proven difficult because of the difficulties in culturing HCV in vitro. Its host range is confined to humans and close primate relatives, and this has also hindered the development of animal models for HCV infection. At present, we still lack much basic information on its entry into cells, replication and assembly; a summary of what is currently understood about the functions of the different HCV-encoded proteins is shown in Fig. 1.



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 1. (A) Organization of the HCV genome, showing the 5' and 3' untranslated regions, the single open reading frame and its cleavage sites. (B) The relative sizes of the resulting proteins and what is currently understood of their functions. (C) Genome organization of HGV/GBV-C, with genes homologous to those of HCV indicated; note the lack of a protein corresponding to the capsid protein of HCV.

 
The long asymptomatic stage of HCV infection and its slow disease progression make estimation of the future clinical impact of HCV difficult to assess, particularly in the UK and other Western countries where there is epidemiological evidence for its very recent spread in new risk groups for parenteral infection, such as IDUs. Currently, it is estimated that approximately 0·5% of the UK population is infected with HCV, and the likely relatively short duration of many of these infections means that we have yet to witness the full extent of liver disease that is likely to arise. Understanding the future outcome of HCV is vital for health planning; for example, if all of those infected with HCV were to progress to chronic liver failure or liver cancer, this would become an intolerable burden on health resources of the UK and other countries with similar or higher prevalences of infection. Understanding the natural history of HCV infection is also of major importance in management of currently asymptomatic individuals. For example, combination treatment with interferon and ribavirin leads to complete and permanent clearance of viraemia in approximately 50% of individuals, while much lower frequencies of response are observed in individuals with more advanced disease, such as those with cirrhosis. Effective treatment of HCV at an early stage of infection may therefore be able to avert much of the end-stage liver disease associated with untreated hepatitis C.

In trying to plan and prioritize HCV management and treatment, we need accurate information on the course and influences on HCV disease progression. Available evidence indicates that long-term asymptomatic carriage of HCV may occur in a large proportion of persistently infected individuals (Seeff et al., 2000 ; Kenny-Walsh, 1999 ; Wiese et al., 2000 ). For example, a relatively benign course of HCV infection has been observed after 22 years in recipients of anti-D immunoglobulin in Ireland (Kenny-Walsh, 1999 ), and amongst blood transfusion recipients after 17 years (Seeff et al., 1992 ). However, these findings should be tempered by the observation in Japan and Italy of a high and rapidly increasing incidence of severe liver disease and hepatocellular carcinoma associated with HCV. The implications from these and other experiences is that disease complications of infection may take an extremely long time to develop (Koretz et al., 1993 ; Alter et al., 1997 ; Seeff, 1997 ). It is therefore vital to understand the long-term clinical consequences in the large number of clinically silent individuals infected relatively recently through drug misuse.

HCV genetic variability

Much of the evidence for the previous spread of HCV derives indirectly from descriptions of current genotype frequencies of HCV in different risk groups and populations. HCV can be classified into a number of distinct genotypes, whose distribution varies both geographically and between risk groups (reviewed in Simmonds, 1998 ). In the currently widely used classification for HCV, known variants of HCV collected from different parts of the world can be divided into six main ‘genotypes’, many of which contain more closely related variants (Fig. 2). For nomenclature, it has been proposed that HCV is classified into genotypes, corresponding to the main branches in the phylogenetic tree, and subtypes corresponding to the more closely related sequences within some of the major groups (Simmonds et al., 1994 ; Enomoto et al., 1990 ). The types have been numbered 1 to 6, and the subtypes a, b and c, in both cases in order of discovery. Therefore, the sequence cloned by Chiron is assigned type 1a, HCV-J and -BK are 1b, HC-J6 is type 2a and HC-J8 is 2b. Each of the six main genotypes of HCV is equally divergent from the others, differing at 31 to 34% of nucleotide positions on pairwise comparison of complete genomic sequences, and leading to approximately 30% amino acid sequence divergence between the encoded polyproteins. Although there is no neutralization assay available for HCV, HCV genotypes most likely correspond to the serotypes of other viruses, such as dengue virus and poliovirus.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 2. Sequence relationships of currently available complete genomic sequences of HCV (listed in Chamberlain et al., 1997 ), and their classification into six genotypes (first tier) and subtypes a, b etc. (second tier). The nomenclature of the HCV genotypes follows the consensus proposal for classification of HCV (Simmonds et al., 1994 ), i.e. the prototype sequence cloned by Chiron is assigned type 1a, HCV-J and -BK are 1b, HC-J6 is type 2a and HC-J8 is 2b, although note the anomalous genotype assignations of types 7b, 8b, 9a and 11a which cluster with types 6a and 6b, and type 10a, which clusters within genotype 3 sequences. Proposals to reclassify these as subtypes in the genotype 6 and 3 clades have been made (Robertson et al., 1998 ; Mizokami et al., 1996 ; de Lamballerie et al., 1997 ; Simmonds et al., 1996 b ). The unrooted tree was generated using the MEGA package (Kumar et al., 1993 ).

 
Different regions of the genome show various levels of sequence diversity. The ends of the genome that contain elements involved in virus replication and in guiding protein translation are highly conserved between genotypes (Kolykhalov et al., 1996 ; Bukh et al., 1992 ), as is the initial coding region (the core gene). Other regions, such as the E1 and E2 genes coding for the envelope glycoproteins, are highly variable, typically differing at over 50% of sites between genotypes.

In Europe, types 1b and type 2 are widely distributed, particularly in older age groups, while those infected through drug use are more likely to be infected with genotypes 3a and 1a (Simmonds et al., 1996a ; Tisminetzky et al., 1994 ; Pawlotsky et al., 1995 ; Goeser et al., 1995 ; McOmish et al., 1993 ). The observation of genotypes associated with drug use in Europe that are distinct from those found in individuals infected through other routes suggests that infection of IDUs originated through a geographically large transmission network largely distinct from other HCV-infected individuals, an observation which has also been documented for HIV (Brown et al., 1997 ; Holmes et al., 1995b ). In the remainder of this Lecture, I will describe the approaches that have been taken to discover the origins of human hepatitis viruses, and the light that this information sheds on their current distribution and epidemiology. Understanding the origins of HCV in humans will help put the current epidemic spread in certain risk groups into a broader context that may be of value in predicting its future impact on human health.

Virus archaeology and evolution

Investigation of the origin of viruses is by definition a speculative venture. The exercise is limited by the ephemeral existence of many viruses, and the lack of any historical depth in epidemiological studies. Furthermore, clinical specimen archives, suitable for recovery of viruses by isolation or by PCR, that are older than 30 years are rare and restricted (Davis et al., 2000 ; Seeff et al., 2000 ), so it is difficult to obtain any direct evidence for the type of viruses that may have existed in the past. In contrast to the rich fossil record of animals and plants, and more recently, the ability to sequence DNA recovered from some archaeological and fossil remains, almost all viruses are essentially invisible in the geological and indeed the historical record.

Probably the earliest solid archaeological evidence for virus infection is contained in skeletons from the Neolithic and Bronze age periods that are deformed similarly to those with poliomyelitis from the present day (Wells, 1964 ), although it remains to be formally demonstrated that the virus we now refer to as poliovirus was the cause. As discussed later in this Lecture, our current uncertainty about the lifespan of recognizable virus species means that we cannot at this stage discount the possibility that other, now vanished, viruses may have caused poliomyelitis then, only to be subsequently replaced. The written record from the ancient world is similarly difficult to interpret. There are few virus infections that cause disease unambiguously different from other causes. Probably the best example of a virus whose existence in ancient civilizations can be confidently determined is smallpox, which seems to have afflicted civilization throughout the period for which records exist. Smallpox is believed by many to have originated after the development of farming in the Middle East around 10000 BC. Smallpox appears to have spread widely over the following centuries. For example, it has been traditionally thought that skin lesions of smallpox are preserved in mummified remains from the 18th and 20th Egyptian Dynasties (1570–1085 BC), including Ramses V who died in 1157 BC, although this has been disputed more recently (Hopkins, 1980 ). Written descriptions of smallpox survive in Ancient Greek writings, such as those of Thucydides, who describes a particularly severe epidemic in Athens in 430 BC. Overall, however, this situation is unusual; among the hundreds of viruses known to currently infect humans, the historical, archaeological and palaeontological record is blank for all but a few.

Reconstruction of virus histories by analysis of their current distributions and genetic relatedness is also fraught with problems. Virus histories are inextricably linked to that of their hosts, which itself may be uncertain in detail or impossible to meaningfully reconstruct for viruses that can frequently cross species barriers. Viruses recombine with one another, with other viruses and with the genomes of the cells they infect and they may undergo major genome rearrangements and changes in replication strategy.

Rates of sequence change in viruses, particularly those with RNA genomes, are invariably much greater than their hosts, and this presents a number of problems in evolutionary reconstruction. To persist within an infected individual or a host population, most RNA viruses must replicate continuously. Each infection cycle in a cell typically takes 1–3 days, over which period several copyings of the virus genome occur. For a virus such as HCV, there may therefore be 100000 genome replications over a period corresponding to a human generation; over this time, only 15–20 cellular divisions separate a human egg from its gametes. Compounding this difference in replication frequency, RNA viruses generally encode their own nucleic acid replicating enzymes, which typically lack proof-reading activity and therefore produce far greater numbers of mutations per replication cycle than that of their hosts. Combined, these two factors likely underlie the greater than million times rate of sequence change compared with other organisms. Even over relatively short evolutionary periods, this rapid rate of virus sequence change potentially obliterates any trace of genetic relatedness between descendants.

Viruses may be subjected to intense selection pressures to evade the host’s immune response, antiviral treatment and to adapt to new hosts on crossing species barriers. Rapid, adaptive changes are favoured in viruses by the frequently large population sizes in an infected organism and the high mutant frequency generated during replication. For example, HIV-1-infected individuals have been estimated to harbour 107–108 infected cells (Haase et al., 1996 ). As each genome replication generates 0·5–5 nucleotide changes (Mansky & Temin, 1995 ), the population in principle contains every possible substitution and combination of paired substitutions. It is therefore not surprising that resistance to antivirals, such as zidovudine, that depends on single or double amino acid changes in the pol gene, appears so quickly in treated individuals. The mutants were already present and have the potential to replace the original population over a period of weeks under this new selection pressure.

Finally, as will be presented in this Lecture, it appears that viruses just do not evolve in a manner comparable to that of animals and plants. Constraints on the evolution of certain viruses, imposed perhaps by possessing single-stranded genomes, extensively overlapping reading frames or regulatory elements embedded in coding sequences, may lead to constraints on sequence change considerably different from those operating on animal and plant genomes. As a result, many of the models and assumptions that underlie phylogenetic reconstruction of animal and plant evolution may arguably not apply to at least some viruses. One potential casualty is the ‘molecular clock’ (Kumar & Hedges, 1998 ), which is based on the repeatable observation that the degree of both nucleotide and encoded amino acid sequence divergence (calculated using relatively simple corrections for multiple substitutions) between homologous genes in different species remain proportional to their time of divergence. For example, sequence change in the {alpha}-chain of haemoglobin is relatively constant over extremely long periods of vertebrate evolution, and between sequences that have become extremely divergent (Fig. 3). Over sequences differing from each other by up to 40%, the rate of sequence change (0·5–1·7x10-9 per site per year) was little different from that observed over the period of ape speciation (0·8–1·5x10-9 per site per year). The implication is that whatever functional constraints there may be on the encoded protein (in this case, in enzymatic activity), these are insignificant compared to the flexibility with which amino acid substitutions can be introduced into the protein sequence, without evident change in fitness of the organism.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 3. Relationship between rates of amino acid sequence change with sequence divergence between {alpha}-globin genes of mammals, placentals and birds (plotted using the ‘{diamondsuit}’ symbol). Rates were calculated by combining the degree of sequence divergence with times of speciation known from palaeontological records (Kumar & Hedges, 1998 ). Over a wide range of divergence times, rates of sequence change were remarkably constant (range 0·5–1·7x10-9 nucleotide substitutions per site per year). By contrast, rates of sequence change of HGV/GBV-C ({bullet}) vary considerably for different degrees of divergence: data points from left to right represent the following divergence times: 8·4 years, time-course in HGV/GBV-C-infected individual (Nakao et al., 1997 ); 100000 years, divergence of modern humans; 1·6 Myr, divergence of troglodytes and verus subspecies of chimpanzees (Pan troglodytes; Morin et al., 1994 ); 7 Myr, divergence of humans and chimpanzees (Jones et al., 1992 ; Morin et al., 1994 ); 35 Myr, divergence of Old (human and chimpanzees) and New World primates (Sanguinis mystax and S. labiatus, Aotus trivirgatus; Jones et al., 1992 ). Sequences compared were from the NS5 region of the genome [amino acid positions 2498–2561 in sequence PNF2161 (U44402)], with divergences and rates based on Jukes–Cantor (J-C) distances.

 
The importance of the molecular clock in evolutionary reconstruction lies in its ability to predict from the rate of sequence change of a gene (even over a short observation period), the time of divergence of other, more distantly related species. Given the lack of other evidence to construct virus histories, the molecular clock has been enthusiastically adopted as the method to calculate times of divergence of genetic variants within contemporary virus populations (Suzuki & Gojobori, 1997 ; Zanotto et al., 1996 ; Bollyky & Holmes, 1999 ; Zhang et al., 1999 ; Korber et al., 2000 ). However, as we shall see in the next section, there is now evidence that the evolution of some viruses violates the behaviour predicted by the molecular clock, through constraints over and above the coding potential of a gene, and/or what appear to be greater restrictions on the amino acid sequence itself. This appears to have led in some cases to extreme conservatism of virus sequences, defying attempts to reconstruct their origins based on sequence comparisons alone. Viruses, for all their mutability and extreme population dynamics may be far more conservative, and older, than has so far been recognized.

In the following section, I will describe the genetic variability and virus–host relationships shown by hepatitis G virus (HGV) or GB virus-C (GBV-C) in their primate hosts, as an example of an extremely successful, co-adapted virus whose evolution appears to be constrained in ways untypical of that observed in their hosts. How broadly this apparent ‘slow-down’ in sequence change applies to other viruses and how this influences our perception of their evolutionary histories will be the topic of the remainder of this Lecture.

Co-evolution of HGV/GBV-C in primates

The name HGV/GBV-C remains as an ugly acronym for the virus independently but simultaneously discovered in 1995 (Linnen et al., 1996 ; Leary et al., 1996 ). The description of the virus as hepatitis G virus is doubly unfortunate as there is no evidence that it causes hepatitis in its natural host (humans) either during initial infection or after long-term carriage. HGV/GBV-C is distantly related to HCV and other flaviviruses in the Hepacivirus genus (Fig. 1C), although sequence similarities are generally limited to specific enzymatic motifs in the NS3 and NS5B genes. There is little or no similarity in the number and arrangement of genes encoding the structural capsid and envelope genes. Strangely, and still unexplained, is the lack of any obvious homologue of the nucleocapsid gene (Fig. 1); the first protein translated has the characteristics of an envelope glycoprotein, possibly homologous to the E1 gene of HCV.

HGV/GBV-C infection is found widely in human populations, with frequencies of active or past infection ranging from 5 to 15%. This distribution extends even to highly isolated populations, such as indigenous tribes people in Papua New Guinea, sub-Saharan Africa and Central/South America (Smith et al., 2000 ; Tanaka et al., 1998a , b ; Mison et al., 2000 ). Infection is frequently persistent and associated with high levels of circulating viraemia, although no evidence links HGV/GBV-C to any identifiable hepatic or non-hepatic disease. Indeed, exactly what cells are infected with HGV/GBV-C still remains unclear, although the suspicion must fall on cells of the haemopoietic or lymphoid lineage such as CD4 lymphocytes, recently shown to be susceptible to infection in vitro (Xiang et al., 2000 ).

Variants of HGV/GBV-C show quite limited sequence variability, with nucleotide sequences differing from each other by a maximum of 13%. HGV/GBV-C has been tentatively classified into four or five genotypes based on these sequence relationships (Smith et al., 2000 ; Mison et al., 2000 ; Sathar et al., 1999 ; Muerhoff et al., 1996 ; Mukaide et al., 1997 ), although the variants lack the clear phylogenetic groupings that underpin the genotype classification of other viruses such as HCV (Fig. 2). There is no great sequence variability of the genes encoding putative envelope glycoproteins (unlike other viruses such as HCV and HIV-1 where such variability has been linked to persistence); indeed there is extreme conservation of the encoded amino acid sequence throughout the genome. Expressed numerically, sequence divergence at non-synonymous sites (dN; i.e. sites where nucleotide substitutions alter the encoded amino acid) is at least 50 times less than the variability found at synonymous (i.e. silent) sites (dS) (Simmonds & Smith, 1999b ). Most coding sequences show biases towards synonymous variability (i.e. show a ratio of dN/dS significantly less than 1), but there are few known coding sequences with ratios approaching 0·02 (or less) as are found in HGV/GBV-C. What the constraints are on sequence change in the coding region of HGV/GBV-C remains quite mysterious, and is one of many aspects of the unusual sequence variability of the virus.

The broad distribution in human populations, and its apparent non-pathogenicity, is consistent with the long-term presence and close host association of HGV/GBV-C with humans. Further evidence for this hypothesis is provided by the geographical distribution of HGV/GBV-C genotypes amongst indigenous populations in different parts of the world (Fig. 4B). In all cases, these are congruent with the distributions expected if HGV/GBV-C was already present in modern human populations as they migrated out of Africa 100000–150000 years ago (Tucker et al., 1999 ; Konomi et al., 1999 ; Liu et al., 2000 ; Tanaka et al., 1998a , b ; Mison et al., 2000 ; Katayama et al., 1997 ; Gonzalez-Perez et al., 1997 ). For example, sequences from the populations in the Far-East are almost invariably genotype 3, and this genotype is otherwise only found in native inhabitants of North and South America (Konomi et al., 1999 ; Tanaka et al., 1998b ; Gonzalez-Perez et al., 1997 ). In contrast, Caucasian and other populations from India westwards including Northern Africa are infected with genotype 2. Genotype 1 is confined to sub-Saharan Africa, and shows the greatest overall sequence diversity (Liu et al., 2000 ; Smith et al., 1997a ; Muerhoff et al., 1997 ); particularly divergent variants have been recovered from Pygmy and other African populations (Sathar et al., 1999 ; Tanaka et al., 1998a ).



View larger version (47K):
[in this window]
[in a new window]
 
Fig. 4. (A) Phylogenetic analysis of complete genomic sequences of HGV/GBV-C (Smith et al., 2000 ), and their provisional classification into five genotypes. In marked contrast to HCV (Fig. 2), note the shallowness of the branching between sequence clusters representing the HGV/GBV-C genotypes. The unrooted tree was generated using the programs DNADIST and NEIGHBOR in the PHYLIP package (Felsenstein, 1993 ). (B) Approximate geographical distribution of HGV/GBV-C genotypes in indigenous populations.

 
The long association of HGV/GBV-C in humans is possibly also mirrored in other primates (Charrel et al., 1999 ; Adams et al., 1998 ; Leary et al., 1997 ; Bukh & Apgar, 1997 ; Birkenmeyer et al., 1998 ) (Fig. 5). HGV/GBV-C variants more divergent than those found in humans have been found in wild-caught chimpanzees from Central and West Africa (Birkenmeyer et al., 1998 ; Adams et al., 1998 ). Furthermore, distinct variants of HGV/GBV-CCPZ were recovered from the two different subspecies of chimpanzees, troglodytes and verus. These showed greater divergence from each other than found between human genotypes, an observation consistent with the likely much greater population diversity of surviving chimpanzee populations (Morin et al., 1994 ). Finally, even more divergent homologues of HGV/GBV-C, described as GBV-A, have been recovered from several species of New World primates (Bukh & Apgar, 1997 ; Leary et al., 1997 ). Again mirroring host relationships, genetic variants of GBV-A differing from each other by around 25% are closely associated with different New World primate species (Bukh & Apgar, 1997 ; Erker et al., 1998 ; Leary et al., 1997 ).



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 5. Phylogenetic analysis of sequences from the NS5 region of HGV/GBV-C, HGV/GBV-CCPZ and GBV-A sequences recovered from different New World primate species (Adams et al., 1998 ). The branching order (but not scale) of GB virus sequences from different primates is congruent with the genetic relatedness of their host species. Note the greater sequence divergence of HGV/GBV-CCPZ recovered from troglodytes and verus subspecies of chimpanzees than the sequence diversity found between human HGV/GBV-C genotypes. Numbers on branches indicates number of bootstrap re-samplings from 100, supporting the observed phylogeny, restricted to values of 75% or greater; p distances are indicated on the scale bar.

 
The molecular clock in HGV/GBV-C evolution
These repeated examples of phylogenetic congruency between GB viruses and their primate host species are clearly consistent with the hypothesis of virus and host co-evolution. If this principal is accepted, however, then the concept of the molecular clock will have to be abandoned as a principle underlying the evolution of HGV/GBV-C. Indeed, the concept of an RNA virus evolving over the time-scales involved in primate evolution sits very oddly with the current paradigm of RNA viruses, in which properties for extremely rapid sequence change, ability to adapt and ephemeral nature have been emphasized (Holland et al., 1982 ).

Evidence that HGV/GBV-C does not change at a constant rate is provided by the discrepancy between the rate of sequence change over short observation intervals, and that implied by the divergence shown between variants infecting human populations and different primate species. Sequence comparison of HGV/GBV-C in samples collected 8 years apart from an individual acutely infected with HGV/GBV-C indicated a rate of 3·9x10-4 nucleotide substitutions per site per year over the whole genome (Nakao et al., 1997 ) (Fig. 3). While this is comparable to that of other RNA viruses [e.g. 4x10-4 for HCV in NS5 (Smith et al., 1997b ); 1·4x10-4 for HIV-1 (Zhu et al., 1998 )], and caused no great surprise amongst virologists when it was first published, it is quite incompatible with the lack of sequence divergence found between different human populations, if the observed genotype differences originated through migration longer than 100000 years ago (Fig. 3). The rate of sequence change over the interval in which Old and New World primate species evolved is even more discrepant from this short-term rate (approximately 10000-fold lower).

If one wished to defend the molecular clock, and extrapolate the rate of sequence change measured over 8 years to longer intervals, then the current divergence found in human HGV/GBV-C would have originated from a common ancestor as recently as 300 years ago, while those infecting chimpanzees and New World primates would have diverged from human genotypes 600–1000 years ago. The occurrence of evolutionarily related viruses in chimpanzees in the wild in Africa and in marmosets and other primates in South America requires a transmission chain of infection operative over the last millennium in which humans would have been the intermediates. Apart from its evident absurdity, this hypothesis ignores the species specificity of the GB viruses; GB viruses obtained from New World primates are non-infectious in chimpanzees, nor can human HGV/GBV-C infect New World primates (J. Bukh and others, personal communication). A transmission chain linking these divergent primate species in recent evolutionary history is therefore unlikely indeed.

Constraints of virus sequence change
A more radical explanation of the data is that there are major differences between GB viruses and higher organisms in the constraints operating on sequence change. Constraints unanticipated by conventional methods of sequence comparison would lead to the underestimation of the frequency of multiple substitutions that occurred on comparison of more divergent sequences. This would reproduce the apparent ‘slowing’ of sequence change inferred from the sequence relationships between HGV/GBV-C genotypes, and between GB viruses infecting different primates.

It is well established that different constraints on sequence change operate at synonymous and non-synonymous sites in coding sequences of almost all gene sequences. Indeed, there are several methods to compare sequences at synonymous and non-synonymous sites separately to allow for independent correction for multiple substitutions (Nei & Gojobori, 1986 ; Li et al., 1985 ). Similarly, transitions occur more frequently than transversions, and again can be corrected for separately. Many complex methods have been developed to calculate evolutionary distances that allow for differences in rate of different types of sequence change at different sites, and have been successfully applied to evolutionary reconstructions of mammals and other eukaryotes. However, such methods fail dismally to reconcile the short-term rate of sequence change in HGV/GBV-C with rates implied from human population and primate species distributions (Simmonds & Smith, 1999a ).

To investigate the existence of more esoteric constraints on sequence change in HGV/GBV-C, we compared the distribution of sequence variability in coding sequences with those of mammalian and other virus coding sequences (Simmonds & Smith, 1999b ). Quite apart from the extreme conservation of the encoded amino acid sequence of the HGV/GBV-C polyprotein (remarked upon above), we also obtained evidence that a large proportion of synonymous sites in the coding part of the HGV/GBV-C genome is also unexpectedly invariant (Fig. 6); comparison of complete coding sequences (8600 bases) of different genotypes of HGV/GBV-C showed an excess of invariant synonymous sites (at 23% of all codons) compared with the frequency expected by chance (10%). This carries the necessary implication that there are fitness constraints on sequence change of HGV/GBV-C over and above the coding function of the genome. As described in detail by Simmonds & Smith (1999b ) and in the paper by Cuceanu et al., (2001) that follows, it now appears likely that the RNA genome of HGV/GBV-C forms a complex and extensive secondary structure through internal base pairing. The high free energy on folding, the existence of multiple covariant sites and the conservation of specific stem–loops between quite divergent GB virus sequences (such as HGV/GBV-C, HGV/GBV-CCPZ and GBV-A) all point to (an) evolutionarily conserved function(s) for the predicted secondary structure. What this actually is, and the extent to which it may be found in other viruses with single-stranded genomes, remains to be determined.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 6. Frequency histogram of variability at synonymous sites in (A) 17 HGV/GBV-C sequences of genotypes 1, 2 and 3; (B) expected distribution of synonymous variability arising by chance (Simmonds & Smith, 1999b ). HGV/GBV-C sequences show a greater frequency of invariant codons (23%) compared with control sequences (10%), with evidence for a second distribution of variability values (mean 0·75) greater than the that of controls (0·5).

 
Nonetheless, the restrictions imposed by secondary structure provide an important clue towards understanding how the co-evolution hypothesis for HCV/GBV-C can be reconciled with the observation of its rapid rate of sequence change over short observation periods. For example, there may be a class of synonymous sites which are in non-base paired parts of the genome, and where sequence change may be relatively unconstrained. These substitutions may therefore be fixed at the frequency predicted from the measured short-term rate of sequence change (Nakao et al., 1997 ). A different class of nucleotide sites which participate in internal base-pairing could be under greater constraint if the resulting secondary structure influenced the fitness of the virus. Substitutions may therefore only occur if simultaneous compensatory changes occur to retain base-pairing in the stem–loop. Indeed, in our analysis of HGV/GBV-C sequences, covariant sites in predicted stem–loops were found throughout the coding part of the genome (Simmonds & Smith, 1999; Cuceanu et al., 2001 ) and rivalled the 5'UTRs of HCV and HGV/GBV-C in frequency and complexity. As the third base positions (normally synonymous) are usually opposite each other in the predicted stem–loops of HGV/GBV-C, the frequency at which covariant substitutions may occur is the substitution frequency squared (i.e. approximately 10-7–10-8 substitutions per site per year). This is indeed quite similar to the rate at which sequence divergence accumulates over the longer periods of human dispersal and primate speciation (Fig. 3).

It seems therefore as if after the rapid accumulation of substitutions at unpaired sites, further diversification can only occur at the much slower rate required by paired changes that retain secondary structure. Indeed, the extreme conservation of the amino acid sequence may reflect the even greater difficulty of simultaneous sequence change at opposite, non-synonymous base-paired sites; both amino acid changes would have to be neutral or beneficial to HGV/GBV-C for the covariant change to be fixed in the virus population. The extreme dN/dS ratio mentioned above may therefore result more from the peculiar constraints imposed by the requirement to maintain RNA secondary structure, rather than unusual functional or structural conservatism of the encoded proteins.

A more exact prediction of the expected short- and long-term rates of sequence change of HGV/GBV-C requires a more complete mapping and functional investigation of its RNA secondary structure, so that the sites that are likely to be constrained and unconstrained can be identified. We also require further information on the distribution of HGV/GBV-C homologues in other primate species; currently, sequences have only been obtained from humans, chimpanzees and some New World primate species. A more conclusive demonstration of congruency would be obtained if sequences from other Old World primates could be obtained, as GB viruses recovered from them should occupy a phylogenetic position intermediate between the human/chimpanzee common ancestor and the branch leading to the New World primate species.

Further analysis of free energy on folding, frequencies of covariant sites and other manifestations of internal base-pairing should be also done for other viruses with single-stranded genomes, to investigate whether the constraints imposed by secondary structure formation represent a more general principle of virus evolution.

Evolution and origins of hepatitis B virus (HBV)

HBV is another virus where it is difficult to reconcile its likely rapid rate of sequence change with its distribution and close host associations with humans and other primates. Although many of the issues to be discussed remain unresolved, and highly controversial, it seems to provide another example of virus evolution operating in a different way from that assumed for higher organisms.

HBV chronically infects approximately 5% of the human population. The toll of approximately 1 million deaths from chronic liver disease and hepatocellular carcinoma per year (Thomas & Jacyna, 1993 ) demonstrates the scale of the global health problem it poses. HBV is transmitted by sexual contact and by parenteral exposure, although it is thought that mother-to-child perinatal transmission and the establishment of a life-long highly infectious carrier state are responsible for the observed high rates of endemicity in high prevalence regions such as South and East Asia, sub-Saharan Africa and amongst indigenous peoples in Central and South America.

HBV is classified in the Hepadnaviridae, and contains a partly double-stranded DNA genome of approximately 3200 bases. HBV replicates via an RNA intermediate anti-genome sequence, encoding a potentially error-prone polymerase enzyme with both reverse transcriptase and DNA polymerase activities. An unusual feature of the HBV genome is the presence of multiple overlapping reading frames for the genes encoding the core, polymerase and surface antigen genes; 67% of the genome is multiply coding, and therefore lacks what would be conventionally regarded as synonymous and non-synonymous sites. It is additionally probable that regions of the genome that are non-coding may be involved in a variety of secondary structure interactions necessary for circularization and transcription.

HBV infecting humans in different geographical regions are currently classified into six genotypes differing from each other by nucleotide sequence distances of approximately 10–13% (Fig. 7). Genotypes A and D have global distributions, genotypes B and C are found predominantly in East and South East Asia, genotype E is predominant in West Africa, and the most divergent genotype F is found exclusively amongst indigenous peoples in Central and South America (Arauz-Ruiz et al., 1997 ; Norder et al., 1994 ).



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 7. Phylogenetic analysis of HBsAg gene sequences of representative sequences of human genotypes A–F and HBV sequences recovered from primates (chimpanzees, gibbons, orang-utans, woolly monkey). Note the intermixing and approximately equal sequence divergence between human genotypes A–E with sequences recovered from different primate species. Bootstrap values of >70% are indicated on the branches.

 
The rate of sequence change of HBV is uncertain. HBV contains a polymerase enzyme without proof-reading activity, and error frequencies on RNA or DNA copying are likely to be of the order measured for the related retroviruses, and for other RNA viruses. Measurement of the rate of sequence change of HBV is complicated by the existence of overlapping reading frames and the lack of synonymous sites in most of the coding sequence. Compounding this difficulty is the evidence that many amino acid changes, particularly in the pre-core region, have a positive selective value, and may occur as an immune evasion strategy. In a recent study (Hannoun et al., 2000 ), individuals with hepatitis and who had cleared HBeAg (interpreted as evidence of a vigorous immune response to HBV) showed a mean 12-fold greater nucleotide substitution rate than individuals who were apparently immunotolerized (mild hepatitis, HBeAg positive). Taking the latter group as representing the evolution of HBV in the absence of immune pressure, HBV shows a substitution rate of 2·1 (range 0–13)x10-5 substitutions per site per year over a mean observation period of 22 years (range 20–35 years). As it is generally HBeAg-positive carriers who transmit HBV infection between human generations, this rate is likely to be the most appropriate for extrapolating substitution rates over longer periods. On this basis, the human genotypes of HBV would have originated from a common ancestor approximately 2300–3100 years ago. How this predicted time of divergence fits with the various theories for the origin of HBV is explored in the remainder of this section.

A bold account for the geographical distribution of HBV genotypes proposed that HBV originated from the Americas, and spread into the Old World over the last 400 years after contact from Europeans during colonization (Bollyky et al., 1997 ). The diversity of HBV sequences of 11–13% between HBV genotypes implies a substitution rate of around 1·5x10-4 per site per year, which is more rapid than the rate observed in HBeAg-positive carriers (2·1x10-5 substitutions per site per year; see above). However, the main problem for this hypothesis is the observation of the widespread distribution of HBV in Old World primate species. At the time this hypothesis was proposed, evidence for infection of other primates, such as chimpanzees and gibbons, by HBV was controversial, and could be dismissed as the result of accidental transmission from humans to captive animals (Lanford et al., 1998 ; Norder et al., 1996 ; Vaudin et al., 1988 ; Zuckerman et al., 1978 ). Since then, it has become more firmly established that primates are infected with HBV in the wild. In a remarkable example of multiple publication, three groups recently published evidence for the existence of a shared genotype of HBV infecting West African chimpanzees (Fig. 7) (Hu et al., 2000 ; Takahashi et al., 2000 ; MacDonald et al., 2000 ). This chimpanzee-specific variant of HBV showed approximately 11% divergence from the human genotypes A–E. Similar findings were reported from gibbons and orang-utans in South East Asia, both of whom harbour species-specific genotypes of HBV equidistant from each other, and from chimpanzee and human genotypes (Grethe et al., 2000 ; Warren et al., 1999 ). There are a few exceptions to these species associations; one of the sequences in the chimpanzee clade originated from a gorilla (AJ131567); a chimpanzee sequence (HBV131567) groups with gibbons; another chimpanzee sequence (AB032431) groups in human genotype E. Finally, sequence AF213008 from a gibbon groups separately from all other sequences. While it is tempting to explain away these exceptions as either laboratory error or contamination, or inadvertent transmission of HBV between species in captivity, more data are clearly needed to substantiate the claimed species/genotype associations.

Since the evidence for primate infections in the wild became widely known, a number of alternative, often futile and invariably highly speculative attempts have been made to reconcile these observations into a coherent account of HBV evolution. The main difficulty arises from the observation of equivalent sequence relationships between human genotypes A–E and G to each other and to the primate-species-associated genotypes of HBV. It is also difficult to rationally fit into any scheme the outlier human HBV genotype F and the even more divergent HBV variant obtained from a captive woolly monkey, a New World primate (Lanford et al., 1998 ).

Amongst the main competing theories, the hypothesis for a New World origin for HBV (Bollyky et al., 1997 ), discussed above, appears incompatible with the now firmly established observations for the widespread distribution of HBV in a wide range of Old World primate species in the wild. The proposal that contact with humans established HBV infection in several primate species in the wild over the last 300 years is epidemiologically highly improbable.

An alternative theory proposes that, as with HGV/GBV-C (see above), HBV co-evolved with anatomically modern humans as they migrated from Africa approximately 100000 years ago (Norder et al., 1994 ; Magnius & Norder, 1995 ). This would imply a sustained rate of sequence change of HBV over the past 100000 years of approximately 5x10-7 nucleotide changes per site per year, remarkably similar to that hypothesized for HGV/GBV-C (Fig. 3). However, unlike HGV/GBV-C, the phylogeny of HBV genotypes in no way corresponds to genetic relationships between human (or primate) population groups. For example, the presence of genotype F in Native American populations is inconsistent with the presence of genotypes B and C in Mongoloid North East Asians, who are genetically their nearest relatives. Secondly, there is little or no genetic evidence that South American and Polynesian populations are significantly intermixed, and there is therefore no explanation for the presence of genotype F in Pacific Island populations. Indeed, there is little relationship between HBV genotype distributions with any of the other human population groups (South East Asians, Caucasians and African populations). As indicated above, the other incongruence is the existence of specifies-specific genotypes of HBV in chimpanzees, gibbons and orang-utans, and the way in which they are intermixed with human genotypes. This is quite unexpected, as the primate viruses should be much more divergent from human variants and from each other given the much longer period of co-speciation of primate species (10–15 million years).

A third, highly speculative, hypothesis for HBV origins, that we recently discussed (MacDonald et al., 2000 ), proposes that variants found in chimpanzees, gibbons, orang-utans and in the New World primate woolly monkey species co-speciated over 10–35 million years. In this case, the outlying position of the woolly monkey HBV sequence and the equal divergence of HBV variants from Old World primate species is (approximately) consistent with host phylogeny and fossil-based estimates for their relative times of divergence. If co-speciation occurred, then the long-term rate of sequence change of HBV would range from 3 to 5x10-9 nucleotide changes per site per year, lower than recorded for any other virus.

Interestingly, a range of much more genetically divergent hepadnaviruses infects rodents in North and South America, such as the woodchuck (Marmota monax), ground squirrel (Spermophilus beecheyi) and arctic ground squirrel (S. parryii). These viruses may be a manifestation of an equivalent process of co-evolution over even longer periods. Remarkably, the genetic distance between primate and rodent HBV variants after their divergence approximately 110 million years ago indicates a rate of sequence change of 6·8x10-9 changes per site per year, bizarrely similar to the rate operating over primate co-evolution.

If HBV co-evolved in primates then the existence of numerous equally distinct human genotypes of HBV would require a different explanation to fit in with this theory. It is possible that human HBV infection arose many times through contact with different primates infected with species-specific genotypes (equivalent to those found in chimpanzees, gibbons and orang-utans). In some ways, this scenario corresponds to that believed to underlie the origins of HIV infection in humans. Infection with HIV-1 is likely to have originated through at least two separate cross-species transmissions from chimpanzees (Gao et al., 1999 ), while human infection with HIV-2 in West Africa arose independently several times through contact with sooty mangabeys (Feng et al., 1992 ). A primate origin for human HBV infection is indeed supported by the observation that the areas of high HBV prevalence in humans are those in which contact and cross-species transmission from primates are most likely (South America, sub-Saharan Africa and South East Asia). Indeed, certain HBV genotypes are specific to these three areas (F, E and B/C respectively). As another parallel with HIV-1, the mixture of HBV genotypes found outside these areas, such as in Europe and North America, may result from much more recent epidemic spread, in newly exposed susceptible groups such as IDUs and male homosexuals. The problem with the theory for a primate origin is that, to date, no HBV genotypes are shared between primates and humans, apart from the finding of a genotype E variant in a chimpanzee (Takahashi et al., 2000 ); if HBV genotypes A–F originated in primates, then the actual species involved in transmission to humans remain unidentified.

At this stage, the problems associated with each of the three hypotheses for the origin of HBV prevents any undisputed conclusions being drawn. However, the fact that such different hypotheses are being argued about highlights the current lack of understanding surrounding the origins of HBV. If the slow rates of sequence change, implied by both the human migration and the primate co-evolution hypotheses, can be verified, HBV would therefore represent another example of a virus evolving in a markedly different way from higher organisms, in this case perhaps the result of the complex constraints imposed by the existence of overlapping reading frames (Mizokami et al., 1997 ).

Origins of HCV

The final part of this Lecture returns to the hypotheses surrounding the origins of HCV, and discusses the application of the principles advanced for the origins of HGV/GBV-C and HBV to the reconstruction of its evolutionary history. The discussion includes the evidence we have for the rate of sequence change of HCV, an analysis of the geographical distribution of HCV genotypes and its genetic diversity, and finally the evidence for the existence of primate homologues of HCV, or of primates as immediate sources of infection in humans.

We believe we have relatively accurate values for the rate of sequence change of HCV, and these provide the starting point for investigating the dynamics of at least its recent evolution. A fortuitous opportunity to establish the rate for HCV arose from the availability of samples from individuals infected with HCV after exposure to a homogeneous, common source outbreak in the 1970s. The culprit was a batch of anti-rhesus D immunoglobulin (anti-D) used in 1977 in Ireland, which contained a highly viraemic component plasma donation from a recently infected individual (Power et al., 1994 , 1995 ). Sequence divergence in the NS5 and E1 regions between sequences recovered from the recipients 17–20 years later indicated rates of sequence change of 4·1 and 7·1x10-4 per site per year respectively (Smith et al., 1997b ), with no evidence for variation in rate between individuals.

Assuming this rate of sequence change is maintained over longer periods, the diversity of variants within each of the genotypes associated with the risk groups for HCV infection was assessed; these included types 1a, 1b and 3a in Western countries. For type 1b, 40 NS5 sequences from epidemiologically unrelated individuals in Europe, USA, Asia and Japan showed a distribution of pairwise distances approximately four times greater than those between anti-D recipients, indicating a time of divergence approximately 60–70 years ago (Smith et al., 1997b ). The absence of any country or region-specific phylogenetic groupings further implies that the initial spread of type 1b occurred relatively rapidly, and that it became disseminated throughout many of the world’s populations over a short period. These results are consistent with the prediction of recent, epidemic spread of HCV derived from mid-depth analysis (Holmes et al., 1995a ).

The diversity of sequences amongst type 3a variants was more restricted than for type 1b, suggesting a more recent dissemination (40 years based upon distances in NS5). In contrast, types 2a, 2b and particularly 2c were more diverse, with a predicted time of origin of 90–150 years ago. The genetic evidence of relatively recent spread of genotypes such as 3a into IDUs is consistent with the epidemiological evidence for the widespread increase in needle-sharing drug abuse since the 1960s, while the greater diversity of types 1b and subtypes of type 2 implies earlier, different modes of transmission, consistent with a range of other risk factors identifiable in older HCV-infected individuals (Simmonds et al., 1996a ; Lau et al., 1996; Tisminetzky et al., 1994 ; Pawlotsky et al., 1995 ; Goeser et al., 1995 ; McOmish et al., 1993 ). Genotype 4a infection is found at high frequency in the Middle East, particularly in Egypt, where there is evidence for the inadvertent, large-scale spread of HCV infection by unsterilized needles used for bilharzia treatment in the 1950s and 1960s (Frank et al., 2000 ).

Origin of HCV genotypes
Although the above-cited and other ongoing molecular epidemiology studies of HCV sequence diversity appear successful in documenting the relatively recent spread of HCV, the problems and controversy associated with the analysis of HGV/GBV-C and HBV sequences indicate the need for caution with reconstruction of the earlier history of HCV. In particular, for all the reasons discussed above, it would be rash to assume that the rates of sequence change measured over 20 years can be extrapolated to calculate the time of origin of much more divergent HCV genotypes, in view of the problems that this approach has caused for other viruses. The problem with HCV is that, apart from studies of genotype distributions, there is little other information currently available that would help towards identifying the source of the current epidemic HCV infection in the West.

HCV genotype distributions in non-Western countries are poorly documented, particularly in sub-Saharan Africa. What information is available, however, indicates a quite different pattern of sequence variability of the virus. For example, in Western African countries, such as Gambia, Ghana, Burkina-Faso, Benin and Guinea (Ruggieri et al., 1996 ; Wansbrough-Jones et al., 1998 ; Mellor et al., 1995 ; Jeannel et al., 1998 ), small scale surveys have indicated a predominance of infection with genotype 2. In contrast to Western countries, these type 2 infections are characterized by considerable sequence diversity, with different individuals each being infected with different subtypes, each in turn distinct from the 2a, 2b and 2c subtypes found in the West. Similarly, genotype 4 infection in Central Africa (Congo, Gabon, Central African Republic) is also characterized by extreme subtype diversity (Xu et al., 1994 ; Menendez et al., 1999 ; Stuyver et al., 1993 ; Fretz et al., 1995 ; Bukh et al., 1993 ), quite different from the epidemic pattern of type 4a infection in Egypt and elsewhere in the Middle East (see above). Combining a large number geographical surveys, HCV variants from five different regions in Africa and South East Asia contain areas of great subtype diversity (Fig. 8A).



View larger version (51K):
[in this window]
[in a new window]
 
Fig. 8. (A) Phylogenetic analysis of nucleotide sequences from part of the HCV NS5B region amplified from HCV-infected individuals, including those from sub-Saharan Africa and South East Asia. Note the much greater diversity and number of subtypes in genotypes 1, 2, 3, 4 and 6 compared with those found in Western countries (Fig. 2). (B) Approximate geographical distribution of regions where increased subtype diversity is found.

 
For another virus, HIV-1, the great sequence diversity in Central Africa provides genetic evidence for that area being the original source of the subsequent worldwide epidemic. By analogy, it could be imagined that the areas identified in Fig. 8(B) represent areas of origin for each of the genotypes that subsequently spread into Western countries. As with HIV-1, the sheer range of HCV sequences in sub-Saharan Africa and South East Asia argues for the long-term presence of HCV in human populations from these areas. Extending the analogy, the demographic and epidemiological factors that led to the global spread of HIV-1 in the last 30–40 years could be the same ones underlying the epidemic spread of HCV over the same period, although it is clear that spread of certain genotypes of HCV, such as types 1b and subtypes of genotype 2, occurred earlier. This hypothesis of HCV origins differs in being multifocal; Southern Asia appears to harbour the greatest diversity of genotypes 3 and 6, while types 1, 2, 4 and probably 5 would be of African origin. To substantiate this speculative hypothesis, we clearly need much more information on the epidemiology of HCV infection in these areas. In particular, it would be important to identify the transmission routes between individuals that are able to maintain HCV infection in a human population for the long periods implied by its genetic variability. There are few, if any, other examples of viruses infecting populations where the principal route of transmission is parenteral. Amongst possibilities for alternative routes, tribal scarification practices, sexual transmission between individuals with untreated sexually transmitted diseases, and mosquito or tick vectors possibly contribute to HCV transmission in these areas, but there is no evidence so far (McCarthy et al., 1994 ; Bellini et al., 1997 ; Silverman et al., 1996 ).

Intriguingly, the areas of greatest genetic diversity of HCV (Fig. 8B) are also those of greatest HBV prevalence. Perhaps if we understood more about the origins of HBV we might be able to explain this coincidental distribution of HCV diversity and the existence of shared factors that may maintain infections with both viruses in these communities. Pursuing the analogy with the origins of HIV-1 (and possibly HBV), there would also be a place for more intensive investigation of infection with HCV or related viruses in Old World primates. Indeed, a virus referred to as GBV-B, similar to HCV in genome organization and secondary structure of the 5'UTR, although highly divergent in sequences, was recently found in a captive tamarind, a New World primate (Simons et al., 1995 ). It may therefore show the same evolutionary relationship to HCV as GBV-A does with HGV/GBV-C. Old World primate species could conceivably carry a range of HCV-like viruses that recapitulate host inter-relationships in the same way as hypothesized for HGV/GBV-C, primate lentiviruses and perhaps HBV.

Concluding remarks

Even without definitive conclusions, this Lecture has at least documented the difficulties in reconstructing virus origins from contemporary, indirect genetic and epidemiological evidence. Furthermore, the incompatibility between predictions of relatively recent times of divergence of HBV and HGV/GBV-C with their global distributions in humans and primates in the wild further suggests that the molecular clock cannot be applied in the simple way that it has been to reconstruct evolutionary histories of other organisms.

Despite its shortcomings, much of the indirect evidence for virus origins that derives from studies of genotype distributions points to very long-term virus–host interrelationships. As discussed above, current evidence suggests that relatively closely related variants of HGV/GBV-C have evolved with their hosts over the period of human dispersal and primate speciation. Our own investigations of this discrepancy have concentrated on the constraints on sequence change, such as RNA secondary structure formation in HGV/GBV-C, that may lead to significant underestimation of evolutionary distances between more distantly related variants. Other authors have documented the unusual complications imposed by the use of multiple reading frames by HBV, and the lack of conventional synonymous sites in most of the genome (Mizokami et al., 1997 ).

Aside from these specific issues, a more general principle that shapes the evolution of viruses is their large population size. To paraphrase work in other fields, the numbers of multicellular, large organisms such as mammals are much more restricted, and this limits the population pool in which fitness selection occurs. Evidence for a lack of meaningful selection at the genome level includes their unnecessarily large genome sizes, for the most part packed with repetitive, mobile elements and other junk DNA, introns and often nonsensical redundancy in gene function. Taking this argument further, the lack of selection would extend to coding sequences, where fixation of mutations that have relatively minor effects on organism fitness could occur at similar frequency to genuinely neutral substitutions. Gene sequences may therefore diverge more or less randomly during speciation, and reproduce the linear relationship between time and degree of sequence divergence that underlies the molecular clock.

Bacteria and viruses are unlikely to evolve under such relaxed constraints. Indeed, their small genome sizes, an almost universal lack of introns or gene reduplications, and in the extreme case, such extreme economy in coding sequences that most of the genome contains multiple reading frames suggest a degree of fitness optimization absent from larger organisms. This process would clearly be facilitated by the large population sizes of both viruses and bacteria in their natural environments.

Population size and high mutation rates have generally been seen as factors enhancing the adaptive ability of viruses to cope with new pressures (such as antiviral treatment, immune recognition). However, the same factors can have precisely the opposite effect in stable environments. Large population sizes and the ability of more fit mutants to rapidly replace entire virus populations in the infected individual inevitably produce populations highly optimized for the environment in which they replicate. Mutants with sequence changes that had even a marginal harmful effect on virus fitness, such as a conservative amino acid substitution (or in the case of HGV/GBV-C, a synonymous substitution that disrupted secondary structure) would be quickly driven out by the 109–1010 other members of the population pool competing for cells to infect. In contrast, comparable substitutions, for example in the haemoglobin gene of an elephant living in a small breeding group, would have no significant impact on its reproductive fitness, and would be as likely to become fixed in the elephant population as neutral or even beneficial mutations.

The selection conditions operating during persistent virus infections can therefore drive out much of the phenotypic variability that characterizes the evolution of larger organisms, and may produce the extraordinary conservatism and evolutionary stasis observed in hepatitis viruses. Effectively, some viruses may have found their fitness peak for a particular host. Neither transmission bottlenecks nor any of the other processes associated with population drift are going to drive them from that peak, particularly where the high mutation rate of RNA viruses provides the means for rapid re-establishment of the original, fitness-optimized population. Large population sizes, the intense selection pressures that operate within them, and high mutation rates that promotes convergence to fitness peaks, may be the factors that set virus evolution apart from that of their hosts.

References

Adams, N. J., Prescott, L. E., Jarvis, L. M., Lewis, J. C. M., McClure, M. O., Smith, D. B. & Simmonds, P. (1998). Detection in chimpanzees of a novel flavivirus related to GB virus-C/hepatitis G virus. Journal of General Virology 79, 1871-1877.[Abstract]

Alter, H. J., Conrycantilena, C., Melpolder, J., Tan, D., Vanraden, M., Herion, D., Lau, D. & Hoofnagle, J. H. (1997). Hepatitis C in asymptomatic blood donors. Hepatology 26, 29-33.

Arauz-Ruiz, P., Norder, H., Visona, K. A. & Magnius, L. O. (1997). Genotype F prevails in HBV infected patients of Hispanic origin in Central America and may carry the precore stop mutant. Journal of Medical Virology 51, 305-312.[Medline]

Bellini, R., Casali, B., Carrieri, M., Zambonelli, C., Rivasi, P. & Rivasi, F. (1997). Aedes albopictus (Diptera: Culicidae) is incompetent as a vector of hepatitis C virus. APMIS 105, 299-302.[Medline]

Birkenmeyer, L. G., Desai, S. M., Muerhoff, A. S., Leary, T. P., Simons, J. N., Montes, C. C. & Mushahwar, I. K. (1998). Isolation of a GB virus-related genome from a chimpanzee. Journal of Medical Virology 56, 44-51.[Medline]

Bollyky, P. L. & Holmes, E. C. (1999). Reconstructing the complex evolutionary history of hepatitis B virus. Journal of Molecular Evolution 49, 130-141.[Medline]

Bollyky, P. L., Rambaut, A., Grassly, N., Carman, W. F. & Holmes, E. C. (1997). Hepatitis B virus has a New World evolutionary origin. Hepatology 26, 765.

Brown, A. J. L., Lobidel, D., Wade, C. M., Rebus, S., Phillips, A. N., Brettle, R. P., France, A. J., Leen, C. S., McMenamin, J., McMillan, A., Maw, R. D., Mulcahy, F., Robertson, J. R., Sankar, K. N., Scott, G., Wyld, R. & Peutherer, J. F. (1997). The molecular epidemiology of human immunodeficiency virus type 1 in six cities in Britain and Ireland. Virology 235, 166-177.[Medline]

Bukh, J. & Apgar, C. L. (1997). Five new or recently discovered (GBV-A) virus species are indigenous to New World monkeys and may constitute a separate genus of the Flaviviridae. Virology 229, 429-436.[Medline]

Bukh, J., Purcell, R. H. & Miller, R. H. (1992). Sequence analysis of the 5' noncoding region of hepatitis C virus. Proceedings of the National Academy of Sciences, USA 89, 4942-4946.[Abstract]

Bukh, J., Purcell, R. H. & Miller, R. H. (1993). At least 12 genotypes of hepatitis C virus predicted by sequence analysis of the putative E1 gene of isolates collected worldwide. Proceedings of the National Academy of Sciences, USA 90, 8234-8238.[Abstract/Free Full Text]

Chamberlain, R. W., Adams, N. J., Taylor, L. A., Simmonds, P. & Elliott, R. M. (1997). The complete coding sequence of hepatitis C virus genotype 5a, the predominant genotype in South Africa. Biochemical and Biophysical Research Communications 236, 44-49.[Medline]

Charrel, R. N., de Micco, P. & de Lamballerie, X. (1999). Phylogenetic analysis of GB viruses A and C: evidence for cospeciation between virus isolates and their primate hosts. Journal of General Virology 80, 2329-2335.[Abstract/Free Full Text]

Cuceanu, N. M., Tuplin, A. & Simmonds, P. (2001). Evolutionarily conserved RNA secondary structures in coding and non-coding sequences at the 3' end of the hepatitis G virus/GB-virus C genome. Journal of General Virology 82, 713-722.[Abstract/Free Full Text]

Davis, J. L., Heginbottom, J. A., Annan, A. P., Daniels, R. S., Berdal, B. P., Bergan, T., Duncan, K. E., Lewin, P. K., Oxford, J. S., Roberts, N., Skehel, J. J. & Smith, C. R. (2000). Ground penetrating radar surveys to locate 1918 Spanish Flu victims In permafrost. Journal of Forensic Science 45, 68-76.

de Lamballerie, X., Charrel, R. N., Attoui, H. & de Micco, P. (1997). Classification of hepatitis C virus variants in six major types based on analysis of the envelope 1 and nonstructural 5B genome regions and complete polyprotein sequences. Journal of General Virology 78, 45-51.[Abstract]

Enomoto, N., Takada, A., Nakao, T. & Date, T. (1990). There are two major types of hepatitis C virus in Japan. Biochemical and Biophysical Research Communications 170, 1021-1025.[Medline]

Erker, J. C., Desai, S. M., Leary, T. P., Chalmers, M. L., Montes, C. C. & Mushahwar, I. K. (1998). Genomic analysis of two GB virus A variants isolated from captive monkeys. Journal of General Virology 79, 41-45.[Abstract]

Felsenstein, J. (1993). PHYLIP Inference Package version 3.5. Department of Genetics, University of Washington, Seattle, WA, USA.

Feng, G., Yue, L., White, A. T., Pappas, P. G., Barchue, J., Greene, B. M., Sharp, P. M., Shaw, G. M. & Hahn, B. H. (1992). Human infection by genetically diverse SIVsm-related HIV-2 in West Africa. Nature 358, 495-499.[Medline]

Frank, C., Mohamed, M. K., Strickland, G. T., Lavanchy, D., Arthur, R. R., Magder, L. S., El Khoby, T., Abdel-Wahab, Y., Aly Ohn, E. S., Anwar, W. & Sallam, I. (2000). The role of parenteral antischistosomal therapy in the spread of hepatitis C virus In Egypt. Lancet 355, 887-891.[Medline]

Fretz, C., Jeannel, D., Stuyver, L., Herve, V., Lunel, F., Boudifa, A., Mathiot, C., Dethe, G. & Fournel, J. J. (1995). HCV infection in a rural population of the Central African Republic (CAR): evidence for three additional subtypes of genotype 4. Journal of Medical Virology 47, 435-437.[Medline]

Gao, F., Bailes, E., Robertson, D. L., Chen, Y., Rodenburg, C. M., Michael, S. F., Cummins, L. B., Arthur, L. O., Peeters, M., Shaw, G. M., Sharp, P. M. & Hahn, B. H. (1999). Origins of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature 397, 436-441.[Medline]

Goeser, T., Tox, U., Muller, H. M., Arnold, J. C. & Theilmann, L. (1995). Genotypes in chronic hepatitis C virus (HCV) infection and in liver cirrhosis caused by HCV in Germany. Deutsche Medizinische Wochenschrift 120, 1070-1073.[Medline]

Gonzalez-Perez, M. A., Norder, H., Bergstrom, A., Lopez, E., Visona, K. A. & Magnius, L. O. (1997). High prevalence of GB virus C strains genetically related to strains with Asian origin in Nicaraguan hemophiliacs. Journal of Medical Virology 52, 149-155.[Medline]

Grethe, S., Heckel, J. O., Rietschel, W. & Hufert, F. T. (2000). Molecular epidemiology of hepatitis B virus variants in nonhuman primates. Journal of Virology 74, 5377-5381.[Abstract/Free Full Text]

Haase, A. T., Henry, K., Zupancic, M., Sedgewick, G., Faust, R. A., Melroe, H., Cavert, W., Gebhard, K., Staskus, K., Zhang, Z. Q., Dailey, P. J., Balfour, H. H., Erice, A. & Perelson, A. S. (1996). Quantitative image analysis of HIV-1 infection in lymphoid tissue. Science 274, 985-989.[Abstract/Free Full Text]

Hannoun, C., Horal, P. & Lindh, M. (2000). Long-term mutation rates in the hepatitis B virus genome. Journal of General Virology 81, 75-83.[Abstract/Free Full Text]

Holland, J., Spindler, K., Horodyski, F., Grabau, E., Nichol, S. & Vandepol, S. (1982). Rapid evolution of RNA genomes. Science 215, 1577-1585.[Medline]

Holmes, E. C., Nee, S., Rambaut, A., Garnett, G. P. & Harvey, P. H. (1995a). Revealing the history of infectious disease epidemics using phylogenetic trees. Philosophical Transactions of the Royal Society of London B Biological Sciences 349, 33-40.

Holmes, E. C., Zhang, L. Q., Robertson, P., Cleland, A., Harvey, E., Simmonds, P. & Brown, A. J. L. (1995b). The molecular epidemiology of human immunodeficiency virus type 1 in Edinburgh. Journal of Infectious Diseases 171, 45-53.[Medline]

Hopkins, D. (1980). News from the field. Paleopathology Association Newsletter 31, 6.

Hu, X., Margolis, H. S., Purcell, R. H., Ebert, J. & Robertson, B. H. (2000). Identification of hepatitis B virus indigenous to chimpanzees. Proceedings of the National Academy of Sciences, USA 97, 1661-1664.[Abstract/Free Full Text]

Jeannel, D., Fretz, C., Traore, Y., Kohdjo, N., Bigot, A., Gamy, E. P., Jourdan, G., Kourouma, K., Maertens, G., Fumoux, F., Fournel, J. J. & Stuyver, L. (1998). Evidence for high genetic diversity and long-term endemicity of hepatitis C virus genotypes 1 and 2 in West Africa. Journal of Medical Virology 55, 92-97.[Medline]

Jones, S., Martin, R. & Pilbeam, D. (1992). Human Evolution. Cambridge: Cambridge University Press.

Katayama, Y., Apichartpiyakul, C., Handajani, R., Ishido, S. & Hotta, H. (1997). GB virus C hepatitis G virus (GBV-C/HGV) infection in Chiang Mai, Thailand, and identification of variants on the basis of 5'-untranslated region sequences. Archives of Virology 142, 2433-2445.[Medline]

Kenny-Walsh, E. (1999). Clinical outcomes after hepatitis C infection from contaminated anti- D immune globulin. New England Journal of Medicine 340, 1228-1233.[Abstract/Free Full Text]

Kolykhalov, A. A., Feinstone, S. M. & Rice, C. M. (1996). Identification of a highly conserved sequence element at the 3' terminus of hepatitis C virus genome RNA. Journal of Virology 70, 3363-3371.[Abstract]

Konomi, N., Miyoshi, C., La Fuente Zerain, C., Li, T. C., Arakawa, Y. & Abe, K. (1999). Epidemiology of hepatitis B, C, E, and G virus infections and molecular analysis of hepatitis G virus isolates in Bolivia. Journal of Clinical Microbiology 37, 3291-3295.[Abstract/Free Full Text]

Korber, B., Muldoon, M., Theiler, J., Gao, F., Gupta, R., Lapedes, A., Hahn, B. H., Wolinsky, S. & Bhattacharya, T. (2000). Timing the ancestor of the HIV-1 pandemic strains [See Comments]. Science 288, 1789-1796.[Abstract/Free Full Text]

Koretz, R. L., Abbey, H., Coleman, E. & Gitnick, G. (1993). Non-A, non-B posttransfusion hepatitis: looking back into the second decade. Annals of Internal Medicine 119, 110-115.[Abstract/Free Full Text]

Kumar, S. & Hedges, S. B. (1998). A molecular timescale for vertebrate evolution. Nature 392, 917-920.[Medline]

Kumar, S., Tamura, K. & Nei, M. (1993). MEGA: Molecular Evolutionary Genetics Analysis, version 1.0. Pennsylvania State University, University Park, PA, USA.

Lanford, R. E., Chavez, D., Brasky, K. M., Burns, R. B.III & Rico-Hesse, R. (1998). Isolation of a hepadnavirus from the woolly monkey, a New World primate. Proceedings of the National Academy of Sciences, USA 95, 5757-5761.[Abstract/Free Full Text]

Leary, T. P., Muerhoff, A. S., Simons, J. N., Pilot-Matias, T. J., Erker, J. C., Chalmers, M. L., Schlauder, G. G., Dawson, G. J., Desai, S. M. & Mushahwar, I. K. (1996). Sequence and genomic organization of GBV-C: a novel member of the Flaviviridae associated with human non-A–E hepatitis. Journal of Medical Virology 48, 60-67.[Medline]

Leary, T. P., Desai, S. M., Erker, J. C. & Mushahwar, I. K. (1997). The sequence and genomic organization of a GB virus A variant isolated from captive tamarins. Journal of General Virology 78, 2307-2313.[Abstract]

Li, W.-H., Wu, C.-I. & Luo, C.-C. (1985). A new method for estimating synonymous and non-synonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Molecular Biology and Evolution 2, 150-174.[Abstract]

Linnen, J., Wages, J., Zhangkeck, Z. Y., Fry, K. E., Krawczynski, K. Z., Alter, H., Koonin, E., Gallagher, M., Alter, M., Hadziyannis, S., Karayiannis, P., Fung, K., Nakatsuji, Y., Shih, J. W. K., Young, L., Piatak, M., Hoover, C., Fernandez, J., Chen, S., Zou, J. C., Morris, T., Hyams, K. C., Ismay, S., Lifson, J. D., Hess, G., Foung, S. K. H., Thomas, H., Bradley, D., Margolis, H. & Kim, J. P. (1996). Molecular cloning and disease association of hepatitis G virus: a transfusion-transmissible agent. Science 271, 505-508.[Abstract]

Liu, H. F., Muyembe-Tamfum, J. J., Dahan, K., Desmyter, J. & Goubau, P. (2000). High prevalence of GB virus C/hepatitis G virus in Kinshasa, Democratic Republic of Congo: a phylogenetic analysis. Journal of Medical Virology 60, 159-165.[Medline]

McCarthy, M. C., El-Tigani, A., Khalid, I. O. & Hyams, K. C. (1994). Hepatitis B and C in Juba, southern Sudan: results of a serosurvey. Transactions of the Royal Society of Tropical Medicine and Hygiene 88, 534-536.[Medline]

MacDonald, D. M., Holmes, E. C., Lewis, J. C. & Simmonds, P. (2000). Detection of hepatitis B virus infection in wild-born chimpanzees (Pan troglodytes verus): phylogenetic relationships with human and other primate genotypes. Journal of Virology 74, 4253-4257.[Abstract/Free Full Text]

McOmish, F., Chan, S.-W., Dow, B. C., Gillon, J., Frame, W. D., Crawford, R. J., Yap, P. L., Follett, E. A. C. & Simmonds, P. (1993). Detection of three types of hepatitis C virus in blood donors: investigation of type-specific differences in serological reactivity and rate of alanine aminotransferase abnormalities. Transfusion 33, 7-13.[Medline]

Magnius, L. O. & Norder, H. (1995). Subtypes, genotypes and molecular epidemiology of the hepatitis B virus as reflected by sequence variability of the S-gene. Intervirology 38, 24-34.[Medline]

Mansky, L. M. & Temin, H. M. (1995). Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase. Journal of Virology 69, 5087-5094.[Abstract]

Mellor, J., Holmes, E. C., Jarvis, L. M., Yap, P. L., Simmonds, P. & International Collaborators (1995). Investigation of the pattern of hepatitis C virus sequence diversity in different geographical regions: implications for virus classification. Journal of General Virology 76, 2493–2507.[Abstract]

Menendez, C., Sanchez-Tapias, J. M., Alonso, P. L., Gimenez-Barcons, M., Kahigwa, E., Aponte, J. J., Mshinda, H., Navia, M. M., Deanta, M. T. J., Rodes, J. & Saiz, J. C. (1999). Molecular evidence of mother-to-infant transmission of hepatitis G virus among women without known risk factors for parenteral infections. Journal of Clinical Microbiology 37, 2333-2336.[Abstract/Free Full Text]

Mison, L., Hyland, C., Poidinger, M., Borthwick, I., Faoagali, J., Aeno, U. & Gowans, E. (2000). Hepatitis G virus genotypes in Australia, Papua New Guinea and the Solomon Islands: a possible New Pacific type identified. Journal of Gastroenterology and Hepatology 15, 952-956.[Medline]

Mizokami, M., Gojobori, T., Ohba, K. I., Ikeo, K., Ge, X. M., Ohno, T., Orito, E. & Lau, J. Y. N. (1996). Hepatitis C virus types 7, 8 and 9 should be classified as type 6 subtypes. Journal of Hepatology 24, 622-624.[Medline]

Mizokami, M., Orito, E., Ohba, K., Ikeo, K., Lau, J. Y. & Gojobori, T. (1997). Constrained evolution with respect to gene overlap of hepatitis B virus. Journal of Molecular Evolution 44, 83-90.

Morin, P. A., Moore, J. J., Chakraborty, R., Jin, L., Goodall, J. & Woodruff, D. S. (1994). Kin selection, social structure, gene flow, and the evolution of chimpanzees. Science 265, 1193-1201.[Medline]

Muerhoff, A. S., Simons, J. N., Leary, T. P., Erker, J. C., Chalmers, M. L., Pilot-Matias, T. J., Dawson, G. J., Desai, S. M. & Mushahwar, I. K. (1996). Sequence heterogeneity within the 5'-terminal region of the hepatitis GB virus C genome and evidence for genotypes. Journal of Hepatology 25, 379-384.[Medline]

Muerhoff, A. S., Smith, D. B., Leary, T. P., Erker, J. C., Desai, S. M. & Mushahwar, I. K. (1997). Identification of GB virus C variants by phylogenetic analysis of 5'-untranslated and coding region sequences. Journal of Virology 71, 6501-6508.[Abstract]

Mukaide, M., Mizokami, M., Orito, E., Ohba, K., Nakano, T., Ueda, R., Hikiji, K., Iino, S., Shapiro, S., Lahat, N., Park, Y. M., Kim, B. S., Oyunsuren, T., Rezieg, M., Alahdal, M. N. & Lau, J. Y. N. (1997). Three different GB virus C/hepatitis G virus genotypes – phylogenetic analysis and A genotyping assay based on restriction fragment length polymorphism. FEBS Letters 407, 51-58.[Medline]

Nakao, H., Okamoto, H., Fukuda, M., Tsuda, F., Mitsui, T., Masuko, K., Lizuka, H., Miyakawa, Y. & Mayumi, M. (1997). Mutation rate of GB virus C hepatitis G virus over the entire genome and in subgenomic regions. Virology 233, 43-50.[Medline]

Nei, M. & Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and non-synonymous substitutions. Molecular Biology and Evolution 3, 418-426.[Abstract]

Norder, H., Courouce, A. M. & Magnius, L. O. (1994). Complete genomes, phylogenetic relatedness, and structural proteins of six strains of the hepatitis B virus, four of which represent two new genotypes. Virology 198, 489-503.[Medline]

Norder, H., Ebert, J. W., Fields, H. A., Mushahwar, I. K. & Magnius, L. O. (1996). Complete sequencing of a gibbon hepatitis B virus genome reveals a unique genotype distantly related to the chimpanzee hepatitis B virus. Virology 218, 214-223.[Medline]

Pawlotsky, J. M., Tsakiris, L., Roudot-Thoraval, F., Pellet, C., Stuyver, L., Duval, J. & Dhumeaux, D. (1995). Relationship between hepatitis C virus genotypes and sources of infection in patients with chronic hepatitis C. Journal of Infectious Diseases 171, 1607-1610.[Medline]

Power, J. P., Lawlor, E., Davidson, F., Yap, P. L., Kenny-Walsh, E., Whelton, M. J. & Walsh, T. J. (1994). Hepatitis C viraemia in recipients of Irish intravenous anti-D immunoglobulin. Lancet 344, 1166-1167.

Power, J. P., Lawlor, E., Davidson, F., Holmes, E. C., Yap, P. L. & Simmonds, P. (1995). Molecular epidemiology of an outbreak of infection with hepatitis C virus in recipients of anti-D immunoglobulin. Lancet 345, 1211-1213.[Medline]

Robertson, B., Myers, G., Howard, C., Brettin, T., Bukh, J., Gaschen, B., Gojobori, T., Maertens, G., Mizokami, M., Nainan, O., Netesov, S., Nishioka, K., Shini, T., Simmonds, P., Smith, D., Stuyver, L. & Weiner, A. (1998). Classification, nomenclature, and database development for hepatitis C virus (HCV) and related viruses: proposals for standardization. Archives of Virology 143, 2493-2503.[Medline]

Ruggieri, A., Argentini, C., Kouruma, F., Chionne, P., Dugo, E., Spada, E., Dettori, S., Sabbatani, S. & Rapicetta, M. (1996). Heterogeneity of hepatitis C virus genotype 2 variants in West Central Africa (Guinea Conakry). Journal of General Virology 77, 2073-2076.[Abstract]

Sathar, M. A., Soni, P. N., Pegoraro, R., Simmonds, P., Smith, D. B., Dhillon, A. P. & Dusheiko, G. M. (1999). A new variant of GB virus C/hepatitis G virus (GBV-C/HGV) from South Africa. Virus Research 64, 151-160.[Medline]

Seeff, L. B. (1997). Natural history of hepatitis C. Hepatology 26, 21-28.

Seeff, L. B., Buskell-Bales, Z., Wright, E. C., Durako, S. J., Alter, H. J., Iber, F. L., Hollinger, F. B., Gitnick, G., Knodell, R. G., Perrillo, R. P., Stevens, C. E. & Hollingsworth, C. G. (1992). Long-term mortality after transfusion-associated non A hepatitis, non B hepatitis. New England Journal of Medicine 327, 1906-1911.[Abstract]

Seeff, L. B., Miller, R. N., Rabkin, C. S., Buskell-Bales, Z., Straley-Eason, K. D., Smoak, B. L., Johnson, L. D., Lee, S. R. & Kaplan, E. L. (2000). 45-year follow-up of hepatitis C virus infection in healthy young adults. Annals of Internal Medicine 132, 105-111.[Abstract/Free Full Text]

Silverman, A. L., McCray, D. G., Gordon, S. C., Morgan, W. T. & Walker, E. D. (1996). Experimental evidence against replication or dissemination of hepatitis C virus in mosquitoes (Diptera: Culicidae) using detection by reverse transcriptase polymerase chain reaction. Journal of Medical Entomology 33, 398-401.[Medline]

Simmonds, P. (1998). Variability of the hepatitis C virus genome. In Hepatitis C Virus , pp. 38-63. Edited by H. W. Reesink. Basel: Karger.

Simmonds, P. & Smith, D. B. (1999a). Hepatitis C and G viruses – old or new? In HIV and the New Viruses , pp. 459-480. Edited by A. G. Dalgleish & R. A. Weiss. San Diego: Academic Press.

Simmonds, P. & Smith, D. B. (1999b). Structural constraints on RNA virus evolution. Journal of Virology 73, 5787-5794.[Abstract/Free Full Text]

Simmonds, P., Alberti, A., Alter, H. J., Bonino, F., Bradley, D. W., Brechot, C., Brouwer, J. T., Chan, S. W., Chayama, K., Chen, D. S., Choo, Q. L., Colombo, M., Cuypers, H. T. M., Date, T., Dusheiko, G. M., Esteban, J. I., Fay, O., Hadziyannis, S. J., Han, J., Hatzakis, A., Holmes, E. C., Hotta, H., Houghton, M., Irvine, B., Kohara, M., Kolberg, J. A., Kuo, G., Lau, J. Y. N., Lelie, P. N., Maertens, G., McOmish, F., Miyamura, T., Mizokami, M., Nomoto, A., Prince, A. M., Reesink, H. W., Rice, C., Roggendorf, M., Schalm, S. W., Shikata, T., Shimotohno, K., Stuyver, L., Trepo, C., Weiner, A., Yap, P. L. & Urdea, M. S. (1994). A proposed system for the nomenclature of hepatitis C viral genotypes. Hepatology 19, 1321-1324.[Medline]

Simmonds, P., Mellor, J., Craxi, A., Sanchez-Tapias, J. M., Alberti, A., Prieto, J., Colombo, M., Rumi, M. G., Loiacano, O., Ampurdanesmingall, S., Fornsbernhardt, X., Chemello, L., Civeira, M. P., Frost, C. & Dusheiko, G. (1996a). Epidemiological, clinical and therapeutic associations of hepatitis C types in western European patients. Journal of Hepatology 24, 517-524.[Medline]

Simmonds, P., Mellor, J., Sakuldamrongpanich, T., Nuchaprayoon, C., Tanprasert, S., Holmes, E. C. & Smith, D. B. (1996b). Evolutionary analysis of variants of hepatitis C virus found in South-East Asia: comparison with classifications based upon sequence similarity. Journal of General Virology 77, 3013-3024.[Abstract]

Simons, J. N., Pilot-Matias, T. J., Leary, T. P., Dawson, G. J., Desai, S. M., Schlauder, G. G., Muerhoff, A. S., Erker, J. C., Buijk, S. L., Chalmers, M. L., Vansant, C. L. & Mushahwar, I. K. (1995). Identification of two flavivirus-like genomes in the GB hepatitis agent. Proceedings of the National Academy of Sciences, USA 92, 3401-3405.[Abstract]

Smith, D. B., Cuceanu, N., Davidson, F., Jarvis, L. M., Mokili, J. L. K., Hamid, S., Ludlam, C. A. & Simmonds, P. (1997a). Discrimination of hepatitis G virus/GBV-C geographical variants by analysis of the 5' non-coding region. Journal of General Virology 78, 1533-1542.[Abstract]

Smith, D. B., Pathirana, S., Davidson, F., Lawlor, E., Power, J., Yap, P. L. & Simmonds, P. (1997b). The origin of hepatitis C virus genotypes. Journal of General Virology 78, 321-328.[Abstract]

Smith, D. B., Basaras, M., Frost, S., Haydon, D., Cuceanu, N., Prescott, L., Kamenka, C., Millband, D., Sathar, M. A. & Simmonds, P. (2000). Phylogenetic analysis of GBV-C/hepatitis G virus. Journal of General Virology 81, 769-780.[Abstract/Free Full Text]

Stuyver, L., Rossau, R., Wyseur, A., Duhamel, M., Vanderborght, B., Van Heuverswyn, H. & Maertens, G. (1993). Typing of hepatitis C virus isolates and characterization of new subtypes using a line probe assay. Journal of General Virology 74, 1093-1102.[Abstract]

Suzuki, Y. & Gojobori, T. (1997). The origin and evolution of Ebola and Marburg viruses. Molecular Biology and Evolution 14, 800-806.[Abstract]

Takahashi, K., Brotman, B., Usuda, S., Mishiro, S. & Prince, A. M. (2000). Full-genome sequence analyses of hepatitis B virus (HBV) strains recovered from chimpanzees infected in the wild: implications for an origin of HBV. Virology 267, 58-64.[Medline]

Tanaka, Y., Mizokami, M., Orito, E., Ohba, K., Kato, T., Kondo, Y., Mboudjeka, I., Zekeng, L., Kaptue, L., Bikandou, B., Mpele, P., Takehisa, J., Hayami, M., Suzuki, Y. & Gojobori, T. (1998a). African origin of GB virus C hepatitis G virus. FEBS Letters 423, 143-148.[Medline]

Tanaka, Y., Mizokami, M., Orito, E., Ohba, K. I., Nakano, T., Kato, T., Kondo, Y., Ding, X., Ueda, R., Sonoda, S., Tajima, K., Miura, T. & Hayami, M. (1998b). GB virus C/hepatitis G virus infection among Colombian native Indians. American Journal of Tropical Medicine and Hygiene 59, 462-467.[Abstract/Free Full Text]

Thomas, H. C. & Jacyna, M. R. (1993). Hepatitis B virus: pathogenesis and treatment of chronic infection. In Viral Hepatitis , pp. 185-207. Edited by A. J. Zuckerman & H. C. Thomas. Edinburgh: Churchill Livingstone.

Tisminetzky, S. G., Gerotto, M., Pontisso, P., Chemello, L., Ruvoletto, M. G., Baralle, F. & Alberti, A. (1994). Genotypes of hepatitis C virus in Italian patients with chronic hepatitis C. International Hepatological Communications 2, 105-112.

Tucker, T. J., Smuts, H., Eickhaus, P., Robson, S. C. & Kirsch, R. E. (1999). Molecular characterization of the 5' non-coding region of South African GBV-C/HGV isolates: major deletion and evidence for a fourth genotype. Journal of Medical Virology 59, 52-59.[Medline]

Vaudin, M., Wolstenholme, A. J., Tsiquaye, K. N., Zuckerman, A. J. & Harrison, T. J. (1988). The complete nucleotide sequence of the genome of a hepatitis B virus isolated from a naturally infected chimpanzee. Journal of General Virology 69, 1383-1389.[Abstract]

Wansbrough-Jones, M. H., Frimpong, E., Cant, B., Harris, K., Evans, M. R. W. & Teo, C. G. (1998). Prevalence and genotype of hepatitis C virus infection in pregnant women and blood donors in Ghana. Transactions of the Royal Society of Tropical Medicine and Hygiene 92, 496-499.[Medline]

Warren, K. S., Heeney, J. L., Swan, R. A., Heriyanto & Verschoor, E. J. (1999). A new group of hepadnaviruses naturally infecting orangutans (Pongo pygmaeus). Journal of Virology 73, 7860–7865.[Abstract/Free Full Text]

Wells, C. (1964). Bones, Bodies and Disease. London: Thames and Hudson.

Wiese, M., Berr, F., Lafrenz, M., Porst, H. & Oesen, U. (2000). Low frequency of cirrhosis in a hepatitis C (genotype 1B) single-source outbreak in Germany: a 20-year multicenter study. Hepatology 32, 91-96.[Medline]

Wildy, P., Field, H. J. & Nash, A. A. (1982). Classical herpes latency revisited. In Virus Persistence , pp. 133-167. Edited by B. W. J. Mahy, A. C. Minson & G. K. Darby. Cambridge: Cambridge University Press.

Xiang, J., Wunschmann, S., Schmidt, W., Shao, J. & Stapleton, J. T. (2000). Full-length GB virus C (hepatitis G virus) RNA transcripts are infectious in primary CD4-positive T Cells. Journal of Virology 74, 9125-9133.[Abstract/Free Full Text]

Xu, L. Z., Larzul, D., Delaporte, E., Brechot, C. & Kremsdorf, D. (1994). Hepatitis C virus genotype 4 is highly prevalent in Central Africa (Gabon). Journal of General Virology 75, 2393-2398.[Abstract]

Zanotto, P. M. D., Gould, E. A., Gao, G. F., Harvey, P. H. & Holmes, E. C. (1996). Population dynamics of flaviviruses revealed by molecular phylogenies. Proceedings of the National Academy of Sciences, USA 93, 548-553.[Abstract/Free Full Text]

Zhang, G., Haydon, D. T., Knowles, N. J. & McCauley, J. W. (1999). Molecular evolution of swine vesicular disease virus. Journal of General Virology 80, 639-651.[Abstract]

Zhu, T. F., Korber, B. T., Nahmias, A. J., Hooper, E., Sharp, P. M. & Ho, D. D. (1998). An African HIV-1 sequence from 1959 and implications for the origin of the epidemic. Nature 391, 594-597.[Medline]

Zuckerman, A. J., Thornton, A., Howard, C. R., Tsiquaye, K. N., Jones, D. M. & Brambell, M. R. (1978). Hepatitis B outbreak among chimpanzees at the London Zoo. Lancet ii, 652–654.