On phylogenetic relationships among major lineages of the Gammaherpesvirinae

Duncan J. McGeoch, Derek Gatherer and Aidan Dolan

Medical Research Council Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G11 5JR, UK

Correspondence
Duncan J. McGeoch
d.mcgeoch{at}vir.gla.ac.uk


   ABSTRACT
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Phylogenetic relationships within the subfamily Gammaherpesvirinae of the family Herpesviridae were investigated for three species in the genus Lymphocryptovirus (or {gamma}1 group) and nine in the genus Rhadinovirus (or {gamma}2 group). Alignments of amino acid sequences from up to 28 genes were used to derive trees by maximum-likelihood and Bayesian Monte Carlo Markov chain methods. Two problem areas were identified involving an unresolvable multifurcation for a clade within the {gamma}2 group, and a high divergence for Murid herpesvirus 4 (MHV4). A robust final tree was obtained, which was valid for genes from across the virus genomes and was rooted by reference to previous analyses of the whole family Herpesviridae. This tree comprised four major lineages: the {gamma}1 group of primate viruses; a clade of artiodactyl {gamma}2 viruses; a clade of perissodactyl {gamma}2 viruses; and a clade of {gamma}2 viruses with a multifurcation at its base and containing Old World and New World primate viruses, Bovine herpesvirus 4 and MHV4. Developing previous work it was proposed, on the basis of similarities between the gammaherpesvirus tree and the tree of corresponding mammalian hosts, that the first three of these major viral lineages arose in a coevolutionary manner with host lineages, while the fourth had its origin in an ancient interspecies transfer. Transfer of dates from mammalian palaeontology then allowed estimation of dates for nodes in the gammaherpesvirus tree.


   INTRODUCTION
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Understanding of evolutionary relationships among members of the family Herpesviridae has improved greatly over the last decade, with application of computer-based methods of phylogenetic analysis to the increasing amounts of DNA sequences available for herpesvirus genomes (McGeoch & Cook, 1994; McGeoch et al., 1995, 2000). Mammalian herpesviruses are now clearly seen all to have a common evolutionary origin, from which they have diverged widely to form three subfamilies, the Alpha-, Beta- and Gammaherpesvirinae (Minson et al., 2000). Herpesviruses with avian and reptilian hosts so far characterized belong to the Alphaherpesvirinae (Minson et al., 2000; McGeoch & Gatherer, 2005), but herpesviruses of amphibians and fish have only a very distant relationship to the three subfamilies, and the one characterized invertebrate herpesvirus (of oysters) makes a third distinct grouping (Davison, 2002). Within subfamilies of the Herpesviridae a substantial subset of the phylogenetic tree's branching features reflect patterns in the tree for lineages of mammalian hosts, suggesting coevolution of host and virus lineages, and this correspondence has enabled a timescale to be constructed for herpesvirus phylogeny (McGeoch & Cook, 1994; McGeoch et al., 1995; McGeoch & Gatherer, 2005).

The subfamily Gammaherpesvirinae is presently divided into two genera, namely Lymphocryptovirus (or {gamma}1 group) and Rhadinovirus (or {gamma}2 group). The {gamma}1 group contains Epstein–Barr virus (EBV) (see Table 1 for abbreviations of virus names) and its primate relatives, while the {gamma}2 group contains herpesviruses with hosts of many mammalian taxa; those treated in this paper have primate, rodent, and artiodactyl and perissodactyl ungulate hosts; human herpesvirus 8 (HHV8) is the single known human virus in the {gamma}2 group. Recent years have seen much activity in discovery and characterization of gammaherpesviruses. Until recently only Old World primate (OWP) viruses were known in the {gamma}1 group, but a {gamma}1 virus of a New World primate (NWP) has now been described (Callitrichine herpesvirus 3, CHV-3) (Rivailler et al., 2002a). In the {gamma}2 group, many new viruses have been detected recently, with hosts that include primate, ungulate and carnivore species (for instance: Rovnak et al., 1998; Greensill et al., 2000; Lacoste et al., 2000; Schultz et al., 2000; Banks et al., 2002; Kleiboeker et al., 2002; Ehlers & Lowden, 2004).


View this table:
[in this window]
[in a new window]
 
Table 1. Species of the Gammaherpesvirinae included in phylogenetic analyses

 
Previous work on phylogeny of the Gammaherpesvirinae fell short of achieving a tree that was completely resolved and satisfactorily interpreted in terms of possible virus–host coevolution (McGeoch et al., 2000; McGeoch, 2001); complications included aberrant behaviour of one virus (Murid herpesvirus 4, MHV4) apparently associated with a particularly high level of sequence change, and inability to completely resolve branching order within the {gamma}2 group. Our aim for this paper was to reanalyse gammaherpesvirus phylogeny with increased discrimination, by treating only gammaherpesvirus sequences and thereby enabling use of a larger set of genes than was possible in our earlier analyses, which encompassed all three subfamilies. We succeeded in constructing a robust tree (although still with a multifurcation) and were able to propose a global interpretation of the relationship of gammaherpesvirus evolution to that of host lineages.


   METHODS
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
General computational handling of molecular sequences.
Genomic sequences were obtained from the EMBL library, as listed in Table 1. Sequence handling used the GCG (accelrys) and EMBOSS packages. Alignment of amino acid sequences was carried out by combined use of three alignment programs, as described previously (McGeoch et al., 2000). Positions in alignments with gaps introduced by the alignment programs were removed before use for phylogenetic inference, as were any regions too diverged to align.

Computational analyses of phylogenetic trees.
The PHYLIP package (version 3.6; Felsenstein, 1989) was used to construct trees by the neighbour-joining method and to display and manipulate trees. PROTML in the MOLPHY package (Adachi & Hasegawa, 1994) was used in preliminary maximum-likelihood (ML) evaluations of alignments of amino acid sequences and to generate lists of tree topologies. TREEADDER (McGeoch & Gatherer, 2005) was used to comprehensively interpolate additional species into tree topologies.

For inference of phylogenetic trees, measures of divergence between amino acid sequences were in all cases made using the matrix of Jones et al. (1992). In-depth phylogenetic analyses were undertaken by two approaches. In the first, CODEML in the PAML package (version 3.14; Yang, 1997) was used to make ML evaluations of aligned sets of amino acid sequences for sets of candidate trees, mostly with a discrete gamma distribution of eight classes of substitution rate across sites. Files output by CODEML containing log likelihoods for alignment sites were then processed by programs in the CONSEL package (version 0.1f; Shimodaira & Hasegawa, 2001) to score trees by the approximately unbiased (AU) test and by multi-scaled bootstrap proportions (BP) (Shimodaira, 2002).

In the second approach, Bayesian analysis using Monte Carlo Markov chains (BMCMC) was carried out on amino acid sequence alignments with MrBayes (version 3; Ronquist & Huelsenbeck, 2003), to generate posterior probability (PP) distributions of trees. Starting trees were randomly chosen and multiple runs of the program were generally made with different starting trees, to check convergence of the process. The program's defaults for prior probability settings were used. BMCMC processes incorporated a discrete gamma distribution of four classes of substitution rate across sites, included one ‘cold’ and three ‘heated’ chains, and were run for 250 000 or 500 000 generations. Output trees were sampled every 50 or 100 generations and typically the first 1000 trees collected were discarded to allow the process to reach stationarity.

Estimations of dates for phylogenetic events.
Dates for nodes in herpesvirus trees were derived on the basis of equating particular nodes to palaeontological dates in host lineages. Two methods were then employed. The first was to calculate by CODEML a molecular clock version of the rooted ML tree. This enforces a single rate of change across all branches, and the rate was then converted to calendar time by regression of the divergences for calibrating nodes against the calibration dates (using MINITAB). The second method used the program r8s (version 1.5; Sanderson, 2003), which aims to smooth differences in rates for branches of previously computed trees, without imposing the globally uniform rate of a molecular clock. The penalized likelihood and quasi-newtonian optimization options of the program were used.


   RESULTS
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Identification of gammaherpesvirus gene sets for phylogenetic analysis
Complete or largely complete genome sequences were available for 12 species of the Gammaherpesvirinae, as listed in Table 1. Three species are in the {gamma}1 group, namely CHV3, EBV and Rhesus lymphocryptovirus (RLV), and the other nine are in the {gamma}2 group. The amino acid sequences from 50 orthologous sets of genes were evaluated for use in phylogenetic analysis. For each gene set an alignment of amino acid sequences was constructed, as described previously (McGeoch et al., 2000). Twenty-eight of the alignments were judged to be of potentially usable quality for inferring phylogenetic trees, by criteria of showing a substantive level of identical residue types at aligned loci and a low incidence of gapping introduced by the alignment procedure, while the remainder were discarded as too diverged to yield reliable trees. Regions of low quality alignment and loci with a gap character in any sequence were then removed from each of the 28 alignment sets. Characteristics of the 28 sets are listed in Table 2. Twenty-one sets had all 12 species represented and seven had 11 species but lacked a Porcine lymphotropic herpesvirus 1 (PLHV1) member. Phylogenetic analyses were carried out on concatenated subsets of these alignments as described in the following sections.


View this table:
[in this window]
[in a new window]
 
Table 2. Gammaherpesvirus genes used for phylogenetic analyses

 
Phylogenetic evaluation of an eight-gene alignment of amino acid sequences
In previous work we identified eight genes that had orthologues in all sequenced genomes of mammalian and avian herpesviruses of the Alpha-, Beta- and Gammaherpesvirinae and whose encoded amino acid sequences were sufficiently conserved to allow alignments of quality suitable for phylogenetic analysis across the three subfamilies (McGeoch et al., 1995, 2000). In the present study we first examined this primary set for the 12 gammaherpesviruses, comprising genes 06, 07, 08, 09, 25, 29, 44 and 46 in the HVS nomenclature used in Table 2. The concatenated alignment for these eight sets of amino acid sequences, referred to as the 8x12 set, was 5862 residues long and so is a large dataset by usual standards. Our initial investigation employed the strategy of using increasingly compute-intensive programs to evaluate successively smaller sets of candidate trees, finally employing ML evaluation with allowance for variation of rates across sites, as described previously (McGeoch et al., 2000). From this analysis (results not shown) it became evident that no single tree was clearly best, and that we should instead consider sets of top-scoring trees. We therefore revised our approach to encompass tree scoring methods that represented best current practice (although at high computational cost). The results shown in this paper are primarily based on two types of analysis: first, extensive ML evaluation with a limited number of operational taxonomic units (OTUs) and application of a sophisticated scoring scheme; and second, the BMCMC method. The results obtained initially and by the two subsequent methods were all closely equivalent.

From considerations of tree scoring and analytical thoroughness we wished to examine sets of possible trees that were as complete as practicable. However, for feasibility in ML computations we could process exhaustively tree sets based on at most seven distinct OTUs. Virus species were therefore lumped into OTUs in cases that we regarded as uncontentious, based on previous analyses. The OTUs defined are illustrated by Fig. 1(a), which shows a tree for the 8x12 set derived as a starting point by the simple clustering method of neighbour-joining with bootstrap analysis. For ML analyses, the three {gamma}1 viruses (CHV3, EBV and RLV) were assigned to a single OTU with the NWP virus CHV3 as sister group to the two OWP viruses, the {gamma}2 OWP HHV8 and Rhesus rhadinovirus (RRV) were assigned to one OTU, as were the NWP herpesvirus ateles (HVA) and herpesvirus saimiri (HVS), and the artiodactyl Alcelaphine herpesvirus 1 (AHV1) and PLHV1. This gave a total of seven OTUs, for which 945 bifurcating unrooted trees are possible. We carried out a retrospective check on the validity of using these OTUs, to be described.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 1. Phylogenetic trees for the 8x12 alignment set. (a) Neighbour-joining tree. Bootstrap percentages for internal branches and divergence scale (amino acid substitutions per site) are shown. This is an unrooted tree, but with the {gamma}1 group (EBV, RLV and CHV3) drawn to show it as outgroup for the {gamma}2 group and thus root the {gamma}2 subtree. The branches for species assigned as single OTUs in subsequent ML analyses are shown as heavy lines. (b–g) Top-scoring ML trees. The six top trees by the AU and BP tests are shown, with AU, BP and {Delta}lnL scores for each; the highest likelihood tree is (b). Trees (b) and (c) had Bayesian PPs of 0·464 and 0·534, respectively, and all other trees had PP<=0·001. Trees (b–g) are unrooted and their divergence scale is at lower left.

 
ML evaluation of all 945 trees was carried out with the CODEML program of the PAML package, with a discrete gamma distribution of rates across sites. Output from CODEML was then processed with the CONSEL software to compute for each tree the value for the AU test and the multi-scaled bootstrap proportion. Confidence sets of trees were made for which a test measure (AU or BP) was greater than 0·05. The AU-derived set contained 16 trees and the BP set contained six trees that were a subset of the AU set; Fig. 1(b)–(g) depict the trees from the BP set. The four trees with highest likelihood values from CODEML also had the highest AU and BP scores. Separately, BMCMC analysis of the 8x12 alignment with the MrBayes program gave a confidence set (PP>0·01) which contained only two trees, corresponding to the top-scoring two trees in the AU and BP sets (Fig. 1b and c). Thus, the AU, BP and PP measures identified consistent sets of trees, but with the AU set the largest and the PP set the smallest.

Inspection of the trees in Fig. 1 (b–g) showed that they can all be regarded as based on a constant 10-species tree (with the topology of that shown in Fig. 2a) but with Bovine herpesvirus 4 (BHV4) and MHV4 each appearing at a variety of loci on this tree. The 10 trees in the AU set that are not included in Fig. 1 also conformed to this description. We therefore adopted, as a working hypothesis to facilitate further analysis, the view that inability to identify a single best tree reflected some property of the BHV4 and MHV4 sequences. BHV4 and MHV4 entries were then removed from the 8x12 alignment and the resulting 8x10 dataset evaluated with CODEML/CONSEL and MrBayes. This analysis gave confidence sets containing only the single tree shown in Fig. 2(a), and we now refer to this as the ‘standard’ 10-species tree. At this point we carried out a check on the validity of our use of OTUs containing more than one species in the analysis described above: BHV4 and MHV4 were added back to the standard tree topology independently at every possible locus, to give a set of 323 trees for 12 species (so that trees with BHV4 and MHV4 within loci corresponding to the previous multiple-species OTUs were now included). This set of trees was then evaluated with CODEML/CONSEL, but no novel trees emerged in the resulting confidence sets, indicating that the use of multiple-species OTUs had been in order.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 2. Phylogenetic trees for 8x10 and 8x11 alignment sets. (a) Shows the single top ML tree for the 8x10 set, as for Fig. 1. (b) Summarizes investigations of a set of trees with BHV4 added at every possible locus to the 8x10 tree topology. The figure shows tree topology for the {gamma}2 subtree only. For each tree in the confidence set AU>0·05, the top BP scores are listed at the branch corresponding to the BHV4 node. The figure above each branch is for CODEML evaluation with allowance for variation in rates across sites, and the figure below each branch for CODEML evaluation with a uniform rate imposed. (c) Shows the equivalent analysis for MHV4.

 
We next carried out analyses that excluded singly either MHV4 or BHV4. Each such 8x11 set contained six OTUs (defined as before) and thus 105 possible bifurcating unrooted trees. ML and BMCMC analyses were carried out as before. In addition, for comparative purposes analyses were run that were equivalent except that a uniform rate was imposed for all sites. Results showing BP values are summarized in Fig. 2(b) and (c). For BHV4 (Fig. 2b), the BP values indicate a weak leaning to BHV4 being a sister group to HHV8/RRV; conservatively, they are to be taken as showing that the branching order for BHV4, HHV8/RRV and HVA/HVS is not resolved. BMCMC modelling (both uniform and variable rate), however, gave strong support for BHV4 as sister to HHV8/RRV (PP=1·000). For MHV4 (Fig. 2c), there is support for possible MHV4 nodes deeper into the tree than with BHV4. The situation is more complicated with MHV4, since the distribution of BP values differs between the analyses with uniform and variable substitution rate across sites: the more elaborate modelling with rate variation gives higher BP values toward the HHV8/RRV clade. BMCMC gave a similar trend, with PP values for MHV4 as a sister of HHV8/RRV of 0·023 and 0·680 in the uniform and variable rate analyses, respectively.

We interpret these findings as follows. First, variability in the BHV4 locus is now seen to represent a straightforward case of very closely spaced nodes for BHV4, HHV8/RRV and HVA/HVS rather than indicating some property truly specific to BHV4. Second, we note that the results for MHV4 locus vary markedly with the analytical method. Harking back to Fig. 1(a) it can be seen that the simplest method employed, that of the neighbour-joining algorithm, gives the deepest node for MHV4, while the two CODEML analyses (Fig. 2c) push the MHV4 node toward the HHV8/RRV clade, the more elaborate version more so. We take this correlation with analytical refinement to indicate that the MHV4 sequences possess some characteristic relative to those of the other species that requires a superior modelling process for optimal outcome, and we presume that the problem lies in an atypically high rate of substitution in MHV4 sequences as evidenced by the long terminal MHV4 branch seen in all trees. Finally, these two factors (of closely spaced nodes and idiosyncratic data) have overlapping ranges of action, so that together they acted in the analyses of Fig. 1 to obscure phylogenetic inference effectively.

The ML trees shown for the 8x12 set in Fig. 1 are, except for the low-scoring tree in part (g), compatible with treating the HHV8/RRV, HVA/HVS, BHV4 and MHV4 lineages as branching in indistinguishable order from a single multifurcation. The ML tree with this multifurcation is shown in Fig. 4(a) (in rooted form, to be described below). We regard this as a conservative, well justified representation of the relationships among {gamma}2 lineages. For further discussion we refer to the clade containing HHV8/RRV, HVA/HVS, BHV4 and MHV4 as the MF (for ‘multifurcated’) clade. We postpone to after analysis of other alignment sets the question of whether the BMCMC analysis, with its apparently more focused confidence sets, might yield a better resolved tree.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 4. Multifurcated trees. (a) Presents the 8x12 ML tree with a quadrifurcated clade and with root locus indicated, as discussed in the text. (b) Shows a comparison of corresponding branch lengths between branches of the 8x12 multifurcated tree (x axis) and those in the gammaherpesvirus portion of a tree derived for all three subfamilies (y axis) (data from McGeoch et al., 2000); the line represents the best fit constrained to pass through the origin. (c) Presents the ‘problem’ region from the consensus tree for the 8x12 set derived by BMCMC; cumulative PP=0·998 (two component trees). (d) Gives the equivalent portion of the consensus tree for the 21x12 set derived by BMCMC; PP=0·978 (single tree). The divergence scale is for all three trees.

 
Characteristics of gammaherpesvirus trees based on specific genomic regions
We were interested to ascertain whether the phylogenetic view derived with the 8-gene set was valid across the virus genomes. It might be, for instance, that the phenomenon of apparent accelerated change in MHV4 sequences is specific to certain genes or that recombination between lineages has occurred. For studying relationships across the virus genomes (which are collinear in their order of common genes), the HVS sequence was chosen as the reference for genomic loci. Alignments of 27 from the 28 sets of potentially usable genes were assigned to eight groups, designated 10K–80K, that were defined by the locations of genes in successive 10 kbp segments of the HVS genome; gene 29, which has two widely separated exons, was omitted. The genes in each group are listed in Table 2. The concatenated alignments for groups 10K through 80K contained, successively, 2174, 909, 1318, 1722, 1585, 1665, 1251 and 1212 aa residues. The 70K and 80K groups lacked data for PLHV1. This sampling process represented sequences ranging from nt 12584 to nt 86491 within the 112930 nt unique portion of the HVS genome.

Each alignment was analysed by CODEML/CONSEL (with test sets of 945 trees as above) and by MrBayes. These exercises are summarized in Fig. 3 in the condensed form of majority-rule consensus trees from the BMCMC analyses. With one exception, the trees in Fig. 3 show a similar condition to those in Fig. 1 in that they have the topology of the standard 10-species tree (as defined above) plus a variety of loci for BHV4 and MHV4. The exception, Fig. 3(f), is for the 60K group, and here the locations of nodes for Equid herpesvirus 2 (EHV2) and HVA/HVS/BHV4 are reversed relative to the standard tree; we note, however, that this atypical configuration is associated with relatively low support in the PP distribution, and accordingly consider that the result for the 60K group can be discounted. In addition, all the trees in Fig. 3 show a long terminal branch for MHV4. We interpret these results as indicating that, despite considerable noise, the smaller datasets show general consistency with the 8x12 set both in tree topology and in uniquely high level of substitution for the MHV4 terminal branch, so that these features should be taken as conserved across the genomes.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 3. Gammaherpesvirus consensus trees based on specific genomic regions. Majority-rule consensus trees from BMCMC analyses are presented for the eight regional datasets 10K to 80K (as listed in Table 2) in (a–h) consecutively. Only the {gamma}2 region is shown for each tree, rooted by the branch to the {gamma}1 part. The percentage representation in the PP distribution of trees for the two groups of species defined by each internal branch is indicated in those cases where the value is less than 100 %. Divergence scale for all trees is at lower right.

 
Optimal representation of phylogenetic relationships
The overall consistency in trees obtained from sampling across the genomes gave confidence in the validity of using datasets incorporating alignments from many genes to examine phylogeny. In addition to the 8x12 set described above, we examined sets based on 21 genes for all 12 virus species (the 21x12 set, of 9920 aa residues from all the genes in Table 2 except genes 54–62) and on 28 genes for 11 species (the 28x11 set, lacking PLHV1, of 12383 aa residues from all the genes in Table 2). These were evaluated by ML and BMCMC methods as for the 8x12 set, with closely comparable outcomes (not shown): in each case trees in confidence sets were composed of the standard tree (or its 9-species counterpart for the 28x11 set) with variable loci for BHV4 and MHV4, similar to the results for the 8x12 tree in Fig. 1(b–g). And for both the 21x12 and 28x11 sets the ML result reduced to including the MF clade, as for the 8x12 set. The Bayesian PP confidence sets typically have fewer members than those based on AU or BP figures from ML, and with both the 8x12 and 21x12 sets (but not the 28x11 set) BMCMC gave consensus trees with higher resolution than the quadrifurcated version; Fig. 4(c) and (d) show the relevant parts of the trees, for the species of the MF-clade. However, the additional details are not compatible between the two trees, and while the 8x12 set is the smaller it also contains the genes best conserved across the virus family, so we concluded that these BMCMC analyses were displaying ‘overconfidence’ and should be discounted. We thus regard the quadrifurcated tree as presenting the best attainable depiction of topology.

CODEML was used to compute ML branch lengths for the quadrifurcated topology, for all three datasets. The lengths of corresponding branches in these 21x12 and 28x11 trees were compared to those of the 8x12 tree. In both cases proportions were found to be closely maintained across the trees; relative to the 8x12 tree branch lengths in the 21x12 and 28x11 trees are expanded by overall factors of 1·21 and 1·25, respectively (estimated by regression). Thus the 8x12 set, versions of which have previously been used to examine phylogeny across the Herpesviridae, gives a representation of relationships that is consistent within the Gammaherpesvirinae with that from the largest accessible gene sets.

The trees discussed so far are unrooted. From previous phylogenetic analyses that included data from the Alpha-, Beta- and Gammaherpesvirinae we know that the root for the gammaherpesvirus tree lies between the {gamma}1 and {gamma}2 groups (McGeoch et al., 1995, 2000). When we compared the 8x12 ML quadrifurcated tree (Fig. 4a) with the rooted gammaherpesvirus portion of a ML tree for all three subfamilies based on sequences from seven genes and one partial gene of 20 species (data from McGeoch et al., 2000), we observed a close proportionality between pairs of corresponding branch lengths, as illustrated in Fig. 4(b). This overall conservation in proportions then allowed transfer of the root position to the 8x12 tree with good precision, as indicated in Fig. 4(a). With the addition of the root locus, Fig. 4(a) gives our current optimal representation of gammaherpesvirus phylogeny.

Coevolution of host and gammaherpesvirus lineages
As mentioned, mammalian herpesvirus lineages show extensive signs of apparent coevolution with host lineages. Interpretation on this basis has been least satisfactory for the Gammaherpesvirinae, in part because of uncertainties in the gammaherpesvirus tree (McGeoch et al., 2000; McGeoch, 2001). We revisited this topic in light of our now robust assignment of major lineages in the tree, to examine what correlations of branching pattern and branch proportions could be made between the host and virus trees, guided by criteria of parsimony and generality. We have identified a scheme that provides an economical account of major features of the gammaherpesvirus tree in terms of host–virus coevolution, and is also consistent with features in the trees for alpha- and betaherpesviruses.

Fig. 5(a) depicts a tree for the higher taxa of relevant host groups, with timescale [as millions of years before the present (Ma)] based primarily on Springer et al. (2003), and Fig. 5(b) presents the gammaherpesvirus tree in a format intended to emphasize proposed correspondences with the host tree. We interpret the following as being of co-evolutionary origin: the OWP and NWP {gamma}1 lineage, the ruminant AHV1 and suid PLHV1 lineages, and the perissodactyl EHV2 lineage. Each of these are known to be populated, beyond the few species represented in our analyses, with other viruses that have hosts in the same group (Rovnak et al., 1998; Ehlers et al., 1999, 2003; Kleiboeker et al., 2002). The only portion of the tree in Fig. 4(a) not thus accounted for is the MF clade, and we interpret this as having a non-coevolutionary origin, involving transfer between host species. The OWP and NWP {gamma}2 virus lineages could, however, have arisen in a co-evolutionary mode during subsequent development of the MF clade. Treating the MF clade as non-coevolutionary in origin efficiently rationalises a number of features, namely: the position of its origin in the tree; the existence of two groupings of primate gammaherpesviruses; the occurrence of BHV4 in a separate location from other ruminant viruses; and the non-correspondence between the location of MHV4 and the location in the host tree of the rodent lineage (shown in Fig. 5a as a grey line).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 5. Comparison of host and gammaherpesvirus trees. (a) Shows a tree for major lineages of the Mammalia that appear as hosts in the gammaherpesvirus analyses, with timescale based on Springer et al. (2003) and Kumar & Hedges (1998). The branches for Rodentia and Carnivora, shown as light lines, are secondary in the text discussion. (b) ML molecular clock tree for the 21x12 set, drawn with its branch order and dimensions on the paper to emphasize correspondences with the host tree in (a). Each branch shown in the virus tree is proposed to have a coevolutionary relationship with the matching branch in the host tree, except for the lightly drawn branch of the MF clade (HHV8, etc).

 
We assigned a timescale to this interpretation of the gammaherpesvirus tree by comparison with palaeontological dates for nodes in the mammalian tree. The following dates were taken from Springer et al. (2003): ruminant/pig divergence, 63·8 Ma (for AHV1/PLHV1 node); artiodactyl/perissodactyl divergence, 82·1 Ma (for divergence of AHV1/PLHV1 from EHV2); and primate/ungulate divergence, 94·0 Ma (for divergence of primate {gamma}1 viruses from AHV1/PLHV1/EHV2). Two methods were used with these figures to obtain divergence dates for other nodes in the quadrifurcated tree, for both the 8x12 and 21x12 datasets. In the first, a ML molecular clock version of the tree was computed with CODEML, then a substitution rate for the whole tree was calculated by regression using the host dates and applied to dating other nodes. In the second method, the program r8s was used, which takes as input a rooted tree with branch lengths and smooths differences among rates on different branches. Results for both approaches are summarized in Table 3.


View this table:
[in this window]
[in a new window]
 
Table 3. Date estimates for nodes in the gammaherpesvirus tree

 

   DISCUSSION
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
We employed two approaches for inference of phylogeny from alignments of amino acid sequences, both powerful and state of the art but both with limitations. For ML evaluation of trees, it is not feasible to treat exhaustively the set of possible trees for more than a modest number of OTUs. Modern tests for evaluation of ML data have supplanted the earlier practice of ranking by likelihood alone, and we found that for inferring confidence sets of trees the multi-scaled bootstrap was a more useful criterion than the AU test. With BMCMC there are two classes of reservation. First, it has complex theoretical underpinnings and is still a new method for phylogenetic inference, which may require time to reach a stable best practice; see discussion by Felsenstein (2004). Second, there is a developing literature on whether the method's output PP distributions may present an overconfident view (Suzuki et al., 2002; Alfaro et al., 2003; Douady et al., 2003). It was notable in our analyses that the Bayesian PP confidence sets were typically more focused than those from ML. Thus, apparently high-confidence BMCMC results should be treated with caution – as in our comparison of results for the 8x12 and 21x12 datasets. Nonetheless, application of two distinct methodologies gave valuable added assurance to our analyses.

At the end of these exhaustive analyses, the gammaherpesvirus tree obtained is completely compatible with that from our previous work (McGeoch et al., 2000; McGeoch, 2001), with the addition of four virus species, with assurance of robustness and generality, and with improved insight into underlying complications. Disentangling the phylogeny of the {gamma}2 group presented difficulties, and recognizing that the nodes for HHV8/RRV, HVA/HVS and BHV4 are effectively coincident and that there is a special problem with the locus of MHV4 were significant steps toward understanding relationships. The overall phylogeny appears to be constant across the genomes, but trees based on small sets of genes were generally noisy and the larger datasets were required for optimal resolution. With the overall constancy of tree topology and proportions seen with the large datasets, we are confident that the optimal tree as presented in Fig. 4(a) is well founded. We regard the tree as comprising four major clades: the {gamma}1 group, the AHV1/PLHV1 lineage, the EHV2 lineage and the MF clade. There are limited sequence data available for many other gammaherpesviruses, but we know of none that demonstrably falls outside these clades.

It is clear that MHV4 sequences have been accumulating changes atypically fast. This effect is not seen with another rodent herpesvirus, the betaherpesvirus murine cytomegalovirus (McGeoch et al., 2000). The visible relative rate of change for MHV4 is, of course, an average over the time since the MHV4 lineage diverged from its sister groups. We cannot, with available data, study substitution characteristics of MHV4 DNA directly; two genome sequences for MHV4 are available (Table 1), but they are of the same strain and very close to identical (Nash et al., 2001) so their comparison is not useful. It is likely that the effect has applied across the genome: it is not a matter of selected mutation in one or a few loci, such as seen with the K1 gene of HHV8 (McGeoch, 2001). The effect could result from relaxed stringency in genomic replication, from more frequent cycles of replication than with other viruses of the group, or from larger virus populations (with ongoing recombination). Whatever its mechanism, this phenomenon may reflect some significant underlying difference between MHV4's biology and those of other {gamma}2 herpesviruses.

Our date estimates for the gammaherpesvirus tree based on host–virus coevolution (Table 3) are likely to be more precise for {gamma}2 than for {gamma}1 nodes with the calibration system applied. In the {gamma}1 clade, it has previously been apparent that EBV and RLV may be more closely related than the 23 Ma date expected for cospeciation (Ehlers et al., 2003; Gerner et al., 2004). Conversely, in the {gamma}2 group, HHV8 and RRV are more distant than cospeciational counterparts (Greensill et al., 2000; Schultz et al., 2000). These examples emphasize that the concept of host–herpesvirus coevolution should not be pursued down to the species level. If the coevolution hypothesis is valid for higher taxa and longer timescales, then the estimates indicate a considerable antiquity for development of the MF clade, close to the end of the Cretaceous (65 Ma). We presume that the MF clade originated by transfer of a virus between host species. The clade's immediate ancestor could have been a virus with a perissodactyl or carnivore host (see location of the carnivore lineage in the host tree, shown in Fig. 5a as a grey line). Involvement of a carnivore gammaherpesvirus ancestor would be consistent with our estimates for the development of the MF clade (Table 3) in relation to that for divergence of perissodactyl and carnivore lines (80·4 Ma; Springer et al., 2003).


   ACKNOWLEDGEMENTS
 
We thank M. Goltz and F. Wang for early sight of data and A. Davison for a critical reading of the manuscript. This work was funded by the UK Medical Research Council.


   REFERENCES
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Adachi, J. & Hasegawa, M. (1994). The MOLPHY 2.2 package. Institute of Statistical Mathematics, Tokyo.

Albrecht, J.-C. (2000). Primary structure of the Herpesvirus ateles genome. J Virol 74, 1033–1037.[Abstract/Free Full Text]

Albrecht, J.-C., Nicholas, J., Biller, D. & 8 other authors (1992). Primary structure of the herpesvirus saimiri genome. J Virol 66, 5047–5058.[Abstract]

Alexander, L., Denekamp, L., Knapp, A., Auerbach, M. R., Damania, B. & Desrosiers, R. C. (2000). The primary sequence of rhesus monkey rhadinovirus isolate 26-95: sequence similarities to Kaposi's sarcoma-associated herpesvirus and rhesus monkey rhadinovirus isolate 17577. J Virol 74, 3388–3398.[Abstract/Free Full Text]

Alfaro, M. E., Zoller, S. & Lutzoni, F. (2003). Bayes or bootstrap? A simulation study comparing the performance of Bayesian Markov chain Monte Carlo sampling and bootstrapping in assessing phylogenetic confidence. Mol Biol Evol 20, 255–266.[Abstract/Free Full Text]

Baer, R., Bankier, A. T., Biggin, M. D. & 9 other authors (1984). DNA sequence and expression of the B95-8 Epstein-Barr virus genome. Nature 310, 207–211.[Medline]

Banks, M., King, D. P., Daniells, C., Stagg, D. A. & Gavier-Widen, D. (2002). Partial characterization of a novel gammaherpesvirus isolated from a European badger (Meles meles). J Gen Virol 83, 1325–1330.[Abstract/Free Full Text]

Davison, A. J. (2002). Evolution of the herpesviruses. Vet Microbiol 86, 69–88.[CrossRef][Medline]

de Jesus, O., Smith, P. R., Spender, L. C., Karstegl, C. E., Niller, H. H., Huang, D. & Farrell, P. J. (2003). Updated Epstein–Barr virus (EBV) DNA sequence and analysis of a promoter for the BART (CST, BARF0) RNAs of EBV. J Gen Virol 84, 1443–1450.[Abstract/Free Full Text]

Douady, C. J., Delsuc, F., Boucher, Y., Doolittle, W. F. & Douzery, E. J. P. (2003). Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol Biol Evol 20, 248–254.[Abstract/Free Full Text]

Ehlers, B. & Lowden, S. (2004). Novel herpesviruses of Suidae: indicators for a second genogroup of artiodactyl gammaherpesviruses. J Gen Virol 85, 857–862.[Abstract/Free Full Text]

Ehlers, B., Ulrich, S. & Goltz, M. (1999). Detection of two novel porcine herpesviruses with high similarity to gammaherpesviruses. J Gen Virol 80, 971–978.[Abstract]

Ehlers, B., Ochs, A., Leendertz, F., Goltz, M., Boesch, C. & Matz-Rensing, K. (2003). Novel simian homologues of Epstein-Barr virus. J Virol 77, 10695–10699.[Abstract/Free Full Text]

Ensser, A., Pflanz, R. & Fleckenstein, B. (1997). Primary structure of the alcelaphine herpesvirus 1 genome. J Virol 71, 6517–6525.[Abstract]

Ensser, A., Thurau, M., Wittmann, S. & Fickenscher, H. (2003). The genome of herpesvirus saimiri C488 which is capable of transforming human T cells. Virology 314, 471–487.[CrossRef][Medline]

Felsenstein, J. (1989). PHYLIP – phylogeny inference package (version 3.2). Cladistics 5, 164–166.

Felsenstein, J. (2004). Inferring Phylogenies. Sunderland, Massachusetts: Sinauer Associates.

Gerner, C. S., Dolan, A. & McGeoch, D. J. (2004). Phylogenetic relationships in the Lymphocryptovirus genus of the Gammaherpesvirinae. Virus Res 99, 187–192.[CrossRef][Medline]

Goltz, M., Ericsson, T., Patience, C., Huang, C. A., Noack, S., Sachs, D. H. & Ehlers, B. (2002). Sequence analysis of the genome of porcine lymphotropic herpesvirus 1 and gene expression during posttransplant lymphoproliferative disease of pigs. Virology 294, 383–393.[CrossRef][Medline]

Greensill, J., Sheldon, J. A., Renwick, N. M., Beer, B. E., Norley, S., Goudsmit, J. & Schulz, T. F. (2000). Two distinct gamma-2 herpesviruses in African green monkeys: a second gamma-2 herpesvirus lineage among old world primates? J Virol 74, 1572–1577.[Abstract/Free Full Text]

Jones, D. T., Taylor, W. R. & Thornton, J. M. (1992). The rapid generation of mutation data matrices from protein sequences. CABIOS 8, 275–282.[Medline]

Kleiboeker, S. B., Schommer, S. K., Johnson, P. J., Ehlers, B., Turnquist, S. E., Boucher, M. & Kreeger, J. M. (2002). Association of two newly recognized herpesviruses with interstitial pneumonia in donkeys (Equus asinus). J Vet Diagn Invest 14, 273–280.[Medline]

Kumar, S. & Hedges, S. B. (1998). A molecular timescale for vertebrate evolution. Nature 392, 917–920.[CrossRef][Medline]

Lacoste, V., Mauclere, P., Dubreuil, G., Lewis, J., Georges-Courbot, M. C. & Gessain, A. (2000). KSHV-like herpesviruses in chimps and gorillas. Nature 407, 151–152.[CrossRef][Medline]

McGeoch, D. J. (2001). Molecular evolution of the {gamma}-Herpesvirinae. Philos Trans R Soc Lond B Biol Sci 356, 421–435.[CrossRef][Medline]

McGeoch, D. J. & Cook, S. (1994). Molecular phylogeny of the Alphaherpesvirinae subfamily and a proposed evolutionary timescale. J Mol Biol 238, 9–22.[CrossRef][Medline]

McGeoch, D. J. & Gatherer, D. (2005). Integrating reptilian herpesviruses into the family Herpesviridae. J Virol in press, January 2005.

McGeoch, D. J., Cook, S., Dolan, A., Jamieson, F. E. & Telford, E. A. R. (1995). Molecular phylogeny and evolutionary timescale for the family of mammalian herpesviruses. J Mol Biol 247, 443–458.[CrossRef][Medline]

McGeoch, D. J., Dolan, A. & Ralph, A. C. (2000). Toward a comprehensive phylogeny for mammalian and avian herpesviruses. J Virol 74, 10401–10406.[Abstract/Free Full Text]

Minson, A. C., Davison, A., Eberle, R., Desrosiers, R. C., Fleckenstein, B., McGeoch, D. J., Pellett, P. E., Roizman, B. & Studdert, M. J. (2000). Family Herpesviridae. In Virus Taxonomy, pp. 203–225. Edited by M. H. V. van Regenmortel, C. M. Fauquet, D. H. L. Bishop, E. B. Carstens, M. K. Estes, S. M. Lemon, J. Maniloff, M. A. Mayo, D. J. McGeoch, C. R. Pringle & R. B. Wickner. New York, San Diego: Academic Press.

Nash, A. A., Dutia, B. M., Stewart, J. P. & Davison, A. J. (2001). Natural history of murine {gamma}-herpesvirus infection. Philos Trans R Soc Lond B Biol Sci 356, 569–579.[Medline]

Neipel, F., Albrecht, J. C. & Fleckenstein, B. (1997). Cell-homologous genes in the Kaposi's sarcoma-associated rhadinovirus human herpesvirus 8: determinants of its pathogenicity? J Virol 71, 4187–4192.[Free Full Text]

Rivailler, P., Cho, Y. G. & Wang, F. (2002a). Complete genomic sequence of an Epstein-Barr virus-related herpesvirus naturally infecting a new world primate: a defining point in the evolution of oncogenic lymphocryptoviruses. J Virol 76, 12055–12068.[Abstract/Free Full Text]

Rivailler, P., Jiang, H., Cho, Y. G., Quink, C. & Wang, F. (2002b). Complete nucleotide sequence of the rhesus lymphocryptovirus: genetic validation for an Epstein-Barr virus animal model. J Virol 76, 421–426.[Abstract/Free Full Text]

Ronquist, F. & Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574.[Abstract/Free Full Text]

Rovnak, J., Quackenbush, S. L., Reyes, R. A., Baines, J. D., Parrish, C. R. & Casey, J. W. (1998). Detection of a novel bovine lymphotropic herpesvirus. J Virol 72, 4237–4242.[Abstract/Free Full Text]

Russo, J. J., Bohenzky, R. A., Chien, M.-C. & 8 other authors (1996). Nucleotide sequence of the Kaposi sarcoma-associated herpesvirus (HHV8). Proc Natl Acad Sci U S A 93, 14862–14867.[Abstract/Free Full Text]

Sanderson, M. J. (2003). r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19, 301–302.[Abstract/Free Full Text]

Schultz, E. R., Rankin, G. W., Jr, Blanc, M. P., Raden, C. C., Tsai, C. C. & Rose, T. M. (2000). Characterization of two divergent lineages of macaque rhadinoviruses related to Kaposi's sarcoma-associated herpesvirus. J Virol 74, 4919–4928.[Abstract/Free Full Text]

Searles, R. P., Bergquam, E. P., Axthelm, M. K. & Wong, S. W. (1999). Sequence and genomic analysis of a rhesus macaque rhadinovirus with similarity to Kaposi's sarcoma-associated herpesvirus/human herpesvirus 8. J Virol 73, 3040–3053.[Abstract/Free Full Text]

Shimodaira, H. (2002). An approximately unbiased test of phylogenetic tree selection. Syst Biol 51, 492–508.[CrossRef][Medline]

Shimodaira, H. & Hasegawa, M. (2001). CONSEL: for assessing the confidence of phylogenetric tree selection. Bioinformatics 17, 1246–1247.[Abstract/Free Full Text]

Springer, M. S., Murphy, W. J., Eduardo, E. & O'Brien, S. J. (2003). Placental mammal diversification and the Cretaceous-Tertiary boundary. Proc Natl Acad Sci U S A 100, 1056–1061.[Abstract/Free Full Text]

Suzuki, Y., Glazko, G. V. & Nei, M. (2002). Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc Natl Acad Sci U S A 99, 16138–16143.[Abstract/Free Full Text]

Telford, E. A. R., Watson, M. S., Aird, H. C., Perry, J. & Davison, A. J. (1995). The DNA sequence of equine herpesvirus 2. J Mol Biol 249, 520–528.[CrossRef][Medline]

Virgin, H. W. IV, Latreille P., Wamsley, P., Hallsworth, K., Weck, K. E., Dal Canto, A. J. & Speck, S. H. (1997). Complete sequence and genomic analysis of murine gammaherpesvirus 68. J Virol 71, 5894–5904.[Abstract]

Yang, Z. (1997). PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13, 555–556.[Medline]

Zimmermann, W., Broll, H., Ehlers, B., Buhk, H. J., Rosenthal, A. & Goltz, M. (2001). Genome sequence of bovine herpesvirus 4, a bovine Rhadinovirus, and identification of an origin of DNA replication. J Virol 75, 1186–1194.[Abstract/Free Full Text]

Received 6 September 2004; accepted 1 November 2004.