Institute of Genetics, University of Nottingham, Queens Medical Centre, Nottingham NG7 2UH, UK1
Author for correspondence: Paul Sharp. Fax +44 115 919 4424. e-mail paul{at}evol.nott.ac.uk
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
JCV has a circular double-stranded DNA genome just over 5 kb long. Until recently, the only full-length genome sequences available were those for the prototypic (Type 1) strain Mad-1 (Frisque et al., 1984 ), and GS/B and GS/K (Loeber & Dörries, 1988
), the latter being two very similar Type 2 strains from a single individual. Most diversity studies have focused on a particular short region of the genome, about 610 bp in length, encompassing the convergent 3' ends of the VP1 and large T antigen genes, and 69 bp of intergenic sequence. However, recently H. T. Agostini, G. L. Stoner and colleagues have determined full-length genome sequences from a variety of strains (Agostini et al., 1996
, 1997a
, 1998a
, b
), and used these to investigate the phylogeny of JCV. Their most recent analysis, of 22 complete JCV genomes, resulted in some very surprising conclusions (Jobes et al., 1998
). Three different phylogenetic algorithms were used: the unweighted pair group method with arithmetic means (UPGMA), the neighbour-joining method (NJ) and the maximum parsimony method (MP). Apparently very different results were obtained from the three methods. For example, a clade comprising JCV Type 1 and Type 4 strains was found (i) to be the earliest diverging lineage using UPGMA, (ii) to be the most recently derived lineage using NJ, and (iii) to span the root of the tree, so that all other strains were derived from within Type 1, using MP. While it is well known that alternative methods can produce different evolutionary trees (Li, 1997
), this degree of discrepancy among results seems unprecedented, and potentially undermines any conclusions that might be drawn from phylogenetic studies of JCV.
Another apparent inconsistency among the results reported by Jobes et al. (1998) concerned the phylogenetic position of strain 402 (representing Type 4). It was stated that the MP method separated 402 from the Type 1 clade, whereas NJ and UPGMA did not. Since it had previously been suggested that strain 402 is most likely a recombinant of Types 1 and 3 (Agostini et al., 1996
), Jobes et al. (1998)
concluded that the MP method was effective in detecting this mosaicism, whereas the UPGMA and NJ methods were not. However, the phylogenetic analyses were performed on a single complete genome sequence alignment, and none of these three methods are expected to be able to distinguish recombinant sequences because they take no account of the relative positions of variable sites within the alignment. Indeed, if it is suspected that a dataset contains mosaic sequences, phylogenetic analyses should be performed using nonmosaic subregions from the alignment.
In the light of these curious results, we have analysed the diversity and evolutionary relationships among full-length JCV sequences. A variety of methods was used to search for any recombinant sequences. These revealed one, previously unidentified, strain (X01) that is clearly a mosaic of sequences from different major Types, but we found no substantive evidence suggesting that strain 402 is a recombinant. Importantly, our results on the phylogeny of JCV strains directly contradict those of the earlier study (Jobes et al., 1998 ) in many respects.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Comparative analyses.
DNA sequences were aligned using CLUSTAL W (Thompson et al., 1994 ). The extent of variation among sequences was examined in diversity plots. The observed sequence difference, between one sequence and a variety of others, was calculated for windows of 500 sites, moved in steps of 50. Numbers of nonsynonymous and synonymous nucleotide substitutions per site, with correction for multiple hits, were estimated by the method of Li (1993)
.
Phylogenetic relationships among nucleotide sequences were estimated by the unweighted pair group method with arithmetic means (UPGMA), and the neighbour-joining (NJ), maximum parsimony (MP) and maximum likelihood (ML) methods (see Li, 1997 ). The UPGMA and NJ methods were applied to distances between pairs of sequences estimated by Kimuras 2-parameter method (Kimura, 1980
). The NJ method (Saitou & Nei, 1987
) was implemented using CLUSTAL W. To assess the reliability of branching orders within the phylogenetic trees obtained, 10000 bootstrap replicates (Felsenstein, 1985
) were performed. The UPGMA, MP and maximum likelihood methods were implemented using the NEIGHBOR, DNAPARS and DNAML programs from the PHYLIP package (Felsenstein, 1992
).
Recombinant analyses.
A number of approaches were used to identify any putative recombinant sequences, and localize breakpoints within them. First, NJ phylogenetic analyses were performed for nine segments of 1000 sites, moved in steps of 500 along the alignment. Sequences with significantly discordant positions in different phylogenies are potential recombinants. Second, a phylogenetic profile was calculated and plotted using the algorithm of Weiller (1998) . In this method, the distances from one sequence to each of the others are calculated for windows along the alignment, and a correlation coefficient is calculated between the vectors of distances for two adjacent windows. This is performed for numerous pairs of adjacent windows, and the values plotted against position in the alignment. Here a window size of 100 bp was used, moved in steps of 1 bp. Similar profiles are calculated for each sequence, and can all be plotted on a single diagram. Unusually low correlation coefficients identify potential recombinant sequences.
Third, after putative recombinant sequences had been identified, a compatibility matrix (Jakobsen & Easteal, 1996 ) was used to examine the distribution of phylogenetically informative sites supporting the placement of a mosaic sequence within alternative clades. Informative sites are plotted along the two axes of the matrix, and the cells within the matrix are coloured white if the two sites concur with respect to the phylogenetic partition among the sequences, or black if they do not. Blocks of sequence with discordant phylogenetic histories, implying recombination, are then readily visually identified. Finally, four sequence informative site breakpoint analysis was performed using the maximum chi-square approach (Robertson et al., 1995
). The heterogeneity between the distribution of informative sites supporting two alternative phylogenies, on either side of a breakpoint within the alignment, was assessed by a 2x2 chi-square. This was calculated for all possible breakpoints along the alignment, and that yielding the highest chi-square value was retained. The significance of the chi-square values was assessed by permutation tests in which the same set of informative sites were randomly shuffled.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Analysis of recombinant sequences
It has been suggested that at least one JCV sequence (strain 402) may be a recombinant (Agostini et al., 1996 ; Jobes et al., 1998
). Therefore, before examining the phylogenetic relationships among JCV strains based on full-length sequences, we looked for the presence of mosaic sequences within the alignment. Exploratory analyses were performed using the phylogenetic profile method (Weiller, 1998
), in which the distance from one sequence to each of the others is calculated for numerous windows through the alignment, and the correlation of these distances between adjacent windows is plotted; a mosaic sequence is expected to show a low correlation between windows spanning the breakpoint between sequences of divergent origins. The profile revealed high correlation values across most of the alignment for all but one of the sequences (Fig. 2
). The exceptional regions were the ends of the alignment, where (as expected; Weiller, 1998
), the values for all sequences were low. The region with low correlations at the left end was quite extended, due to low sequence divergence in that area (Fig. 1
). The exceptional sequence was strain X01, where the plot dipped to very low values in the region around site 2100 (Fig. 2
). Plots for other sequences did not dip in this region, strongly suggesting that the X01 sequence has a recombinant origin.
|
The breakpoint for this recombination was mapped by examining the distribution of phylogenetically informative sites in a four sequence alignment containing the putative recombinant, representatives of the two parental lineages (i.e. GS/B and 601) and an outgroup (Robertson et al., 1995 ); here we used Mad-1 (Type 1). A compatibility matrix (Jakobsen & Easteal, 1996
) clearly illustrated the nonrandom distribution of sites placing X01 with GS/B or with 601 (Fig. 3
). Among the 40 informative sites, 16 adjacent sites in one half of the genome (from 670 to 2087) were all mutually compatible, as were 20 adjacent sites in the other half (from 2249 to 4765); however, these two blocks were mutually incompatible. Maximum chi-square analysis (Robertson et al., 1995
) placed the most likely breakpoint between sites 2087 (where X01 and 601 are both G, while all other sequences are T) and 2111 (where X01 and all Type 2 sequences are C, while 601 and all Type 1 sequences are T); the distribution of informative sites around this breakpoint was extremely nonrandom (P<10-5). This location leaves one contradictory informative site (at 2246) within the 3' region, whereas if the breakpoint were located after site 2246 there would be two contradictory informative sites within the immediate 5' region (Fig. 3
). There is also a single contradictory informative site (182) at the extreme 5' end of the alignment. Recalling that the genome is circular, there must be two breakpoints in X01; the second could be between sites 182 and 670, or within the (unavailable) noncoding region linking the two ends of the alignment.
|
|
|
The phylogeny obtained by NJ analysis is shown in Fig. 6. Within this tree the major Types represented by multiple strains (i.e. Types 1, 2 and 3) each formed monophyletic groups supported by high bootstrap values. ML analysis gave an identical topology. MP analysis produced four equally parsimonious trees, differing from each other and from the NJ tree only with respect to the branching order within the Type 2A clade. Thus, these three approaches yielded consistent results, since the branching order within Type 2A was not clearly resolved by any method. In contrast, the UPGMA tree differed more substantially, in clustering strain 230 with the Type 2B strains rather than the Type 2A strains. The UPGMA method assumes a constant molecular clock, and this can lead to errors in phylogenetic reconstruction when evolutionary rates vary among lineages. Therefore, the UPGMA tree is probably the least reliable, although it should be noted that this difference between the UPGMA results and those from the other three methods was minor compared to the apparent variation reported by Jobes et al. (1998)
.
|
Apart from the different locations of the root, our NJ tree differed from that of Jobes et al. (1998) with respect to the position of the Type 2B clade. Whereas all Type 2 strains formed a clear clade (supported in 92% of bootstraps) in Fig. 6
, the previous analysis (after re-positioning the root) placed Type 2B strains outside a clade comprising the other Type 2 strains, Type 3 strains and Tai-3. We repeated the NJ analysis including X01, and found a similar result to that of Jobes et al. (1998)
, indicating that their anomalous position for the Type 2B lineage was a consequence of including the mosaic X01 sequence in the analysis. Thus, the inclusion of a recombinant sequence not only gave a false impression of the phylogenetic position of that sequence, but also distorted the positions of other sequences.
Nucleotide substitution rates among JCV strains
To characterize the rate and pattern of divergence among JCV strains in more detail, we estimated separately the numbers of synonymous (KS) and nonsynonymous (KA) substitutions per site for each of the six genes, compared among various strains (Table 1). Normally, synonymous substitutions are expected to be effectively neutral, and so reflect the underlying rate of mutation. Then KS values are expected to be similar among different genes, and the ratio of KA/KS reflects the action of natural selection on nonsynonymous mutations. For JCV the KA/KS ratio for the agnoprotein gene was unusually high, but this was due to low KS values rather than high KA values. This may indicate that silent sites within the agnoprotein gene are under additional constraint. It has been suggested that the homologous region in the polyomavirus simian virus (SV)40 contains attenuator sequences (Goldring et al., 1992
). However, there may also be a sampling effect here, as the agnogene is very short; similar sized regions with no synonymous variations among strains were seen within other genes. The very low KA/KS values for the genes encoding VP1 and the T antigens imply strong constraint on the sequences of those proteins.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
It was suggested by Jobes et al. (1998) that one reason for the differences among the results of different analyses was because one sequence (strain 402) was a recombinant, and that one method (MP) was more useful than others in detecting that mosaicism. However, we found no substantial evidence that strain 402 is recombinant. Again, the apparent differences in the position of strain 402 were entirely due to the different rootings of the trees from different methods. Furthermore, none of the methods previously employed would be expected to distinguish recombinant sequences.
We used a variety of methods to detect recombinant viruses, and found that one strain, X01, has a clearly mosaic sequence. Jobes et al. (1998) renamed that strain as 501, because they considered it to be the prototype of a new genotype, Type 5. Our results indicate that X01 is a mosaic of sequences from Types 2 and 6, and that therefore there is no basis for (this) Type 5, and in light of this it seems more appropriate to retain the label X01 for this strain. Since this was the only recombinant strain detected, the question arises as to where and how the mosaic sequence of X01 arose. Almost half of the X01 genome sequence is identical to a sequence (GS/B) from Germany (Loeber & Dörries, 1988
), while the other half is most similar to strain 601 from an African American (Jobes et al., 1998
); strain X01 was obtained from an American with European parents (Agostini et al., 1998b
). The identity between part of the X01 sequence and GS/B, compared to a difference (albeit small) between two sequences (GS/B and GS/K) obtained from a single PML patient, might point to X01 being a mosaic sequence arising from some laboratory artefact. However, there are other instances of extremely closely related sequences from distinct origins, such as strains 228 and 229, from a native American in New Mexico, and a European American in California, respectively (Agostini et al., 1998b
). Also, the relatively slow rate of evolution of JCV (see below) implies that identical sequences might be found among viruses that have been separated for thousands of years. However the X01 sequence was generated, it was important to identify it as mosaic because including it in the phylogenetic analysis distorted the phylogenetic positions of other sequences.
Our analysis of full-length genome sequences, excluding the recombinant sequence, should provide the best basis on which to classify strains into Types and subtypes. Three Types (13) represented by multiple strains were clearly distinguished, although Types 2 and 3 are much closer to each other than to Type 1 (Fig. 6). The single representative of Type 4, strain 402, is no more divergent from Type 1 strains than strains within Type 2 are from each other; for consistency, it seems that strain 402 is better classified within Type 1. The single representative of Type 6, strain 601, is highly divergent from other subtypes, and so warrants being placed in a separate genotype. Strain Tai-3 has been tentatively designated as Type 7 (Jobes et al., 1998
). Here Tai-3 was found to lie outside the clade currently defined as Type 2, but even closer to Type 2 than is Type 3 (Fig. 6
); it could be designated as a distinct Type, but (in the absence of Types 4 and 5) the numbering is moot. A number of Type 2 subtypes have been proposed (Agostini et al., 1998b
; Jobes et al., 1998
). There seems no good basis for separating strains 228 and 229 (previously termed Type 2C) from other Type 2A strains, but strain 230 (previously termed Type 2D) is phylogenetically distinct from Types 2A and 2B (Fig. 6
).
It has been suggested that the various clades of JCV have co-evolved with human populations (Sugimoto et al., 1997 ; Agostini et al., 1997b
, 1998a
; Guo et al., 1998
). If it can be assumed that the major genotypes of JCV diverged when human populations migrated, then the rate of synonymous substitution in JCV has been around 4x10-7 substitutions per site per year. This is around ten times faster than previously suggested for primate polyomaviruses (Yasunaga & Miyata, 1982
), possibly reflecting saturation of substitutions in the earlier analysis which compared much more divergent viruses (BKV and SV40). Our rate estimate is about four orders of magnitude lower than in viruses using RNA-dependent polymerases for replication, such as influenza viruses (Fitch et al., 1991
) or the retrovirus HIV-1 (Li et al., 1988
), but about two orders of magnitude higher than synonymous substitution rates in host (primate) nuclear genes (Li et al., 1987
). This rate estimate for JCV also appears to be higher than that for another group of DNA viruses, the alphaherpesviruses. McGeoch et al. (1995)
estimated the common ancestor of herpes simplex viruses 1 and 2 (HSV-1 and HSV-2) at 8·5 Myr ago, while Dolan et al. (1998)
calculated the average KS between HSV-1 and HSV-2 as 0·47 substitutions per site: that would imply a rate of 2·8x10-8 substitutions per site per year. Thus DNA viruses appear to have comparatively slow evolutionary rates. Presumably the precise rate depends on the numbers of rounds of replication the viruses undergo, per unit time.
![]() |
Acknowledgments |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Agostini, H. T., Ryschkewitsch, C. F., Brubaker, G. R., Shao, J. & Stoner, G. L. (1997a). Five complete genomes of JC virus Type 3 from Africans and African Americans.Archives of Virology 142, 637-655.[Medline]
Agostini, H. T., Yanagihara, R., Davis, V., Ryschkewitsch, C. F. & Stoner, G. L. (1997b). Asian genotypes of JC virus in native Americans and in a Pacific island population: markers of viral evolution and human migration.Proceedings of the National Academy of Sciences, USA 94, 14542-14546.
Agostini, H. T., Ryschkewitsch, C. F. & Stoner, G. L. (1998a). JC virus Type 1 has multiple subtypes: three new complete genomes.Journal of General Virology 79, 801-805.[Abstract]
Agostini, H. T., Shishido-Hara, Y., Baumhefner, R. W., Singer, E. J., Ryschkewitsch, C. F. & Stoner, G. L. (1998b). JC virus type 2: definition of subtypes based on DNA sequence analysis of ten complete genomes.Journal of General Virology 79, 1143-1151.[Abstract]
Dolan, A., Jamieson, F. E., Cunningham, C., Barnett, B. C. & McGeoch, D. J. (1998). The genome sequence of herpes simplex virus type 2.Journal of Virology 72, 2010-2021.
Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap.Evolution 39, 783-791.
Felsenstein, J. (1992). PHYLIP (Phylogeny Inference Package), version 3.5c. Department of Genetics, University of Washington, Seattle, WA, USA.
Fitch, W. M., Leiter, J. M. E., Li, X. & Palese, P. (1991). Positive Darwinian evolution in human influenza A viruses.Proceedings of the National Academy of Sciences, USA 88, 4270-4274.[Abstract]
Frisque, R. J., Bream, G. L. & Canella, M. T. (1984). Human polyomavirus JC virus genome.Journal of Virology 51, 458-469.[Medline]
Goldring, N. B., Kessler, M. & Aloni, Y. (1992). Parameters affecting the elongation block by RNA polymerase II at the SV40 attenuator-1 in vitro.Biochemistry 31, 8369-8376.[Medline]
Gouy, M., Gautier, C., Attimonelli, M., Lanave, C. & Di Paola, G. (1985). ACNUC a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage.CABIOS 1, 167-172.[Abstract]
Guo, J., Kitamura, T., Ebihara, H., Sugimoto, C., Kunitake, T., Takehisa, J., Na, Y. Q., Al-Ahdal, M. N., Hallin, A., Kawabe, K., Taguchi, F. & Yogo, Y. (1996). Geographical distribution of the human polyomavirus JC virus types A and B and isolation of a new type from Ghana.Journal of General Virology 77, 919-927.[Abstract]
Guo, J., Sugimoto, C., Kitamura, T., Ebihara, H., Kato, A., Guo, Z., Liu, J., Zheng, S. P., Wang, Y. L., Na, Y. Q., Suzuki, M., Taguchi, F. & Yogo, Y. (1998). Four geographically distinct genotypes of JC virus are prevalent in China and Mongolia: implications for the racial composition of modern China.Journal of General Virology 79, 2499-2505.[Abstract]
Jakobsen, I. B. & Easteal, S. (1996). A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences.CABIOS 12, 291-295.[Abstract]
Jobes, D. V., Chima, S. C., Ryschkewitsch, C. F. & Stoner, G. L. (1998). Phylogenetic analysis of 22 complete genomes of the human polyomavirus JC virus.Journal of General Virology 79, 2491-2498.[Abstract]
Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences.Journal of Molecular Evolution 16, 111-120.[Medline]
Li, W.-H. (1993). Unbiased estimation of the rates of synonymous and nonsynonymous substitution.Journal of Molecular Evolution 36, 96-99.[Medline]
Li, W.-H. (1997). Molecular Evolution. Sunderland, MA: Sinauer Associates.
Li, W.-H., Tanimura, M. & Sharp, P. M. (1987). An evaluation of the molecular clock hypothesis using mammalian DNA sequences.Journal of Molecular Evolution 25, 330-342.[Medline]
Li, W.-H., Tanimura, M. & Sharp, P. M. (1988). Rates and dates of divergence between AIDS virus nucleotide sequences.Molecular Biology and Evolution 5, 313-330.[Abstract]
Loeber, G. & Dörries, K. (1988). DNA rearrangements in organ-specific variants of polyomavirus JC strain GS.Journal of Virology 62, 1730-1735.[Medline]
McGeoch, D. J., Cook, S., Dolan, A., Jamieson, F. E. & Telford, E. A. R. (1995). Molecular phylogeny and evolutionary timescale for the family of mammalian herpesviruses.Journal of Molecular Biology 247, 443-458.[Medline]
Padgett, B. L., Walker, D. L., Zurhein, G. M., Eckroad, R. J. & Dessel, B. H. (1971). Cultivation of papova-like virus from human brain with progressive multifocal leukoencephalopathy. Lancet i, 12571260.
Robertson, D. L., Hahn, B. H. & Sharp, P. M. (1995). Recombination in AIDS viruses.Journal of Molecular Evolution 40, 249-259.[Medline]
Saitou, N. & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing evolutionary trees.Molecular Biology and Evolution 4, 406-425.[Abstract]
Shah, K. V. (1990). Polyomaviruses. In Virology, pp. 1609-1623. Edited by B. N. Fields & D. M. Knipe. New York: Raven Press.
Soeda, E., Maruyama, T., Arrand, J. R. & Griffin, B. E. (1980). Host-dependent evolution of three papova viruses.Nature 285, 165-167.[Medline]
Sugimoto, C., Kitamura, T., Guo, J., Al-Ahdal, M. N., Shchelkunov, S. N., Otova, B., Ondrejka, P., Chollet, J.-Y., El-Safi, S., Ettayebi, M., Gresenguet, G., Kocagoz, T., Chaiyarasamee, S., Thant, K. Z., Thein, S., Moe, K., Kobayashi, N., Taguchi, F. & Yogo, Y. (1997). Typing of urinary JC virus DNA offers a novel means of tracing human migrations.Proceedings of the National Academy of Sciences, USA 94, 9191-9196.
Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.Nucleic Acids Research 22, 4673-4680.[Abstract]
von Haeseler, A., Sajantila, A. & Pääbo, S. (1996). The genetical archaeology of the human genome.Nature Genetics 14, 135-140.[Medline]
Weiller, G. F. (1998). Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences.Molecular Biology and Evolution 15, 326-335.[Abstract]
Yasunaga, T. & Miyata, T. (1982). Evolutionary changes of nucleotide sequences of papova viruses BKV and SV40: they are possibly hybrids.Journal of Molecular Evolution 19, 72-79.[Medline]
Received 16 November 1999;
accepted 31 January 2000.