Tempo and Mode of Human and Simian T-Lymphotropic Virus (HTLV/STLV) Evolution Revealed by Analyses of Full-Genome Sequences

M. Salemi3,, J. Desmyter and A.-M. Vandamme

Rega Institute for Medical Research, KULeuven, Leuven, Belgium


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
We investigated the tempo and mode of evolution of the primate T-lymphotropic viruses (PTLVs). Several different models of nucleotide substitution were tested on a general phylogenetic tree obtained using the 20 full-genome HTLV/STLV sequences available. The likelihood ratio test showed that the Tamura and Nei model with discrete {gamma}-distributed rates among sites is the best-fitting substitution model. The heterogeneity of nucleotide substitution rates along the PTLV genome was further investigated for different genes and at different codon positions (cdp's). Tests of rate constancy showed that different PTLV lineages evolve at different rates when first and second cdp's are considered, but the molecular-clock hypothesis holds for some PTLV lineages when the third cdp is used. Negative selection was evident throughout the genome. However, in the gp46 region, a small fragment subjected to positive selection was identified using a Monte Carlo simulation based on a likelihood method. Employing correlations of the virus divergence times with anthropologically documented migrations of their host, a possible timescale was estimated for each important node of the PTLV tree. The obtained results on these slow-evolving viruses could be used to fill gaps in the historical records of some of the host species. In particular, the HTLV-I/STLV-I history might suggest a simian migration from Asia to Africa not much earlier than 19,500–60,000 years ago.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Human T-lymphotropic virus type 1 (HTLV-I) and type 2 (HTLV-II) were the first two human retroviruses discovered, in 1980 and 1982, respectively (Poiesz et al. 1980Citation ; Kalyanaraman et al. 1982Citation ). HTLV-I is associated with adult T-cell leukemia (Yoshida, Miyoshi, and Hinuma 1982Citation ) and tropical spastic paraparesis/HTLV-I–associated myelopathy (Gessain et al. 1985Citation ; Osame et al. 1987Citation ). HTLV-II's seem to be linked to neurological disorders (Hall et al. 1994Citation ). They are transforming retroviruses and classified in a separate genus together with bovine leukemia virus (BLV). In endemic populations, HTLVs are transmitted mainly from mother to child via breast-feeding and from husband to wife via sexual intercourse. Blood transfusion and intravenous drug abuse have contributed to a more epidemic spread during the last century. Different HTLV-I subtypes have been described: HTLV-Ia, also known as the cosmopolitan subtype, joins strains from different geographic regions (Miura, Fukunaga, and Igarashi 1994Citation ); HTLV-Ib is the so-called Central African subtype (Hahn et al. 1984; Vandamme et al. 1994Citation ); HTLV-Ic is a very divergent subtype, isolated from aboriginals living in Papua New Guinea and Australia, and is called the Melanesian subtype (Gessain et al. 1991Citation ; Saksena et al. 1992Citation ); HTLV-Id and HTLV-Ie were recently described as distinct molecular subtypes isolated from Cameroonian pygmies and from a Mbuti Efe pygmy in Congo, respectively (Mahieux et al. 1997Citation ; Salemi et al. 1998bCitation ); HTLV-If is a new African subtype isolated from a Gabonese infected individual (Salemi et al. 1998bCitation ). Two main subtypes of HTLV-II, HTLV-IIa and HTLV-IIb, were described (Hall et al. 1992Citation ). Both were found in a number of native Amerindians (Biggar et al. 1996Citation ; Ferrer et al. 1996Citation ; Maloney et al. 1992Citation ; Hjelle et al. 1993Citation ) and also among injecting drug users (IDUs) in the United States, Europe, and Vietnam (Lee et al. 1989Citation ; Zella et al. 1990Citation ; Fukushima et al. 1998Citation ). HTLV-II infection was also reported in African pygmies (Goubau et al. 1992Citation ), and a highly divergent new subtype, HTLV-IId, was recently isolated from an Efe pygmy individual (Vandamme et al. 1998Citation ).

In contrast with the human immunodeficiency virus (HIV), the third human retrovirus known, HTLV-I and HTLV-II are poorly replicative, with a remarkably stable genome. It has been estimated that the evolutionary rate of HTLV-II in IDUs is around 10-4/10-5 nucleotide substitutions per site per year (Salemi et al. 1998aCitation ), which is one of the lowest evolutionary rates reported for a retrovirus so far (rates generally range between 10-2 and 10-4 nucleotide substitutions per site per year). The molecular mechanisms of this slow evolution could be explained in part by the observation that in HTLV-infected individuals with high proviral loads, a large part of the provirus is produced by clonal expansion of infected cells and not by viral replication (Wattel et al. 1995Citation ; Cimarelli et al. 1996Citation ).

HTLV-I–related simian viruses, called STLV-I's, have been discovered in several nonhuman primates in Africa and in Asia (Watanabe et al. 1986Citation ; Song et al. 1994Citation ; Ibrahim, De Thé, and Gessain 1995Citation ), and they are also characterized by high genomic stability. African HTLV-I's and STLV-I's cannot be separated into distinct phylogenetic lineages according to their species of origin, but, rather, seem to be related according to the geographic origin of their host (Vandamme, Salemi, and Desmyter 1998Citation ). The term "primate T-lymphotropic virus type I" (PTLV-I) was introduced to describe this group of viruses. Liu et al. (1996)Citation showed that the three human subtypes Ia, Ib, and Ic arose from three geographically distinct simian reservoirs in West and Central Africa and in Indonesia, respectively. The new HTLV-Id, HTLV-Ie, and HTLV-If subtypes also seem to have arisen from recent simian-to-human transmissions in Africa (Mahieux et al. 1998Citation ; Salemi et al. 1998bCitation ). The closest simian relative of HTLV-II, called STLV-II in analogy with HTLV-I/STLV-I, was isolated from African bonobos (Pan paniscus) (Liu et al. 1994Citation ; Giri et al. 1994Citation ). In contrast to HTLV-I/STLV-I, STLV-II clearly lies in a distinct phylogenetic lineage with respect to HTLV-II, suggesting either an ancient interspecies transmission in Africa or a coevolution of STLV-II/HTLV-II with their host species (Vandamme et al. 1996Citation ). Finally, a new simian T-lymphotropic virus, STLV-L, equidistantly related to HTLV-I/STLV-I and HTLV-II/STLV-II, was isolated from an African baboon (Papio hamadryas) (Goubau et al. 1994Citation ; Van Brussel et al. 1997Citation ). A human counterpart of STLV-L is not known. For simplicity, in this paper we will refer to HTLV and STLV strains in general as primate T-lymphotropic viruses (PTLVs).

The evolutionary history of PTLVs is interesting for both theoretical and practical reasons. HTLV-I is an important human pathogen. Phylogenetic analyses of the DNA sequences, using a good model of nucleotide substitution, can be used to answer different epidemiological questions, such as those regarding the global spread of the different genetic subtypes. Moreover, recent reports seem to suggest that STLV infections are more widespread than previously thought; in light of the growing interest in xenotransplantation, the investigation of their origin, evolution, and capacity for interspecies transmission should be considered with great attention. Other questions on the evolution of human and simian retroviruses have a theoretical interest. Which model of nucleotide substitution can better describe their evolution? Is the rate of nucleotide substitution constant among sites? Are the different lineages evolving more or less at a constant evolutionary rate, following the molecular-clock hypothesis? Did HTLVs and STLVs coevolve with their hosts? Could we use phylogenetic relationships among PTLV strains to fill gaps in the historical records of some of the host species?


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
PTLV Nucleotide Sequences
The following 20 complete nucleotide sequences, available from the EMBL and GB databases, were used: seven HTLV-Ia strains (ATK1, J02029; ATL-YS, U19949; BOI, L36905; MT2, L03561; TSP-1, M86840; RKI3, AF042071; HS35, D00294/D13784), one HTLV-Ib strain (EL, S74562/M67514), one HTLV-Ic strain (MEL5, L02534), one STLV-I strain from an Indonesian Macaca tonkeana (TE4, Z46900), two STLV-II strains from African P. paniscus (PanP, U90557; PP1664, Y14570), two HTLV-IIa strains (Mo, M10060; SPWV, Lewis et al., unpublished results), four HTLV-IIb strains (NRA, L20734; G12, L11456; Gu, X89270; GAB, Y13051), and one HTLV-IId strain (Efe2, Y14365), one STLV-L strain from an African P. hamadryas (PH969, Y07616). BLV was used as outgroup for the rooted trees (BLV-CG, K02120).

Sequence Alignments
The 20 PTLV strains were aligned using the GeneWorks software (Intelligenetics, Oxford, England), followed by minimum manual editing. The LTR and proximal pX regions were excluded from the alignment, since DNA dot matrices showed that between HTLV-I, HTLV-II, and STLV-L the similarity in these regions is too low to allow unambiguous alignment. Another alignment, including the bovine leukemia virus (BLV), was obtained using amino acid sequences of the entire gag, pol, and env regions, just excluding the overlapping sequences between gag and pol and between pol and env.

Analysis of the Phylogenetic Signal
The presence of saturation at different codon positions (cdp's) of the PTLV full-genome alignment was tested by comparing the saturation index expected when assuming full saturations with the observed saturation index. Statistical significance was assessed employing a t-test with infinite degrees of freedom. Expected and observed saturation indices were calculated with the program DAMBE (Xia 1999Citation ). The presence of phylogenetic signal in the data set was also investigated with the Hillis and Huelsenbeck (1992)Citation method based on the skewness of tree length distribution. Given a data set, the tree length under a maximum-parsimony criterion for all possible topologies (or a random sample of them) is computed. If there is no phylogenetic signal in the data, the distribution tends to be symmetric. If there is phylogenetic signal, the distribution tends to be (left) skewed (Hillis and Huelsenbeck 1992Citation ). Since about 1020 unrooted possible trees exist for 20 taxa, we estimated the tree length distribution of 1,000,000 random trees for the 20 PTLV strains using the option "Evaluate Random Trees ..." of PAUP*, version 4.0d65, written by David Swofford. Four separate tree length distributions were obtained, employing the first, second, first + second, and third cdp's, respectively, and their skewness was statistically evaluated.

Another way to visualize the presence of phylogenetic noise in a particular data set of aligned sequences is to perform a likelihood mapping analysis investigating groups of four randomly chosen sequences, called quartets (Strimmer and Von Haeseler 1997Citation ). For a quartet, just three unrooted tree topologies are possible. The likelihood of each topology can be estimated with the maximum-likelihood (ML) method, and the three likelihoods can be reported as dots in an equilateral triangle (see fig. 1 ). For N sequences, N!/4! possible quartets exist and the distribution of the dots in the triangle can give an overall impression of the treelikeness of the data. When the N sequences are not clustered, the order of the sequences is not relevant, and the question of which of the possible tree topologies is supported by any cluster is meaningless. Thus, we can distinguish three main different areas in the equilateral triangle (Strimmer and Von Haeseler 1997Citation ; Nieselt-Struve 1998Citation ): (1) the tree corners representing fully resolved tree topologies, i.e., the presence of treelike phylogenetic signal in the data; (2) the center, which is the area of starlike phylogeny, representing phylogenetic noise; and (3) the three areas on the sides, where it is not possible to decide between two different tree topologies, representing netlike phylogeny. The percentage of dots belonging to each area can give an idea about the mode of evolution in the data set under investigation. Likelihood mapping analyses were performed with the program PUZZLE (Strimmer and Von Haeseler 1997Citation ) employing the first, second, and third cdp's, respectively, of the PTLV genes. For each analysis, all 4,845 possible quartets for the 20 PTLV strains were evaluated.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 1.—Unrooted neighbor-joining phylogenetic tree based on the first and second codon positions of the 20 full-genome PTLV strains. Types and subtypes of the viruses are given in bold type. Bootstrap analysis was applied using 1,000 bootstrap samples. Percentages of bootstrap values are reported. In the top right square, bootstrap values within the HTLV-Ia subtype are shown. In the bottom left square, bootstrap values within the HTLV-IIa and HTLV-IIb subtypes are shown

 
Phylogenetic Relationships Among HTLV/STLV Strains
For a first evaluation of the general phylogenetic relationship among the 20 full-genome PTLV sequences, different phylogenetic methods were used and compared: the neighbor-joining (NJ) method, the Fitch and Margoliash (Fitch) method, and the maximum likelihood (ML) method, all implemented in the PHYLIP software package (Felsenstein 1993Citation ). Distances for the NJ and Fitch methods were calculated in accordance with the Kimura two-parameter model (DNADIST program implemented in PHYLIP) with a transition/tranversion bias of 2.0. A transition/tranversion bias of 2.0 was also used for the ML tree (DNAML program implemented in PHYLIP) and for the weighted parsimony (wpars) method, obtained using PAUP*, version 4.0d65. NJ and wpars trees were also obtained employing only the first and second cdp's or only the third cdp's of the 20 full-genome PTLV aligned sequences with the program PAUP* version 4.0d65, with the same substitution model and transition/tranversion ratio. The NJ, Fitch, and wpars trees were statistically evaluated using 1,000 bootstrap samples, whereas P values were calculated for the ML tree (Felsenstein 1993Citation ).

Root of the PTLV Tree
Phylogenetic trees were constructed from BLV-PTLV amino acid alignments, assuming BLV as outgroup, with the NJ, Fitch, and wpars methods. The protein distance matrix for the amino acid alignment was estimated in PUZZLE using the Blosum62 model of amino acid substitution. For each method, 1,000 bootstrap replicates were performed to assess statistical significance. The likelihood mapping method was also used to locate the root of the tree. The 21 strains were divided into four groups: BLV (the CG strain), PTLV-I (including all of the HTLV-I and STLV-I strains), PTLV-II (including all of the HTLV-II and STLV-II strains), and STLV-L (the PH969 strain). The likelihood mapping analysis, implemented in PUZZLE, was performed to evaluate the likelihood of the three possible topologies of a tree joining four different groups of taxa (Strimmer and Von Haeseler 1997Citation ).

Models of Nucleotide Substitution and Likelihood Ratio Test
Employing the tree obtained for the 20 PTLV strains, different parametric models assuming eight discrete categories of {gamma}-distributed rates among sites were evaluated according to the likelihood ratio test (Huelsenbeck and Rannala 1997Citation ): JC69 (Jukes and Cantor 1969Citation ), K80 (Kimura 1980Citation ), F81 (Felsenstein 1981Citation ), HKY85 (Hasegawa, Kishino and Yano 1985Citation ), TN93 (Tamura and Nei 1993Citation ), and REV (Yang 1993Citation ). The program baseml implemented in the PAML, version 1.4, software package (Yang 1997Citation ) was used for the calculations. Successively, more sophisticated models were used to investigate in detail the substitution rate heterogeneity among sites in coding regions of the PTLV genome, as described in the Results section.

The molecular-clock hypothesis was tested on the PTLV tree with the likelihood ratio test for the clock hypothesis implemented in PUZZLE (Strimmer and Von Haeseler 1997Citation ) and the best-fitting nucleotide substitution model. The clock was tested employing only the first and second cdp's or only the third cdp's of each nonoverlapping gene of the PTLV genome.

Test for Positive or Purifying Selection
The numbers of nonsynonymous and synonymous substitutions, indicated (KA and KS, respectively) were computed for PTLV-I and PTLV-II strains separately. A KA/KS ratio significantly lower than 1 indicates the presence of purifying selection, whereas ratios greater than 1 indicate positive selection. KA and KS values were estimated for the gag, pro, pol, env, and tax regions separately by comparing each of the 10 PTLV-I strains with one another using the method of Nei and Gojobori (1986) implemented in the program MEGA, and their averages were used to compute KA/KS ratios for each coding region. The same was done for the nine PTLV-II strains. The presence of selection across the PTLV genome (using only first and second cdp's) was also tested with the program PLATO (Grassly and Rambaut 1998Citation ). A sliding window of 5 nt looks for regions of the alignment which do not fit with a global (null) hypothesis of neutral evolution, represented by the ML tree calculated assuming constant nucleotide substitution rates across sites. When a substitution model and a phylogeny are specified for an alignment, PLATO calculates the likelihood of this null hypothesis for each site along the alignment. Those regions of the alignment which have the lowest average likelihoods are then tested for significant departure from the null hypothesis using Monte Carlo simulation. Significance indicates failure of the null model to explain the observed data and, depending on the null model, can indicate the importance of recombination or selection (Grassly and Rambaut 1998Citation ). A distribution of 1,000 simulated DNA data sets (option -r1000 in PLATO) for the Monte Carlo procedure was used.

Timescale of PTLV Evolution
To estimate divergence times in cases of rejection of the molecular clock, we used the method of Li et al. (1987)Citation . Consider three homologous sequences with unequal evolutionary rates: a and b, which diverged T1 years ago, and c, which diverged from a and b T2 years ago (T2 > T1), having nucleotide distances kab, kac, and kbc, respectively, and the nucleotide distances kaR, kbR, and kcR between an outgroup R and each of the three sequences. Under a "perfect" molecular clock, we should have:

Since the clock does not hold, equation (1) is not satisfied, and Li et al. (1987)Citation suggested estimating T1 knowing T2, or vice versa, using

when kaR - kcR < kbR - kcR and

when kbR - kcR < kaR - kcR.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Analysis of Phylogenetic Signal
The left part of table 1 shows the expected saturation indices assuming full saturations and the observed ones at different cdp's of the PTLV genome. While first and second cdp's do not show evidence of saturation, saturation does occur at third cdp's when the 20 PTLV strains are considered. The presence of phylogenetic signal at different cdp's was also investigated. The right part of table 1 shows the g1 skewness values for the tree length distribution of each individual cdp. We used tabulated 95% and 99% values of skewness test statistic g1 from Hillis and Huelsenbeck (1992)Citation to assess statistical significance. The tree distribution of first, second, and third cdp's demonstrate strong left skew, implying that not only first and second, but also third cdp's in the data set are phylogenetically informative. The fact that third cdp's, as well as first and second positions, retain treelike phylogenetic information is confirmed by the likelihood mapping analysis (see table 2 ). Employing the first, second, and third cdp's in separate analyses, only 4.8%, 9.5%, and 1.3% of the dots, respectively, were found in the starlike region of the likelihood mapping triangle (see table 2 ). This probably reflects that saturation at third cdp's only occurs at the deepest branch, which is the origin of PTLV, leaving the clustering at the PTLV-I part or at the PTLV-II/STLV-L part unaffected.


View this table:
[in this window]
[in a new window]
 
Table 1 Saturation Indices and g1 Skewness Values of the Tree Length Distributions for Different Codon Positions of the 20 Full-Genome PTLV Strains

 

View this table:
[in this window]
[in a new window]
 
Table 2 Likelihood Mapping Analysis of the 4,845 Possible Quartets (Cluster of Four Sequences) of the 20 Full-Genome PTLV Strains

 
Topology and Root of the HTLV/STLV Tree
In figure 1 , the NJ tree based on first and second third cdp's joining the 20 PTLV strains is shown. As expected, the tree can be divided into two parts: a primate T-lymphotropic virus type I (PTLV-I) part, joining all the HTLV-I strains together with the STLV-I strain isolated from a macaque in Indonesia, and a primate T-lymphotropic virus type II (PTLV-II) part, joining the HTLV-II strains together with the STLV-II strains from African pygmy chimpanzees. Among HTLV-I's and HTLV-II's, the different subtypes can be further distinguished. STLV-L lies between PTLV-I and PTLV-II. All of the other tree-building methods used, either employing full-genome sequences or third-cdp or amino acid sequences (see Materials and Methods), gave exactly the same topology, still supported by high bootstrap values (data not shown). We assumed the tree to be a reliable representation of the relationships among PTLV strains, and we used it to test different models of nucleotide substitutions.

We used BLV as outgroup in an attempt to determine the root of the general PTLV tree reported in figure 1 . In figure 2 , the likelihood mapping method using the amino acid alignment of the BLV/PTLV sequences (see Materials and Methods) is shown. 85.6% of quartets support the clustering between STLV-L and the PTLV-II strains. Except for the low bootstrap support with the parsimony method for protein (Protpars) implemented in PAUP* (see fig. 2 ), the STLV-L clustering with PTLV-II was also present in the NJ, Fitch, and ML trees (data not shown) supported by robust bootstrap values (see fig. 2 ). In conclusion, the root of the PTLV tree can be placed on the branch leading to PTLV-I.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 2.—Likelihood mapping analysis of the 20 full-genome PTLV strains and a BLVstrain to determine the root of the tree, using the entire gag-pol-env amino acid sequences (see Materials and Methods). The BLV-PTLV strains were assigned to four clusters according to their origins: the PTLV-I cluster, the PTLV-II cluster, a cluster containing STLV-L, and a cluster containing the BLV strain. Each corner in the triangle of the likelihood mapping represents one of the three possible tree topologies for the four clusters. In total, 90 sets of four sequences can be generated by drawing one sequence from each of the four clusters. For each set, the likelihoods of the three possible tree topologies were calculated and represented as dots inside the triangle. The closer the dot to one of the corners, the higher the likelihood of the tree represented by that corner, while dots in the center represent phylogenetic noise, possibly due to starlike evolution

 
Nucleotide Substitution Model
Analyses of real data suggest that while different models of nucleotide substitutions have often led to drastic changes in the likelihood of a tree or in the distance matrix of the taxa, they have only minor effects on the tree topology (Yang 1994a, 1994bCitation ). Table 3 shows likelihoods and numbers of parameters estimated on the PTLV tree under each nucleotide substitution model described in Materials and Methods. Since all three cdp's are equally informative from the phylogenetic point of view (see previous section), we used the full-genome alignment including all cpd's for the analysis but assuming a discrete {gamma} distribution of rates (eight categories) across sites to take into account that different nucleotide substitution rates can occur at different cpd's and in different genes. The most complex REV model shows the highest likelihood, but the difference between the likelihoods for TN93, and REV is not significant in a {chi}2 test with three degrees of freedom. On the other hand, the likelihood of the TN93 model is significantly better compared with the likelihoods of the JC69, K80, F81, and HKY85 models, respectively (see table 2 ). When a difference is not significant, the simpler model (TN93) is preferred. The estimate of the {alpha} shape parameter for the {gamma} distribution of rates is always <1, indicating a strong rate heterogeneity across sites, even though simpler models tend to overestimate {alpha} (see table 2 ).


View this table:
[in this window]
[in a new window]
 
Table 3 Likelihood Ratio Tests for Different Models of Nucleotide Substitution for the PTLV Tree

 
Relative Nucleotide Substitution Rates Across Sites
It is known that substitution rates are usually different at different cdp's and in different genes (Li 1997Citation ). A {gamma} distribution of rates across sites of sequences coding for different genes can be considered just a rough description of this phenomenon. To investigate rate heterogeneity across sites in further detail, we grouped sites in the PTLV genome into 12 classes, each one representing a different cdp in gag, pro, pol, and env genes, respectively. On the other hand, the coding region of the regulatory protein Rex overlaps with the one of the 5' end of Tax, so that the first cdp of Tax corresponds to the second cdp of Rex, and so on. We thus defined six additional classes of sites, three representing the overlapping cdp's of Tax and Rex and three representing the first, second, and third nonoverlapping cdp's of the 3' end of Tax. Since Tax and Rex proteins have different lengths in different PTLV lineages, only the shortest common fragment of both was used in the comparison. In the previous section, TN93 was chosen as the best-fitting nucleotide substitution model for the PTLV genome. We thus investigated the likelihood of the PTLV tree under a TN93 model with 18 different classes of sites allowing different substitution rates, different nucleotide frequencies, different transition/transversion ratios, and different pyrimidine transition/purine transition ratios at each class. We called this model TN93C. In total, 90 parameters need to be estimated with ML under the TN93C model (37 branch lengths and 53 rate parameters), whereas the mean of nucleotide frequencies for each class of sites is estimated from the observed frequencies in the alignment. Heterogeneity across the different classes of sites was ignored, since an even more complex model, with more unknown parameters to be estimated, would increase the standard error of the estimates without substantially improving the results (data not shown). The logarithm likelihood of the PTLV tree under TN93C model was -34,301.01 which is significantly better than the one under the simpler TN93 model with {gamma}-distributed rates across sites (-36,393.83, see table 2 ).

The rate parameters estimated for each class of sites are reported in table 4 . The first cdp of gag was chosen as a reference (substitution rate = 1); relative rates for the other classes of sites are shown in the table. In each gene, the first cpd tends to change about two times as fast as the second cpd, whereas the third cpd changes about eight times as fast as the first one. On average, among the nonoverlapping genes, pro shows the highest nucleotide substitution rate and tax shows the lowest (see table 4 ). In the tax/rex overlapping region, sites relative to different tax and rex cdp's tend to change more slowly with respect to cdp's in other genes. One exception is the second cdp of rex, which is also the third cdp of tax, changing around five times as fast as the second cdp's of other genes (see table 4 ). Finally, different PTLV genes generally show a strong transition transversion bias at third cdp's, whereas purine transitions occur about two times more often than pyrimidine transitions at second cdp's (data not shown).


View this table:
[in this window]
[in a new window]
 
Table 4 Relative Nucleotide Substitution Rates at Different Codon Positions (cdp's) for Different Genes of the PTLV Genome

 
Molecular Clock
The molecular-clock hypothesis for the PTLV lineages was tested for synonymous and nonsynonymous sites considering the first plus second cdp's and the third cdp's separately. Again, the TN93 model was chosen as the best-fitting substitution model describing the evolutionary patterns of PTLVs. Branch lengths of the tree in figure 1 , with the root placed on the PTLV-I branch according to the analysis given in figure 2 , were then reestimated by this model, assuming and not assuming the constancy of the evolutionary rate. The same was done pruning STLV-L from the tree or using only PTLV-I (root on STLV-I-TE4), STLV-L/PTLV-II (root on STLV-L), PTLV-II (root on STLV-II), HTLV-I (root onHTLV-Ic-MEL5), or HTLV-II (root on HTLV-IId-Efe2) strains. In table 5 , the likelihood ratio test for the clock hypothesis for each group of strains is reported. The clock is rejected for all groups when considering first plus second cdp's: the likelihood of the tree assuming a nonconstant evolutionary rate is always significantly increased. On the other hand, the clock hypothesis holds for STLV-L, PTLV-I and PTLV-II strains when we exclude HTLV-II strains isolated from IDUs if we consider third cdp's (see table 5 ).


View this table:
[in this window]
[in a new window]
 
Table 5 Likelihood Ratio Test for the Molecular-Clock Hypothesis of PTLVs

 
Nonsynonymous/Synonymous Substitution Analysis
Average KA and KS values among the 10 PTLV-I strains and among the 9 PTLV-II strains are reported in table 6 . KA and KS averages within PTLV-I, PTLV-II, HTLV-I, STLV-II, or HTLV-II strains and within or between different subtypes, when compared, gave similar ratios (data not shown). KA/KS within PTLV-I or PTLV-II is always significantly less than 1, indicating purifying selection. However, employing the ML technique for the detection of selection along aligned nucleotide sequences described in Materials and Methods, 51 nt coding for amino acids 4–20 of the gp46 surface protein of the env gene showed a significantly lower likelihood value, compared with other regions of the PTLV genome, with respect to the hypothesis of neutral evolution (see Materials and Methods). Indeed, in this small region, KA/KS > 1 (1.41; P < 0.01). This region was detected using sliding windows of both 5 and 10 nt with the program PLATO (see Materials and Methods).


View this table:
[in this window]
[in a new window]
 
Table 6 Average Nonsynonymous (Ka) and Synonymous (Ks) Nucleotide Substitutions in Different Genes of the PTLV-I and PTLV-II Genome

 
HTLV/STLV Divergence Times
To set the timescale for the PTLV evolution, we employed correlations of the virus divergence times with anthropologically documented migrations of their host. The highly divergent HTLV-Ic, of which MEL5 is a representative strain, shown in the PTLV-I part of the tree in figure 1 , was isolated from members of the non–Austronesian language highlander people in Papua New Guinea and from inhabitants of the Solomon Islands (Gessain et al. 1991Citation ) and from Australian aboriginals (Bastian et al. 1993Citation ). This population migrated from Asia to Melanesia 40,000–60,000 years ago according to genetic and archaeological evidence (Roberts, Jones, and Smith 1990Citation ; Cavalli-Sforza, Menozzi, and Piazza 1994Citation ). However, the earlier dating seems more in agreement with genetic data (Cavalli-Sforza, Menozzi, and Piazza 1994Citation ; Hagelberg, personal communication), possibly reflecting the genetic divergence present in the migrant population, an argument that also holds for the viruses they carry. Since no simians have ever been detected on these islands, this population possibly acquired the virus from Indonesian simians on their migratory pathways (Ibrahim, De Thé, and Gessain 1995Citation ). Indeed, several divergent Asian simian virus clades are known, of which TE4 is the only full-length genome available, and in the LTR and env trees, all of these Asian simian virus clades join the PTLV-I tree close to the branch toward TE4 (Vandamme et al. 1998Citation ). Considering that humans are the only primates in Melanesia and that the HTLV-Ic found in isolated aboriginal tribes is the most divergent HTLV-I subtype, 60,000 years ago could be reasonably used as a lower estimate of the divergence time at the node separating the HTLV-Ic Melanesian subtype, MEL5, from the other PTLV-I strains. Since sequence divergence within the HTLV-Ic subtype could already have been present at the time the host migrated to Melanesia (reflecting the difference between a gene tree and a population tree), the divergence time of the HTLV-Ic could have preceded the early human migrations to Melanesia.

The third cdp's of STLV-L, PTLV-I, and PTLV-II coding regions evolve at constant rates (see table 5 ). A rooted PTLV tree with clocklike branch lengths was constructed based on the third cdp's, excluding HTLV-II IDU strains, for which the clock does not hold (see table 4 ) (Salemi et al. 1999Citation ), with the TN93 model with {gamma}-distributed rates across sites (estimated {alpha} = 4.34). On this tree, employing the host migration time described above as a lower limit of the virus divergence time, we estimated an evolutionary rate for the third cdp of not more than 1.67 ± 0.17 x 10-6 nucleotide substitutions per site per year, which was then further used to date all other nodes of the tree, on both the PTLV-I and the PTLV-II parts, except for the separation between PTLV-I and STLV-L/PTLV-II. In fact, despite the observation that the third cdp's of the PTLV coding regions appear to be saturated when all the strains are included, considering only PTLV-I or only PTLV-II and STLV-L strains, third positions do not show evidence of saturation (see table 1 ). Thus, while we can use the clock to date closely related lineages, estimation of divergence times between more divergent lineages, such as PTLV-I and PTLV-II, cannot be reliably based on the clock at the third cdp. We used the BLV/PTLV protein alignment and the method of Li et al. (1987)Citation (see Materials and Methods) to get an estimate of the separation between PTLV-I and PTLV-II/STLVL corresponding to the deepest branch, which is the origin of PTLV. Amino acid distances were calculated with the Blosum62 substitution model taking into account rate heterogeneity across sites (estimated {alpha} = 0.9). Finally, the separation between HTLV-IIa and HTLV-IIb, for which the clock does not hold, was estimated on the general PTLV tree, also including the HTLV-II IDU strains, and using third cdp's and the TN93 model with {gamma}-distributed rates across sites (estimated {alpha} = 5.2) with the Li et al. (1987)Citation method. As a starting date, we employed the estimated separation of the HTLV-IId Efe2 pygmy strain from the other HTLV-II strains, which was calculated using the evolutionary rate at third cdp's for the general PTLV tree (see above). The timescale of PTLV evolution is summarized in figure 3 .



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 3.—Possible evolution times with 95% confidence intervals of PTLVs. The timescale was calibrated on the third codon position of the PTLV tree employing the migration from East Asia to Melanesia of the non–Austronesian language population 60,000 years ago (given in bold type) as the separation between the Melanesian HTLV-Ic strain and the other PTLV-I strains. Since the clock hypothesis at third codon positions does not hold for HTLV-II when the HTLV-IIa and HTLV-IIb injecting-drug-user strains are included, the method of Li et al. (1987) was used to estimate the divergence time between HTLV-IIa and HTLV-IIb, and 95% confidence intervals (CIs) are given in brackets. The PTLV time of origin was estimated on amino acid sequences with BLV as outgroup with the method of Li et al. (1987). Dashed arrows show alternative estimates of the origin of PTLV based on the STLV-L/PTLV-II separation (right arrow) or based on the PTLV-I Asian/African split (right arrow); 95% CIs are given in brackets. Branch lengths in the tree are not proportional to the genetic distances

 
The tree topology shown in figure 3 is the same as that of the tree shown in figure 1 , but the tree is now rooted according to the rooted tree topology supported by the statistical evaluation of different phylogenetic methods (see fig. 2 ). The presumed simian ancestors of HTLV-Ib and HTLV-Ia African and cosmopolitan subtypes diverged over 19,500 years ago in Africa, whereas HTLV-Ia arose over 12,700 years ago (see fig. 3 ). The Indonesian STLV-I TE4 strain diverged from the other PTLV-I strains over 93,000 years ago somewhere in Asia (see fig. 3 ). On the other side of the tree, the African HTLV-IId separated from the Amerindian HTLV-IIa and HTLV-IIb strains over 58,000 years ago, whereas HTLV-IIa and HTLV-IIb separated over 22,000 years ago. HTLV-II separated from STLV-II over 400,000 years ago, and STLV-L diverged from PTLV-II over 1,000,000 years ago (see fig. 3 ). Finally, the origin of PTLV was estimated to have occurred at least 1,300,000 years ago (see fig. 3 ).

As discussed above, all the dates estimated here are upper limits, since the date from which calculations were started (60,000 years ago) is also an upper limit, considering the possible difference between a gene tree and a population tree, with the separations in a gene tree generally preceding those in a population tree (Li 1997Citation ).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
In this study, we used all 20 PTLV full-genome sequences available (from which we removed the LTR and the proximal pX regions because they cannot be aligned without ambiguity) to investigate the tempo and mode of PTLV evolution based on the most complete phylogenetic information possible. A general problem using such a data set is that randomization of synonymous changes may obscure the phylogenetic signal. It was shown in the Results section that this is not the case for PTLVs, since first, second, and third cdp's analyzed separately all exhibit significant levels of treelike phylogenetic signal.

Phylogenetic relationships among STLV-L, PTLV-I, and PTLV-II were already largely known, but some ambiguity still persisted on the root of the PTLV tree. We clearly demonstrated that the root of the PTLV tree lies on the branch leading to PTLV-I, with STLV-L and PTLV-II clustering together. The TN93 model with {gamma}-distributed rates among sites appears to be the best-fitting model for our PTLV strain data set. As a consequence, we have to assume that during PTLV evolution, not only transitions and transversions, but also pyrimidine transitions and purine transitions occurred with different biases. Thus, we suggest the use of this model, implemented in several programs like PUZZLE, PAML, and MEGA, for future investigations of the phylogeny of primate T-lymphotropic viruses in coding regions.

The observation of nucleotide substitution rate heterogeneity across sites of the PTLV full-genome sequences was expected, since it is known that synonymous and nonsynonymous sites can evolve at different rates. However, a more detailed analysis of relative nucleotide substitution rates suggest that third cdp's change at more or less the same rate among different genes and about eight times as fast as rates at first cdp's. On the other hand, substitution rates at first and second cdp's show greater heterogeneity. Biologically, this can be explained by considering that different genes can have different selective pressures. In this regard, the gag gene appears to be the one with the strongest purifying selection, since it has the lowest substitution rate at the first and second cdp's, whereas the pro gene is the fastest-evolving one. A special case is presented by the overlapping tax/rex region. In this region, the first, second, and third cdp's of Tax and the first and third cdp's of Rex appear to change particularly slowly, whereas the second cdp of the 5' end of Rex changes faster than second cdp's of other genes. Since the Rex second cdp is also the Tax third cdp, it can be suggested that the higher neutral mutation rate of Tax speeds up the nonsynonymous mutation rate in Rex. At the 3' end of Tax, which does not overlap with another reading frame, first, second, and third cdp's have the usual low value. In spite of the presence of purifying selection across the PTLV-I and PTLV-II genomes, as attested to by KA/KS values significantly lower than 1 in each gene region, we detected a small fragment in the surface gp46 protein subjected to positive selection. The fragment does not correspond to any immunodominant epitope, and since the three-dimensional structure of the protein is not known, it is difficult to find a biological explanation. It might be an interesting goal in the future to look for links of the amino acid sequences in this region with immunological or structural properties of the surface protein. KA and KS values between PTLV-I and PTLV-II are not shown, since the observed saturation at third cdp's could lead to artifactual KS values when the deepest branches of the PTLV trees are compared.

The assumption of a constant evolutionary clock for HTLVs and STLVs always significantly decreases the likelihood of the tree when we consider the first and second cdp's. It is already known that evolutionary rates calculated for some mammalian genes are not constant (Wu and Li 1985Citation ) and rates slow down in higher primates (Li and Tanimura 1987Citation ). Interspecies transmissions among simians and between simians and humans challenge the same virus lineage with different hosts and different immune systems. This phenomenon most probably contributes to the changing molecular clocks of these viruses. On the other hand, when we exclude HTLV-II IDU strains, PTLVs evolve following a molecular clock at the third cdp's. The overall evolutionary rate of the PTLV genes at third cdp's was estimated to be not higher than 1.67 ± 0.17 x 10-6 nucleotide substitutions per site per year. The clock was calibrated employing the earliest human migration from Asia to Melanesia, 60,000 years ago, as the lower limit for the node separating HTLV-Ic from the other HTLV-I subtypes. We are aware of the fact that the divergence times in the PTLV gene tree could have preceded those of the population tree of the host species, implying that the evolutionary rate was overestimated. However, we believe that the real rate of PTLV evolution cannot be much slower than our estimate for two reasons. First, evolutionary rates of retroviruses usually range between 10-1 and 10-4 nucleotide substitutions per site per year due to their fast replication rates and their error-prone reverse transcriptases (Domingo and Holland 1994Citation ). At present, there are no experimental data on the fidelity of HTLV reverse transcriptase, but comparison of structural models with HIV-1 reverse transcriptase does not suggest the presence of any particular feature, such as proofreading activity, in the HTLV enzyme that could be responsible for an exceptionally high fidelity (unpublished data). The PTLV evolutionary rate proposed in this paper is already 100 times slower than the rate of the slowest-evolving retrovirus known to date. It is unlikely for the real rate to be still 10–100 times slower than 1.67 x 10-6, since we would then have to assume that PTLVs are evolving at a rate comparable with those of cellular genes. Second, in a previous paper published by our group (Liu et al. 1994Citation ), HTLV-I sequences were isolated from seven family members infected by parental transmission. We sequenced 1,031 nt in the LTR and gp21 env regions. All sequences from each member but one, with a single–base pair substitution in the LTR, were identical. An estimate of the evolutionary rate of the virus in this family gives <=3.3 x 10-6 nucleotide substitutions per site per year (Van Dooren et al., personal communication). This is still an upper limit, but the fact that it is so close to the rate estimated in the present paper strengthens our confidence in the results. However, we are aware that a sightly lower evolutionary rate than our estimate is still possible, and thus dates given in figure 3 are reported as upper limits.

In any case, the PTLV evolutionary rate estimated here at third cdp's is much slower than the one calculated in the LTR region of HTLV-II strains infecting IDUs (Salemi et al. 1998aCitation ), but it is of the same order as the one calculated for HTLV-II LTR in endemically infected populations (Salemi et al. 1999Citation ). It could be suggested that the evolutionary rates of HTLVs depend on the way of transmission, which is different in different populations: mainly, mother-to-child transmission via breast feeding in endemically infected tribes or transmission via needle sharing among IDUs (Salemi et al. 2000). Indeed, in HTLV-I– or HTLV-II–infected individuals with high proviral loads, a large part of the provirus is produced by clonal expansion of infected cells and not by viral replication via reverse transcriptase (Wattel et al. 1995Citation ; Cimarelli et al. 1996Citation ). Because of the clonal expansion of the provirus, the reverse transcription step is less necessary for the virus to maintain its population in the host during its lifetime, while it might be considered necessary for the infection of a new host. Consequently, the possibility exists that the evolutionary rate of HTLVs is increasing with the transmission rate which is much higher in IDUs than in endemically infected tribes (Salemi et al. 1999Citation ).

The lower limit for the PTLV time of origin was estimated to be about 1,300,000 years ago when starting from the PTLV-II part of the tree in figure 3 , or about 800,000 years ago when starting from the PTLV-I part of the tree. The fact that the two dates with overlapping confidence intervals roughly agree increases our confidence on these divergence time estimations. HTLV-II separated from STLV-II not much earlier than 400,000 years ago, which is much later than the separation between bonobos and humans, dated at least 5,000,000 years ago based on both paleontological and molecular biology data (Sarich and Wilson 1967Citation ; Pilbeam 1984). Thus, human and simian lineages of the primate T-lymphotropic viruses did not separate following the speciation of their hosts, but probably arose from ancient interspecies transmissions. Very shortly after the origin of PTLV, the virus was present in Africa, since the African STLV-L and PTLV-II diverged from each other in Africa not much earlier than 1,000,000 years ago. The recent isolation of the divergent STLV-I marcI in an Asian Macaca arctoides suggests that the PTLV-I origin could be placed in Asia (Mahieux, Pecon-Slattery, and Gessain 1997Citation ). Because the two lineages, starting from the root node of the PTLV tree, evolved on different continents, the place of the origin of the PTLV common ancestor remains open; it is either Asia or Africa. Assuming an Asian origin for PTLV, a simian migration from Asia to Africa after the PTLV origin has to be postulated in order to spread the virus on the second continent. No such migration has been reported to date. Assuming an African origin, a simian migration from Africa to Asia has to be postulated. As a consequence, the African origin of PTLV could be supported by the documented migration of macaques from Africa to Asia around 2,000,000 years ago (Fa 1989Citation ) if we consider that the upper limit of the 95% confidence interval for the PTLV origin is 1,995,000 or 3,000,000 years ago, depending on which part of the tree in figure 3 is used for dating.

The separation between the Indonesian STLV-I and the other PTLV-I strains occurred much later than the STLV-II/HTLV-II one, not much earlier than 93,000 years ago (see fig. 3 ). The earliest node in the PTLV tree leading to the African STLV-I and HTLV-I strains is dated 19,500 years ago, while the HTLV-I cosmopolitan subtype, which today is spread all over the world, arose just over 12,700 years ago. The presence of these viruses on the African continent should be seen as a later introduction due to simian and/or human movements during prehistoric times between 19,500 (the latest date for the radiation in Africa) and not much earlier than 60,000 (the earliest date for a common Asian ancestor) years ago. The migration from Indonesia to Madagascar around 1,200 years ago (Kent 1962Citation ) came too late to explain the origin of HTLV-I infection on the African continent, as previously suggested (Saksena et al. 1992Citation ). Moreover, several human-to-simian transmissions would then have to be assumed, which does not seem very likely. Thus, the date of the STLV-I introduction in Africa might correspond to an ancient movement of simians (possibly as pets of humans) "back to Africa," for which evidence could be investigated in the archaeological records.

Finally, it is interesting to note that according to our calculations, the separation between HTLV-IIa and HTLV-IIb occurred not much earlier than 22,000 years ago, with a confidence interval from 12,000 to 38,000 years ago. This range is consistent with the time frame proposed for the settling of the Americas through the Bering land bridge migrations 15,000 to 35,000 years ago (Greenberg, Turner, and Zegura 1986Citation ; Cavalli-Sforza, Menozzi, and Piazza 1994Citation ). Thus, our data might support the suggestion of several investigators, based on epidemiological data, that HTLV-II among Amerindian tribes was originally brought from Asia into the Americas along with the migration of the HTLV-II–infected Asian populations over the Bering land bridge (Neel, Biggar, and Suzernik 1994Citation ; Biggar et al. 1996Citation ; Suzuki and Gojobori 1998Citation ).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
We thank Ria Swinnen for fine editorial help and Dr. Erika Hagelberg for helpful discussions. We are also grateful to Korbinian Strimmer, Xia Xuhua, the editor of Molecular Biology and Evolution, and anonymous referees for suggestions and criticisms that led to improvements in the manuscript. This study was supported in part by grant 3009894N of the Belgian Foundation for Scientific Research. M.S. is the recipient of a TMR Marie Curie fellowship from the European Commission.


    Footnotes
 
Pekka Pamilo, Reviewing Editor

1 Abbreviations: BLV, bovine leukemia virus; cdp, codon position; Fitch, Fitch and Margoliash method; HTLV, human T-cell lymphotropic virus; IDUs, injecting drug users; ML, maximum likelihood; NJ, neighbor joining; Protpars, parsimony method for proteins; PTLV, primate T-cell lymphotropic virus; STLV, simian T-cell lymphotropic virus; TN93, Tamura and Nei model assuming discrete {gamma}-distributed rates across sites; wpars, weighted parsimony method. Back

2 Keywords: HTLV-I HTLV-II STLV-I STLV-II molecular clock evolutionary models Back

3 Address for correspondence and reprints: M. Salemi, Rega Institute for Medical Research, Minderbroedersstraat 10, B-3000 Leuven, Belgium. E-mail: marco.salemi{at}uz.kuleuven.ac.be Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 

    Bastian, I., J. Gardner, D. Webb, and I. Gardner. 1993. Isolation of a human T-lymphotropic virus type I strain from Australian aboriginals. J. Virol. 67:843–851.[Abstract]

    Biggar, R., M. E. Taylor, J. V. Neel, B. Hjelle, P. H. Levine, F. Black, G. M. Shaw, P. M. Sharp, and B. Hahn. 1996. Genetic variants of human T-cell lymphotropic virus type II in American Indian groups. Virology 216:165–173.

    Cavalli-Sforza, L. L., P. Menozzi, and A. Piazza. 1994. The history and geography of human genes. Princeton University Press, Princeton, N.J.

    Cimarelli, A., C. A. Duclos, A. Gessain, C. Casoli, and U. Bertazzoni. 1996. Clonal expansion of human T-cell leukemia virus type II in patients with high proviral load. Virology 223:362–364.

    Domingo, E., and J. J. Holland. 1994. Mutation rates and rapid evolution of RNA viruses. Pp. 161–184 in S. S. Morse, ed. The evolutionary biology of viruses. Raven Press, New York.

    Fa, J. A. 1989. The genus Macaca: a review of taxonomy and evolution. Mamm. Rev. 19:45–81.[ISI]

    Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368–376.[ISI][Medline]

    ———. 1993. PHYLIP: phylogenetic inference package. Version 3.5c. Distributed by the author, Department of Genetics, University of Washington, Seattle.

    Ferrer, J. F., E. Esteban, S. Dube et al. (11 co-authors). 1996. Endemic infection with human T-cell leukemia/lymphoma virus type IIB in Argentinean and Paraguayan Indians: epidemiology and molecular characterization. J. Infect. Dis. 174:944–953.[ISI][Medline]

    Fukushima, Y., M. J. Lewis, C. Monken et al. (13 co-authors). 1998. Identification and molecular characterization of human T-lymphotropic virus type II infections in intravenous drug abusers in the former South Vietnam. AIDS Res. Hum. Retroviruses 14:537–540.

    Gessain, A., F. Barin, J. C. Vernant, O. Gout, L. Maurs, A. Calender, and G. de Thé. 1985. Antibodies to human T lymphotropic virus type-I in patients with tropical spastic paraparesis. Lancet 2:407–410.

    Gessain, A., R. Yanagihara, G. Franchini, R. M. Garruto, C. L. Jenkins, A. B. Ajdukiewicz, R. C. Gallo, and D. C. Gajdusek. 1991. Highly divergent molecular variants of human T-lymphotropic virus type I from isolated populations in Papua New Guinea and the Solomon Islands. Proc. Natl. Acad. Sci. USA 88:7694–7698.

    Giri, A., P. Markham, L. Digilio, G. Hurteau, R. C. Gallo, and G. Franchini. 1994. Isolation of a novel simian T-cell lymphotropic virus from Pan paniscus that is distantly related to the human T-cell leukemia/lymphotropic virus types I and II. J. Virol. 68:8392–8395.[Abstract]

    Goubau, P., J. Desmyter, J. Ghesquiere, and B. Kasereka. 1992. HTLV-II among pygmies. Nature 359:201 [letter].

    Goubau, P., M. Van Brussel, A.-M. Vandamme, H. F. Liu, and J. Desmyter. 1994. A primate T-lymphotropic virus, PTLV-L, different from human T-lymphotropic viruses types I and II, in a wild-caught baboon (Papio hamadryas). Proc. Natl. Acad. Sci. USA 91:2848–2852.

    Grassly, N. C., and A. Rambaut. 1998. PLATO—partial likelihood assessed through optimisation. User manual. PLATO is available via anonymous ftp from evolve.zoo.ox.ac.uk in directory packages/Plato.

    Greenberg, J., C. Turner, and S. Zegura. 1986. The settlement of the Americas: a comparison of the linguistic, dental, and genetic evidence. Curr. Anthropol. 27:477–497.[ISI]

    Hahn, B. H., G. M. Shaw, M. Popovic, A. Lo-Monico, R. C. Gallo, and F. Staal-Wong. 1984. Molecular cloning and analysis of a new variant of human T-cell leukemia virus (HTLV-Ib) from an African patient with adult T-cell leukemia-lymphoma. Int. J. Cancer 34:613–618.

    Hall, W. W., K. Takayuki, S. Ijichi, H. Takahashi, and S. W. Zhu. 1994. Human T cell leukemia/lymphoma virus type II (HTLV-II): emergence of an important newly recognized pathogen. Semin. Virol. 5:165–178.[ISI]

    Hall, W. W., H. Takahashi, C. Liu, M. H. Kaplan, O. Scheewind, S. Ijichi, K. Nagashima, and R. C. Gallo. 1992. Multiple isolates and characteristics of human T-cell leukemia virus type II. J. Virol. 66:2456–2463.[Abstract]

    Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174.[ISI][Medline]

    Hillis, D. M., and J. Huelsenbeck. 1992. Signal, noise, and reliability in molecular phylogenetic analysis. J. Hered. 83:189–195.[ISI][Medline]

    Hjelle, B., S. W. Zhu, H. Takahashi, S. Ijichi, and W. W. Hall. 1993. Endemic human T cell leukemia virus type II infection in southwestern US Indians involves two prototype variants of virus. J. Infect. Dis. 168:737–740.[ISI][Medline]

    Huelsenbeck, J., and B. Rannala. 1997. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276:227–232.

    Ibrahim, F., G. De Thé, and A. Gessain. 1995. Isolation and characterization of a new simian T-cell leukemia virus type I from naturally infected Celebes macaques (Macaca tonkeana): complete nucleotide sequence and phylogenetic relationship with the Australo-Melanesian human T-cell leukemia virus type I. J. Virol. 69:6980–6993.[Abstract]

    Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21–132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York.

    Kalyanaraman, V. S., M. G. Sarngadharan, M. Robert-Guroff, I. Miyoshi, D. Golde, and R. C. Gallo. 1982. A new subtype of human T-cell leukemia virus (HTLV-II) associated with a T-cell variant of hairy cell leukemia. Science 218:571–573.

    Kent, R. K. 1962. From Madagascar to the Malagasy Republic. Brittanica Encyclopedia, London.

    Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120.[ISI][Medline]

    Kumar, S., K. Tamura, and M. Nei. 1993. MEGA: molecular evolutionary genetic analysis. Version 1.0. Pennsylvania State University, University Park.

    Lee, H., P. Swanson, V. S. Shorty, J. A. Zack, J. D. Rosenblatt, and I. S. Y. Chen. 1989. High rate of HTLV-II infection in seropositive IV drug abusers in New Orleans. Science 244:471–475.

    Li, W.-H. 1997. Molecular evolution. Sinauer, Sunderland, Mass.

    Li, W.-H., and M. Tanimura. 1987. The molecular clock runs more slowly in man than in apes and monkeys. Nature 326:93–96.

    Li, W.-H., K. H. Wolfe, J. Sourdis, P. Sharp. 1987. Reconstruction of phylogenetic trees and estimation of divergence times under nonconstant rates of evolution. Cold Spring Harb. Symp. Quant. Biol. 52:847–856.[ISI][Medline]

    Liu, H. F., A.-M. Vandamme, M. Van Brussel, J. Desmyter, and P. Goubau. 1994. New retroviruses in human and simian T-lymphotropic viruses. Lancet 344:265–266.

    Liu, H. F., P. Goubau, M. Van Brussel, K. Van Laethem, Y. C. Chen, J. Desmyter, and A.-M. Vandamme. 1996. The three human T-lymphotropic virus type I subtypes arose from three geographically distinct simian reservoirs. J. Gen. Virol. 77:359–368.[Abstract]

    Mahieux, R., C. Chappey, M. C. Georges-Coubot, G. DeBreuil, P. Mauclere, A. Georges, and A. Gessain. 1998. Simian T-lymphotropic virus type I from Mandrillus sphinx as a simian counterpart of human T-cell lymphotropic type I subtype D. J. Virol. 72:10316–10322.[Abstract/Free Full Text]

    Mahieux, R., F. Ibrahim, P. Mauclere et al. (14 co-authors). 1997. Molecular epidemiology of 58 new African human T-cell leukemia virus type I (HTLV-1) strains: identification of a new and distinct HTLV-1 molecular subtype in Central Africa and in pygmies. J. Virol. 71:1317–1333.[Abstract]

    Mahieux, R., J. Pecon-Slattery, and A. Gessain. 1997. Molecular characterization and phylogenetic analyses of a new, highly divergent simian T-cell lymphotropic virus (STLV-1marc1) in Macaca arctoides. J. Virol. 71:6253–6258.[Abstract]

    Maloney, E. M., R. J. Biggar, J. V. Neel, M. Taylor, B. H. Hahn, G. M. Shaw, and W. A. Blattner. 1992. Endemic human T cell lymphotropic virus type II infection among isolated Brazilian Amerindians. J. Infect. Dis. 166:100–107.[ISI][Medline]

    Miura, T., T. Fukunaga, and T. Igarashi. 1994. Phylogenetic subtypes of human T-lymphotropic virus type I and their relations to the anthropological background. Proc. Natl. Acad. Sci. USA 91:1124–1127.

    Neel, J. V., R. J. Biggar, and R. I. Suzernik. 1994. Virologic and genetic studies relate Amerind origins to the indigenous people of the Mongolia/Manchuria/southeastern Siberia region. Proc. Natl. Acad. Sci. USA 91:10737–10741.

    Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418–426.[Abstract]

    Nieselt-Struve, K. 1998. Combining likelihood-mapping and statistical geometry to a new sequence analysis tool. Pp. 13–22 in M. K. Uyenoyama and A. von Haeseler, eds. Prooceedings of the Trinational Workshop on Molecular Evolution. University of Munich, Munich, Germany.

    Osame, M., M. Matsumoto, K. Usuku, S. Izumo, N. Ijichi, H. Amitani, M. Tara, and A. Igata. 1987. Chronic progressive myelopathy associated with elevated antibodies to human T-lymphotropic virus type I and adult T-cell leukemia like cells. Ann. Neurol. 21:117–122.[ISI][Medline]

    Poiesz, B. J., F. W. Ruscetti, A. F. Gazdar, P. A. Bunn, J. A. Minna, and R. C. Gallo. 1980. Detection and isolation of type C retrovirus particles from fresh and cultured lymphocytes of a patient with cutaneous T-cell lymphoma. Proc. Natl. Acad. Sci. USA 77:7415–7419.

    Roberts, D., G. R. Jones, and M. A. Smith. 1990. Report of thermoluminescence dates supporting arrival of people between 50 and 60 k. y. in southern Australia. Nature 345:153.

    Saksena, N. K., M. P. Sherman, R. Yanagihara, D. K. Dube, and B. J. Poiesz. 1992. LTR sequence and phylogenetic analyses of a newly discovered variant of HTLV-I isolated from the Hagahai of Papua New Guinea. Virology 189:1–9.

    Salemi, M., M. Lewis, J. F. Egan, W. W. Hall, J. Desmyter, and A.-M. Vandamme. 1999. Different population dynamics and evolutionary rates of human T-cell lymphotropic virus type II (HTLV-II) in injecting drug users compared to in endemically infected Amerindian and Pygmy tribes. Proc. Natl. Acad. Sci. USA 69:13253–13258.

    Salemi, M., A.-M. Vandamme, C. Gradozzi, K. Van Laethem, E. Cattaneo, G. Taylor, C. Casoli, P. Goubau, J. Desmyter, and U. Bertazzoni. 1998a. Evolutionary rate and genetic heterogeneity of human T-cell lymphotropic virus type II (HTLV-II) using new isolates from European injecting drug users. J. Mol. Evol. 46:602–611.

    Salemi, M., S. Van Dooren, E. Audenaert, E. Delaporte, P. Goubau, J. Desmyter, and A.-M. Vandamme. 1998b. Two new human T-lymphotropic virus type I phylogenetic subtypes in seroindeterminates, a Mbuti pygmy and a Gabonese, have closest relatives among African STLV-I strains. Virology 246:277–287.

    Sarich, V. M., and A. C. Wilson, 1967. Immunological time scale for hominid evolution. Science 158:1200–1203.

    Song, K. J., V. R. Nerurkar, N. Saitou, A. Lazo, J. R. Blakeslee, I. Miyoshi, and R. Yanagihara. 1994. Genetic analysis and molecular phylogeny of simian T-cell lymphotropic virus type I: evidence for independent virus evolution in Asia and Africa. Virology 199:56–66.

    Strimmer, K., and A. von Haeseler. 1997. Likelihood mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc. Natl. Acad. Sci. USA 94:6815–6819.

    Suzuki, Y., and T. Gojobori. 1998. The origin and evolution of the human T-cell lymphotropic virus type I and II. Virus Genes 16:69–84.

    Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzess. Mol. Biol. Evol. 10:512–526.[Abstract]

    Van Brussel, M., P. Goubau, R. Rousseau, J. Desmyter, and A.-M. Vandamme, 1997. Complete nucleotide sequence of a new simian T-lymphotropic virus, STLV-PH969 from hamadryas baboon, and unusual features of its long terminal repeats. J. Virol. 71:5464–5472.

    Vandamme, A.-M., H.-F. Liu, P. Goubau, and J. Desmyter. 1994. Primate T-lymphotropic virus type I LTR sequence variation and its phylogenetic analysis: compatibility with an African origin of PTLV-I. Virology 202:212–223.

    Vandamme, A.-M., H.-F. Liu, M. Van Brussel, W. De Meurichy, J. Desmyter, and P. Goubau. 1996. The presence of a divergent T-lymphotropic virus in a wild-caught pygmy chimpanzee (Pan paniscus) supports and African origin for the human T-lymphotropic/simian T-lymphotropic group of viruses. J. Gen. Virol. 77:1089–1099.[Abstract]

    Vandamme, A.-M., M. Salemi, and J. Desmyter. 1998. The simian origins of the pathogenic human T-cell lymphotropic virus type I. Trends Microbiol. 6:477–483.[ISI][Medline]

    Vandamme, A.-M., M. Van Brussel, H.-F. Liu, M. Salemi, K. Van Laethem, M. Van Ranst, L. Michels, J. Desmyter, and P. Goubau. 1998. African origin of human T-lymphotropic virus type II (HTLV-II) supported by a new subtype HTLV-IId in Zairean Bambuti Efe pygmies. J. Virol. 72:4327–4340.[Abstract/Free Full Text]

    Watanabe, T., M. Seiki, Y. Hirayama, and M. Yoshida. 1986. Human T-cell leukemia virus type I is a member of the African subtype of simian viruses (STLV). Virology 148:385–388.

    Wattel, E., J.-P. Vartanian, C. Pannetier, and S. Wain-Hobson. 1995. Clonal expansion of human T-cell leukemia virus type I—infected cells in asymptomatic and symptomatic carriers without malignancy. J. Virol. 69:2863–2868.[Abstract]

    Wu, C. I., and W.-H. Li. 1985. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc. Natl. Acad. Sci. USA 82:1741–1745.

    Xia, X. 1999. DAMBE 3.0 (software package for data analysis in molecular biology and evolution). Department of Ecology and Biodiversity, University of Hong Kong, Hong Kong.

    Yang, Z., 1993. Maximum likelihood estimation of the phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10:1396–1401.

    ———. 1994a. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39:105–111.

    ———. 1994b. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites. J. Mol. Evol. 39:306–314.

    ———. 1997. Phylogenetic analysis by maximum-likelihood (PAML). Version 1.3b. Pennsylvania State University, Institute of Molecular Evolutionary Genetics, University Park.

    Yoshida, M., I. Miyoshi, and Y. Hinuma. 1982. Isolation and characterization of retrovirus from cell lines of human adult T-cell leukemia and its implication in the disease. Proc. Natl. Acad. Sci. USA 79:2031–2035.

    Zella, D., L. Mori, M. Sala, P. Ferrante, C. Casoli, G. Magnani, G. Achilli, E. Cattaneo, F. Lori, and U. Bertazzoni. 1990. HTLV-II infection in Italian drug abusers. Lancet 2:575–576.

Accepted for publication November 8, 1999.