Phylogenetic reconstruction of intrapatient evolution of human immunodeficiency virus type 1: predominance of drift and purifying selection

Laurens Kils-Hütten1, Rémi Cheynier2, Simon Wain-Hobson2 and Andreas Meyerhans1

Abteilung Virologie, Universität des Saarlandes, Institut für Medizinische Mikrobiologie und Hygiene, Klinikum Homburg, Haus 47, D-66421 Homburg, Germany1
Unité de Rétrovirologie Moléculaire, Institut Pasteur, F-75724 Paris cedex 15, France2

Author for correspondence: Andreas Meyerhans. Fax +49 6841 16 3980. e-mail Andreas.Meyerhans{at}med-rz.uni-sb.de


   Abstract
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
The intra-host evolution of 73 human immunodeficiency virus type 1 quasispecies was analysed by split decomposition analysis. Non-synonymous and synonymous nucleotide substitutions were counted along the shortest path connecting all sequences and compared with the numbers expected under the assumption of a random model of mutation. For the majority of substitutions, drift and negative selection seemed to prevail.


   Introduction
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
The human (HIV) and simian (SIV) immunodeficiency viruses exhibit tremendous genetic variability in their hosts. This is due mainly to two factors, the error-prone nature of the virus replication machinery and the continuous and rapid virus turn-over. Consequently, a complex mixture of genetically related viral genomes, or quasispecies, is present within each individual. These quasispecies fluctuate both in time and space, with the overall divergence between individual sequences generally increasing with time after primary infection.

The mechanisms that may contribute to the fixation of mutations include positive and negative selection as well as stochastic effects like bottlenecking and the massive destruction of virus and virus-infected cells by the intense antiviral immune response. A widely used means of distinguishing between these processes is the analysis of non-synonymous (ns) and synonymous (s) nucleotide substitutions. When normalized to the number of non-synonymous and synonymous sites, a ratio of dns/ds>>1 would indicate positive selection and dns/ds<<1 would indicate negative selection, while dns/ds~1 is compatible with drift (Nei & Gojobori, 1986 ).

Counting ns and s substitutions within an HIV sequence dataset relies on a proper reconstruction of virus evolution by phylogeny. Without doing so, standard pairwise methods for estimating dns and ds tend to overestimate the number of substitutions in these cases (Zanotto et al., 1999 ). Split decomposition is a mathematical clustering technique that has been applied successfully to the analysis of virus evolution (Dopazo et al., 1993 ; Plikat et al., 1997 ). It is a non-approximative method by which a set of sequences in the form of a distance matrix is decomposed into a number of binary splits. The splits can then be presented as a network, in which the nodes and tips of the branches correspond to individual sequences.

Given the clonal origin of an HIV infection represented by the node with most branches at an early time-point after the primary infection, it was possible to make a reliable estimation of the number of ns and s substitutions along the evolutionary path of an HIV-1 nef quasispecies in a single individual (Plikat et al., 1997 ). By comparison with expected ns and s values expected from a neutral model, it was concluded that drift plays a prominent role in HIV evolution. Here, the approach has been applied to 73 HIV-1 datasets comprising the nef, env and gag genes, encompassing approximately 1000 sequences. As judged by the proportions of ns and s substitutions, negative selection and random processes are uppermost in shaping the evolution of virus quasispecies in vivo.


   Methods
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
{blacksquare} Sequence data.
The 73 HIV-1 sequence datasets were obtained from databases or were kindly provided by Klaus Cichutek and Heike Merget-Millitzer (Paul-Ehrlich Institut, Langen, Germany). The GenBank accession numbers are: M58193M58283, L35950L36017, U79785U79869, U79870U79957, U68496U68521, U69282U69481, U56146U56235, U79034U79113, L26408L26445, L34422L34541, U58393U58465, M84240M84317, M74591M74684, U29433U29437, U29956, U29957, U29959U30074, U30077U30145, U31573U31582, U43035U43054, U35894U36185 and M77541M77636.

{blacksquare} Analysis of the sequence data.
Nucleic acid sequences were aligned by using the multiple sequence alignment algorithm as implemented in Clustal W (Thompson et al., 1994 ). Gap penalty parameters were set to 3·0 for opening a new gap and 0·05 for extension of an existing gap and the output format was set to MSF. The sequence alignments were translated to NEXUS format by using a modified version of READSEQ and used as input for SplitsTree 2.4 (Bandelt & Dress, 1992 ; Huson, 1998 ; Thompson et al., 1994 ). For each dataset, the most parsimonious path was mapped out on the phylogram and substitutions were scored in terms of non-synonymous and synonymous substitutions as well as transitions and transversions.

The expected numbers of ns and s mutations were calculated for one sequence in each dataset, assuming all substitutions to be equally probable. Alternatively, they were calculated by imposing a transition/transversion ratio derived from each dataset in a manner described previously (Plikat et al., 1997 ). Any deviation of the observed values from these theoretical values was tested for significance by means of a {chi}2-test. Datasets for which {chi}2=3·841 (P<0·05) were interpreted as being under selection. If ns substitutions were significantly more or less frequent than expected, the genomes would be considered to be under positive or negative (purifying) selection, respectively. Since scoring transitions and transversions doesn't require knowledge of the genome that initiated the infection, this method is particularly useful for analysing unrooted datasets.


   Results
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
As an example of the split decomposition-based analysis, Fig. 1 shows the results for HIV-1 V3 intrapatient evolution (patient B; McDonald et al., 1997 ). When all taxa in a dataset were used, the ‘fit’ was frequently less than 100%. This means that a certain number of informative sites were discarded in order not to violate the constraints inherent to the program. In the case of Fig. 1, it could be shown that only variations at site 94 were responsible for the low fit. When excluded, all other informative sites for all the taxa were incorporated with a fit of 100%. The details of the sequences used from each dataset can be found at http://www.med-rz.uni-saarland.de/med_fak/virologie/hiv-splits-2000.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 1. Phylogenetic reconstruction of an intrapatient set of HIV-1 env V3 sequences by split decomposition analysis. Position 94 was excluded in order to obtain a fit of 100%. Complex splits are represented by parallelograms. Bold lines indicate the minimum path length connecting all sequences. The nucleotide substitutions along this path are marked. Numbers of ns and s substitutions were counted while the expected values were computed assuming a random distribution of ns and s substitutions using 91-9 as reference sequence. The difference between the observed and expected ns/s ratios was tested for statistical significance by the {chi}2-test.

 
The minimum pathway connecting all sequences was established first, after which substitutions were scored. There were 13 ns and 4 s substitutions (Fig. 1). The expected values for ns and s were 13·4 and 3·6. The deviation of the observed from the expected values was not statistically significant ({chi}2=0·07, P>0·05). This approach was then applied to 72 more HIV-1 intrapatient datasets, most of which have been published. They encompass gag, env V1–V2, V3, V4 and V5 and nef datasets. Interestingly, most showed ns and s values consistent with a process indistinguishable from random accumulation (55/73 sets, or 75%), with negative or purifying selection for the remainder (18/73=25%) (Table 1. These results are summarized succinctly in Fig. 2, which gives the mean {chi}2 values and the range.


View this table:
[in this window]
[in a new window]
 
Table 1. Summary of the results of split decomposition-based phylogenetic analysis of 73 intrapatient HIV-1 datasets

 


View larger version (13K):
[in this window]
[in a new window]
 
Fig. 2. Result of the phylogenetic analysis of 73 datasets covering gag and nef and the hypervariable parts of the env region of HIV-1. The phylogenetic relations in each set were established by split decomposition analysis. Nucleotide substitutions were counted according to the most parsimonious path and compared with expected values. Statistical comparison between the numbers of s and ns substitutions was performed by the {chi}2-test. The ranges as well as mean {chi}2 values for each gene were analysed. The y-axis shows {chi}2 values; those corresponding to datasets with an excess of ns substitutions with respect to expected values are shown in the upper part, whereas {chi}2 values in the lower part correspond to datasets with a dearth of ns substitutions with respect to expected values. Horizontal lines mark P=0·05.

 
Transitions (Ts) were more frequent than transversions (Tv). Due to the degeneracy of the genetic code, transversions lead more frequently to ns changes than do transitions. Consequently, calculation of expected ns and s values with a random model ignoring a Ts/Tv bias could be too simplistic and conducive to error. However, if the expected values were calculated by using a Ts/Tv ratio derived from the particular dataset, there were only slightly more cases where noise was in evidence as opposed to negative selection. No example of positive selection was revealed (data not shown).

For the sequences published by McDonald et al. (1997) , CD4+ cell counts were given corresponding to the date of final sequence sampling. Plotting the CD4+ cell counts against the {chi}2 values determined for each of these sequence sets shows that there was basically no difference between the modes of evolution in patients with T4 counts around 400 per µl and those with T4 counts in ranges defining AIDS (data not shown).

A number of studies have shown that the accumulation of HIV-1 genetic diversity over time is essentially linear (Gojobori et al., 1990 ; Plikat et al., 1997 ). Table 1 includes a number of datasets from several individuals over time. Although the founding genome was not known, it was interesting to investigate the manner in which cross-sectional diversity increased over time for these patients. The mean numbers of ns and s substitutions per base are shown as a function of time after seroconversion in Fig. 3.



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 3. Accumulation of nucleotide substitutions over time in three representative HIV-1 V3 sequence sets. The accumulation of ns (open squares) and s (open diamonds) substitutions is marked, revealing an overall substitution rate of 1–2 substitutions per V3 region per year. Correlation coefficients (r) for ns substitutions are 0·93 for (A), 0·91 for (B) and 0·96 for (C). Correlation coefficients for s substitutions are 0·97 for (A), 0·98 for (B) and 0·97 for (C), supporting the idea that accumulation of ns and s substitutions is a steady process. Examples A, B and C in this figure correspond to V3 datasets Wolinsky P1, Wolfs Pat.1 and Holmes p82 from Table 1.

 
For patient Wolinsky P1, there was a steady accumulation of substitutions over time (Fig. 3A). The gradients went through the origin, consistent with a single source of infection. For patient Wolfs Pat.1 (Fig. 3B), genetic diversity was apparently present at the time of seroconversion, consistent with multiple variants initiating infection; after infection, there was a linear accumulation of diversity over time. Similarly, there was a steady accumulation of diversity over time for patient Holmes p82 (Fig. 3C). However, the gradients passed though the abscissa at more than 2 years after infection. One explanation could be that a severe bottleneck had occurred, so wiping out the majority of ancestral sequences. Given that sampling rarely involves more than 20 clones, earlier sequences at low frequency are probably present but are rarely sampled. For the three datasets, the cross-sectional diversity increased linearly for both ns and s substitutions, at a pace of about 1–2 substitutions per V3 sequence per year.


   Discussion
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
Split decomposition is a versatile tool for the phylogenetic reconstruction of retrovirus evolution, particularly intrapatient datasets. The procedure described here was based upon localizing individual substitutions on a phylogram, when the fit was 100%. Only in this situation are branch lengths strictly proportional to the number of substitutions. When applied to 73 datasets, drift and negative selection dominated.

A number of these datasets have revealed evidence of positive selection (Holmes et al., 1992 ; Liu et al., 1997 ; Simmonds et al., 1991 ; Wolinsky et al., 1996 ; Zhang et al., 1997 ), while the present analysis failed to do so. In these previous studies, ns and s values were invariably derived from 2x2 comparisons of sequences, without taking into account the phylogenetic relationships between the sequences. There is no obvious reason why 2x2 analyses should have a preferential effect on non-synonymous as opposed to synonymous substitutions. However, as phylogenetic reconstruction identifies a smaller number of mutations in a dataset compared with a 2x2 analysis, the statistical importance of the observed to expected values for ns or s substitutions becomes a major issue. For example, for a random model of mutation, approximately 80% of substitutions are non-synonymous. As the number of observed mutations becomes smaller, it clearly becomes harder to distinguish the observed distribution of ns and s substitutions from that expected. Conversely, any method that increases the absolute values of ns and s might tend to establish statistical significance more frequently than warranted.

However, it is not just a question of statistical significance, because many of the datasets involve large numbers of observed ns and s substitutions (Table 1). The question then becomes: which method more accurately describes the biological process? Given that diversification is a case of descent with modification, phylogenetic reconstruction would appear warranted. Indeed, it is increasingly being used in assessing the significance of ns and s mutations as judged by codon-based methods (Nielsen & Yang, 1998 ; Yamaguchi-Kabata & Gojobori, 2000 ; Zanotto et al., 1999 ). Such analyses have shown some evidence of positive selection. The difference is that these studies are of much higher resolution, asking questions about individual codons, whereas analyses of whole regions are of lower resolution. Taken together, this suggests that the majority of substitutions are not positively selected, while a small fraction might well be (Sala & Wain-Hobson, 2000 ).

Among these intrapatient datasets, there was clearly a great deal of genetic noise. In other words, there was little evidence of fixation of mutations in the virus quasispecies. The reasons for this may be many, although one variable could be the half-life of HIV-infected resting T cells in the peripheral blood, the preferred source of material for all of the studies cited above. These half-lives are variably cited in terms of months to years (Michie et al., 1992 ; Perelson et al., 1996 , 1997 ). Hence, the time required to observe fixation may need to be several half-lives. Another variable could be the rapid expansion of some populations of variants that spill over into the periphery. The most striking observation concerns the dynamics of defective proviruses: occasionally 40% of genomes can be defective at a single site at a given time-point. A few months earlier or later, the proportion may be 5% or less (Martins et al., 1991 ).

Another unknown in the assessment of mutation frequencies and mutation matrices from patient-derived datasets is the effect of recombination. In the accompanying study of longitudinal SIV sequence variation, Cheynier et al. (2001) show that the frequency of deletions in an intrapatient data set is comparable to that for transversions. Jetzt et al. (2000) have determined the ex vivo recombination rate to be about 2–3 recombination events per genome per cycle. This is approximately tenfold higher than the point substitution rate (Mansky & Temin, 1995 ). A priori, recombination should not favour ns or s substitutions per se. However, in a situation where the recombination rate is high compared with the substitution rate, it might be argued that the majority of homoplasies result from recombination.

In conclusion, a strong element of genetic noise is in evidence among intrapatient HIV quasispecies. Although more precise methods do yield evidence of positive selection, it would seem that the majority of substitutions observed in a dataset are unselected or are under negative selection.


   Acknowledgments
 
This study was supported by grants of the Deutsche Forschungsgemeinschaft, Institut Pasteur and the ANRS.


   References
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
Ball, J. K., Holmes, E. C., Whitwell, H. & Desselberger, U. (1994). Genomic variation of human immunodeficiency virus type 1 (HIV-1): molecular analyses of HIV-1 in sequential blood samples and various organs obtained at autopsy. Journal of General Virology 75, 867-879.

Bandelt, H. J. & Dress, A. W. (1992). Split decomposition: a new and useful approach to phylogenetic analysis of distance data. Molecular Phylogenetics and Evolution 1, 242-252.[Medline]

Brown, A. J. & Cleland, A. (1996). Independent evolution of the env and pol genes of HIV-1 during zidovudine therapy. AIDS 10, 1067-1073.[Medline]

Brown, A. J., Lobidel, D., Wade, C. M., Rebus, S., Phillips, A. N., Brettle, R. P., France, A. J., Leen, C. S., McMenamin, J., McMillan, A., Maw, R. D., Mulcahy, F., Robertson, J. R., Sankar, K. N., Scott, G., Wyld, R. & Peutherer, J. F. (1997). The molecular epidemiology of human immunodeficiency virus type 1 in six cities in Britain and Ireland. Virology 235, 166-177.[Medline]

Cheynier, R., Kils-Hütten, L., Meyerhans, A. & Wain-Hobson, S. (2001). Insertion/deletion frequencies match those of point mutations in the hypervariable regions of the simian immunodeficiency virus surface envelope gene. Journal of General Virology 82, 1613-1619.[Abstract/Free Full Text]

Delassus, S., Cheynier, R. & Wain-Hobson, S. (1991). Evolution of human immunodeficiency virus type 1 nef and long terminal repeat sequences over 4 years in vivo and in vitro. Journal of Virology 65, 225-231.[Medline]

Donaldson, Y. K., Bell, J. E., Holmes, E. C., Hughes, E. S., Brown, H. K. & Simmonds, P. (1994). In vivo distribution and cytopathology of variants of human immunodeficiency virus type 1 showing restricted sequence variability in the V3 loop. Journal of Virology 68, 5991-6005.[Abstract]

Dopazo, J., Dress, A. W. M. & von Haeseler, A. (1993). Split decomposition: a technique to analyze viral evolution. Proceedings of the National Academy of Sciences, USA 90, 10320-10324.[Abstract]

Gojobori, T., Moriyama, E. N. & Kimura, M. (1990). Molecular clock of viral evolution, and the neutral theory. Proceedings of the National Academy of Sciences, USA 87, 10015-10018.[Abstract]

Holmes, E. C., Zhang, L. Q., Simmonds, P., Ludlam, C. A. & Brown, A. J. (1992). Convergent and divergent sequence evolution in the surface envelope glycoprotein of human immunodeficiency virus type 1 within a single infected patient. Proceedings of the National Academy of Sciences, USA 89, 4835-4839.[Abstract]

Holmes, E. C., Zhang, L. Q., Robertson, P., Cleland, A., Harvey, E., Simmonds, P. & Leigh Brown, A. J. (1995). The molecular epidemiology of human immunodeficiency virus type 1 in Edinburgh. Journal of Infectious Diseases 171, 45-53.[Medline]

Hughes, E. S., Bell, J. E. & Simmonds, P. (1997). Investigation of the dynamics of the spread of human immunodeficiency virus to brain and other tissues by evolutionary analysis of sequences from the p17gag and env genes. Journal of Virology 71, 1272-1280.[Abstract]

Huson, D. H. (1998). SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68-73.[Abstract]

Jetzt, A. E., Yu, H., Klarmann, G. J., Ron, Y., Preston, B. D. & Dougherty, J. P. (2000). High rate of recombination throughout the human immunodeficiency virus type 1 genome. Journal of Virology 74, 1234-1240.[Abstract/Free Full Text]

Leitner, T., Kumar, S. & Albert, J. (1997). Tempo and mode of nucleotide substitutions in gag and env gene fragments in human immunodeficiency virus type 1 populations with a known transmission history. Journal of Virology 71, 4761-4770.[Abstract]

Liu, S. L., Schacker, T., Musey, L., Shriner, D., McElrath, M. J., Corey, L. & Mullins, J. I. (1997). Divergent patterns of progression to AIDS after infection from the same source: human immunodeficiency virus type 1 evolution and antiviral responses. Journal of Virology 71, 4284-4295.[Abstract]

McDonald, R. A., Mayers, D. L., Chung, R. C., Wagner, K. F., Ratto-Kim, S., Birx, D. L. & Michael, N. L. (1997). Evolution of human immunodeficiency virus type 1 env sequence variation in patients with diverse rates of disease progression and T-cell function. Journal of Virology 71, 1871-1879.[Abstract]

Mansky, L. M. & Temin, H. M. (1995). Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase. Journal of Virology 69, 5087-5094.[Abstract]

Martins, L. P., Chenciner, N., Asjo, B., Meyerhans, A. & Wain-Hobson, S. (1991). Independent fluctuation of human immunodeficiency virus type 1 rev and gp41 quasispecies in vivo. Journal of Virology 65, 4502-4507.[Medline]

Michie, C. A., McLean, A., Alcock, C. & Beverley, P. C. (1992). Lifespan of human lymphocyte subsets defined by CD45 isoforms. Nature 360, 264-265.[Medline]

Nei, M. & Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular Biology and Evolution 3, 418-426.[Abstract]

Nielsen, R. & Yang, Z. (1998). Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148, 929-936.[Abstract/Free Full Text]

Perelson, A. S., Neumann, A. U., Markowitz, M., Leonard, J. M. & Ho, D. D. (1996). HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science 271, 1582-1586.[Abstract]

Perelson, A. S., Essunger, P., Cao, Y., Vesanen, M., Hurley, A., Saksela, K., Markowitz, M. & Ho, D. D. (1997). Decay characteristics of HIV-1-infected compartments during combination therapy. Nature 387, 188-191.[Medline]

Plikat, U., Nieselt-Struwe, K. & Meyerhans, A. (1997). Genetic drift can dominate short-term human immunodeficiency virus type 1 nef quasispecies evolution in vivo. Journal of Virology 71, 4233-4240.[Abstract]

Sala, M. & Wain-Hobson, S. (2000). Are RNA viruses adapting or merely changing? Journal of Molecular Evolution 51, 12-20.[Medline]

Simmonds, P., Zhang, L. Q., McOmish, F., Balfe, P., Ludlam, C. A. & Brown, A. J. (1991). Discontinuous sequence change of human immunodeficiency virus (HIV) type 1 env sequences in plasma viral and lymphocyte-associated proviral populations in vivo: implications for models of HIV pathogenesis. Journal of Virology 65, 6266-6276.[Medline]

Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673-4680.[Abstract]

Wolfs, T. F., Zwart, G., Bakker, M., Valk, M., Kuiken, C. L. & Goudsmit, J. (1991). Naturally occurring mutations within HIV-1 V3 genomic RNA lead to antigenic variation dependent on a single amino acid substitution. Virology 185, 195-205.[Medline]

Wolinsky, S. M., Korber, B. T., Neumann, A. U., Daniels, M., Kunstman, K. J., Whetsell, A. J., Furtado, M. R., Cao, Y., Ho, D. D. & Safrit, J. T. (1996). Adaptive evolution of human immunodeficiency virus-type 1 during the natural course of infection. Science 272, 537-542.[Abstract]

Yamaguchi-Kabata, Y. & Gojobori, T. (2000). Reevaluation of amino acid variability of the human immunodeficiency virus type 1 gp120 envelope glycoprotein and prediction of new discontinuous epitopes. Journal of Virology 74, 4335-4350.[Abstract/Free Full Text]

Zanotto, P. M., Kallas, E. G., de Souza, R. F. & Holmes, E. C. (1999). Genealogical evidence for positive selection in the nef gene of HIV-1. Genetics 153, 1077-1089.[Abstract/Free Full Text]

Zhang, L., Diaz, R. S., Ho, D. D., Mosley, J. W., Busch, M. P. & Mayer, A. (1997). Host-specific driving force in human immunodeficiency virus type 1 evolution in vivo. Journal of Virology 71, 2555-2561.[Abstract]

Received 20 October 2000; accepted 8 March 2001.