Department of Ecology and Evolutionary Biology, University of Kansas
Correspondence: E-mail: scottw{at}ku.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: HIV evolution positive selection adaptation rate
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Several major hypotheses of disease progression posit that HIV pathogenicity is a direct result of virus adaptation to the host environment (Tersmette et al. 1989; Nowak et al. 1991; Wodarz, Klenerman, and Nowak 1998; Wolinsky and Learn 1999). Specifically, the evolution of new viral phenotypes that evade the host's immune responses is thought to play a central role in disease progression. Prompted by these hypotheses, a number of empirical studies have investigated the role of viral evolution in disease progression by following genetic divergence and diversity in vivo over the course of infection (Wolfs et al. 1990; Holmes et al. 1992; Strunnikova et al. 1995; Wolinsky et al. 1996; Ganeshan et al. 1997; Markham et al. 1998; Strunnikova et al. 1998; Shankarappa et al. 1999; Viscidi 1999). Unfortunately, the results of these studies have been somewhat contradictory. For example, some studies have found a positive relationship between the accumulation rate of genetic diversity and disease progression rate (Strunnikova et al. 1995; Markham et al. 1998; Strunnikova et al. 1998), whereas others have found the opposite pattern (Wolinsky et al. 1996; Ganeshan et al. 1997). I suggest that there are two reasons for such inconsistencies. First, in analyzing these longitudinal studies of sequence evolution, investigators have not explicitly differentiated between adaptive and selectively neutral changes (with the notable exception of Zanotto et al. [1999] and Ross and Rodrigo [2002]). Characterizing the rate and pattern of adaptation is essential to determining the clinical significance of sequence evolution in vivo. Thus, it is necessary to filter out the "evolutionary noise" of neutral mutations. Also, the number of patients sampled in each of these longitudinal studies is simply too small (generally between five and 10) to make broad generalizations regarding disease progression. A combined analysis of all the available longitudinal sequence data is urgently needed to evaluate the role of viral evolution in disease progression. In this paper, I present such a combined analysis. In addition, I adapt a method that explicitly differentiates between adaptive and neutral changes (Smith and Eyre-Walker 2002).
The env Gene of HIV-1
The env gene codes for the envelope glycoprotein gp160, which is a precursor to two glycoproteins: gp41 and gp120. I will focus on the region of env that ultimately gives rise to gp120. This protein is embedded in and extends exterior to the viral lipid membrane and is primarily responsible for host cell receptor binding and host cell tropism. Additionally, due partly to its physical location in the virion, gp120 contains a number of recognition sites for various adaptive immune responses, including neutralizing antibodies (e.g., Goudsmit et al. 1988), helper T lymphocytes (Fenoglio et al. 2000), and cytotoxic T lymphocytes (Walker et al. 1986; Tsubota et al. 1989). Therefore, two potentially important positive selective forces acting on the env gene are changes in optimal host cell receptor affinity and evasion of host immune responses. The gp120 portion of env has been broadly categorized into five hypervariable regions (V1 to V5) with conserved regions interspersed (Modrow et al. 1987).
The action of natural selection on the env gene is evident from patterns of synonymous and nonsynonymous substitutions in env sequences. The rate of nonsynonymous substitution is greater than the rate of synonymous substitution in some regions of env (Bonhoeffer, Holmes, and Nowak 1995; Yamaguchi-Kabata and Gojobori 2000), which is a clear indication of positive selection. Estimates have been obtained for the relative frequency of adaptive mutation, the strength of positive selection, and the exact location of positively selected sites (Yamaguchi-Kabata and Gojobori 2000; Ross and Rodrigo 2002). Further, within infected individuals, the strength of selection and the relative frequency of adaptive mutation are positively associated with the time to disease progression (Ross and Rodrigo 2002). If the primary selective force acting on the env gene is evasion of immune responses, then this result implies that those patients who mount a broad and strong immune response to HIV are able to control the virus for a longer period of time.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
In the env gene of HIV, positive selection may contribute to high-frequency polymorphisms as well as substitutions. Frequency-dependent selection for rare mutations may be prevalent in genes that code for the targets of immune response (Nielsen 1999). Also, latently infected cells (e.g., CD4+ memory cells) may serve as a reservoir for the virus populationthat is, archaic virus genomes may circulate at low frequencies among the contemporary virus population (Pierson, McArthur, and Siliciano 2000; Müller, Vigueras-Gómez, and Bonhoeffer 2002; Kelly et al. in press). Therefore, even though a mutation is (at least initially) positively selected, it may never reach complete fixation. To estimate the number of adaptive mutations that have reached high frequency (greater than 50% but less than 100%) since initial infection, I use the same approach that is used for adaptive substitutions:
|
|
Analysis of Longitudinal Sequence Samples
I obtained env sequence data from several longitudinal studies of HIV-1 infection (Wolfs et al. 1990; Holmes et al. 1992; Strunnikova et al. 1995; Wolinsky et al. 1996; Ganeshan et al. 1997; Markham et al. 1998; Strunnikova et al. 1998; Shankarappa et al. 1999). Data sets are available from three nonoverlapping regions in the env gene that are approximately the same length: V1V2 (seven patients, 290 bp, average 22 clones screened/time point [Strunnikova et al. 1998]), C2V3 (43 patients,
300 bp, average 10 sequences/time point [Wolfs et al. 1990; Holmes et al. 1992; Strunnikova et al. 1995; Wolinsky et al. 1996; Ganeshan et al. 1997; Markham et al. 1998; Shankarappa et al. 1999]), and V4V5 (20 patients,
340 bp, average 12 sequences/time point [Wolinsky et al. 1996; Ganeshan et al. 1997; Shankarappa et al. 1999]).
It should be noted that the sequences from the V1V2 region were originally sampled by using a heteroduplex mobility assay to identify similarity groups, and then one clone from each similarity group was sequenced (Strunnikova et al. 1998). Also, the frequencies from each similarity group are no longer available (R. Viscidi, personal communication), so I treated each sequence as equally frequent. Therefore the samples were not truly random. Based on the sample sizes reported in the original study (Strunnikova et al. 1998), I repeated the adaptation rate analysis assuming a completely skewed frequency distributionthat is, one common similarity group in each sample, with the remaining groups represented only once. These results from analyses differed little from the analysis based on equal frequencies (data not shown).
For each infected individual, data sets were selected on the basis that the first sample was taken less than 3 years after seroconversion, and the last sample was taken at least 1 year after the first sample. The DNA sequences used were isolated from either plasma RNA or peripheral blood mononuclear cells. These data were combined. Shankarappa et al. (1999) isolated sequences from both sources at the same times in the same patients. They found that patterns of polymorphism and divergence were virtually identical for the two types of data.
Because sequence data are available over time, and because the env gene is genetically homogeneous early in infection, divergence (Dn and Ds) was measured relative to a "founding" ancestral sequence. This ancestral sequence was reconstructed as the consensus sequence of the first time point sampled in each patient. Also, at polymorphic sites, the ancestral sequence was used to determine which nucleotides are derived and which are ancestral. Samples from the last time point available in each patient were aligned with the ancestral sequence using the default parameters of ClustalX (Thompson et al. 1997), and then hand corrected. Nonsynonymous and synonymous changes were classified and counted using the SITES program (Hey and Wakeley 1997). Substitutions were not corrected for saturation. Using samples from intermediate time points, I screened for multiple substitutions in a few of the faster-evolving virus populations and found little evidence for saturation.
For each patient in the basic analyses, adaptation rates were estimated as the numbers of adaptive substitutions (ad), adaptive common polymorphisms (ap), or adaptive events (at) in the last time point, divided by the time between the first and last samples. For each patient in the detailed analyses, ad, ap, and at were first estimated at each sampled time point except the first one, and then regressed on time since the first sample to estimate adaptation rates.
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Adaptation Rates and Disease Progression
I investigated the relationship between disease progression rates and adaptation rates through a detailed analysis of a subset of the available longitudinal sequence data (nine patients [Shankarappa et al. 1999]). The longitudinal data sets from these nine patients were selected because samples were taken very frequently over the entire course of infection, sample sizes were relatively large (typically 10 to 20 sequences), and a large portion of the env gene was sequenced, including both the C2V3 and the V4V5 regions (i.e., the C2V5 region). Progression time was measured as the time between initial infection and the onset of clinical AIDS (CD4+ T-cell count below 200 cells/µl). For comparisons, patients were categorized into two groups based on their progression times: moderate progressors (progression time between 5 and 7 years; patients 1, 5, 6, 7, and 8) and slow progressors (progression time greater than 7 years; patients 2, 3, 9, and 11).
The numbers of adaptive substitutions (ad), common polymorphisms (ap), and adaptive events (at) as a function of time in each patient are shown in figure 1. The most striking pattern evident from this analysis is that slow progressors tend to have higher adaptation rates than the moderate progressors. This holds true for adaptive substitutions, common polymorphisms, and overall adaptive events. To investigate this pattern statistically, I estimated the different adaptation rates as the slope of a best-fit line for each class of adaptation in each patient. A comparison of the adaptation rates between slow and moderate progressors is shown in table 2. The rate of adaptive events is significantly higher in the slow progressors (Mann-Whitney U test, P < 0.05). The rate of adaptive substitutions and the accumulation rate of adaptive polymorphisms were also higher in slow progressors, but these differences were not statistically significant.
|
|
The Distribution of Adaptations over the env Gene
Because a large proportion of nonsynonymous substitutions and common polymorphisms are adaptive, we can make some generalizations about the spatial distribution of adaptations over the env gene. Of particular interest is whether most nonsynonymous changes are limited to the five hypervariable regions and whether common polymorphisms share the same distribution as substitutions. To determine the distributions of nonsynonymous substitutions and common polymorphisms, each sample was aligned with the NL4-3 genome (GenBank accession number M19921) as a reference, and the locations of the relevant nonsynonymous changes were recorded.
The distributions of nonsynonymous substitutions and common polymorphisms are shown in figure 2. The most striking pattern evident from these distributions is the remarkable lack of selective constraint in the V3 loop flanking regions and in the V3 loop itself. In the 100 codons of the C2V3 region, there are apparently no highly conserved regions. This result is especially noteworthy, considering that this region is largely responsible for initiating viral entry into the host cell. Further, the abundance of fixations and common polymorphisms in the regions flanking the V3 corroborates earlier results that suggested that adaptive changes may not be limited to the hypervariable regions (Nielsen and Yang 1998; Yamaguchi-Kabata and Gojobori 2000; Ross and Rodrigo 2002). Another interesting pattern is that, in the V1V2 region, the distribution of nonsynonymous substitutions is significantly different from the distribution of nonsynonymous common polymorphisms (G-test for goodness-of-fit, G = 10.96, df = 4, P < 0.05; observed changes were binned into five groups, each 50 base pairs long), with more substitutions occurring in the V1 loop and more common polymorphisms occurring in the V2 loop and flanking regions. The reasons for this difference are unclear. One possibility is that, assuming that immune-mediated, frequency-dependent selection is operating, the relationship between frequency and fitness may be variable across the V1V2 region.
|
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
One of the major unsolved problems in the study of HIV and related viruses is determining the relative importance of neutralizing antibodies, helper T lymphocytes, and cytotoxic T lymphocytes in controlling viral replication. The methods presented here, in combination with detailed epitope analyses of longitudinally sampled patients, could help elucidate the roles played by each of these three types of immune response. This would provide new insight into the mechanisms by which HIV populations overwhelm the immune system and lead to AIDS.
![]() |
Appendix |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To explore the bias inherent to this method, I evaluated the expectation of ad, ap, and at using a propagation of error analysis (Rice 1987). Let µdn = E(Dn), µds = E(Ds), etc. The expectation of ad is
|
|
|
|
|
Violations of the assumptions of the evolutionary model should make the estimates more conservative. I assumed that all nonsynonymous, rare polymorphisms are neutral. If in fact some of these polymorphisms are either beneficial or slightly deleterious, then Rn would overestimate the number of neutral, nonsynonymous, rare polymorphisms, and ad, ap, and at would be underestimates. Furthermore, I did not correct for saturation, so Dn and Cn will underestimate the actual numbers of nonsynonymous substitutions and common polymorphisms, respectively. This would also cause ad, ap, and at to be downwardly-biased. Considering these two factors, ad, ap, and at are probably conservative estimators.
The above analyses also assume free recombination. A violation of this assumption should not strongly bias the estimates, but if it does, it may make the estimates more conservative. First, note that the estimates will not be strongly affected by genetic hitchhikingthat is, the fixation of neutral variants tightly linked to and in association with beneficial mutations (Maynard Smith and Haigh 1974). This is because positive selection does not affect the expected substitution rate at linked neutral loci; it only affects the variance (Kelly 1994; Gillespie 2000). Hitchhiking will reduce the expected levels of polymorphism at linked neutral sites, but the effect should be proportional for nonsynonymous and synonymous sites. Thus, the effects on Rn and Rs should roughly cancel out. In summary, hitchhiking does not affect E(Dn), E(Ds), or the ratio E(Rn)/E(Rs), so it should not affect the adaptation rate estimates.
Furthermore, considering the dynamics of polymorphism and divergence in the case of limited recombination, the covariances in the above propagation of error analysis may cause ad (and ap and at) to be even more conservative. In the expression for E((DsRn)/Rs), the most prominent covariance is Cov(Ds, Rn) because it is only divided by µrs, rather than or
. This covariance should be negative, which would cause the free-recombination estimate of ad to be more conservative. To illustrate why this covariance should be negative, consider two independent realizations of the evolutionary process, starting with initial infection. In one realization, due to chance, the time back to the most recent common ancestor (MRCA) of all the sequences in the population is short. Therefore, levels of polymorphism will be relatively low. However, the time between initial infection and the MRCA will be long, so the number of substitutions will be relatively high. In the other realization, also due to chance, the time back to the MRCA might be longer, which would lead to higher levels of polymorphism but a shorter time between initial infection and the MRCA and correspondingly lower numbers of substitutions. Therefore, covariances between polymorphism (i.e., Rn) and divergence (i.e., Ds) statistics should be negative.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bonhoeffer, S., E. C. Holmes, and M. A. Nowak. 1995. Causes of HIV diversity. Nature 376:125.[CrossRef][ISI][Medline]
Bustamante, C. D., J. Wakeley, S. A. Sawyer, and D. L. Hartl. 2001. Directional selection and the site-frequency spectrum. Genetics 159:1779-1788.
Clark, S. J., M. S. Saag, W. D. Decker, S. Cambell-Hill, J. L. Roberson, P. J. Veldkamp, J. C. Kappes, B. H. Hahn, and G. M. Shaw. 1991. High titers of cytopathic virus in plasma of patients with syptomatic primary HIV-1 infection. N. Engl. J. Med. 324:954-960.[Abstract]
Fenoglio, D., G. Li Para, and L. Lozzi, et al. (11 co-authors). 2000. Natural analogue peptides of an HIV-1 gp120 T-helper epitope antagonize response of gp120-specific human CD4 T-cell clones. J. Acquir. Immune Defic. Syndr. 23:1-7.[ISI][Medline]
Fu, Y.-X. 2001. Estimating mutation rate and generation time from longitudinal samples of DNA sequences. Mol. Biol. Evol. 18:620-626.
Ganeshan, S., R. E. Dickover, B. T. M. Korber, Y. J. Bryson, and S. M. Wolinsky. 1997. Human immunodeficiency virus type 1 genetic evolution in children with different rates of development of disease. J. Virol. 71:663-677.[Abstract]
Gillespie, J. H. 2000. Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155:909-919.
Goudsmit, J., C. Debrouck, R. H. Meloen, L. Smit, M. Bakker, D. M. Asher, A. Wolff, C. J. Gibbs, and D. C. Gajdusek. 1988. Human immunodeficiency virus type 1 neutralization epitope with conserved architecture elicits early type-specific antibodies in experimentally infected chimpanzees. Proc. Natl. Acad. Sci. USA 85:4478-4482.[Abstract]
Hey, J., and J. Wakeley. 1997. A coalescent estimator of the population recombination rate. Genetics 145:833-846.
Ho, D. D., A. U. Neumann, A. S. Perelson, W. Chen, J. M. Leonard, and M. Markowitz. 1995. Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature 373:123-126.[CrossRef][ISI][Medline]
Holmes, E. C., L. Q. Zhang, P. Simmonds, C. A. Ludlam, and A. J. Brown. 1992. Convergent and divergent sequence evolution in the surface envelope glycoprotein of human immunodeficiency virus type 1 within a single infected patient. Proc. Natl. Acad. Sci. USA 89:4835-4839.[Abstract]
Kelly, J. K. 1994. An application of population genetic theory to synonymous gene sequence evolution in the human immunodeficiency virus (HIV). Genet. Res. 64:1-9.[ISI][Medline]
Kelly, J. K., S. Williamson, M. E. Orive, M. S. Smith, and R. D. Holt. 2003. Linking ecological and genetic models of intra-host viral dynamics. I. Infection of multiple cell types. Am. Nat. (in press).
Markham, R. B., W.-C. Wang, and A. E. Weisstein, et al. (11 co-authors). 1998. Patterns of HIV-1 evolution in individuals with differing rates of CD4 T cell decline. Proc. Natl. Acad. Sci. USA 95:12568-12573.
Maynard Smith, J., and J. Haigh. 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23:23-35.[ISI][Medline]
McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652-654.[CrossRef][ISI][Medline]
Modrow, S., B. H. Hahn, G. M. Shaw, R. C. Gallo, F. Wong-Staal, and H. Wolf. 1987. Computer-assisted analysis of envelope protein sequences of seven human immunodeficiency virus isolates: prediction of antigenic epitopes in conserved and variable regions. J. Virol. 61:570-578.[ISI][Medline]
Müller, V., J. F. Vigueras-Gómez, and S. Bonhoeffer. 2002. Decelerating decay of latently infected cells during prolonged therapy for human immunodeficiency virus type 1 infection. J. Virol. 76:963-8965.[CrossRef]
Nielsen, R. 1999. Changes in ds/dn in the HIV-1 env gene. Mol. Biol. Evol. 16:711-714.
Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929-936.
Nowak, M. A., R. M. Anderson, A. R. McLean, T. F. Wolfs, J. Goudsmit, and R. M. May. 1991. Antigenic diversity thresholds and the development of AIDS. Science 254:963-969.[ISI][Medline]
Pantaleo, G., C. Graziosi, and A. S. Fauci. 1993. New concepts in the immunopathogenesis of human immunodeficiency virus infection. N. Engl. J. Med. 328:327-335.
Perelson, A. S., A. U. NeuMann, M. Markowitz, J. M. Leonard, and D. D. Ho. 1996. HIV-1 dynamics in vivo: virion clearance rate, infected cell life-span, and viral generation time. Science 271:1582-1586.[Abstract]
Pierson, T., J. McArthur, and R. F. Siliciano. 2000. Reservoirs for HIV-1: mechanisms for viral persistence in the presence of antiviral immune responses and antiretroviral therapy. Annu. Rev. Immunol. 18:665-708.[CrossRef][ISI][Medline]
Rice, J. A. 1987. Mathematical statistics and data analysis. Wadsworth, Pacific Grove, Calif.
Rodrigo, A. G., E. G. Shaper, E. L. Delwart, A. K. Iversen, M. V. Gallo, J. Brojatsch, M. S. Hirsch, B. D. Walker, and J. I. Mullins. 1999. Coalescent estimates of HIV-1 generation time in vivo. Proc. Natl. Acad. Sci. USA 96:2187-2191.
Ross, H. A., and A. G. Rodrigo. 2002. Immune-mediated positive selection drives human immunodeficiency virus type 1 molecular variation and predicts disease duration. J. Virol. 76:11715-11720.
Sawyer, S. A., and D. L. Hartl. 1992. Population genetics of polymorphism and divergence. Genetics 132:1161-1176.
Seo, T.-K, J. L. Thorne, M. Hasegawa, and H. Kishino. 2002. Estimation of effective population size of HIV-1 within a host: a pseudomaximum-likelihood approach. Genetics 160:1283-1293.
Shankarappa, R., J. B. Margolick, and S. J. Gange, et al. (12 co-authors). 1999. Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J. Virol. 73:10489-10502.
Smith, N. G. C., and A. Eyre-Walker. 2002. Adaptive protein evolution in Drosophila. Nature 415:1022-1024.[CrossRef][ISI][Medline]
Strunnikova, N., S. C. Ray, C. Lancioni, M. Nguyen, and R. P. Viscidi. 1998. Evolution of human immunodeficiency virus type 1 in relation to disease progression in children. J. Hum. Virol. 1:224-239.[Medline]
Strunnikova, N., S. C. Ray, R. A. Livingston, E. Rubalcaba, and R. P. Viscidi. 1995. Convergent evolution within the V3 loop domain of human immunodeficiency virus type 1 in association with disease progression. J. Virol. 69:7548-7558.[Abstract]
Tersmette, M., R. A. Gruters, F. de Wolf, R. E. de Goede, J. M. Lange, P. T. Schellekens, J. Goudsmit, H. G. Huisman, and F. Miedema. 1989. Evidence for a role of virulent human immunodeficiency virus (HIV) variants in the pathogenesis of acquired immunodeficiency syndrome: studies on sequential HIV isolates. J. Virol. 63:2118-2125.[ISI][Medline]
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:4876-4882.[CrossRef]
Tsubota, H., C. I. Lord, D. I. Watkins, C. Morimoto, and N. L. Letvin. 1989. A cytotoxic T lymphocyte inhibits acquired immunodeficiency syndrome virus replication in peripheral blood lymphocytes. J. Exp. Med. 169:1421-1434.[Abstract]
Viscidi, R. P. 1999. HIV evolution and disease progression via longitudinal studies. Pp. 346389 in K. A. Crandall, ed. The evolution of HIV. John Hopkins, Baltimore.
Walker, C. M., D. J. Moody, D. P. Stites, and J. A. Levy. 1986. CD8+ lymphocytes can control HIV infection in vitro by suppressing virus replication. Science 234:1563-1566.[ISI][Medline]
Wodarz, D., P. Klenerman, and M. A. Nowak. 1998. Dynamics of cytotoxic T-lymphocyte exhaustion. Proc. R. Soc. Lond. B Biol. Sci. 265:191-203.[CrossRef][ISI][Medline]
Wolfs, T. F., J. J. de Jong, H. Van den Berg, J. M. Tijnagel, W. J. Krone, and J. Goudsmit. 1990. Evolution of sequences encoding the principal neutralization epitope of human immunodeficiency virus 1 is host-dependent, rapid, and continuous. Proc. Natl. Acad. Sci. USA 87:9938-9942.[Abstract]
Wolinsky, S. M., B. T. M. Korber, and A. U. Neumann, et al. (11 co-authors). 1996. Adaptive evolution of human immunodeficiency virus-type 1 during the natural course of infection. Science 272:537-542.[Abstract]
Wolinsky, S. M., and G. H. Learn. 1999. Levels of diversity within and among host individuals. Pp. 275314 in K. A. Crandall, ed. The evolution of HIV. John Hopkins, Baltimore.
Yamaguchi-Kabata, Y., and T. Gojobori. 2000. Reevaluation of amino acid variability of the human immunodeficiency virus type 1 gp120 envelope glycoprotein and prediction of new discontinuous epitopes. J. Virol. 74:4335-4350.
Zanotto, P. M. A., E. G. Kallas, R. F. de Souza, and E. C. Holmes. 1999. Genealogical evidence for positive selection in the nef gene of HIV-1. Genetics 153:1077-1089.