In vivo characteristics of human immunodeficiency virus type 1 intersubtype recombination: determination of hot spots and correlation with sequence similarity

Gkikas Magiorkinis1, Dimitrios Paraskevis1, Anne-Mieke Vandamme2, Emmanouil Magiorkinis1, Vana Sypsa1 and Angelos Hatzakis1

1 National Retrovirus Reference Center, Department of Hygiene and Epidemiology, Athens University Medical School, Mikras Asias 75, 11527 Athens, Greece
2 Rega Institute for Medical Research, Katholieke Universiteit Leuven, Leuven, Belgium

Correspondence
A. Hatzakis
ahatzak{at}cc.uoa.gr


   ABSTRACT
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Recombination plays a pivotal role in the evolutionary process of many different virus species, including retroviruses. Analysis of all human immunodeficiency virus type 1 (HIV-1) intersubtype recombinants revealed that they are more complex than described initially. Recombination frequency is higher within certain genomic regions, such as partial reverse transcriptase (RT), vif/vpr, the first exons of tat/rev, vpu and gp41. A direct correlation was observed between recombination frequency and sequence similarity across the HIV-1 genome, indicating that sufficient sequence similarity is required upstream of the recombination breakpoint. This finding suggests that recombination in vivo may occur preferentially during reverse transcription through the strand displacement-assimilation model rather than the copy-choice model.


   INTRODUCTION
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Human immunodeficiency virus type 1 (HIV-1) belongs to the family Retroviridae and rates among the highest of variable human pathogens (Wain-Hobson, 1993). The extensive genetic diversity of the virus is caused by several factors, such as (1) the high misincorporation rate of reverse transcriptase (RT) in the absence of any proof-reading mechanisms (Dougherty & Temin, 1988; Preston et al., 1988; Roberts et al., 1988), (2) high virus production rate (Ho et al., 1995) and (3) homologous recombination (Robertson et al., 1995a, b).

Retrovirus recombination was first observed in avian tumour viruses (Vogt, 1971) and subsequently in other retroviruses (Clavel et al., 1989; Wong & McCarter, 1973). Recombination is thought to occur during the reverse transcription of the genomic RNA to the formation of double-stranded DNA (see Fig. 5A), providing the main strategy of retroviruses for exchanging genetic information between two heterogeneous RNAs and, thus, for tremendous genetic alterations within the viral genome. A prerequisite for recombination between two genetically diverse RNA molecules is their co-packaging in the same virion (Hu & Temin, 1990a, b).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 5. Reverse transcription and possible models for homologous retrovirus recombination. The RNA and the growing DNA chain are shown in thick and thin lines, respectively. Single arrows indicate molecular jumps of the RT, double horizontal arrows indicate the progression of DNA synthesis and triple arrows indicate sequential steps in time. Degradation of RNA by RNase-H activity is shown in dashed lines. (A) Description of the reverse transcription process. (B) Description of the copy-choice model for HIV-1 recombination. Donor and acceptor RNA molecules are shown in grey and black, respectively. The single vertical arrow indicates template switches during minus-strand DNA synthesis. (C) Description of the strand displacement-assimilation model for HIV-1 recombination. The two different minus-strand DNA molecules synthesized as well as the growing plus-strand DNA chains are shown in grey and black.

 
Two different models have been proposed for the mechanism of retrovirus recombination (see Fig. 5B, C): the forced copy-choice model (Coffin, 1979; Vogt, 1971) and the strand displacement-assimilation model (Boone & Skalka, 1981a, b; Junghans et al., 1982). The forced copy-choice model proposes that template switching is driven mainly by breaks in the viral RNA, thus forcing the RT to switch to the other copy of the genomic RNA. According to this model, recombination occurs during the synthesis of minus-strand DNA. The various models proposed for strand transfer during minus-strand synthesis, and due not only to RNA breaks, have been known in general as ‘copy-choice’ (Vogt, 1971). In contrast, the strand displacement-assimilation model proposes that recombination occurs during the synthesis of plus-strand DNA, when, in one of the two ongoing second-strand cDNA synthesis complexes, an internally initiated DNA fragment is displaced by an upstream, growing DNA fragment, becoming free to hybridize and to continue second-strand synthesis from the other cDNA synthesis complex.

The impact of recombination on the evolution of HIV-1 was documented recently, showing that at least 10 % of HIV-1 strains circulating comprise intersubtype recombinants (Kuiken et al., 2000; Robertson et al., 1995a, b). Furthermore, a significant proportion of the HIV-1 sequences that have been characterized initially by partial sequencing as non-recombinants were found to comprise intersubtype recombinants after analysis of their complete sequences (Kuiken et al., 2000). Three different classes of homologous recombination have been documented to occur in HIV-1: between strains of the same subtype (intrasubtype), between strains of different subtype (intersubtype) and between strains of different group (intergroup). In this study, 34 available full-length HIV-1 intersubtype mosaics were re-analysed to determine the in vivo properties of HIV-1 intersubtype recombination.


   METHODS
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
HIV-1 isolates.
A total of 34 full-length HIV-1 intersubtype recombinants, including all known HIV-1 ‘pure’ subtypes (A–D, F–H, J and K) (Kuiken et al., 2000), available at the HIV database (http://hiv-web.lanl.hiv.gov) were re-analysed. Circulating recombinant form isolates were analysed only once. The accession numbers for the reference strains used were: M62320 (subtype A); M12508 (subtype B); U46016 (subtype C); K03454 (subtype D); AF005494 (subtype F1); U88826 (subtype G); AF005496 (subtype H); AF082395 (subtype J) and AJ249239 (subtype K).

Determination of recombination patterns.
DNA sequence alignments were performed using CLUSTAL_W, version 1.81 (Thompson et al., 1994). The recombination pattern of each sequence was resolved by bootscanning plots, as implemented in the SIMPLOT software, version 2.5 (Ray, 1998), and confirmed further by phylogenetic analysis using the neighbour-joining method (NJ) (Saitou & Nei, 1987) with Kimura's two-parameter correction (Kimura, 1980), as implemented in PHYLIP, version 3.5c (Felsenstein, 1993) (see supplementary Fig. 1, available at http://vir.sgmjournals.org). Bootstrap analysis (100 replicates) was used to estimate the reliability of the constructed trees. Phylogenetic analysis was accomplished using the maximum-likelihood model (ML) with the Tamura–Nei evolutionary model (TrN), including {gamma}-distributed rates heterogeneity among sites, as implemented in TREEPUZZLE, version 5.0.pl6 (Schmidt et al., 2002). TrN was chosen as the best-fitting nucleotide substitution model in several pieces of the HIV-1 genome, according to the ML ratio test using the MODELTEST (Posada & Crandall, 1998) and PAUP*, version 4.0b10, programs (Swofford, 1998). To search for any potential relationships between the unclassified regions of the recombinant sequences and any HIV-1 sequences characterized previously, a BLAST search was performed using the default settings (http://www.ncbi.nlm.nih.gov/BLAST/). Phylogenetic analysis was then used to confirm the similarities obtained by BLAST.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 1. Frequency of recombination breakpoints (black) and extent of sequence similarity (grey) across the HIV-1 genome for a sliding window of 1000 bp.

 
Frequency of recombination breakpoints.
According to the recombination pattern of each individual HIV-1 sequence, a database was created, including the co-ordinates of the breakpoints of all of the mosaic sequences. Based on this database, the frequency of recombination breakpoints adjusted per 1000 nucleotides was calculated along the HIV-1 genome. To determine thoroughly the recombination breakpoint frequency across the HIV-1 genome, a program termed BREAKSCAN was developed (available from the authors upon request). BREAKSCAN was developed to plot the frequency of recombination breakpoints in a sliding window along the genome. This was done by counting the total number of breakpoints within the window over all of the 34 recombinant strains and dividing by the total number of the recombinant sequences (n=34). Running BREAKSCAN, the frequency of recombination breakpoints was plotted in a sliding window of 1000 bp moving in steps of 50 bp (Fig. 1).

Independency of breakpoints.
A small number of recombination breakpoints may have been inherited by the same progeny and, consequently, there might be a perturbation of their independency. To examine whether recombination events may share an immediate common ancestor, we identified all breakpoints localized within a similar region of the alignment (±100 nt from the breakpoint position) that shared a common evolutionary history (HIV-1 subtype) in the adjacent regions. In the event that recombination events shared an immediate common ancestor, they should then cluster together in the region containing the breakpoint. Thus, to identify any potentially linked recombination events, we performed phylogenetic analysis using ML for all pairs of fragments (400 nt in length); these fragments included a breakpoint and shared a common HIV-1 subtype classification in the adjacent regions. Any recombination events found to share a common evolutionary history were excluded from all subsequent analyses.

Determination of sequence similarity across the HIV-1 genome.
Sequence similarity across the HIV-1 genome was calculated by a newly developed program termed SIMSCAN (available from the authors upon request). SIMSCAN was designed to calculate the mean similarity (measured as the proportion of sites that are identical between two sequences) between different sequences in a sliding window moving along the genome. The mean intersubtype sequence similarity was determined by averaging all pairwise intersubtype similarities of nine sequences belonging to all HIV-1 ‘pure’ subtypes characterized previously (subtypes A–D, F–H, J and K). Sequence similarity was plotted for a moving window of 1000 bp in steps of 50 bp using the SIMSCAN program and compared with the recombination breakpoint frequency in the same figure. In a similar way, we compared the recombination breakpoint frequency with the mean intersubtype evolutionary distances corrected by TrN with a {gamma}-distribution rates heterogeneity among sites, as estimated in TREEPUZZLE.

Statistics.
The association between sequence similarity and recombination breakpoint frequency was modelled using a linear regression model and the coefficient of determination (r2) was estimated. To detect a possible phase difference between the two series, this procedure was repeated by moving the similarity plot either upstream or downstream in steps of 50 bp over a range of 1000 bp relative to the recombination frequency plot and a maximum r2 value was obtained. To examine whether the recombination breakpoints are more clustered than would be expected by chance, we compared the observed versus the expected number of breakpoints in the major gene products, where the expected number of breakpoints per gene region was estimated by averaging the total number of breakpoints per sequence length in the alignment using the chi-squared test.

Simulations.
To test for the statistical significance of the association between the sequence similarity and the recombination breakpoint frequency of the maximum r2 value and to validate the statistical significance of the difference between the observed and the randomly derived frequency of recombination breakpoints, recombination was simulated using a uniform distribution of breakpoints and the distributions of the maximum r2 value. The chi-squared values obtained from the simulated datasets were used as the null. More specifically, the positions of the recombination breakpoints were simulated using a Monte Carlo-based algorithm according to the following assumption: the positions of the breakpoints were distributed uniformly along the genome and, more specifically, within positions 400–7800 of the alignment. Recombination breakpoints were simulated, for simplicity, at positions multiple of 50 nt, since the accuracy for detection of the breakpoints is below 50 nt.


   RESULTS
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Our re-analysis revealed that approximately half of the sequences are more complex recombinants than described initially. The overall number of breakpoints was found to be 247 with a median (range) of 7 (2–15). An intersubtype recombination breakpoint was defined as a region of approximately 100–200 nt in length in which the HIV-1 subtype classification was significantly different (supported by >75 % NJ bootstrapping and >70 % ML puzzling steps) in the adjacent pieces. Moreover, we identified 14 breakpoints within regions that shared an immediate common ancestor, thus corresponding to non-independent recombination events excluded from all subsequent analyses. The results of this re-analysis provided an updated dataset of the 233 recombination breakpoint positions based on all intersubtype HIV-1 recombinants available to date.

First, the frequency of recombination breakpoints, adjusted per 1000 nt, was plotted along the HIV-1 genome (Fig. 2). According to Fig. 2, there is only a twofold fluctuation of the breakpoint frequency throughout the HIV-1 genome, e.g. within the pol gene, the frequency of breakpoints was twofold higher than that in gag or env. In contrast, where a more detailed approach was used allowing the mapping of the recombination breakpoints within a sliding window of 1000 bp (BREAKSCAN), the frequency of intersubtype recombination was found to differ considerably (fivefold) within certain genomic regions (Jetzt et al., 2000). Recombination frequency peaks were found in RT, in the accessory genes vif and vpr, in the first exons of tat/rev, in vpu and in gp41, indicating that they provide intersubtype recombination hot spots. Conversely, in gp120 and in p17/p24 (gag), the frequency of recombination was the lowest throughout the complete HIV-1 genome. To examine whether the recombination breakpoints are more clustered than expected by chance, we compared the observed versus the expected number of breakpoints in the major gene products. The result of this analysis revealed that the distribution of breakpoints observed was significantly different from the expected one (P<0·002), suggesting that intersubtype recombination does not occur randomly across the HIV-1 genome.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 2. Distribution of recombination breakpoints in different HIV-1 genes: gag, pol, accessory genes 1 (vif/vpr, vpu and the first exons of tat and rev), env, accessory genes 2 (second exons of tat and rev) and nef. The average recombination frequency in different genes was calculated by adding the total number of recombination breakpoints for each genomic region and dividing by the total number of sequences analysed (n=34). The adjusted recombination frequency was calculated by dividing the average frequency by the total length of each genomic region and multiplying with 1000 bp in steps of 50 bp.

 
The shape of the recombination breakpoint frequency plot prompted us to examine for any potential association with sequence similarity within the HIV-1 genome. Comparison of results obtained from the BREAKSCAN and SIMSCAN programs in a single plot (Fig. 1) indicated that there is a correlation between the outlines of the two curves, suggesting that recombination occurs more frequently adjacent to genomic regions with higher conservation. Furthermore, Fig. 1 suggests that there is a ‘phase difference’ between the two plots and, more specifically, the similarity peaks appear upstream of the recombination hot spots. To resolve the phase difference between sequence similarity and recombination breakpoint frequency, the coefficient of determination (r2) between the two was determined by moving the similarity plot upstream or downstream in steps of 50 bp over a range of 1000 bp relative to the recombination frequency plot (Fig. 3A). The coefficient of determination was highest (rmax2=0·69) for a phase difference of 700 bp upstream to the recombination frequency plot (Fig. 3A). The coefficient of determination was similar (rmax2=0·67) when the evolutionary distances using TrN with a {gamma}-distribution rates heterogeneity among sites was used instead of the uncorrected sequence similarity.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 3. (A) Phase difference between recombination breakpoint frequency and sequence similarity. The coefficient of determination (r2) between the recombination breakpoint frequency and sequence similarity is plotted against the phase difference between the two, calculated in steps of 50 bp with a maximum phase difference of 1000 bp. The window sizes for either the recombination frequency or the similarity plot were 1000 bp. (B) Coefficient of determination (r2) between the recombination breakpoint frequency and the similarity plot when the window size varied from 800 to 2000 bp in steps of 100 bp for the recombination frequency. (C) Coefficient of determination (r2) between the recombination breakpoint frequency and the similarity plot when the window size varied from 400 to 1200 bp in steps of 100 bp for the similarity plot.

 
To examine whether this phase difference was affected by the window size of the similarity or the recombination frequency plots, the correlation coefficient between them was determined for different window sizes (Fig. 3B, C). In particular, similar results for the phase difference were obtained when the window size varied from 800 to 2000 and from 400 to 1200 bp in steps of 100 bp for recombination frequency and similarity plots, respectively, in all possible combinations (Fig. 3). The minimum window size of the recombination frequency plot used in this analysis was set to 800 bp, estimated according to Sturges' rule (Sturges, 1926). Furthermore, the minimum window size for the similarity plot was set to 400 bp, considering that such a fragment contains enough phylogenetic signal throughout the HIV-1 genome. According to this analysis, the correlation between the recombination frequency and similarity plots was maximized only for a phase difference of approximately 650 bp upstream to the recombination frequency and it was not affected by the window size.

To test for the statistical significance of the estimated rmax2 value (rmax2=0·69), we estimated the distribution of the coefficient of determination (r2) between the similarity and the frequency of recombination under the assumption that the positions of the breakpoints are distributed uniformly across the genome. More specifically, the location of the breakpoints was simulated in 1000 replicates of 34 sequences and then the correlation between recombination frequency and similarity was calculated for each dataset using a sliding window of 1000 nt, in the same way as in the original data. The value obtained for the real dataset (r2=0·69) was found to lie at the extreme 5 % of the distribution of the 1000 maximized coefficients of determination, indicating that there is a statistically significant association between the frequency of recombination breakpoints and sequence similarity (Fig. 4).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 4. Distribution of 1000 simulated coefficients of determination between the frequency of recombination breakpoints and sequence similarity obtained under the assumption that breakpoints are distributed uniformly along the genome. Black bars indicate the extreme 5 % of the distribution.

 

   DISCUSSION
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Our re-analysis revealed that intersubtype recombination in HIV-1 is not a random process but occurs more or less frequently in certain genomic regions. The identification of intersubtype recombination hot spots was based on the analysis of all full-length HIV-1 intersubtype recombinants using a sensitive method, such as the sliding window approach, which allows monitoring for any potential differences in the recombination frequency in every single partial genomic region of the HIV-1 genome.

The identification of recombination hot spots may be important for the development of retrovirus constructs used as reagents for HIV-1 vaccine development (Kuiken et al., 2000). Additionally, hot spots render them the most appropriate regions for detecting HIV-1 intersubtype recombinants and thus facilitating the HIV-1 molecular epidemiological studies based on the analysis of partial genomic regions. Furthermore, in genomic regions where recombination occurs less frequently, the assumption that virus divergence is dominated by point mutations is not violated, thus rendering them most appropriate for phylogenetic analyses and molecular clock calculations. The existence of certain hot spots within different genomic regions could be explained possibly by any structural features of the RT, which causes template jumping, or any potential functional constraints between different genomic regions belonging to diverse HIV-1 subtypes (Gao et al., 1996; Paraskevis et al., 2000; Worobey & Holmes, 1999).

On the other hand, based on our analysis, sequence similarity is of vital importance for intersubtype recombination events, as has been considered previously (Worobey & Holmes, 1999). It is of particular interest that in C2–V3 region of env, which is the most variable region of the HIV-1 genome (Kuiken et al., 2000), no recombination breakpoints were observed. Furthermore, the importance of sequence similarity in the efficiency of recombination has been demonstrated previously in vitro, where the rate of recombination between two non-homologous RNAs was 100 to 1000 times less than that between two entirely homologous sequences (Zhang & Temin, 1993). Thus, in accordance with previous findings about the correlation between sequence similarity and recombination, we provide direct evidence based on data observed in vivo: first that sequence similarity drives template jumping in retrovirus homologous recombination and second that sequence dissimilarity may be the predominant constraint in recombination events between quite divergent strains. According to our analysis, there might be a similarity threshold for template switching and recombination. In the case of HIV-1 intersubtype recombination, where there is considerable divergence between different lineages, recombination occurs more frequently in the most well-conserved regions, such as in pol. These findings are in accordance with recent findings from in vitro studies (Iglesias-Sanchez & Lopez-Galindez, 2002).

Although different parameters such as selection, protein functionality and epidemiological reasons may have played a role in the successful spread of the HIV-1 intersubtype recombinants detected until now, we provide evidence that sequence similarity provides a major constraint in the process of the generation of HIV-1 recombinants. This means that this particular association between sequence similarity and recombination may affect the formation of mosaic strains before other forces such as selection or virulence play their role in the spread of newly generated recombinants.

On the other hand, the expected outline of recombination frequency should be quite different if the evolutionary selection was the main driving force: recombination should be more frequent adjacent to divergent regions, such as env, and less frequent at conserved regions, such as the accessory genes and pol/RT (see supplementary Fig. 2, available online at http://vir.sgmjournals.org), resulting in negative correlation between sequence similarity and recombination frequency. While point mutations do happen randomly because of the fortuity of RT errors, they are strongly selected by evolutionary forces, resulting in more conserved regions, such as pol, and less conserved regions, such as env. Surprisingly, recombination does not seem to be affected by evolution in the same way and the observed pattern must be affected by another, less obvious, underlying mechanism.

Moreover, our analysis revealed a newly identified phase difference of approximately 650 nt between sequence similarity and recombination frequency. According to the models proposed for retrovirus recombination, hybridization between the nascent DNA and the acceptor molecule is required for priming an efficient continuation of DNA synthesis (Hu & Temin, 1990b; Negroni & Buc, 2001). Thus, if sequence similarity is important in this process, it will appear as a phase shift with respect to the recombination breakpoint: downstream if recombination occurs during first-strand cDNA synthesis and upstream when during second (or plus)-strand synthesis (Fig. 5). Our observations are thus more supportive of the strand displacement-assimilation model to occur in vivo.

Previous in vitro experiments documented that retrovirus recombination may occur during both plus- and minus-strand DNA synthesis (Hu & Temin, 1990a, b; Junghans et al., 1982), while it is strongly believed to occur mainly during minus-strand DNA synthesis through the copy-choice model (Jetzt et al., 2000; Negroni & Buc, 2001). It is interesting that since the first observation of the DNA-H structures through electron microscopy and the phrasing of the strand displacement-assimilation model, no other evidence has ever supported this model as predominant in any other retrovirus (Negroni & Buc, 2001). On the contrary, our analysis provides evidence that strand displacement-assimilation might play a significant role in intersubtype recombination, although further study is required to assess the potential of the strand displacement-assimilation model in the generation of HIV-1 intersubtype recombinants.

Recombination is influential for retrovirus evolution and it has been estimated that for HIV-1, recombination occurs, on average, more frequently than the mutation accumulation rate (Jetzt et al., 2000; Negroni & Buc, 2001). The evolutionary advantages of homologous recombination are well established (Burke, 1997). In depth analysis reveals that recombination rises as a primitive form of sexual reproduction, a type of amphigony (Temin, 1991). Therefore, it may serve as an emergency exit from the accumulation of deleterious mutations known as ‘Muller's ratchet’ (Chao, 1990). Furthermore, it gives the potentiality of ‘evolutionary broad jumping’ between peaks of fitness; it allows more efficient exploration of the sequence space and increases genetic diversity.

In brief, our findings indicated that there are certain intersubtype recombination hot spots across the HIV-1 genome and that sequence similarity plays a pivotal role in recombination events between different subtypes, indicating that sufficient sequence similarity is required upstream of the recombination breakpoint and suggesting that the strand displacement-assimilation model might provide the dominant model for HIV-1 intersubtype recombination.


   REFERENCES
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Boone, L. R. & Skalka, A. M. (1981a). Viral DNA synthesized in vitro by avian retrovirus particles permeabilized with melittin. I. Kinetics of synthesis and size of minus- and plus-strand transcripts. J Virol 37, 109–116.[Medline]

Boone, L. R. & Skalka, A. M. (1981b). Viral DNA synthesized in vitro by avian retrovirus particles permeabilized with melittin. II. Evidence for a strand displacement mechanism in plus-strand synthesis. J Virol 37, 117–126.[Medline]

Burke, D. S. (1997). Recombination in HIV: an important viral evolutionary strategy. Emerg Infect Dis 3, 253–259.[Medline]

Chao, L. (1990). Fitness of RNA virus decreased by Muller's ratchet. Nature 348, 454–455.[CrossRef][Medline]

Clavel, F., Hoggan, M. D., Willey, R. L., Strebel, K., Martin, M. A. & Repaske, R. (1989). Genetic recombination of human immunodeficiency virus. J Virol 63, 1455–1459.[Medline]

Coffin, J. M. (1979). Structure, replication, and recombination of retrovirus genomes: some unifying hypotheses. J Gen Virol 42, 1–26.[Medline]

Dougherty, J. P. & Temin, H. M. (1988). Determination of the rate of base-pair substitution and insertion mutations in retrovirus replication. J Virol 62, 2817–2822.[Medline]

Felsenstein, J. (1993). PHYLIP: Phylogeny Inference Package. Department of Genetics, University of Washington, Seattle, WA, USA.

Gao, F., Robertson, D. L., Morrison, S. G. & 8 other authors (1996). The heterosexual human immunodeficiency virus type 1 epidemic in Thailand is caused by an intersubtype (A/E) recombinant of African origin. J Virol 70, 7013–7029.[Abstract]

Ho, D. D., Neumann, A. U., Perelson, A. S., Chen, W., Leonard, J. M. & Markowitz, M. (1995). Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature 373, 123–126.[CrossRef][Medline]

Hu, W. S. & Temin, H. M. (1990a). Genetic consequences of packaging two RNA genomes in one retroviral particle: pseudodiploidy and high rate of genetic recombination. Proc Natl Acad Sci U S A 87, 1556–1560.[Abstract]

Hu, W. S. & Temin, H. M. (1990b). Retroviral recombination and reverse transcription. Science 250, 1227–1233.[Medline]

Iglesias-Sanchez, M. J. & Lopez-Galindez, C. (2002). Analysis, quantification, and evolutionary consequences of HIV-1 in vitro recombination. Virology 304, 392–402.[CrossRef][Medline]

Jetzt, A. E., Yu, H., Klarmann, G. J., Ron, Y., Preston, B. D. & Dougherty, J. P. (2000). High rate of recombination throughout the human immunodeficiency virus type 1 genome. J Virol 74, 1234–1240.[Abstract/Free Full Text]

Junghans, R. P., Boone, L. R. & Skalka, A. M. (1982). Retroviral DNA H structures: displacement-assimilation model of recombination. Cell 30, 53–62.[Medline]

Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16, 111–120.[Medline]

Kuiken, C., Foley, B., Hahn, B., Marx, P., McCutchan, F., Mellors, J. W., Mullins, J., Wolinksy, S. & Korber, B. (2000). HIV Sequence Compendium 2000. Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA.

Negroni, M. & Buc, H. (2001). Mechanisms of retroviral recombination. Annu Rev Genet 35, 275–302.[CrossRef][Medline]

Paraskevis, D., Magiorkinis, M., Paparizos, V., Pavlakis, G. N. & Hatzakis, A. (2000). Molecular characterization of a recombinant HIV type 1 isolate (A/G/E/?): unidentified regions may be derived from parental subtype E sequences. AIDS Res Hum Retroviruses 16, 845–855.[CrossRef][Medline]

Posada, D. & Crandall, K. A. (1998). MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817–818.[Abstract]

Preston, B. D., Poiesz, B. J. & Loeb, L. A. (1988). Fidelity of HIV-1 reverse transcriptase. Science 242, 1168–1171.[Medline]

Ray, S. C. (1998). SIMPLOT. Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, USA.

Roberts, J. D., Bebenek, K. & Kunkel, T. A. (1988). The accuracy of reverse transcriptase from HIV-1. Science 242, 1171–1173.[Medline]

Robertson, D. L., Hahn, B. H. & Sharp, P. M. (1995a). Recombination in AIDS viruses. J Mol Evol 40, 249–259.[Medline]

Robertson, D. L., Sharp, P. M., McCutchan, F. E. & Hahn, B. H. (1995b). Recombination in HIV-1. Nature 374, 124–126.[Medline]

Saitou, N. & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406–425.[Abstract]

Schmidt, H. A., Strimmer, K., Vingron, M. & von Haeseler, A. (2002). TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502–504.[Abstract/Free Full Text]

Sturges, H. (1926). The choice of a class-interval. JASA 21, 65–66.

Swofford, D. L. (1998). PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods), 4 edn. Sinauer Associates, Sunderland, MA, USA.

Temin, H. M. (1991). Sex and recombination in retroviruses. Trends Genet 7, 71–74.[Medline]

Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–4680.[Abstract]

Vogt, P. K. (1971). Genetically stable reassortment of markers during mixed infection with avian tumor viruses. Virology 46, 947–952.[Medline]

Wain-Hobson, S. (1993). The fastest genome evolution ever described: HIV variation in situ. Curr Opin Genet Dev 3, 878–883.[Medline]

Wong, P. K. & McCarter, J. A. (1973). Genetic studies of temperature-sensitive mutants of Moloney-murine leukemia virus. Virology 53, 319–326.[CrossRef][Medline]

Worobey, M. & Holmes, E. C. (1999). Evolutionary aspects of recombination in RNA viruses. J Gen Virol 80, 2535–2543.[Free Full Text]

Zhang, J. & Temin, H. M. (1993). Rate and mechanism of nonhomologous recombination during a single cycle of retroviral replication. Science 259, 234–238.[Medline]

Received 20 February 2003; accepted 23 June 2003.