1 National Retrovirus Reference Center, Department of Hygiene and Epidemiology, Athens University Medical School, Mikras Asias 75, 11527 Athens, Greece
2 Rega Institute for Medical Research, Katholieke Universiteit Leuven, Leuven, Belgium
Correspondence
A. Hatzakis
ahatzak{at}cc.uoa.gr
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Retrovirus recombination was first observed in avian tumour viruses (Vogt, 1971) and subsequently in other retroviruses (Clavel et al., 1989
; Wong & McCarter, 1973
). Recombination is thought to occur during the reverse transcription of the genomic RNA to the formation of double-stranded DNA (see Fig. 5A
), providing the main strategy of retroviruses for exchanging genetic information between two heterogeneous RNAs and, thus, for tremendous genetic alterations within the viral genome. A prerequisite for recombination between two genetically diverse RNA molecules is their co-packaging in the same virion (Hu & Temin, 1990a
, b
).
|
The impact of recombination on the evolution of HIV-1 was documented recently, showing that at least 10 % of HIV-1 strains circulating comprise intersubtype recombinants (Kuiken et al., 2000; Robertson et al., 1995a
, b
). Furthermore, a significant proportion of the HIV-1 sequences that have been characterized initially by partial sequencing as non-recombinants were found to comprise intersubtype recombinants after analysis of their complete sequences (Kuiken et al., 2000
). Three different classes of homologous recombination have been documented to occur in HIV-1: between strains of the same subtype (intrasubtype), between strains of different subtype (intersubtype) and between strains of different group (intergroup). In this study, 34 available full-length HIV-1 intersubtype mosaics were re-analysed to determine the in vivo properties of HIV-1 intersubtype recombination.
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Determination of recombination patterns.
DNA sequence alignments were performed using CLUSTAL_W, version 1.81 (Thompson et al., 1994). The recombination pattern of each sequence was resolved by bootscanning plots, as implemented in the SIMPLOT software, version 2.5 (Ray, 1998
), and confirmed further by phylogenetic analysis using the neighbour-joining method (NJ) (Saitou & Nei, 1987
) with Kimura's two-parameter correction (Kimura, 1980
), as implemented in PHYLIP, version 3.5c (Felsenstein, 1993
) (see supplementary Fig. 1
, available at http://vir.sgmjournals.org). Bootstrap analysis (100 replicates) was used to estimate the reliability of the constructed trees. Phylogenetic analysis was accomplished using the maximum-likelihood model (ML) with the TamuraNei evolutionary model (TrN), including
-distributed rates heterogeneity among sites, as implemented in TREEPUZZLE, version 5.0.pl6 (Schmidt et al., 2002
). TrN was chosen as the best-fitting nucleotide substitution model in several pieces of the HIV-1 genome, according to the ML ratio test using the MODELTEST (Posada & Crandall, 1998
) and PAUP*, version 4.0b10, programs (Swofford, 1998
). To search for any potential relationships between the unclassified regions of the recombinant sequences and any HIV-1 sequences characterized previously, a BLAST search was performed using the default settings (http://www.ncbi.nlm.nih.gov/BLAST/). Phylogenetic analysis was then used to confirm the similarities obtained by BLAST.
|
Independency of breakpoints.
A small number of recombination breakpoints may have been inherited by the same progeny and, consequently, there might be a perturbation of their independency. To examine whether recombination events may share an immediate common ancestor, we identified all breakpoints localized within a similar region of the alignment (±100 nt from the breakpoint position) that shared a common evolutionary history (HIV-1 subtype) in the adjacent regions. In the event that recombination events shared an immediate common ancestor, they should then cluster together in the region containing the breakpoint. Thus, to identify any potentially linked recombination events, we performed phylogenetic analysis using ML for all pairs of fragments (400 nt in length); these fragments included a breakpoint and shared a common HIV-1 subtype classification in the adjacent regions. Any recombination events found to share a common evolutionary history were excluded from all subsequent analyses.
Determination of sequence similarity across the HIV-1 genome.
Sequence similarity across the HIV-1 genome was calculated by a newly developed program termed SIMSCAN (available from the authors upon request). SIMSCAN was designed to calculate the mean similarity (measured as the proportion of sites that are identical between two sequences) between different sequences in a sliding window moving along the genome. The mean intersubtype sequence similarity was determined by averaging all pairwise intersubtype similarities of nine sequences belonging to all HIV-1 pure subtypes characterized previously (subtypes AD, FH, J and K). Sequence similarity was plotted for a moving window of 1000 bp in steps of 50 bp using the SIMSCAN program and compared with the recombination breakpoint frequency in the same figure. In a similar way, we compared the recombination breakpoint frequency with the mean intersubtype evolutionary distances corrected by TrN with a -distribution rates heterogeneity among sites, as estimated in TREEPUZZLE.
Statistics.
The association between sequence similarity and recombination breakpoint frequency was modelled using a linear regression model and the coefficient of determination (r2) was estimated. To detect a possible phase difference between the two series, this procedure was repeated by moving the similarity plot either upstream or downstream in steps of 50 bp over a range of 1000 bp relative to the recombination frequency plot and a maximum r2 value was obtained. To examine whether the recombination breakpoints are more clustered than would be expected by chance, we compared the observed versus the expected number of breakpoints in the major gene products, where the expected number of breakpoints per gene region was estimated by averaging the total number of breakpoints per sequence length in the alignment using the chi-squared test.
Simulations.
To test for the statistical significance of the association between the sequence similarity and the recombination breakpoint frequency of the maximum r2 value and to validate the statistical significance of the difference between the observed and the randomly derived frequency of recombination breakpoints, recombination was simulated using a uniform distribution of breakpoints and the distributions of the maximum r2 value. The chi-squared values obtained from the simulated datasets were used as the null. More specifically, the positions of the recombination breakpoints were simulated using a Monte Carlo-based algorithm according to the following assumption: the positions of the breakpoints were distributed uniformly along the genome and, more specifically, within positions 4007800 of the alignment. Recombination breakpoints were simulated, for simplicity, at positions multiple of 50 nt, since the accuracy for detection of the breakpoints is below 50 nt.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
First, the frequency of recombination breakpoints, adjusted per 1000 nt, was plotted along the HIV-1 genome (Fig. 2). According to Fig. 2
, there is only a twofold fluctuation of the breakpoint frequency throughout the HIV-1 genome, e.g. within the pol gene, the frequency of breakpoints was twofold higher than that in gag or env. In contrast, where a more detailed approach was used allowing the mapping of the recombination breakpoints within a sliding window of 1000 bp (BREAKSCAN), the frequency of intersubtype recombination was found to differ considerably (fivefold) within certain genomic regions (Jetzt et al., 2000
). Recombination frequency peaks were found in RT, in the accessory genes vif and vpr, in the first exons of tat/rev, in vpu and in gp41, indicating that they provide intersubtype recombination hot spots. Conversely, in gp120 and in p17/p24 (gag), the frequency of recombination was the lowest throughout the complete HIV-1 genome. To examine whether the recombination breakpoints are more clustered than expected by chance, we compared the observed versus the expected number of breakpoints in the major gene products. The result of this analysis revealed that the distribution of breakpoints observed was significantly different from the expected one (P<0·002), suggesting that intersubtype recombination does not occur randomly across the HIV-1 genome.
|
|
To test for the statistical significance of the estimated rmax2 value (rmax2=0·69), we estimated the distribution of the coefficient of determination (r2) between the similarity and the frequency of recombination under the assumption that the positions of the breakpoints are distributed uniformly across the genome. More specifically, the location of the breakpoints was simulated in 1000 replicates of 34 sequences and then the correlation between recombination frequency and similarity was calculated for each dataset using a sliding window of 1000 nt, in the same way as in the original data. The value obtained for the real dataset (r2=0·69) was found to lie at the extreme 5 % of the distribution of the 1000 maximized coefficients of determination, indicating that there is a statistically significant association between the frequency of recombination breakpoints and sequence similarity (Fig. 4).
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The identification of recombination hot spots may be important for the development of retrovirus constructs used as reagents for HIV-1 vaccine development (Kuiken et al., 2000). Additionally, hot spots render them the most appropriate regions for detecting HIV-1 intersubtype recombinants and thus facilitating the HIV-1 molecular epidemiological studies based on the analysis of partial genomic regions. Furthermore, in genomic regions where recombination occurs less frequently, the assumption that virus divergence is dominated by point mutations is not violated, thus rendering them most appropriate for phylogenetic analyses and molecular clock calculations. The existence of certain hot spots within different genomic regions could be explained possibly by any structural features of the RT, which causes template jumping, or any potential functional constraints between different genomic regions belonging to diverse HIV-1 subtypes (Gao et al., 1996
; Paraskevis et al., 2000
; Worobey & Holmes, 1999
).
On the other hand, based on our analysis, sequence similarity is of vital importance for intersubtype recombination events, as has been considered previously (Worobey & Holmes, 1999). It is of particular interest that in C2V3 region of env, which is the most variable region of the HIV-1 genome (Kuiken et al., 2000
), no recombination breakpoints were observed. Furthermore, the importance of sequence similarity in the efficiency of recombination has been demonstrated previously in vitro, where the rate of recombination between two non-homologous RNAs was 100 to 1000 times less than that between two entirely homologous sequences (Zhang & Temin, 1993
). Thus, in accordance with previous findings about the correlation between sequence similarity and recombination, we provide direct evidence based on data observed in vivo: first that sequence similarity drives template jumping in retrovirus homologous recombination and second that sequence dissimilarity may be the predominant constraint in recombination events between quite divergent strains. According to our analysis, there might be a similarity threshold for template switching and recombination. In the case of HIV-1 intersubtype recombination, where there is considerable divergence between different lineages, recombination occurs more frequently in the most well-conserved regions, such as in pol. These findings are in accordance with recent findings from in vitro studies (Iglesias-Sanchez & Lopez-Galindez, 2002
).
Although different parameters such as selection, protein functionality and epidemiological reasons may have played a role in the successful spread of the HIV-1 intersubtype recombinants detected until now, we provide evidence that sequence similarity provides a major constraint in the process of the generation of HIV-1 recombinants. This means that this particular association between sequence similarity and recombination may affect the formation of mosaic strains before other forces such as selection or virulence play their role in the spread of newly generated recombinants.
On the other hand, the expected outline of recombination frequency should be quite different if the evolutionary selection was the main driving force: recombination should be more frequent adjacent to divergent regions, such as env, and less frequent at conserved regions, such as the accessory genes and pol/RT (see supplementary Fig. 2, available online at http://vir.sgmjournals.org), resulting in negative correlation between sequence similarity and recombination frequency. While point mutations do happen randomly because of the fortuity of RT errors, they are strongly selected by evolutionary forces, resulting in more conserved regions, such as pol, and less conserved regions, such as env. Surprisingly, recombination does not seem to be affected by evolution in the same way and the observed pattern must be affected by another, less obvious, underlying mechanism.
Moreover, our analysis revealed a newly identified phase difference of approximately 650 nt between sequence similarity and recombination frequency. According to the models proposed for retrovirus recombination, hybridization between the nascent DNA and the acceptor molecule is required for priming an efficient continuation of DNA synthesis (Hu & Temin, 1990b; Negroni & Buc, 2001
). Thus, if sequence similarity is important in this process, it will appear as a phase shift with respect to the recombination breakpoint: downstream if recombination occurs during first-strand cDNA synthesis and upstream when during second (or plus)-strand synthesis (Fig. 5
). Our observations are thus more supportive of the strand displacement-assimilation model to occur in vivo.
Previous in vitro experiments documented that retrovirus recombination may occur during both plus- and minus-strand DNA synthesis (Hu & Temin, 1990a, b
; Junghans et al., 1982
), while it is strongly believed to occur mainly during minus-strand DNA synthesis through the copy-choice model (Jetzt et al., 2000
; Negroni & Buc, 2001
). It is interesting that since the first observation of the DNA-H structures through electron microscopy and the phrasing of the strand displacement-assimilation model, no other evidence has ever supported this model as predominant in any other retrovirus (Negroni & Buc, 2001
). On the contrary, our analysis provides evidence that strand displacement-assimilation might play a significant role in intersubtype recombination, although further study is required to assess the potential of the strand displacement-assimilation model in the generation of HIV-1 intersubtype recombinants.
Recombination is influential for retrovirus evolution and it has been estimated that for HIV-1, recombination occurs, on average, more frequently than the mutation accumulation rate (Jetzt et al., 2000; Negroni & Buc, 2001
). The evolutionary advantages of homologous recombination are well established (Burke, 1997
). In depth analysis reveals that recombination rises as a primitive form of sexual reproduction, a type of amphigony (Temin, 1991
). Therefore, it may serve as an emergency exit from the accumulation of deleterious mutations known as Muller's ratchet (Chao, 1990
). Furthermore, it gives the potentiality of evolutionary broad jumping between peaks of fitness; it allows more efficient exploration of the sequence space and increases genetic diversity.
In brief, our findings indicated that there are certain intersubtype recombination hot spots across the HIV-1 genome and that sequence similarity plays a pivotal role in recombination events between different subtypes, indicating that sufficient sequence similarity is required upstream of the recombination breakpoint and suggesting that the strand displacement-assimilation model might provide the dominant model for HIV-1 intersubtype recombination.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Boone, L. R. & Skalka, A. M. (1981b). Viral DNA synthesized in vitro by avian retrovirus particles permeabilized with melittin. II. Evidence for a strand displacement mechanism in plus-strand synthesis. J Virol 37, 117126.[Medline]
Burke, D. S. (1997). Recombination in HIV: an important viral evolutionary strategy. Emerg Infect Dis 3, 253259.[Medline]
Chao, L. (1990). Fitness of RNA virus decreased by Muller's ratchet. Nature 348, 454455.[CrossRef][Medline]
Clavel, F., Hoggan, M. D., Willey, R. L., Strebel, K., Martin, M. A. & Repaske, R. (1989). Genetic recombination of human immunodeficiency virus. J Virol 63, 14551459.[Medline]
Coffin, J. M. (1979). Structure, replication, and recombination of retrovirus genomes: some unifying hypotheses. J Gen Virol 42, 126.[Medline]
Dougherty, J. P. & Temin, H. M. (1988). Determination of the rate of base-pair substitution and insertion mutations in retrovirus replication. J Virol 62, 28172822.[Medline]
Felsenstein, J. (1993). PHYLIP: Phylogeny Inference Package. Department of Genetics, University of Washington, Seattle, WA, USA.
Gao, F., Robertson, D. L., Morrison, S. G. & 8 other authors (1996). The heterosexual human immunodeficiency virus type 1 epidemic in Thailand is caused by an intersubtype (A/E) recombinant of African origin. J Virol 70, 70137029.[Abstract]
Ho, D. D., Neumann, A. U., Perelson, A. S., Chen, W., Leonard, J. M. & Markowitz, M. (1995). Rapid turnover of plasma virions and CD4 lymphocytes in HIV-1 infection. Nature 373, 123126.[CrossRef][Medline]
Hu, W. S. & Temin, H. M. (1990a). Genetic consequences of packaging two RNA genomes in one retroviral particle: pseudodiploidy and high rate of genetic recombination. Proc Natl Acad Sci U S A 87, 15561560.[Abstract]
Hu, W. S. & Temin, H. M. (1990b). Retroviral recombination and reverse transcription. Science 250, 12271233.[Medline]
Iglesias-Sanchez, M. J. & Lopez-Galindez, C. (2002). Analysis, quantification, and evolutionary consequences of HIV-1 in vitro recombination. Virology 304, 392402.[CrossRef][Medline]
Jetzt, A. E., Yu, H., Klarmann, G. J., Ron, Y., Preston, B. D. & Dougherty, J. P. (2000). High rate of recombination throughout the human immunodeficiency virus type 1 genome. J Virol 74, 12341240.
Junghans, R. P., Boone, L. R. & Skalka, A. M. (1982). Retroviral DNA H structures: displacement-assimilation model of recombination. Cell 30, 5362.[Medline]
Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16, 111120.[Medline]
Kuiken, C., Foley, B., Hahn, B., Marx, P., McCutchan, F., Mellors, J. W., Mullins, J., Wolinksy, S. & Korber, B. (2000). HIV Sequence Compendium 2000. Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA.
Negroni, M. & Buc, H. (2001). Mechanisms of retroviral recombination. Annu Rev Genet 35, 275302.[CrossRef][Medline]
Paraskevis, D., Magiorkinis, M., Paparizos, V., Pavlakis, G. N. & Hatzakis, A. (2000). Molecular characterization of a recombinant HIV type 1 isolate (A/G/E/?): unidentified regions may be derived from parental subtype E sequences. AIDS Res Hum Retroviruses 16, 845855.[CrossRef][Medline]
Posada, D. & Crandall, K. A. (1998). MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817818.[Abstract]
Preston, B. D., Poiesz, B. J. & Loeb, L. A. (1988). Fidelity of HIV-1 reverse transcriptase. Science 242, 11681171.[Medline]
Ray, S. C. (1998). SIMPLOT. Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, USA.
Roberts, J. D., Bebenek, K. & Kunkel, T. A. (1988). The accuracy of reverse transcriptase from HIV-1. Science 242, 11711173.[Medline]
Robertson, D. L., Hahn, B. H. & Sharp, P. M. (1995a). Recombination in AIDS viruses. J Mol Evol 40, 249259.[Medline]
Robertson, D. L., Sharp, P. M., McCutchan, F. E. & Hahn, B. H. (1995b). Recombination in HIV-1. Nature 374, 124126.[Medline]
Saitou, N. & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406425.[Abstract]
Schmidt, H. A., Strimmer, K., Vingron, M. & von Haeseler, A. (2002). TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502504.
Sturges, H. (1926). The choice of a class-interval. JASA 21, 6566.
Swofford, D. L. (1998). PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods), 4 edn. Sinauer Associates, Sunderland, MA, USA.
Temin, H. M. (1991). Sex and recombination in retroviruses. Trends Genet 7, 7174.[Medline]
Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 46734680.[Abstract]
Vogt, P. K. (1971). Genetically stable reassortment of markers during mixed infection with avian tumor viruses. Virology 46, 947952.[Medline]
Wain-Hobson, S. (1993). The fastest genome evolution ever described: HIV variation in situ. Curr Opin Genet Dev 3, 878883.[Medline]
Wong, P. K. & McCarter, J. A. (1973). Genetic studies of temperature-sensitive mutants of Moloney-murine leukemia virus. Virology 53, 319326.[CrossRef][Medline]
Worobey, M. & Holmes, E. C. (1999). Evolutionary aspects of recombination in RNA viruses. J Gen Virol 80, 25352543.
Zhang, J. & Temin, H. M. (1993). Rate and mechanism of nonhomologous recombination during a single cycle of retroviral replication. Science 259, 234238.[Medline]
Received 20 February 2003;
accepted 23 June 2003.