*Laboratoire Génome et Informatique, Université de Versailles Saint Quentin-en-Yvelines, Versailles, France
Laboratoire d'Informatique Fondamentale de Lille, Equipe Bioinformatique, Université des Sciences et Technologie de Lille, Villeneuve d'Ascq, France
Deutsches Krebsforschungszentrum Theoretische Bioinformatik (H0300) Im Neuenheimer Feld 280, Heidelberg, Germany
Institut de Génétique Humaine, Montpellier, France
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the present study, we tested these evolutionary hypotheses by focusing on the long terminal repeats (LTRs) (and flanking sequences) that bound proviral DNA sequences from two groups of human immunodeficiency viruses (HIV): 15 HIV-1's together with a chimpanzee simian immunodeficiency virus (SIV), and 9 HIV-2's together with a macaque SIV and a sooty mangabey SIV. It is known that following retrovirus integration into the host-cell genome, the double-stranded proviral DNA is flanked by two identical LTRs, with the 5' LTR element serving as the binding site for transcription factors (reviewed in Ou and Gaynor 1995
; Pereira et al. 2000
). The HIV nef gene open reading frame partially overlaps the 3' LTR (fig. 1
). The HIV LTRs are short sequences that can be visually compared and have been subjected to exhaustive sequencing and biological studies because of an important pathological concern (as a result of the worldwide AIDS crisis) and the presence of transcription control sites on them. It is already known that retrovirus LTRs have multiplied motifs that may correspond to experimentally determined regulatory elements (Frech, Brack-Werner, and Werner 1996
). Several studies have shown that the LTR structures and their regulation are of particular interest for HIV expression (Gaynor 1992
). Here, we use the control sites that are conserved during evolution as starter homology blocks for a reliable multiple alignment of the 27 LTR sequences. We also list overrepresented words by using a new calculation strategy (Klaerr-Blanchard, Chiapello, and Coward 2000)
applicable to short sequences such as the LTRs and their coding and noncoding sectors. CTG is often found in the overlapping multiplied motifs described in HIV-1 LTRs (Seto, Brunck, and Bernstein 1989
). Moreover, the sequences of HIV (1 and 2) are the most biased in favor of the overrepresented trinucleotides in the LTRs (Laprevotte et al. 1997
).We search for putative short- and longer-range duplications/deletions by comparisons with shuffled sequences and by scrutinizing the thoroughly assessed alignment of the 27 sequences sector by sector. The results are in accordance with the previous hypotheses for the retrovirus nucleotide sequences of molecular evolution by scrambled stepwise short- and longer-range duplications/deletions.
|
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
The Alignment Strategy
The alignment can be found on the web page http://genome.genetique.uvsq.fr/laprevotte/. Within each HIV-1 or HIV-2 group, the sequences were closely related and the alignments are easily constructed, such that the two consensuses were easily deduced. The point is to align HIV-1 and HIV-2 sequences together (these sequences are supposed to have a common evolutionary progenitor). The alignment was constructed manually. To begin with, it was based on eight consensus elements (or groups of elements), that is, 18 positions highlighted by Frech, Brack-Werner, and Werner (1996)
, who studied common modular structures in primate lentiviral LTR sequences. Most of these core blocks enable one to propose a reliable alignment of the corresponding and neighboring HIV-1 and HIV-2 sectors, provided that some local corrections are done. In addition, PPT and PBS were very significant core blocks, together with the 5' and 3' ends of the LTRs, respectively. The rest of the alignment was built by recursively searching the intercalary sectors for perfectly matched segments of at least three bases in length. For each step, a new intercalary alignment was then based on the longest perfect match between any paired HIV-1 and HIV-2 sequences, that is, consistent with the prealigned bordering sectors. This match was a priori assumed to be the closest to the putative original sequence. In addition, probable duplication/deletion events were taken into account. Especially in the case of an unequal number of repeated motifs between HIV-1's and HIV-2's, gaps were inserted (gaps were not treated explicitly but remain as those parts of the sequences that did not belong to any of the aligned segments; Morgenstern, Dress, and Werner 1996
). Eventually, the alignment was based on the nucleotides printed on the line labeled "common sectors." These nucleotides covered 643 positions (
58%) of the alignment. In the coding reading frame, the polypeptide alignment was constructed in the same way based on the conserved amino acids (and the corresponding codons). In order to assess and to locally correct the nucleotide alignment while increasing the signal-to-noise ratio, alphabets of more than four letters were additionally used: that of the polypeptide alignment in the coding reading frame (as just mentioned), and that obtained by the 8- and 12-ranked N-block presentation for the whole of the sequences. Obviously, there was good agreement between the polypeptide and the nucleotide alignments except for a few locations (see the web page). All of the aligned sequences were coded using a 12-ranked and an 8-ranked N-block presentation (the latter being less stringent). Obviously (see the web page), the N-block presentation corroborated the homology blocks (in addition, local corrections of the alignment were made possible).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
In the coding sector (table 3 ), only CIVCG and RESIVSMM showed overrepresented oligonucleotides including CTG. In addition, two HIV-1's and six HIV-2's showed an overrepresented CCAG. The noncoding sector (table 4 ) appeared to be much more congruent with the retroviral signature: the sequences studied showed at least one overrepresented oligonucleotide including CTG (except for K03456, M26727, RESIVSMM, and RESIMM251); six HIV-2's out of nine showed at least one overrepresented word including CAG. For K03455, X01762, and M17449, only the entire lengths of the sequences were studied because of a premature stop codon (tables 35 ).
Table 5 displays the tri- and tetranucleotides that were found to be overrepresented when a first-order Markov chain model was used. Only the sequence RESIVSMM had an overrepresented oligonucleotide (CTGG, in the entire sequence and in the coding sector) that was congruent with the so-called retroviral signature. GGGA remained significantly overrepresented in the noncoding sectors of all of the HIV-1 sequences that were tested (except for CIVCG) and three HIV-2 sequences (L07625, M30895, and X61240). Obviously, the major portion of these overrepresented GGGA's was clustered in the sectors (aligned with CIVCG 397458), where the repeated sites NF-KB and SP.1 were located (see the alignment on the web page). Actually, for these sequences with overrepresented GGGA's, the noncoding sectors were 341547 bases in length and included between 7 and 10 occurrences of this word. In these actual sectors, the zone in which NF-KB and SP.1 sites were clustered (being only 5973 bases in length) included as many as four or five occurrences of GGGA (boxed by a thick line in the alignment).
|
In figure 2 , the numerical values defined for table 6 (column A) are displayed for three sets of 2,700 (27 x 100) shuffled sequences. For each distribution graph, each of the 27 starter sequences was shuffled 100 times (Materials and Methods). For the left graph, the sequences only rigorously conserved the starter nucleotide compositions. For the middle graph, the dinucleotide counts were additionally exactly conserved, as were the trinucleotide compositions for the right graph. The middle graph was more shifted from the left than was the right from the middle, such that the major part of the increase of the random values was accounted for by the first-order Markov chain model. Hence, the repeated sequences investigated in table 6 (column A) appear to be accounted for, to a large extent, by the dinucleotide compositions of the sequences.
|
|
Column J of table 6 accounts for significant overrepresentations of sectors of more than 15 bases in length that are made up of no more than two letters. According to the Bernoulli model, the HIV-1 sequences (except for K02013, K02007, and K03456) were somewhat significant; except for M17451, they were not significant when compared with a first-order Markov chain model. For the HIV-2 group, when a first-order Markov chain model was used, RESIVSMM, J04542, L07625, M30502, and M31113 remained significant, while J03654, J04498, and M15390 did not; the three other sequences in the group were not significant anyway. For the sequences that remained significant when compared to shuffled sequences conserving the exact starter dinucleotide count (first-order Markov chain model), the numerical value was 3.6% or 3.7%, except for M17451 (2.3%) conserving only a borderline significance; for those sequences which did not conserve their significance or remain nonsignificant, the parameter was 0%2.3%.
Column K of table 6 accounts for significant overrepresentations of sectors at least 30 bases in length made up of no more than three letters (with at most one base excepted, this latter not being included in the computation of the numerical value defined above). HIV-1 sequences (except for M17449 and L20571) were significant even against a first-order Markov chain model. For significant sequences, the numerical values range from 18.7% to 32% (14% and 10.1% for M17449 and L20571, respectively). As a whole, HIV-2 sequences (value from 0% to 17%) were not significant.
The Reliability of the Alignment of the 27 Nucleotide Sequences
A reliable alignment is an essential tool in the present work. The accurate alignment of previously identified benchmarks and its congruency with the polypeptide alignment and with the N-block presentation coding of the sequences (Materials and Methods and the alignment on the web page) allowed us to consider this alignment reliable for testing local molecular evolution hypotheses. Three available multiple-sequence alignment programs were tested (table 7
) against the benchmarks found in both the HIV-1 and the HIV-2 groups to select the most suitable algorithm for aligning the actual sequences studied in this work. Clustal-X (Thompson, Plewniak, and Poch 1999
) is a progressive alignment method comparing individual residues by using a Needleman-Wunschbased algorithm (Needleman and Wunsch 1970
) and employing gap penalties; Mabios (Abdeddaïm 1997
) and Dialign (Morgenstern, Dress, and Werner 1996
) calculate homology blocks of which the best combinations are chosen in order to select the benchmarks on which the rest of the alignment is constructed. At first, it appeared that there was no program constructing the same alignment that another did. The program Clustal-X produced a total misalignment downstream of the HIV-1 deletion zone following "TAR Common Sector" (CIVCG-519); moreover, the deleted sequence J03654 was oddly aligned (data not shown). Mabios and/or Dialign aligned all of the benchmarks (the R-U5 junction excepted); the alignment of five of these benchmarks was more accurate when Mabios was used (with Dialign constructing an alignment that was less accurate or only partial). However, the Dialign program was the only one aligning all but one of the indicated benchmarks, particularly the much-conserved polyadenylation site (Poly (A)) and highlighting the duplication of the NF-KB site in the HIV-1 group (by inserting a gap in HIV-2 sequences in front of one of the two NF-KB copies). Hence, as regards the present alignment, Dialign appeared to be the most reliable program of those tested; it was further tested for two sectors where the alignment was difficult to construct even manually: the set of sequences aligned with CIVCG 328463 and that aligned with CIVCG 558609 (see the web page). The nucleotides of the alignment constructed with Dialign in these sectors were coded by the 8-ranked N-block presentation, HIV-1 and HIV-2 matching letters being highlighted in red as on the web page (data not shown). Actually, these highlighted letters are less numerous than in the manually aligned corresponding sectors, which suggests that the alignment constructed with such a program has at least to be visually refined.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
By no means can biological mechanisms be deduced from the correlation of overrepresented words and significant local repetitions with the dinucleotide compositions of the corresponding sequences. It is impossible to decide between two hypotheses: either the doublet frequencies, due to any event, account for these words found to be overrepresented or clustered when compared with a Bernoulli model, or a large number of duplications of oligonucleotides (such as AG and CT; tables 24
) bias the dinucleotide composition of the sequence and account for the nonsignificance of many repetitions when tested against a first-order Markov chain model. Such duplications could favor particular nucleotide motifs for biochemical reasons or because of starter tandem repeated sequences. Previous studies of complete retroviral sequences (Laprevotte 1992
; Laprevotte et al. 1997
) strengthen the second hypothesis by demonstrating that for most of the overrepeated oligonucleotides, the observed frequency is not merely a consequence of dinucleotide distribution (many overrepresented oligonucleotides remained significant versus a first-order Markov chain model; moreover, the correlation between the dinucleotide distribution in the subset of the overrepresented oligonucleotides and that of the whole sequence was variable, high, weak, or even null). The fact that for RESIVSMM (which is supposed to be the evolutionary progenitor of the HIV-2's; Gao et al. 1999
) CTGG is overrepresented even when a first-order Markov chain model is used (table 5
) fits the same hypothesis. Moreover, many of the putative locally repeated sectors remain significant even against a first-order Markov chain model (table 6
), giving examples of probable duplications that are obviously not accounted for by the dinucleotide composition of the sequence; these are tandem repeats, local repetitions, clusters of oligonucleotides, and "monotonous" sectors made up of no more than two or three letters. Columns F and K of table 6
show many repetitions and "monotonous" sectors that may cover up to
30% of the sequence and that are significant even against "Markov-1" random sequences (in these cases, the percentage is overestimated by about 10%15%). Moreover, about one third of the alignment includes sectors boxed by a thick line at at least one sequence or one HIV-1/HIV-2 consensus (see the web page). As seen below, these sectors suggest local repetition events. Hence, it appears that in any case the dinucleotide compositions cannot account for all of the listed repetitive patterns and that these patterns cover a large portion of the sequences. The discrepancies between the results (tables 26
) for the HIV-1 and the HIV-2 groups, respectively, suggest distinct mono- or oligonucleotide duplication/deletion scenarios that could have occurred since the evolutionary divergence between the two groups; this led us to search the reliable alignment of the sequences for patterns both statistically significant and evolutionary suggestive.
Differentiated sectors can be delineated in the alignment (see the web page) in terms of the degrees of homology between HIV-1 and HIV-2 aligned sectors. The 5' landmark that is the PPT, together with the 5' end of the LTR (CIVCG 1037) and the 3' landmark that is the PBS (CIVCG 682705), are highly conserved and highlighted by the 12-ranked N-block presentation, as are six other homology blocks; out of these six blocks, the NF-KB site (CIVCG 397407 and 409418) and the polyadenylation signal (CIVCG 570581) are in the noncoding part of the sequence; the other four (CIVCG [105136], [168181], [215229], and [284296]), align with conserved sectors in the polypeptide sequence nef.
The major portion of the coding sectors (up to and including position CIVCG-346), is to be distinguished from the rest of the alignment: the aligned sectors (except for two) measure about the same length (337346 bases). Except for M17449 (which exhibits a premature stop codon), the lengths are equal or differ, as expected, by multiples of three. The length is longer for the L20571 sequence (349 bases); a scan of the colored alignment obviously corroborates the fact that L20571 is a divergent isolate among the HIV-1 group (Gurtler et al. 1994
). In the HIV-2 group, J03654 (Zagury et al. 1988
) is deleted between positions CIVCG-88 and CIVCG-317 (excluded). This could be accounted for by two successive deletion events. Let us write the HIV-2 consensus between the positions CIVCG-79 and CIVCG-94 while supposing a jump of the reverse transcriptase (Katz and Skalka 1990
; Zhang and Temin 1994
) from the first aga to the second; then, the sequence becomes TATACTTAGAAGG. Eleven out of the 13 letters of this motif match the HIV-2 consensus between positions CIVCG-309 and CIVCG-321 (TATARYTACAAGG), suggesting a second jump between the two motifs. Furthermore, in spite of the conservation of the lengths of the major part of the coding sectors, gaps have to be inserted in the sequences in order to align the homology blocks, suggesting any number of duplications/deletions. For instance, for the sectors aligned with CIVCG from position 39 to position 58, four demonstrative sequences lead to the proposal of a suggestive alignment:
|
|
From position CIVCG-347 downward, the major portion of the aligned sequences is noncoding. Consequently, their lengths do not necessarily differ by multiples of three; they are much more divergent between the HIV-1 and the HIV-2 groups and even, within the HIV-2 group, between the two SIV-2 and the HIV-2 sequences. The duplication/deletion events appear to have been much less constrained during evolution than they have been in the coding parts. In this respect, several sectors deserve scrutiny.
The alignment between positions CIVCG-348 and CIVCG-386 can be accounted for by stepwise duplications/deletions (see the web page). HIV-1 clones have been described (Estable et al. 1996
) where the HIV-1 empty sectors are occupied by the so-called "most frequent naturally occurring length polymorphism" (MFLNP on the web page), which shows varying lengths and appears more or less clearly to contain repeated sectors. Here, the aligned sectors in the HIV-2 group do not appear to be deleted.
Between CIVCG-394 and CIVCG-413 (excluded), HIV-2 sequences (SIVs excluded) exhibit two imperfectly repeated sectors that could be the remnant of a duplication event. L07625 and X61240 HIV-2 sequences are to be distinguished from the other seven (Kreutz et al. 1992
; Barnett et al. 1993
), as they differ in numerous locations all along the alignment. In each of them, the two homologous sectors (boxed by a thick line in the alignment) extend from position L07625-436 to position L07625-460 and from position L07625-461 to position L07625-484, respectively, and do not coincide with those of the other HIV-2 sequences:
|
|
|
Column K of table 6 shows that the HIV-1 sequences (except for M17449 and L20571) include overrepresented sectors at least 30 bases in length made up of no more than three letters (with at most one base excepted). Such a sector is found in these sequences between positions K02013-462 and K02013-494 (the corresponding sector is boxed by a thick line at the HIV-1 consensus; see the web page). In this sector, as well as upstream and downstream, the HIV-1 group shows clusters of CT's, CTG's, and CTGG's (table 6, columns F, G, H, and I). These words are boxed by a thick line in the alignment when the corresponding pattern is overrepresented against a one-order Markov chain model in the corresponding sequence taken as a whole (table 6 ). All of these words are scattered in a region that could be accounted for, at least partly, by stepwise duplications/deletions of mono- or oligonucleotides taken from tandemly repeated CTG's.
The aligned sectors extending from the 5' end of the R region (CIVCG-501, the beginning of viral RNA; see above) to the positions aligned with CIVCG-565 correspond to the TAR region, which has been extensively studied concerning its biological meaning and the stable stem-loop structure that forms TAR RNA (reviewed in Ou and Gaynor 1995
; Rabson and Graves 1997
). The HIV-1 TAR RNA contains both a loop and a bulge structure that are critical for Tat-mediated activation. The HIV-2 TAR RNA is capable of forming a complex structure that consists of two discrete stem-loop regions. Possible evolution routes from simple one-hairpin to complex branched TAR structures have been discussed in the literature. The extended portion of the HIV-2 TAR, relative to the HIV-1 TAR, have the greatest similarity to a human immunoglobulin pseudogene sequence, suggesting (see above) that this sub-sequence is a captured element (reviewed in Myers 1997
). In the alignment, the sector referred to as "TAR Common Sector" is conserved between HIV-1 and HIV-2. It corresponds to the upper portion of the HIV-1 stem-loop (the bulge-and-loop zone) and to the 5' HIV-2 stem-loop region. The two successive sectors of the HIV-2 consensus that are boxed by a thick line in the alignment on the web page (the first including the TAR Common Sector), correspond (apart from a few bases) to the two HIV-2 TAR RNA discrete stem-loop regions:
|
As a whole, the results discussed above fit the molecular-evolution model hypothesized previously (Laprevotte 1989, 1992
; Laprevotte et al. 1997
): overrepresented oligonucleotides are scattered throughout the entire range of the retroviral sequences; they share complementary core consensuses that fit the rule of a trend to TG/CT excess (Ohno and Yomo 1990
) and suggest starter tandemly repeated oligonucleotides (short tandem repeats giving rise to longer oligonucleotide repeats, as hypothesized previously [Southern 1972
; Ohno 1988
]); they are mixed with scrambled short-scale repetitions, deletions/duplications, tandem repeats, and cryptic simplicity patterns, suggesting a molecular evolution by scrambled stepwise short- and longer-range duplications/deletions (in addition to nucleotide miscopying).
Even though this model gives a good account of the repetitive aspects of retroviral nucleotide sequences, other evolutionary processes may be considered, such as gene conversion (leading to homogeneity throughout DNA sequences; see discussion in Laprevotte 1989
) and a converging evolution toward repeated motifs serving useful functions (Laprevotte et al. 1997
). This also leads to consideration of possible selective pressures maintaining the repeats.
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Sector by sector, we hypothesize a large number of local duplication/deletion scenarios that span a great portion of the alignment and could account for length divergences between the HIV-1 and HIV-2 groups. Consequently, base substitutions are by no means the unique evolutionary process to take into account for comparisons of such sequences and their phylogenetic analyses. Altogether, our results support our previous hypotheses on the molecular evolution of retroviral nucleotide sequences: a large portion of the sequences can be accounted for by scrambled stepwise short- and longer-range duplications/deletions. There is an emerging hypothesis of an important duplication/deletion role for the reverse transcriptase that could (in addition to already-proposed scenarios) generate perfect or stuttering tandem repeats and then a cryptic simplicity of the sequence. The consensus overrepresented motifs and the numerous cryptic simplicity sectors observed suggest one or several starter tandemly repeated short motif(s). Additional comparisons of decreasingly homologous sequences using a fast and reliable method for the alignments could further unravel these evolutionary patterns.
A reliable and accurate alignment of the compared sequences is an essential tool for performing a high-resolution molecular evolution study. The accurate assessment of the nucleotide alignment with already-identified benchmarks, with the polypeptide alignment, and with the N-presentation coding of the sequences allows us to consider the alignment reliable. The multiplied alphabet obtained by the mathematical strategy called N-block presentation appears to be a promising method to increase the signal-to-noise ratio in the nucleotide alignment studies.
It is well known that in eukaryotic cells, reverse transcription processes are not restricted to parasitic retroviruses, and that a diverse set of genes, referred to as retrotranscripts, derived from their normal progenitor genes via an mRNA intermediate (Boeke and Stoye 1997
). These elements, as well as retroviruses and retrotransposons, are a source of genomic variation, as could be an increasing number of human endogenous retrovirus sequences that have been demonstrated (Kjellman, Sjogren, and Widegren 1999
). The endogenous IAP particles of mice may also contribute to the generation of genetic diversity in this host population. Furthermore, it has been hypothesized that if the prebiotic genetic material was RNA, reverse transcription might have been required to formulate DNA-based genetic information (Katz and Skalka 1990
). All of these data and others, taken together, suggest that further investigation of the reverse transcription could shed light on some aspects of eukaryotic genome evolution and consequently not be restricted to the biology of retroviruses.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: HIV-1 and HIV-2 LTR nucleotide sequences
multiple alignment
N-block presentation
"retroviral signatures" of overrepeated oligonucleotides
scrambled stepwise duplications/deletions
cryptic simplicity
2 Address for correspondence and reprints: Ivan Laprevotte, Laboratoire Génome et Informatique, Université de Versailles Saint Quentin-en-Yvelines, 45 avenue des Etats-Unis, 78035 Versailles cedex, France. E-mail: laprevotte{at}genetique.uvsq.fr
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Abdeddam S., 1997 Incremental computation of transitive closure and greedy alignment Lect. Notes Comput. Sci 1264:167-179
Antezana M. A., M. Kreitman, 1999 The nonrandom location of synonymous codons suggests that reading frame-independent forces have patterned codon preferences J. Mol. Evol 49:36-43[ISI][Medline]
Averof M., A. Rokas, K. H. Wolfe, P. M. Sharp, 2000 Evidence for a high frequency of simultaneous double-nucleotide substitutions Science 287:1283-1286
Barnett S. W., M. Quirogu, A. Werner, D. Dina, J. A. Levy, 1993 Distinguishing features of an infectious molecular clone of the highly divergent and noncytopathic human immunodeficiency virus type 2 UC1 strain J. Virol 67:1006-1014[Abstract]
Beasty A. M., M. J. Behe, 1988 An oligopurine sequence bias occurs in eukaryotic viruses Nucleic Acids Res 16:1517-1528[Abstract]
Berkhout B., 1996 Structure and function of the human immunodeficiency virus Prog. Nucleic Acid Res. Mol. Biol 54:1-34[ISI][Medline]
Boeke J. D., J. P. Stoye, 1997 Retrotransposons, endogenous retroviruses, and the evolution of the retroelements Pp. 343435 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, New York
Chaboissier M. C., D. Finnegan, A. Bucheton, 2000 Retrotransposition of the I factor, a non-long terminal repeat retrotransposon of Drosophila, generates tandem repeats at the 3' end Nucleic Acids Res 28:2467-2472
Coward E., 1998 Mathematical methods for repeated patterns in biological sequences Dr.Ing. thesis, Norwegian University of Science and Technology, Trondheim, Norway
. 1999 Shufflet: shuffling sequences while conserving the k-let counts Bioinformatics 15:1058-1059
Devereux J., 1989 The GCG sequence analysis software package Version 6.0. Genetics Computer Group, Madison, Wis
Didier G., 1999 Caractérisation des N-écritures et application à létude des suites de complexité ultimement n + cste Theor. Comput. Sci 215:31-49[ISI]
Estable M. C., B. Bell, A. Merzouki, J. S. G. Montaner, M. V. O'Shaughnessy, I. J. Sadowski, 1996 Human immunodeficiency virus type 1 long terminal repeat variants from 42 patients representing all stages of infection display a wide range of sequence polymorphism and transcription activity J. Virol 70:4053-4062[Abstract]
Frech K., R. Brack-Werner, T. Werner, 1996 Common modular structure of lentivirus LTRs Virology 224:256-267[ISI][Medline]
Gao F., E. Bailes, L. Robertson, et al. (12 co-authors) 1999 Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature 397:436-441[ISI][Medline]
Gaynor R., 1992 Cellular transcription factors involved in the regulation of HIV-1 gene expression AIDS 6:347-363[ISI][Medline]
Gurtler L. G., P. H. Hauser, J. Eberli, A. von Brunn, S. Knapp, L. Zekeng, J. M. Tsague, L. Kaptue, 1994 A new subtype of human immunodeficiency virus type 1 (MVP-5180) from Cameroon J. Virol 68:1581-1585[Abstract]
Kandel D., Y. Matias, R. Unger, P. Winkler, 1996 Shuffling biological sequences Discrete Appl. Math 71:171-185[ISI]
Katz R. A., A. M. Skalka, 1990 Generation of diversity in retroviruses Annu. Rev. Genet 24:409-445[ISI][Medline]
Kjellman C., H. O. Sjogren, B. Widegren, 1999 HERV-F, a new group of human endogenous retrovirus sequences J. Gen. Virol 80:2383-2392
Klaerr-Blanchard M., H. Chiapello, E. Coward, 2000 Detecting localized repeats in genomic sequences: a new strategy and its application to B. subtilis and A. thaliana sequences Comput. Chem 24:57-70[ISI][Medline]
Kreutz R., U. Dietrich, H. Kühnel, K. Nieselt-Struwe, M. Eigen, H. Rübsamen-Waigmann, 1992 Analysis of the envelope region of the highly divergent HIV-2 ALT isolate extends the known range of variability within the primate immunodeficiency viruses AIDS Res. Hum. Retroviruses 8:1619-1629[ISI][Medline]
Laprevotte I., 1989 Scrambled duplications in the feline leukemia virus gag gene: a putative pattern for molecular evolution J. Mol. Evol 29:135-148[ISI][Medline]
1992 Mo-MuLV nucleotide sequence exhibits three levels of oligomeric repetitions, suggesting a stepwise molecular evolution J. Mol. Evol 35:420-428[ISI][Medline]
Laprevotte I., S. Brouillet, C. Terzian, A. Hénaut, 1997 Retroviral oligonucleotide distributions correlate with biased nucleotide compositions of retrovirus sequences, suggesting a duplicative stepwise molecular evolution J. Mol. Evol 44:214-225[ISI][Medline]
Laprevotte I., A. Hampe, C. J. Sherr, F. Galibert, 1984 Nucleotide sequence of the gag gene and gag-pol junction of feline leukemia virus J. Virol 50:884-894[ISI][Medline]
Malik H. S., W. D. Burke, T. H. Eickbush, 2000 Putative telomerase catalytic subunits from Giardia lamblia and Caenorhabditis elegans. Gene 251:101-108[ISI][Medline]
Morgenstern B., A. Dress, T. Werner, 1996 Multiple DNA and protein sequence alignment based on segment-to-segment comparison Proc. Natl. Acad. Sci. USA 93:12098-12103
Myers G., 1997 Retroviral sequences Pp. 709755 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, New York
Needleman S. B., C. D. Wunsch, 1970 A general method applicable to the search for similarities in the amino acid sequence of two proteins J. Mol. Biol 48:443-453[ISI][Medline]
Ohno S., 1988 Codon preference is but an illusion created by the construction principle of coding sequences Proc. Natl. Acad. Sci. USA 85:4378-4382[Abstract]
Ohno S., T. Yomo, 1990 Various regulatory sequences are deprived of their uniqueness by the universal rule of TA/CG deficiency and TG/CT excess Proc. Natl. Acad. Sci. USA 87:1218-1222[Abstract]
Ou S.-H. I., R. B. Gaynor, 1995 Intracellular factors involved in gene expression of human retroviruses Pp. 97184 in J. Levy, ed. The Retroviridae. Vol. 4. Plenum Press, New York and London
Pereira L. A., K. Bentley, A. Peeters, M. J. Churchill, N. J. Deacon, 2000 A compilation of cellular transcription factor interactions with the HIV-1 LTR promoter Nucleic Acids Res 28:663-668
Peterlin B. M., 1995 Molecular biology of HIV Pp. 185238 in J. Levy, ed. The Retroviridae. Vol. 4. Plenum Press, New York and London
Rabson A. B., B. J. Graves, 1997 Synthesis and processing of viral RNA Pp. 205261 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, New York
Seto M. H., T. K. Brunck, R. L. Bernstein, 1989 Overlapping redundant sextuplets identical with regulatory elements of HIV-1 and SV40 Nucleic Acids Res 17:2783-2800[Abstract]
Smith T. F., M. S. Waterman, 1981 Identification of common molecular subsequences J. Mol. Biol 147:195-197[ISI][Medline]
Southern E., 1972 Repetitive DNA in mammals Pp. 1927 in R. A. Pfeiffer, ed. Modern aspects of cytogenetics: constitutive heterochromatin in man. Symposia Medica Hoechst No. 6. Schattauer Verlag, Stuttgart, Germany
Tautz D., M. Trick, G. A. Dover, 1986 Cryptic simplicity in DNA is a major source of genetic variation Nature 322:652-656[ISI][Medline]
Terzian C., I. Laprevotte, S. Brouillet, A. Hénaut, 1997 Genomic signatures: tracing the origin of retroelements at the nucleotide level Genetica 100:271-279[ISI][Medline]
Thompson J. D., F. Plewniak, O. Poch, 1999 A comprehensive comparison of multiple sequence alignment programs Nucleic Acids Res 27:2682-2690
Treier M., C. Pfeifle, D. Tautz, 1989 Comparison of the gap segmentation gene hunchback between Drosophila melanogaster and Drosophila virilis reveals novel modes of evolutionary change EMBO J 8:1517-1525[Abstract]
Vartanian J.-P., A. Meyerhans, B. Asjo, S. Wain-Hobson, 1991 Selection, recombination, and GA hypermutation of human immunodeficiency virus type 1 genomes J. Virol 65:1779-1788[ISI][Medline]
Vogt P. K., 1997 Historical introduction to the general properties of retroviruses Pp. 125 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, New York
Wu W., B. M. Blumberg, P. J. Fay, R. A. Bambara, 1995 Strand transfer mediated by immunodeficiency virus reverse transcriptase in vitro is promoted by pausing and results in misincorporation J. Biol. Chem 270:325-332
Zagury J. F., G. Franchini, M. Reitz, et al. (15 co-authors) 1988 Genetic variability between isolates of human immunodeficiency virus (HIV) type 2 is comparable to the variability among HIV type 1 Proc. Natl. Acad. Sci. USA 85:5941-5945[Abstract]
Zhang H., H. M. Temin, 1994 Retrovirus recombination depends on the length of sequence identity and is not error prone J. Virol 68:2409-2414[Abstract]