Unusually long target site duplications flanking some of the long terminal repeats of human endogenous retrovirus K in the human genome

Ilgar Z. Mamedov, Yuri B. Lebedev and Eugene D. Sverdlov

Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Science, 16/10 Miklukho-Maklaya Street, 117997 Moscow, Russia

Correspondence
Ilgar Z. Mamedov
Imam{at}humgen.siobc.ras.ru


   ABSTRACT
Top
ABSTRACT
MAIN TEXT
REFERENCES
 
Human endogenous retroviruses (HERVs) make up a substantial part of the human genome. HERVs and solitary long terminal repeats (solo LTRs) are usually flanked by 4–6 nt short direct repeats through the well-known mechanism of their integration. A number of solo LTRs flanked by unusually long direct repeats were detected in the human genome. These unusual structures might be a product of an alternative virus insertion mechanism.

Supplementary data associated with this article are available at http://humgen.siobc.ras.ru/supplement/suppl.html.


   MAIN TEXT
Top
ABSTRACT
MAIN TEXT
REFERENCES
 
Human endogenous retroviruses (HERVs) and related sequences comprise 8 % of the human genome (International Human Genome Sequencing Consortium, 2001). They are subdivided into different families and subfamilies and are represented by either almost full-length proviruses harbouring the main retroviral genes (gag, pol and env) or solitary long terminal repeats (solo LTRs), which are presumably the products of recombination between identical 5' and 3' LTRs of intact proviruses (Wilkinson et al., 1994). HERVs are assumed to be footprints of ancient infections of germ line cells by retroviruses during evolution of the primate lineage, amplified in the genome by a retrotransposition mechanism (Sverdlov, 1998). Due to a variety of regulatory elements present in their structures, including promoters, enhancers, hormone-responsive elements and polyadenylation signals, HERVs might cause significant and evolutionarily important changes in expression patterns of neighbouring genes (Sverdlov, 2000). The HERV-K family is probably the most functionally and transpositionally active group (Lower et al., 1996; Medstrand et al., 2002), represented by nearly 60 full-length proviruses and 1000–2000 solo LTRs in the human genome (Mager & Medstrand, 2002).

In the host genome, almost all HERVs and solo LTRs are flanked by short direct repeats (SDRs) or target site duplications (TSDs) at the sites of insertion. TSDs represent duplicated genomic sequences introduced through the mechanism of retroviral integration. SDRs of HERV-related elements of 4–6 nt long are identical at the moment of integration but can significantly diverge with time.

In this report we describe unusually long TSDs flanking some of the retroviral LTRs in the human genome and discuss their origin.

The LTR insertions with unusually long TSDs were first identified in a library of human-specific LTR HERV-K integrations, which was obtained using a new method described by us recently (Mamedov et al., 2002). Human specificity of individual LTRs was confirmed by comparison of PCR amplification products for human and great ape DNA samples. This stage was performed with primers targeted at unique sequences flanking the LTR integration sites at the 5' and 3' ends. The amplification product of a locus with an LTR insert is generally about 960 bp (the LTR length) longer than that derived from an orthologous locus lacking the LTR. However, this was not the case for two human-specific LTRs. For the fragments amplified from the LTR AC006035 insert located on human chromosome 7 and an orthologous site in chimpanzee lacking the LTR, the difference in length was approximately 250 bp higher than expected (i.e. ~1210 bp instead of 960 bp; see Fig. 1 for details). Here a standard PCR protocol with the Gibco PCR Reagents System and primers 5'-AACCACGTGAATACACTTTCTCA-3' (forward) and 5'-GTCCAGTTAGACCCCTCAACTAG-3' (reverse) was used. The samples were amplified for 28 cycles at 94 °C for 20 s, 65 °C for 20 s and 72 °C for 40 s. A similar peculiar deviation was also observed for the LTR AC009132 located on chromosome 16. In this case the amplification profile was 28 cycles of 94 °C for 20 s, 53 °C for 20 s and 72 °C for 40 s, and primers 5'-ACGAGATTGGGTAGTTAAAATCC-3' (forward) and 5'-TACACTGTAACATGAATGTACCA-3' (reverse) were employed.



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 1. Example of PCR amplification of an individual LTR-containing locus in the human genome and orthologous loci in chimpanzee and gorilla genomes. (A) Diagram showing the LTR-containing locus (human) with a 250 bp TSD and the orthologous LTR-deficient locus (chimpanzee) with a non-duplicated 250 bp sequence (Target site) – the pre-integration state. (B) Electrophoretic analysis of LTR-containing (white arrow) and LTR-lacking (black arrow) PCR products. Lanes: Hum1 and Hum2, genomic DNA from two humans; Chimp1 and Chimp2, genomic DNA from two chimpanzees (Pan troglodytes); Gor1, genomic DNA from a gorilla (Gorilla gorilla). Ladder, DNA fragments of a 1 kb DNA length marker (SibEnzime).

 
The sequences of both the LTR-containing loci were analysed and no SDRs flanking the LTRs were detected. Sequences of 500 bp flanking these two LTRs in the human genome were subjected to a plot analysis using the VectorNTI program, which revealed unusually long TSDs of 250 and 61 bp in length for the LTRs AC006035 and AC009132, respectively. A long TSD flanking the former LTR (AC006035) was nearly perfect except for a deletion of one T in a 13T track and a single G/T substitution (for alignments of TSDs, see data available at http://humgen.siobc.ras.ru/supplement/suppl.html). It consisted of part of AluJo, a short non-repetitive genomic sequence, and part of MER67B. The LTR AC009132 was integrated into an L1MB3 element. Its long TSD was formed by a duplicated fragment corresponding to positions 6044–6106 of the L1 consensus sequence. This duplication was completely perfect.

Both the LTRs were specific to the human genome and belonged to the youngest subfamily amplified late in primate evolution (Table 1). Their pre-integration states were characterized by shorter PCR fragments amplified from orthologous loci in chimpanzee genomic DNA. The absence of the LTR insertions in orthologous genomic loci of chimpanzee and gorilla suggested that the LTRs were integrated after the split of the chimpanzee and human lineages, which occurred 5–6 million years ago. A low divergence of TSD sequences in human DNA implies recent duplication events. To evaluate the duplication time, PCR products obtained from chimpanzee and gorilla genomic DNAs from the locus orthologous to the LTR AC009132 were cloned using a PCR products T-easy cloning kit (Promega) and sequenced using an Applied Biosystems 373 automatic DNA sequencer. An analysis of the obtained sequences and sequenced fragments of the chimpanzee genome revealed that the orthologous loci of the two closely related primate genomes lacked the duplication (see supplementary data), thus confirming that this duplication did occur after the divergence of the human and chimpanzee lineages.


View this table:
[in this window]
[in a new window]
 
Table 1. Location of LTRs flanked by unusually long TSDs

 
A search through the non-redundant and high throughput genome sequences databases of GenBank for HERV-K LTRs using the NCBI BLAST program (http://www.ncbi.nlm.nih.gov/BLAST) and the RepeatMasker program by A. Smith and P. Green ( http://ftp.genome.washington.edu ) also revealed other long TSDs. We analysed SDRs of about 400 solo HERV-K LTRs and 35 proviruses randomly picked up from GenBank. Most of the analysed sequences appeared to have perfect SDRs of the usual length (4–6 bp). Some of the SDRs analysed were not perfect and contained one or two nucleotide substitutions. Some of the LTRs analysed were truncated at their 5' or 3' ends, which prevented identification of SDRs. However, some full-length LTRs were flanked by SDRs of unusual size. Sequences of 500 bp flanking these LTRs were subjected to plot analysis. As a result, two more LTRs flanked by 96 and 262 bp long TSDs (see Table 1) were detected (AC108063 and Z95704, respectively). The former LTR was integrated into a unique genomic sequence and its direct repeats were more diverged than the two described above. The LTR sequence allowed us to assign it to an intermediate-age subfamily (Table 1) that was transpositionally active before the divergence of the gorilla and human/chimpanzee lineages.

To examine whether the duplication arose due to the LTR insertion, PCR assays of various primate genomic DNAs were performed using primers 5'-TCATAGATAGAAACAAGGTCCTCCT-3' and 5'-CCCCAGTGGCTCGTACTAGAG-3' targeted at the LTR AC108063 flanks. The amplification was performed for 30 cycles at 94 °C for 20 s, 63 °C for 20 s and 72 °C for 40 s. The PCR fragments corresponding to the LTR AC108063 insertion site with 96 bp long TSDs were detected in DNA samples from human, chimpanzee and gorilla. With several gibbon DNA samples presumably lacking the LTR, the corresponding PCR fragments were shorter. A PCR product derived from the gibbon genome of a Hylobates lar individual was cloned and sequenced (accession no. AY536064), confirming the absence of duplication in the ‘pre-integration’ site.

The LTR Z95704 belongs to an old subfamily and is integrated into an L1 retrotransposon. Its direct repeats are highly (~8 %) diverged. The sequences of the sites corresponding to the human LTRs AC108063 and Z95704, taken from the recently published chimpanzee genome (available at http://genome.ucsc.edu), revealed similar duplications flanking the LTRs.

So far only a few examples of long TSDs flanking other transposable elements in the sites of their integration are known, among them a 214 bp long TSD produced due to the integration of a human L1 (Feng et al., 1996), 952 bp long repeats surrounding an IS476 element in a recombinant plasmid (Chen et al., 1999) and 82 bp long repeats flanking the intracisternal A particle (IAP) in the mouse genome (Tanaka & Ishihara, 1995). The long interspersed elements (LINE) retrotransposition mechanism differs from that of retroviruses and includes nicking of the target DNA. Feng et al. (1996) suggested that the formation of such long TSDs was due to peculiarities of helicase activity at the site of integration. In the case of the bacterial insertion sequence (IS) element, the duplication was suggested to be due to plasmid recombination at the site of insertion (Chen et al., 1999). Similar to retroviruses, the integrated form of an IAP element has gag, pol and env genes between two LTRs and uses the same mechanism of retrotransposition. In this context, the closest example is an 82 bp long TSD flanking the de novo integration of an IAP element into the IL-3 gene of myeloid leukaemia cells, generated by whole-body irradiation of mice. Tanaka & Ishihara (1995) suggested that such a long target site duplication was somehow associated with the impact of radiation, which might cause rearrangements or induce an unusual mechanism of retrotransposition.

It has been hypothesized (Morrish et al., 2002) that retrotransposons sometimes take part in the reparation of DNA breaks caused by various reasons. A similar reparation process coupled with HERV-K retrotransposition might form the LTRs described in this work. However, real forces and participants responsible for the integration-associated duplication remain unclear. Hopefully, they will be identified when other mammalian genomes are sequenced and more examples of such rare events are available. The study of the mechanism of the long TSD formation will give us a deeper insight into retroviral transposition and interactions of endogenous retroviruses with the host genome.


   ACKNOWLEDGEMENTS
 
We would like to express our gratitude to Dr B. Glotov for helpful and constructive discussion and help in manuscript preparation. This work was supported by INTAS-2001-0759, the Russian Foundation for Basic Research 02-04-48614 and 2006.20034 grants, as well as by the Physico-Chemical Biological Program of the Russian Academy of Sciences.


   REFERENCES
Top
ABSTRACT
MAIN TEXT
REFERENCES
 
Chen, J. H., Hsieh, Y. Y., Hsiau, S. L., Lo, T. C. & Shau, C. C. (1999). Characterization of insertions of IS476 and two newly identified insertion sequences, IS1478 and IS1479, in Xanthomonas campestris pv. campestris. J Bacteriol 181, 1220–1228.[Abstract/Free Full Text]

Feng, Q., Moran, J. V., Kazazian, H. H., Jr & Boeke, J. D. (1996). Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905–916.[Medline]

International Human Genome Sequencing Consortium (2001). Initial sequencing and analysis of the human genome. Nature 409, 860–921.[CrossRef][Medline]

Lebedev, Y. B., Belonovitch, O. S., Zybrova, N. V., Khil, P. P., Kurdyukov, S. G., Vinogradova, T. V., Hunsmann, G. & Sverdlov, E. D. (2000). Differences in HERV-K LTR insertions in orthologous loci of humans and great apes. Gene 247, 265–277.[CrossRef][Medline]

Lower, R., Lower, J. & Kurth, R. (1996). The viruses in all of us: characteristics and biological significance of human endogenous retrovirus sequences. Proc Natl Acad Sci U S A 93, 5177–5184.[Abstract/Free Full Text]

Mager, D. & Medstrand, P. (2002). Retroviral repeat sequences. In Encyclopedia of the Human Genome. Edited by K. Gardiner. London: Nature Publishing Group. Available at http://www.ehgonline.net/

Mamedov, I., Batrak, A., Buzdin, A., Arzumanyan, E., Lebedev, Y. & Sverdlov, E. D. (2002). Genome-wide comparison of differences in the integration sites of interspersed repeats between closely related genomes. Nucleic Acids Res 30, e71.[Abstract/Free Full Text]

Medstrand, P., van de Lagemaat, L. N. & Mager, D. L. (2002). Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res 12, 1483–1495.[Abstract/Free Full Text]

Morrish, T. A., Gilbert, N., Myers, J. S., Vincent, B. J., Stamato, T. D., Taccioli, G. E., Batzer, M. A. & Moran, J. V. (2002). DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat Genet 31, 159–165.[CrossRef][Medline]

Sverdlov, E. D. (1998). Perpetually mobile footprints of ancient infections in human genome. FEBS Lett 428, 1–6.[CrossRef][Medline]

Sverdlov, E. D. (2000). Retroviruses and primate evolution. Bioessays 22, 161–171.[CrossRef][Medline]

Tanaka, I. & Ishihara, H. (1995). Unusual long target duplication by insertion of intracisternal A-particle element in radiation-induced acute myeloid leukemia cells in mouse. FEBS Lett 376, 146–150.[CrossRef][Medline]

Wilkinson, D. A., Mager, D. L. & Leong, J. C. (1994). Endogenous human retroviruses. In The Retroviridae, vol. 3. Edited by J. A. Levy. New York: Plenum.

Received 14 October 2003; accepted 22 February 2004.



This Article
Abstract
Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Download to citation manager
Google Scholar
Articles by Mamedov, I. Z.
Articles by Sverdlov, E. D.
Articles citing this Article
PubMed
PubMed Citation
Articles by Mamedov, I. Z.
Articles by Sverdlov, E. D.
Agricola
Articles by Mamedov, I. Z.
Articles by Sverdlov, E. D.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
INT J SYST EVOL MICROBIOL MICROBIOLOGY J GEN VIROL
J MED MICROBIOL ALL SGM JOURNALS