Physiologische Chemie I, Biozentrum, University of Würzburg, Würzburg, Germany
Long-terminal-repeat (LTR) retrotransposons from the Ty3/Gypsy superfamily have been detected in various eukaryotic taxa, including some vertebrate lineages (lampreys, bony fishes, amphibians, and reptiles; Miller et al. 1999
and references therein). Nevertheless, molecular and database screenings failed to detect such elements in the genome of mammals. Considering the huge amount of sequence information available on mammalian genomes and transcriptomes, this suggested that Ty3/Gypsy retrotransposons either have been lost or are present at an extremely low copy number in mammals.
By examination of public sequence databases, we identified Ty3/Gypsy-like sequences in mammals. The human protein KIAA1051, obtained from a brain cDNA library (Kikuno et al. 1999
), shows significant similarities to the Gag structural core protein of some Ty3/Gypsy retrotransposons from the Ty3 family, including Sushi from the pufferfish Fugu rubripes (Poulter and Butler 1998
) (42.5% similarities, expected value E = 10-24) and Skippy (Anaya and Roncero 1995
), Maggy (Farman et al. 1996
), and Cft1 (Curtis and Oliver 1996
) from different fungi (fig. 1
). In particular, the C-terminal putative nucleic acidbinding site CX2CX4HX4C is conserved. No significant similarity to other families of Ty3/Gypsy retrotransposons and retroviruses was found. Two additional human proteins, the brain protein KIAA1318 (accession number BAA92556; Nagase et al. 2000
) and the putative nuclear protein LDOC1 (Nagasaki et al. 1999
), also showed significant similarities to the KIAA1051 and Sushi Gag(-like) proteins, but outside of the nucleic acidbinding domain.
|
The KIAA1051 gene has been mapped to human chromosome 7 (Kikuno et al. 1999
). Despite the large amount of human genomic sequences present in public databases (more than 80% estimated coverage with working draft sequences at the time of the analysis), nucleotide database searching using the gag/pol-like sequence as a query identified the KIAA1051 gene only in the human low-pass sequence sampling clone RP11-648L18 (R. H. Waterston; accession number AC069292, htgs database). Other sequences from this clone (e.g., the epsilon sarcoglycan gene) have been already mapped to chromosome 7 like the KIAA1051 gene. This confirmed the location of the KIAA1051 gene and suggested that this retrotransposon-like sequence was not reiterated (or at an extremely low level) in the human genome. Analysis of the database genomic sequences, as well as our PCR analysis (data not shown), showed an absence of introns in the KIAA1051 gene, consistent with its retrotransposon origin (not shown). Database searching identified KIAA1051 ESTs from numerous different human organs and tissues. Their high level of identity to the KIAA1051 sequence was compatible with the presence of a unique transcribed gene.
MyEF-3, a putative mouse homolog of KIAA1051 isolated from a brain expression library (Steplewski et al. 1998
), displays 69% similarity (E = 2 x 10-62) to the KIAA1051 Gag-like protein. The potential nucleic acidbinding site is conserved (fig. 1 ). Because the published MyEF-3 cDNA was too short to detect a pol-like sequence, we amplified by PCR a MyEF-3-like sequence from mouse genomic DNA (accession number AF302691; see legend of fig. 1
). One unique sequence, called MyEF-3*, was reproducibly obtained using different primer combinations and mouse genomic DNA sources. MyEF-3* shows more than 98% nucleotide identity to the published MyEF-3 cDNA sequence. Two frameshifts are present in the published MyEF-3 gag-like sequence compared with KIAA1051 and MyEF-3*: a 2-nt deletion, followed approximately 150 nt downstream by a 1-nt deletion, restoring the original reading frame. This did not introduce any stop codon in MyEF-3 but generated a
50-aa central region with a much lower degree of similarity to MyEF-3* and KIAA1051 (from PLGYCQ to DQTSAP in MyEF-3; fig. 1 ). Apart from this region of lower similarity, the Gag-like sequences of MyEF-3 and MyEF-3* are absolutely identical (fig. 1 ). MyEF-3* also contains a partial pol-like sequence (fig. 1 ), which is identical to the unique related mouse EST present in public databases (accession number AW209996). Compared with KIAA1051, the protease-encoding region of MyEF-3* is separated from the partial RT-encoding region by an
600-nt insertion very rich in oligonucleotide repeats and introducing stop codons into the partial pol open reading frame (this insertion was removed for protein sequence analysis). RT-PCR analysis showed that the whole gag/partial pol MyEF-3* sequence amplified from genomic DNA was transcribed in mouse brain and that the 600-nt insertion was not spliced out (data not shown).
The KIAA1051 pol-like nucleotide sequence is 89% identical to the TRT1 (partial) cDNA sequence from mink lung epithelial cells, whose RNA level is decreased after transforming growth factor ß treatment (Ralph, McClelland, and Welsh 1993
). This sequence was too short to detect any gag-like gene. Related ESTs were also detected in rats and bovines (not shown), suggesting, in contrast to earlier reports, that a Ty3/Gypsy-like sequence is widely distributed and even transcribed in mammals.
The phylogeny of the Ty3/Gypsy superfamily is difficult to establish using individual enzymatic domains (Malik and Eickbush 1999
). Accordingly, analysis of RT domains 1 and 2 did not allow us to determine the phylogenetic position of mammalian Pol-like sequences (data not shown). Extension of the region of analysis about 50 amino acids upstream of RT domain 1 placed the mammalian sequences into a group of Sushi/Maggy-related elements within the Ty3 family (fig. 2
), supporting the database analysis results. Analysis of sequences from the protease active site to RT domain 2 supported the same conclusion (not shown) but was not taken into consideration because of the high level of sequence variability and ambiguous alignments. The low level of conservation between Gag sequences within the Ty3/Gypsy superfamily (including presence or absence of CX2CX4HX4C domains, depending on the element) allowed only the comparison of closely related Gag sequences. This analysis suggested a close relationship of KIAA1051 and MyEF-3 sequences to the Sushi Gag protein from the fish F. rubripes (fig. 2
).
|
If the binding of MyEF-3 in the promoter of the MBP is of significance for the expression of the myelin gene, as suggested by Steplewski et al. (1998)
, our analysis would propose a very unusual hypothesis: a Ty3/Gypsy-related retrotransposon would have evolved into a regulatory DNA-binding protein in mammals. It has already been suggested that important cellular functions like the RAG1/RAG2 recombinase (Agrawal, Eastman, and Schatz 1998
; Hiom, Melek, and Gellert 1998
) and the RT telomerase (Eickbush 1997
) might have evolved from transposable elements. More recently, a retroviral envelope-like protein called syncytin has been implicated in human placental morphogenesis (Mi et al. 2000
). While retroviral envelope proteins and syncytin probably have similar molecular functions (promotion of cell fusion; Mi et al. 2000
), MyEF-3 might represent a case of a genome parasite protein having evolved into an extremely divergent host molecular function.
The rate of substitutions between KIAA1051 and MyEF-3 gag-like genes is fivefold higher at synonymous sites than at nonsynonymous sites, and even 13.8-fold higher in the 130-nt region surrounding the nucleic-acid-binding-domainencoding sequence. The rate of substitutions between KIAA1051 and MyEF-3 pol-like genes is 3.8-fold higher at synonymous sites. Such values are suggestive of maintenance of the functionality of the gag and (partial) pol genes during evolution. This might be related to retrotranspositional activity of the KIAA1051-related retrotransposons after separation of the mouse lineage from the human/mink lineage approximately 110 MYA (Kumar and Hedges 1998
).
On the other hand, KIAA1051, TRT1, and MyEF-3* show a much lower degree of conservation with other Sushi-related retrotransposons downstream of RT domains 1 and 2, including the absence of the same extremely conserved residues in domains 3 and 4 (indicated with arrows in fig. 1
according to Xiong and Eickbush [1990
]). The degree of similarity within this region is relatively high between the three mammalian elements. Hence, this suggests that the loss of conserved RT residues occurred in a common ancestral retrotransposon before separation of the mouse lineage from the human/mink lineage. We cannot exclude the possibility that the ability to retrotranspose might have been conserved even with these mutations. Nevertheless, some of the lost residues are extremely conserved in all autonomous retroelements (Xiong and Eickbush 1990
), strongly suggesting that they are necessary for retrotransposition. Therefore, a defective retrotransposon might have been present for at least 100 Myr in the genome of mammals without retrotransposing. Despite its inactivity, this retrotransposon-related element was not lost, its Gag- and protease-encoding sequences did not suffer any disrupting mutation, and its gene product was even selected for its functionality. This suggests that a retrotransposon-like protein has undertaken a new selected function independent of retrotransposition and uncovers a fascinating aspect of genome ecology in living organisms.
Footnotes
1 Keywords: KIAA1051
MyEF-3
TRT1
Sushi,
Gag
Pol, phylogeny
2 Address for correspondence and reprints: Jean-Nicolas Volff, Physiological Chemistry I, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany. E-mail: volff{at}biozentrum.uni-wuerzburg.de
literature cited
Agrawal, A., Q. M. Eastman, and D. G. Schatz. 1998. Transposition mediated by RAG1 and RAG2 and its implications for the evolution of the immune system. Nature 394:744751.
Anaya, N., and M. I. Roncero. 1995. Skippy, a retrotransposon from the fungal plant pathogen Fusarium oxysporum. Mol. Gen. Genet. 249:637647.
Curtis, M. D., and R. P. Oliver. 1996. Gypsy-class retrotransposon sequences in organisms related to the leaf mould fungus Cladosporium fulvum. Microbiol. Res. 151:113119.
Eickbush, T. H. 1997. Telomerase and retrotransposons: which came first? Science 277:911912.
Farman, M. L., Y. Tosa, N. Nitta, and S. A. Leong. 1996. MAGGY, a retrotransposon in the genome of the rice blast fungus Magnaporthe grisea. Mol. Gen. Genet. 251:665674.
Hiom, K., M. Melek, and M. Gellert. 1998. DNA transposition by the RAG1 and RAG2 proteins: a possible source of oncogenic translocations. Cell 94:463470.
Kikuno, R., T. Nagase, K. Ishikawa, M. Hirosawa, N. Miyajima, A. Tanaka, H. Kotani, N. Nomura, and O. Ohara. 1999. Prediction of the coding sequences of unidentified human genes. XIV. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro. DNA Res. 6:197205.
Kumar, S., and S. B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917920.
Malik, H. S., and T. H. Eickbush. 1999. Molecular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J. Virol. 73:51865190.
Mi, S., X. Lee, X. Li et al. (12 co-authors). 2000. Syncytin is a captive retroviral envelope protein involved in human placental morphogenesis. Nature 403:785789.
Miller, K., C. Lynch, J. Martin, E. Herniou, and M. Tristem. 1999. Identification of multiple Gypsy LTR-retrotransposons lineages in vertebrate genomes. J. Mol. Evol. 49:358366.[ISI][Medline]
Nagasaki, K., T. Manabe, H. Hanzawa, N. Maass, T. Tsukada, and K. Yamaguchi. 1999. Identification of a novel gene, LDOC1, down-regulated in cancer cell lines. Cancer Lett. 140:227234.[ISI][Medline]
Nagase, T., R. Kikuno, K. Ishikawa, M. Hirosawa, and O. Ohara. 2000. Prediction of the coding sequences of unidentified human genes. XVI. The complete sequences of 150 new cDNA from brain which code for large proteins in vitro. DNA Res. 7:6573.
Poulter, R., and M. Butler. 1998. A retrotransposon family from the pufferfish (fugu) Fugu rubripes. Gene 215:241249.
Ralph, D., M. McClelland, and J. Welsh. 1993. RNA fingerprinting using arbitrarily primed PCR identifies differentially regulated RNAs in mink lung (My1Lu) cells growth arrested by transforming growth factor ß1. Proc. Natl. Acad. Sci. USA 90:1071010714.
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425.[Abstract]
Steplewski, A., B. Krynska, A. Tretiakova, S. Haas, K. Khalili, and S. Amini. 1998. MyEF-3, a developmentally controlled brain-derived nuclear protein which specifically interacts with myelin basic protein proximal regulatory sequences. Biochem. Biophys. Res. Commun. 243:295301.[ISI][Medline]
Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:33533362.[Abstract]