Department of Biochemistry, University of Otago, Dunedin, New Zealand
The pursuit of genome sequencing projects has added impetus to the discovery and analysis of the repetitive DNA sequences that are present in all eukaryotes. Much interest has been focused on retroelements, elements propagated via an RNA intermediate, as these are the most abundant form of repetitive DNA in eukaryote genomes. One class of these elements is the retrotransposons, which can further be subdivided into two major groups: those that possess long terminal repeats (LTRs) and those that do not. The LTR retrotransposons contain open reading frames (ORFs) which encode the proteins (GAG and POL) required for the reverse transcription of the element mRNA and integration of the resulting cDNA into the host genome. The classification of LTR retroelements is currently under intense analysis, and several schemes have recently been proposed (Bowen and McDonald 1999
; Hull 1999
; Pringle 1999
; Cook et al. 2000
).
The phylogenetic analyses of LTR retrotransposons based on the predicted amino acid sequences of their reverse transcriptases (RTs) indicate that they fall into at least four major groups: the two extensively reviewed Ty1/copia and Ty3/gypsy groups, the BEL-like group that includes several newly discovered elements (Cook et al. 2000
), and the retroviruses. The Ty1/copia retroelement can be distinguished from members of the other three groups not only on the basis of their RT sequences, but also on the basis of the order of amino acid motifs within their POL ORFs. The order of domains in the POL ORF of Ty1/copia elements is protease, integrase, RT/RNase H, while in Ty3/gypsy elements, Bel elements, and the retroviruses, the order is protease, RT/RNase H, integrase. It has been proposed that the difference in the orders of the POL domains be the main defining feature in the classification of LTR-containing retroelements (Bowen and MacDonald 1999
; Hull 1999
; Pringle 1999
). Such a scheme is debatable, however, in that phylogenetic analyses suggest similar divergence between the BEL elements, the Ty3/gypsy elements, and the Ty1/copia elements.
Until fairly recently, it was assumed that retroviruses were confined to vertebrate hosts and that the LTR retrotransposons were present only in nonvertebrate eukaryotes. Lately, examples of retrotransposons with envelope domains characteristic of a retrovirus life cycle have been found in Drosophila and plants (Wright and Voytas 1998), including plant elements of the Ty1/copia group (Peterson-Burch et al. 2000
).
Conversely, LTR retrotransposons, or parts thereof, have been discovered in many species of vertebrates (Britten et al. 1995
; Poulter and Butler 1998
; Miller et al. 1999
). While a small number of fragments of Ty1/copia elements have been found in fish and reptiles (Flavell et al. 1995
; Roest Crollius et al. 2000
), Ty3/gypsy-type retrotransposons are represented in a wide range of vertebrate classes (Miller et al. 1999
), although none have yet been reported in birds or mammals.
Here, we describe a full-length LTR retrotransposon in a sequence from the Atlantic cod, Gadus morhua. This retrotransposon, which we call Gmr1, is unusual in that sequence comparisons clearly show that it is a member of the Ty3/gypsy group, but the order of the domains within its pol ORF is the same as that of Ty1/copia group elements. Analysis of additional vertebrate retrotransposons, including a previously undetected element in the sturgeon Acipenser baeri, suggests that there exists a new vertebrate retrotransposon lineage with an unusual POL domain order.
The DNA sequence of Gmr1, the G. morhua LTR retrotransposon is present between base pair 7520 and base pair 12788 in GenBank accession AF104899 (Widholm et al. 1999
). This entry describes 14.976 Kb of the G. morhua immunoglobulin light-chain gCL5 gene cluster region. The retrotransposon lies between two immunoglobulin light-chain L1 regions. The 5' LTR extends from position 12788 to position 12385, and the 3' LTR extends from position 7925 to position 7520. The retrotransposon is 5,269 bp long, and its structure is illustrated in figure 1a
. The LTRs differ in length by 2 bp and share 98% identity. Both LTRs start and end with a 5-bp inverted repeat, 5'-TGTGG ... CCACA-3', which is similar to the retroviral consensus (Temin 1980
). Two base pairs downstream of the 5' LTR is an 18-bp primer-binding site (PBS)(fig. 1a
). This PBS is a 17/18-bp match to the 3' end of a human tRNAAla. This suggests that a Gadus tRNAAla (not present in the database) is used to prime the minus-strand DNA synthesis of Gmr1. A run of 12 consecutive purine residues is found 2 bp upstream of the 3' LTR and likely serves as the priming site for plus-strand DNA synthesis of the retrotransposon. The element is not flanked by a short duplication of the genomic target site.
|
The coding capacity of each of the cod element ORFs is outlined below. The 5' region of Gmr1 encodes the GAG protein, which is the structural component of the virus-like particle of the LTR retrotransposons. In Gmr1, the GAG protein contains a putative Zinc-finger RNA-binding site, CX2CX4HX4C, which is found in many LTR retrotransposons. Apart from this motif, gag sequences are generally little conserved among different LTR retrotransposons. However, further sequences upstream of this motif in the cod element can be recognized as homologous to the corresponding regions of Abr1 from the sturgeon Acipenser and in a retroelement from Xenopus (1A11) already described by Greene et al. (1993)
(fig. 1b
). Gmr1, Abr1, and 1A11 also each contain, in a region 3' of the Zn-finger motif, a motif (DS/TG) resembling the active site of the aspartic protease domain of POL. The RNA-binding domain of the putative gag gene and the protease domain of the POL region in Gmr1 are encoded in the same reading frame without an intervening termination codon. The arrangements of the elements in Acipenser (Abr1) and Xenopus (1A11) are similar to that of Gmr1, with the Zn-finger and the protease encoded in the same ORF.
A phylogenetic analysis was conducted using multiple alignments of each of two POL domains (RT and IN) so that the relationship of Gmr1 to other LTR retrotransposons could be examined. The tree constructed using the seven motifs of the RT domain (Xiong and Eickbush 1990
) is shown in figure 2
. Gmr1 is clearly and robustly grouped among the Ty3/gypsy elements on the basis of the RT domain. Initial BLASTP searches had indicated that over the RT/RNaseH region, the retrotransposons most closely similar to Gmr1 were Osvaldo from D. buzzatii; Ted, the cabbage looper retrotransposon; 17.6 from Drosophila melanogaster, and Tom from Drosophila ananassae. These are all Ty3/gypsy elements.
|
A further feature of the phylogenetic analyses is that both the RT and the IN trees indicate that Gmr1 belongs within a group of Ty3/gypsy elements which is phylogenetically distinct from the group which contains sushi, the only full-length vertebrate Ty3/gypsy element previously described (Poulter and Butler 1998
).
The classification of retrotransposons has been based mainly on the phylogenetic relationships generated by comparison of the amino acid sequence of the shared characteristic, the RT domain (Xiong and Eickbush 1990
). Other characteristics, however, can also be used to distinguish one group from another: for example, the presence or absence of LTRs. The presence of an env-like domain was thought to be a distinguishing feature of vertebrate retroviruses, but it is now known that many Ty3/gypsy retrotransposons, some Ty1/copia retrotransposons, and some BEL-like elements have env genes. Another major distinction that has been used to classify LTR elements is the domain order in the POL ORF (Bowen and McDonald 1999
). Ty3/gypsy and BEL elements share the POL arrangement of the vertebrate retroviruses, PRO-RT/RNaseH-IN. In contrast, Ty1/copia elements have the order PRO-IN-RT/RNaseH. On the basis of the order of POL domains, LTR retrotransposons can therefore be divided into two groups. There are at least two problems with this bipartite division. First, phylogenetic analyses suggest that Ty1/copia elements are more closely related to Ty3/gypsy and retroviral elements than are the BEL-like retrotransposons (fig. 2
). The Ty3/gypsyvertebrate retrovirusBel grouping therefore appears to be polyphyletic. The present analysis presents a second difficulty for this bipartite division of the LTR retrotransposons. The cod element, Gmr1, and the sturgeon homolog, Abr1, belong with the Ty3/gypsy group of retrotransposons on the basis of RT and IN sequence similarity. However, their domain order in the POL ORF is that found in Ty1/copia retrotransposons. All other retroelements in which the IN domain is 5' of the RT/RNaseH domain have previously been shown to be members of the Ty1/copia phylogenetic group. Gmr1 and Abr1 do not fit easily into present schemes of LTR retrotransposon classification.
It may be suggested that the domain order in Gmr1 is simply due to some internal rearrangement subsequent to its integration. For several reasons, however, we believe that the structure shown in figure 1 represents the original form of Gmr1. The LTRs are almost identical, implying that the element was recently mobile. Gmr1 also contains all of the expected motifs of a functional element; that is, there are no essential parts missing from the IN and RT/RNaseH domains, and these domains are not internally rearranged. Indeed, only five nucleotide changes would be necessary to re-create an apparently intact element. Abr1, an element from the distantly related sturgeon, has an identical structure but has obviously been evolving independently for a sufficient length of time for all but the most highly conserved regions to diverge. It seems unlikely that Gmr1 and its closest relative would each suffer the same rearrangement subsequent to their integration, a type of rearrangement we have not encountered in any other retrotransposon. The PRO, RT/RNaseH, and IN domains are all separated by extensive spacer regions in LTR retrotransposons that would facilitate retention of functionality following internal rearrangements.
Another feature of interest is that Gmr1 appears to fall within a Ty3/gypsy lineage not previously encountered in vertebrates. This is supported by a recent analysis (Marin and Llorens 2000
) which tentatively placed the RT sequence of Gmr1 within a group named the "Osvaldo" group by Malik and Eickbush (1999)
. Gmr1 and Abr1 therefore belong within a group which is phylogenetically distinct from the two LTR retrotransposon groups previously described from vertebrates, the Tf1/sushi (Poulter and Butler 1998
) and mag/easel (Miller et al. 1999
) groups. The Tf1/sushi group contains sushi and Hsr1 from the cave salamander Hydromantes supramontis (Marracci et al. 1996
). Many vertebrate LTR retrotransposon fragments fall into the Tf1/sushi group (Miller et al. 1999
). The mag/easel group contains easel, an LTR retrotransposon fragment from the chum salmon (Tristem et al. 1995
), and some related fragments (Miller et al. 1999
). Gmr1, which is one of the few full-length vertebrate LTR retrotransposon described to date, represents a distinct vertebrate retrotransposon lineage. Further elements from the Gmr1 lineage would assist phylogenetic analysis. As the analysis of retroelements continues, not only their abundance in genomes, but also the great plasticity apparent in their structure is becoming clearer. The discovery of a lineage of vertebrate Ty3/gypsy retrotransposons with a Ty1/copia-like POL domain order illustrates this plasticity. The structure of Gmr1 and related elements would almost certainly prevent their detection by methods employing redundant PCR primers corresponding to conserved sequences in the PRO and RT domains (Miller et al. 1999
). It is therefore possible that the lineage may be widespread in vertebrates, given its occurrence in two divergent fish species and an amphibian.
Footnotes
Pekka Pamilo, Reviewing Editor
1 Keywords: vertebrate LTR retrotransposon
Gadus morhua,
POL domain order
2 Address for correspondence and reprints: Margaret Butler, Department of Biochemistry, University of Otago, P.O. Box 56, Dunedin, New Zealand. E-mail: margi{at}sanger.otago.ac.nz
.
literature cited
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:33893402.
Bowen, N. J., and J. F. McDonald. 1999. Genomic analysis of Caenorhabditis elegans reveals ancient families of retroviral-like elements. Genome Res. 9:924935.
Britten, R. J., T. J. McCormack, T. L. Mears, and E. H. Davidson. 1995. Gypsy/Ty3-class retrotransposons integrated in the DNA of herring, tunicate, and echinoderms. J. Mol. Evol. 40:1324.[ISI][Medline]
Cook, J. M., J. Martin, A. Lewin, R. E. Sinden, and M. Tristem. 2000. Systematic screening of Anopheles mosquito genomes yields evidence for a major clade of Pao-like retrotransposons. Insect Mol. Biol. 9:10917.[ISI][Medline]
Felsenstein, J. 1989. PHYLIPphylogeny inference package (version 3.2). Cladistics 5:164166.
Flavell, A. J., V. Jackson, M. P. Iqbal, I. Riach, and S. Wadell. 1995. Ty1-copia group retrotransposon sequences in amphibia and reptilia. Mol. Gen. Genet. 246:6571.[ISI][Medline]
Frame, I. G., J. F. Cutfield, and R. T. M. Poulter. 2001. New BEL-like LTR-retrotransposons in Fugu rubripes, Caenorhabditis elegans, and Drosphila melanogaster. Gene (in press).
Greene, J. M., H. Otani, P. J. Good, and I. B. Dawid. 1993. A novel family of retrotransposon-like elements in Xenopus laevis with a transcript inducible by two growth factors. Nucleic Acids Res. 21:23752381.[Abstract]
Hull, R. 1999. Classification of reverse transcriptase transcribing elements: a discussion document. Arch. Virol. 144:209214.[ISI][Medline]
Malik, H. S., and T. H. Eickbush. 1999. Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J. Virol. 73:51865190.
Marin, I., and C. Llorens. 2000. Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data. Mol. Biol. Evol. 17:10401049.
Marracci, S., R. Batistoni, G. Pesole, L. Citti, and I. Nardi. 1996. Gypsy/Ty3-like elements in the genome of the terrestrial Hydromantes (Amphibia, Urodela). J. Mol. Evol. 43:584593.[ISI][Medline]
Miller, K., C. Lynch, J. Martin, E. Herniou, and M. Tristem. 1999. Identification of multiple Gypsy LTR-retrotransposon lineages in vertebrate genomes. J. Mol. Evol. 49:358366.[ISI][Medline]
Pantazidis, A., M. Labrador, and A. Fontdevila. 1999. The retrotransposon Osvaldo from Drosophila buzzatii displays all structural features of a functional retrovirus. Mol. Biol. Evol. 16:909921.[Abstract]
Peterson-Burch, B. D., D. A. Wright, H. M. Laten, and D. F. Voytas. 2000. Retroviruses in plants? Trends Genet. 16:151152.
Poulter, R., and M. I. Butler. 1998. A retrotransposon family from the pufferfish (fugu) Fugu rubripes. Gene 215:241249.
Pringle, C. R. 1999. Virus taxonomy1999. The universal system of virus taxonomy, updated to include the new proposals ratified by the International Committee on Taxonomy of Viruses during 1998. Arch. Virol. 144:421429.[ISI][Medline]
Roest Crollius, H., O. Jaillon, C. Dasilva et al. (12 co-authors). 2000. Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis. Genome Res. 10:939949.
Temin, H. M. 1980. Origin of retroviruses from cellular moveable genetic elements. Cell 21:599600.
Tristem, M., P. Kabat, E. Herniou, A. Karpas, and F. Hill. 1995. Easel, a gypsy LTR-retrotransposon in the Salmonidae. Mol. Gen. Genet. 249:229236.[ISI][Medline]
Widholm, H., A. S. Lundback, A. Daggfeldt, B. Magnadottir, G. W. Warr, and L. Pilstrom. 1999. Light chain variable region diversity in Atlantic cod (Gadus morhua L.) Dev. Comp. Immunol. 23:231240.
Wright, D. A., and D. F. Voytas. 1998. Potential retroviruses in plants: Tat1 is related to a group of Arabidopsis thaliana Ty3/gypsy retrotransposons that encode envelope-like proteins. Genetics 149:703715.
Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:33533362.[Abstract]