Department of Biochemistry, University of Otago, Dunedin, New Zealand
Correspondence: E-mail: timg{at}sanger.otago.ac.nz.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: DIRS Ngaro retrotransposon tyrosine recombinase
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Most retrotransposons belong to one of two major classes, commonly known as the long terminal repeat (LTR) retrotransposons, and the non-LTR retrotransposons (Eickbush and Malik 2002). A typical LTR retrotransposon consists of two directly repeated LTR sequences flanking a protein-coding internal region. The LTRs regulate transcription and play essential roles in the copying of the element's RNA into DNA. The internal region often contains two long open reading frames (ORFs). The first of these, gag, encodes proteins which combine to form a particle within which the reverse transcription reactions take place. The second, pol, encodes a polyprotein bearing various enzymatic activities: aspartic protease (PR), reverse transcriptase (RT), ribonuclease H (RH), and a DDE-type (Fayet et al. 1990; Capy et al. 1997) integrase (IN). Aspartic protease cleaves the initial polyprotein translation product into its active domains; RT and RH together catalyze the copying of the element's RNA into a double-stranded DNA; and IN is involved in the insertion of this DNA into the host genome.
Non-LTR retrotransposons (Xiong and Eickbush 1988) have structures and replication mechanisms quite distinct from those of LTR retrotransposons (Malik, Burke, and Eickbush 1999). The RTs of all known LTR and non-LTR retrotransposons are, however, clearly homologous, indicating that these two major classes are derived, at least in part, from a common ancestor (Xiong and Eickbush 1990).
LTR retrotransposons can be classified into a number of distinct groups on the basis of sequence similarity and various structural features (Xiong and Eickbush 1990; Malik and Eickbush 2001). Five major groups have been described to date. These are known as the Ty3/gypsy, vertebrate retrovirus, Ty1/copia, BEL, and DIRS groups. The members of the first four of these groups are generally similar in overall structure and replication mechanism. The major differences are that some have acquired additional protein-coding domains. For instance, some, most notably the exogenous members of the vertebrate retroviral lineage, have an additional ORF, known as env, which encodes proteins that allow these elements to be infectious. In other elements, the order of domains within the pol gene has been rearranged (Goodwin and Poulter 2002). Some of these groups are widespread in eukaryotes; for instance, members of both the Ty3/gypsy and Ty1/copia groups have been found in protists, plants, fungi, and animals. On the one hand, this widespread distribution probably reflects an origin of these elements early in eukaryote evolution. On the other hand, other groups such as BEL (metazoa) and the vertebrate retroviruses (vertebrates) have a more restricted distribution, suggesting that they either arose later in eukaryote evolution or that they have been lost from (or not yet detected in) several major eukaryote lineages.
The most distinctive LTR retrotransposons are the members of the DIRS group (Goodwin and Poulter 2001; Duncan et al. 2002). These elements differ in structure from all other LTR retrotransposons, encode a distinct complement of proteins, and have different replication mechanisms. DIRS1 itself (fig. 1A), from the slime mold Dictyostelium discoideum (Cappello, Handelsman, and Lodish 1985), consists of inverted terminal repeats flanking an internal region containing genes for a putative Gag protein, RT/RH, and a putative tyrosine recombinase (YR; Goodwin and Poulter 2001). It does not encode an aspartic protease or a DDE-type integrase. The 3' end of the internal region contains a sequence known as the internal complementary region (ICR), which is complementary to the extreme ends of the element and is believed to play an essential role in the replication cycle. Elements essentially identical in structure to DIRS1 have been found in vertebrates (e.g., TnDirs1 from the pufferfish Tetraodon nigroviridis; Goodwin and Poulter 2001). The DIRS1-like elements PAT from the nematode Panagrellus redivivus (de Chastonay et al. 1992) and kangaroo from the green alga Volvox carteri (Duncan et al. 2002) each encode a similar set of proteins to DIRS1, but they have direct, rather than inverted, repeats, and these are arranged in a nested fashion along the elements (fig. 1A). The only fungal DIRS element described to date, Prt1 from the zygomycete Phycomyces blakesleeanus (Ruiz-Perez, Murillo, and Torres-Martinez 1996), has inverted terminal repeats, although these are much shorter than those in DIRS1, and it lacks an ICR. The wide distribution of DIRS elements (plants, fungi, protists, and animals) suggests that the group arose early in eukaryote evolution.
|
In this report we first describe some interesting new DIRS-like elements, and the analysis of a novel protein domain present in all members of the DIRS group. We then report the detection and analysis of a number of novel, and phylogenetically distinct tyrosine recombinaseencoding retrotransposons, which we refer to as the "Ngaro group," after an element from the zebrafish. Together, the DIRS and Ngaro groups of tyrosine recombinaseencoding retrotransposons contain elements with an impressive diversity of structures and a variety of unusual features, such as extensive overlapping ORFs, novel protein-coding domains, and spliceosomal introns.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Polymerase Chain Reactions (PCRs) and Sequencing
Danio rerio BAC clone CH211287K8 in E. coli was obtained from BACPAC Resources, Bruce Lyon Memorial Research Building, Oakland, Calif. 94609, USA. BAC DNA was prepared by alkaline lysis followed by isopropanol precipitation. The sequence of the novel retrotransposon on this clone was obtained from a series of overlapping PCR products. The PCR primers were from Proligo (http://www.gensetoligos.com/). The PCRs were performed on an Eppendorf Mastercycler Gradient instrument using the Expand High Fidelity PCR system (Roche). Sequencing was performed using an ABI377 DNA Sequencer at the University of Otago.
Accession Numbers
Accession numbers of many elements can be found in Goodwin and Poulter (2001). Sources of additional elements are described in the paragraphs that follow.
Retrotransposons
Danio rerio: DrNgaro1, AY152729; DrNgaro2, AL591418 or AL929508; DrNgaro3, AL591180; DrNgaro4, AL603743; DrNgaro5, BX000452; DrDirs2, AL645756; DrDirs3, AL713862. Caenorhabditis briggsae: CbRecom2, AC084491. Chicken: CR1, AF308606. Chlamydomonas reinhardtii: TOC1, X56231; TOC2, AV393766, BI527265; TOC3, draft genome sequence (http://genome.jgi-psf.org/chlre1/chlre1.home.html) scaffold 2543 (4435971). Coprinopsis cinerea: CcNgaro1, AACS01000194 (94647100808); CcNgaro2, AACS01000053 (580020700); CcNgaro3, AACS01000092 (2571831848); CcNgaro4, AACS01000045 (
192300199600). Fugu rubripes: Maui, AF086712. Lytechinus variegatus: LvNgaro1, AC131494; LvDirs1, AC131505. Phanerochaete chrysosporium: draft genome sequence (http://www.jgi.doe.gov/programs/whiterot.htm); PcNgaro1, scaffold66 (133564139703); PcNgaro2, scaffold72 (9020796024); PcNgaro3, scaffold256 (20807580). Also a partial sequence from strain F99 in GenBank (AY453069); PcNgaro4, scaffold11 (391412394088); PcNgaro5, scaffold141 (10703139); PcNgaro6, scaffold141 (31905648); PcNgaro7, scaffold278 (15946326); PcNgaro8, scaffold5 (173870177833). Rhizopus oryzae: RoDirs1 assembled consensus sequence, http://biocadmin.otago.ac.nz/retrobase/home.htm. Strongylocentrotus purpuratus: SpPat1, SpNgaro1, and SpNgaro2 assembled consensus sequences, http://biocadmin.otago.ac.nz/retrobase/home.htm. Trypanosoma brucei, VIPER1Tb, AC007865; VIPER2Tb, AC087701. Trypanosoma cruzi: VIPER1Tc, Y09442; VIPER2Tc, AF052831; VIPER3Tc, AC114397. Turtle: PsCR1, AB005891. Volvox carteri: kangaroo, AY137241. Xenopus laevis: XlDirs1, BJ036703, BJ044614; XlNgaro1, BG163190, BJ040278, and others; XlNgaro2, BG553380, BJ040416, and others; XlNgaro5, BE575389, BJ040291, BJ039040, and others; XlNgaro6, BQ737268, BG515648; XlNgaro7, BJ039522. Xenopus tropicalis: XtDirs1, AC144974; XtDirs2, AC145807. Note. In an earlier publication (Goodwin, Butler and Poulter 2003) LvNgaro1 was referred to as Goliath and several PcNgaro elements were referred to as Apollo.
Some of these elements are available in the Third Party Annotation Section of the DDBJ/EMBL/GenBank databases under the following accession numbers TPA: LvNgaro1, BK001253; DrNgaro2, BK001254; DrNgaro3, BK001255, DrNgaro4, BK001256; DrNgaro5, BK001717; CcNgaro1, BK001716, CcNgaro3, BK001748; LvDirs1, BK001257; DrDirs2, BK001258; DrDirs3, BK001259.
Enzymes Related to the Additional ORFs in LvNgaro1 and Some XlNgaro Elements
Escherichia coli: TEP-I, AE005230. Saccharomyces cerevisiae: IAH1, X82930. Homo sapiens: PAF-AH, BC007863. Arabidopsis thaliana: AY086102.
Bacteriophage Methyltransferases:
Haemophilus phage HP1, U24159; phage phi 4795, AJ487680; Salmonella typhimurium phage ST64B, AY055382; Shigella flexneri bacteriophage V (SfV), U82619.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In an attempt to determine the true nature of TOC1 we searched the available C. reinhardtii expressed sequence tags (EST) and genomic sequence data for elements related to TOC1. We detected, several EST sequences in GenBank (including AV393766 and BI527265) that appear to be derived from a C. reinhardtii TOC1-like element (which we name TOC2). This element contains repeat sequences very similar to those of TOC1, but it has an apparently intact coding region (not shown). Immediately upstream of one repeat sequence in TOC2 is a tyrosine recombinase gene, suggesting that TOC2 (and thus also TOC1) are indeed DIRS-related elements. Furthermore, we detected several full-length and relatively intact DIRS retrotransposons in the draft genome sequence of C. reinhardtii (http://genome.jgi-psf.org/chlre1/chlre1.home.html). One of these elements, which we name TOC3, is depicted in figure 1B. TOC3 has a very similar arrangement of repeat sequences to TOC1, suggesting that the two elements are related. The internal regions of TOC3 have the capacity to encode a protein with RT and RH domains similar in sequence to those of DIRS elements (see fig. 1 and fig. 2 in the Supplementary Material online) as well as a tyrosine recombinase (fig. 2A). It should be noted that numerous additional DIRS-related elements that appear in the C. reinhardtii draft sequence will not be analyzed further here.
|
In addition to these new DIRS-like elements from plants, an unusual DIRS-like element was identified in the sea urchin Strongylocentrotus purpuratus. This element, SpPat1, corresponds to a full-length version of an element previously identified only by its recombinase gene (SpRecom8 in Goodwin, Butler, and Poulter 2003). A full-length consensus sequence (available online at http://biocadmin.otago.ac.nz/retrobase/home.htm) was obtained from the sea urchin WGS sequence section of GenBank (as described in Materials and Methods). The structure of this element is very similar to that of TOC3 (fig. 1B), even though one is from an animal and the other is from a plant.
Finally, it is interesting to note that we have detected elements closely related to the unusual Prt1 element from the zygomycetous fungus Phycomyces blakesleeanus, in the WGS sequences of the zygomycete Rhizopus oryzae. We constructed a consensus sequence (http://biocadmin.otago.ac.nz/retrobase/home.htm) of one of these elements, RoDirs1, and found it to have a structure very similar to that of typical DIRS elements; i.e., it contains inverted terminal repeats, an ICR, and long overlapping ORFs (not shown). The finding that RoDirs1 and Prt1 are closely related (see below) suggests that their active forms should have similar structures. The unusual structure of the degenerate Prt1 element (short inverted repeats and lack of an ICR) may therefore be the result of deletions since its last active transposition. A recombination between the ICR and the right terminal repeat sequence could have converted the structure found in RoDirs1 to that found in Prt1.
DIRS-Like Elements Encode a Methyltransferase-Like Domain
It was reported that the RT/RH ORFs of DIRS-like elements encode a conserved domain, C-terminal to the last conserved motif of RH, which is of unknown function (Duncan et al. 2002). This domain is apparently absent from all other known groups of retrotransposons. We found that this conserved domain is also present in the new DIRS elements described above (fig. 1). Most interestingly, we observed that this domain is similar in sequence to the DNA adenine methyltransferases encoded by various bacteriophages, such as SfV of Shigella flexneri (Allison et al. 2002) and ST64B from Salmonella enterica (Mmolawa, Schmieger, and Heuzenroeder 2003). From an alignment of the two sets of proteins (fig. 2B) it can be seen that the DIRS element proteins share a number of conserved residues with the phage methyltransferases. Indeed, the majority of residues conserved among the phage methyltransferases are also found in the retrotransposon proteins. Included among these is a motif similar in sequence to the conserved phage (D/N)PP(Y/F) motif known to be important in binding the S-adenosylmethionine substrate (Kossykh, Schlagman, and Hattman 1993). The observed level of sequence similarity suggests that the two sets of proteins are homologous and may have related activities. At present, the role of these putative methyltransferase domains in the replication of DIRS elements can only be the subject of speculation.
A Novel Tyrosine Recombinase-Encoding Retrotransposon in Zebrafish
In our earlier report we mentioned a partial retrotransposon on sequence AL591418 from the zebrafish Danio rerio (Goodwin and Poulter 2001). This element was truncated at its 5' end and contained a corrupt RT/RH gene followed by a relatively intact recombinase gene. The recombinase encoded by this element appeared to be phylogenetically distinct from the known DIRS-like recombinases, suggesting that the element might be a different type of retrotransposon. The truncated nature of the sequence, however, prevented a thorough analysis. Recently, a large amount of zebrafish genomic sequence data has become available (http://www.sanger.ac.uk/Projects/D_rerio/). We have screened the new data for sequences related to the atypical element on entry AL591418 and can now provide a detailed description of these retrotransposons.
Using the predicted protein sequences of the AL591418 element as queries in TBLASTN searches of the available zebrafish sequence (August 2002) we detected several distinct families of related retrotransposon-like elements. Most of these elements appeared to be somewhat corrupt, having a variable number of frameshifts and premature stop codons within their ORFs, and/or large insertions and deletions (not shown), which made analyses of these elements difficult. A partial sequence of an element which appeared to be intact was, however, detected in an unfinished BAC sequence (AL844899) in the HTGS division of GenBank. To facilitate the analysis of these elements, we obtained the relevant BAC clone (CH211287K8; BACPAC Resources) and the element was completely sequenced. We have named elements of this general type (see below) Ngaro elements, after the New Zealand Maori word for fly. We refer to this specific element (Accession Number AY152729) as DrNgaro1 (Danio rerio Ngaro element no. 1).
The termini of DrNgaro1 (and related elements; see below) were defined by a series of comparisons (not shown) between multiple (>30) different copies of these elements at different genomic loci, and by comparisons between the sequences flanking the various elements and the sequences of related empty sites. This work revealed that the termini of DrNgaro1 elements comprise a set of nested direct repeats, and that the ends of these repeats correspond precisely to the ends of the elements (fig. 1C). The arrangement of repeats is similar to that in PAT and TOC1, etc., and we refer to them as A1, B1, A2, and B2. In the case of DrNgaro1 the A repeats are 157 bp long and are 100% identical. The B repeats are 158 bp long and are also 100% identical. The A and B repeats share no significant sequence similarity with each other.
The sequenced copy of DrNgaro1 contains three long ORFs. These are all located between the A1 and B1 repeats (fig. 1C). The first ORF possibly encodes a Gag-like protein, as it is in a similar position to the gag ORFs of other retrotransposons, and the predicted translation product contains three zinc fingerlike motifsCx2Cx4Hx4C, Cx7Cx2HxC, and Cx2Cx3Hx4Csimilar to those commonly found in Gag proteins. The second ORF encodes a protein with RT and RH domains (see fig. 1 and fig. 2 in the Supplementary Material online). The third ORF encodes a putative tyrosine recombinase bearing the conserved tetrad of catalytic residues (RHRY; Nunes-Duby et al. 1998) that is characteristic of these proteins (fig. 2A). Comparisons (not shown) between our sequence of DrNgaro1 and 20 additional sequences from DrNgaro1 elements available in the zebrafish WGS sequence data (available via the NCBI) suggest that the observed frameshifts between the putative Gag and RT/RH-encoding regions (fig. 1C) are mutations and that active elements most likely have these two coding regions within a single ORF. In contrast, similar comparisons reveal that most elements have the YR-encoding ORF in a different reading frame to the upstream RT/RH ORF. The fact that these two ORFs overlap (fig. 1C) raises the possibility that translation of the YR ORF is achieved by programmed ribosomal frameshifting, as has been shown for other retrotransposons, such as Ty1 (Belcourt and Farabaugh 1990) and Ty3 (Farabaugh, Zhao, and Vimaladithan 1993).
Additional DrNgaro Elements in Zebrafish
As alluded to above, DrNgaro1 is a member of one of several distinct families of related elements in the zebrafish genome. These families have often diverged quite considerably in sequence. For instance, DrNgaro1 shares 60% amino acid sequence identity with the element on sequence AL591418 (DrNgaro2) over the highly conserved RT/RH region. The overall structures of these elements (examples in fig. 1C) are, however, generally very similar to that of DrNgaro1, indicating that this structure is required for activity of these elements.
Interestingly, some elements have similar repeat structures to that of DrNgaro1, and even have sequence similarity with the nested repeats of DrNgaro1, yet do not appear to have any protein-coding capabilities. An example of such an element is DrNgaro3 (AL591180; fig. 1C). It is possible that these elements are non-autonomous, but can still transpose when supplied with the appropriate proteins in trans, as is the case with, for example, rodent VL30 elements (Carter et al. 1986; Adams et al. 1988). Other DrNgaro elements appear to have suffered internal deletions (DrNgaro4), while others may still be intact (DrNgaro5).
We also found many examples in zebrafish of Ngaro-like elements that appear to consist of just the A2B2 region (not shown). It is possible that these elements result from homologous recombination between the A1 and A2 repeats of full-length elements. Such an event would result in the loss of an element's coding region, the B1 repeat, and a total of one copy of the A repeat. Such structures may be analogous to the so-called solo LTRs that are thought to result from recombination between the LTRs of conventional LTR retrotransposons.
The overall structures of DrNgaro elements appear to be similar to those of TOC3 and SpPat1 (fig. 1). In contrast, the RT and RH domains of DrNgaro elements are not particularly similar in sequence to any known DIRS-like elements (see fig. 1 and fig. 2 in the Supplementary Material online), and appear to be phylogenetically distinct from DIRS elements (see below). In addition, none of the DrNgaro elements contains the methyltransferase-like domain which was found in all the DIRS elements. These differences suggest that, despite some structural similarities, Ngaro elements might belong to a major distinct group of tyrosine recombinaseencoding retrotransposons. To test this possibility, we sought to detect additional Ngaro-like elements in other species.
Ngaro-Like Retrotransposons Are Present in a Variety of Animals and Fungi
Animals
Ngaro-like elements were detected in EST sequences from the amphibians Xenopus laevis and Xenopus tropicalis. Although we could not compile a full-length sequence of any of these elements using the available data, by combining overlapping EST sequences we managed to assemble several full-length X. laevis recombinase genes (e.g., XlNgaro1; fig. 2A), as well as RT and RH coding domains (XlNgaro6 and 7; not shown), which are clearly similar in sequence to the zebrafish Ngaro elements.
Preliminary results suggest that, as expected, these XlNgaro elements contain repeat sequences downstream of their recombinase genes, although the exact nature of these repeats has not yet been determined. Interestingly, however, these elements each contain an additional ORF (relative to DrNgaro elements) between the end of the recombinase gene and the start of the repeat sequences. The products of these ORFs are highly conserved among the different elements. Sequence comparisons (fig. 2C) suggest that these ORFs encode proteins most closely related to the C-terminal halves of the ORF1 proteins of several vertebrate members of the CR1 clade of non-LTR retrotransposons, such as CR1 from the chicken (Haas et al. 1997) and Maui from the pufferfish Fugu rubripes (Poulter, Butler, and Ormandy 1999). As has recently been noted by Kapitonov and Jurka (2003), the ORF1 proteins of these non-LTR retrotransposons are related to a variety of cellular enzymes, including TEP-I (thioesterase) from E. coli (Huang et al. 2001), brain platelet-activating factor acetyl hydrolase (PAF-AH) from the cow (Ho et al. 1997), and the isoamyl acetate hydrolyzing esterase (IAH1) from Saccharomyces cerevisiae (Fukuda et al. 2000). Several of these related enzymes, including brain PAF-AH (Ho et al. 1997) and TEP-I (Huang et al. 2001), have been experimentally characterized and found to possess catalytic triads containing conserved serine, aspartate, and histidine residues. The proteins encoded by the retrotransposons contain the same conserved residues, and these appear within similar contexts and with similar spacing (fig. 2C). These proteins may therefore have some related enzymatic function. We shall consider their possible roles in the Discussion.
A Ngaro-like element was found in a sequence from the sea urchin Lytechinus variegatus (AC131494: Davidson et al. 2002). This element, which we call LvNgaro1, has a similar overall structure to the zebrafish DrNgaro elements (fig. 1D). LvNgaro1 also has an ORF between the recombinase gene and the B1 repeat which encodes a protein related to those of the similarly located ORFs in the XlNgaro elements (ORF4 in fig. 1D; fig. 2C). Like the zebrafish Ngaro elements, LvNgaro1 does not contain the methyltransferase-like domain characteristic of DIRS elements.
Ngaro-like elements were also identified in the WGS sequences of the sea urchin S. purpuratus. A full-length consensus sequence of one of these, SpNgaro1, and a partial sequences of another, SpNgaro2, were assembled. These elements have overall structures (not shown) similar to those of the intact zebrafish DrNgaro elements.
Small fragments of Ngaro-like RT genes were detected in sequences from several mammals including humans (AC092110, nucleotides 1902619346), rats (AC094494), and mice (AC100020). These are just small, corrupt fragments of Ngaro-like RT genes, however, and are likely to be ancient "fossil" sequences.
Fungi
Full-length and apparently intact Ngaro-like retrotransposons were detected in the draft genome sequences of the basidiomycetes Phanerochaete chrysosporium (white rot fungus; http://www.jgi.doe.gov/programs/whiterot.htm) and Coprinopsis cinerea (mushroom; WGS sequencing project accession no. AACS00000000). We refer to these elements as PcNgaro and CcNgaro, respectively. These fungal elements have overall structures similar to those of the metazoan elements described above, with nested direct repeats and a similar complement of coding domains, albeit with a few interesting differences as discussed below. Most of these fungal elements are somewhat degenerate, with occasional frameshift and nonsense mutations. In addition, several elements have been disrupted by the insertion of other transposable elements, and some appear to have undergone recombinations between their repeat sequences so that they now have structures of the form A-[coding region]-B or AB.
In contrast to the metazoan Ngaro elements described above, some of the fungal Ngaro elements, such as CcNgaro1 (fig. 1D), have their RT/RH ORF entirely overlapped by the 5' end of the recombinase ORF. This arrangement appears to be similar to that seen in the original DIRS elements (fig. 1A), where it is thought to provide a mechanism for translating the recombinase ORF in such a way that the recombinase does not end up being covalently attached to the RT/RH protein. Despite the apparent similarity in the arrangement of these overlapping ORFs, it appears that they have arisen independently, rather than by descent. This is indicated by the finding that in the DIRS-like elements the overlapping recombinase ORF is in the 1 phase relative to the RT/RH ORF (fig. 1A, and not shown). In the fungal Ngaro elements with overlapping ORFs, the recombinase ORF is in the +1 phase relative to the RT/RH ORF (fig. 1D, and not shown).
A second unusual feature of some of the fungal Ngaro elements is the apparent presence of spliceosomal introns. These have been identified at the same site in the RT/RH ORF of some Ngaro elements from both P. chrysosporium and C. cinerea (e.g., CcNgaro3; fig 1D), and also at an additional site in the YR ORF of some of the P. chrysosporium elements (fig. 3). Introns have been detected in retroelements previously; for instance, splicing is performed in vertebrate retroviruses to permit the production of envelope and accessory proteins (for a review, see Rabson and Graves 1997). In these cases, however, the excised introns actually contain protein-coding sequences, and so are unlike typical spliceosomal introns. Noncoding introns have recently been detected in some retroelements (Arkhipova et al. 2003), including the DIRS element kangaroo (Duncan et al. 2002), although they are exceptionally rare and not well understood. The presence of spliceosomal introns in fungal Ngaro elements was first suggested when comparisons (not shown) of various related elements detected the presence of poorly conserved regions, often of variable length and containing frameshifts and stop codons, in otherwise well-conserved ORFs. Comparisons of the putative protein products of these elements revealed that these poorly conserved regions appear as inserts relative to the predicted protein products encoded by other elements. More in-depth analyses revealed that the inserts are flanked by the canonical 5'-GT...AG-3' nucleotides of spliceosomal introns. The assignment of these inserts as introns has been essentially confirmed by the identification of intron-less copies of some elements (e.g., PcNgaro6 in fig. 3).
|
Ngaro-like elements were not detected in several eukaryotes whose genomes have been nearly completely sequenced, such as Drosophila melanogaster, Anopheles gambiae, Caenorhabditis elegans, Arabidopis thaliana, and Saccharomyces cerevisiae. Similarly, only small fragments were detected in the pufferfish Fugu rubripes, although these elements were abundant in the zebrafish. It thus appears that Ngaro-like elements, although found in a variety of animals and fungi, have a somewhat patchy distribution.
Evolutionary Analyses
To examine the relationship between Ngaro and DIRS elements, and between the YR- and DDE-INencoding LTR retrotransposons, phylogenetic trees were constructed, based on an alignment of RT and RH sequences. The alignment included representative sequences from the various groups of LTR retrotransposons, as well as sequences from the related hepadnaviruses and caulimoviruses. We also included some trypanosome sequences which are similar to the VIPER elements of Trypanosoma cruzi. VIPERs are recently described elements that contain RT and RH domains similar in sequence to those of LTR retrotransposons (Vazquez et al. 2000). Little else is known about these elements, however.
A tree obtained by the Neighbor-Joining method is shown in figure 4. The overall topology of this tree is similar to that of trees reported previously (e.g., Malik and Eickbush 2001), and the major groups of elements are all resolved and receive high levels of bootstrap support. On the tree, the Ngaro-like elements form a monophyletic group within the LTR retrotransposons, which is well-supported by bootstrap resampling (99% support). Within the Ngaro group, the metazoan elements form a monophyletic group (100% support). The fungal Ngaro elements which contain introns (PcNgaro13 and CcNgaro3 and 4) also group together, as do those that contain the long overlapping ORFs (CcNgaro1 and 2).
|
Perhaps the most interesting aspect of this tree is that it does not group together the Ngaro and the DIRS-like elements. Similar results were also obtained on trees constructed by alternative methods such as maximum parsimony and on trees based on separate alignments of RT and RH sequences (not shown). These findings suggest that the Ngaro and DIRS-like elements should be considered as distinct groups of YR retrotransposons. It should be noted, however, that these trees, while they do not indicate a close relationship between the Ngaro and DIRS groups, also do not rule out a sister group relationship between them (i.e., a monophyletic origin). This is because the nodes connecting well-established major groups of elements seldom receive convincing levels of support. This is likely a result of the divisions between major groups representing very ancient events (prior to the divergence of major eukaryotic lineages). Coupled with a high rate of evolution in retroelements, this has likely obscured the relationships among the various groups. Overall, the findings suggest that either the Ngaro and DIRS groups had separate origins, or alternatively, that they had a common but very ancient origin, with the divergence between the two groups occurring prior to the separation of the ancestors of animals and fungi (because representatives of each group are found in both of these kingdoms).
Trees based on an alignment of tyrosine recombinase sequences (the regions encompassing the RHRY tetrad, as shown in fig. 2A) were also constructed (not shown). Although these trees generally have low resolution, because of the great diversity among tyrosine recombinase sequences, they also separate the Ngaro and DIRS1-like elements. The trees also suggest a possible relationship between the YR retrotransposons and the Crypton group of YR-encoding DNA transposons, as discussed previously (Goodwin, Butler, and Poulter 2003).
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In this report we have identified interesting features of both the DIRS and Ngaro groups which seem worthy of further analysis. For instance, the finding that spliceosomal introns occur in members of both the DIRS and Ngaro groups is clearly a discovery worthy of additional study, and one which may have implications for the transposition mechanisms of these elements. Likewise, the presence of extensive, and independently evolved, overlapping ORFs is a matter of considerable evolutionary interest.
Also of interest is the finding that all elements of the DIRS group encode a domain with striking sequence similarity to the methyltransferases of a number of bacteriophages. The level of sequence similarity between the retrotransposon proteins and the phage methyltransferases suggests that they are homologous and may have similar activities. A methyltransferase domain has not, to our knowledge, been recognized in any retroelement before, and its role in the replication of these retrotransposons is unknown at present. It does not seem unreasonable, however, that modification of nucleic acids by methylation could serve an important role at some stage in an element's life cycle.
Similarly, several deuterostome members of the Ngaro group were found to contain a gene lying downstream of the YR gene that codes for a protein of unknown function. The predicted products of these additional genes are most similar in sequence to the C-terminal halves of the ORF1 proteins of several vertebrate members of the CR1 clade of non-LTR retrotransposons. These retrotransposon proteins all share several blocks of conserved sequences with a variety of cellular enzymes, including the critical Ser-Asp-His residues of the enzymes' catalytic triads. The sequence similarities and levels of conservation suggest that the retrotransposon proteins have some enzymatic function. The presence in all these proteins of Ser-Asp-His catalytic triads, and structural considerations (Huang et al. 2001), suggest that they may belong to the /ß hydrolase fold familya very large collection of structurally related enyzmes characterized by catalytic triads in which the catalytic residues always appear in the order nucleophile (often Ser)acid (often Asp)His (Ollis et al. 1992; Schrag and Cygler 1997; Nardini and Dijkstra 1999). Members of the
/ß hydrolase fold family are very diverse in sequence (often with no detectable sequence similarity apart from the catalytic residues), and they have a diverse range of functionsas esterases, lipases, proteases, peroxidases, and dehalogenases, among others. The retrotransposon proteins are not highly similar in sequence to any of these enzymes, but they do appear to be most closely related to a particular group of enzymes first described by Upton and Buckley (1995). This group includes TEP-I (thioesterase) from E. coli (Huang et al. 2001), brain PAF-AH (acetylhydrolase) from the cow (Ho et al. 1997), and numerous other enzymes variously described as, for example, lipases, thioesterases, and arylesterases. The diverse activities of these enzymes and the low level of sequence similarity mean that it is not a trivial task to assign a function to the putative retrotransposon enzymes. On the basis of several sequence comparisons, Kapitonov and Jurka (2003), in their paper reporting the identification of these genes in non-LTR retrotransposons, concluded that the proteins function as esterases, and they speculated that they might be involved in the penetration of cell membranes and the horizontal transmission of these elements. We suggest an alternative possibilitythat these proteins might cleave peptide bonds and play a role in protein processing and maturation, analogous to the roles played by aspartic proteases in many LTR retrotransposons. However, accurate assignment of function to these proteins will most likely have to wait until their experimental characterization.
In their original report, Cappello, Handelsman, and Lodish (1985) proposed a detailed model for the replication of the DIRS1 element. This model, based on analyses of the repeat sequences within DIRS1 and knowledge of the transcripts it produces, involves interactions between all the identified repeat sequences within the DIRS1 element, and results in a double-stranded circular DNA molecule. The model neatly explains many of the puzzling features of DIRS1, such as why some copies of the repeats are actually not perfectly complementary. Our subsequent finding, that DIRS-like elements from organisms as distantly related as vertebrates and slime molds have virtually identical structures, is consistent with the proposed model. Furthermore, the finding that DIRS-like elements all encode apparent tyrosine recombinases is consistent with the end-product of replication being a circular molecule, as this would be a more appropriate substrate for the reactions catalyzed by a tyrosine recombinase than a linear molecule. The model does not cover all aspects of DIRS1 replication, however. For instance, it does not specify what the primer of minus-strand DNA synthesis might be. The replication of Ngaro and PAT-like elements must differ in some respects from the replication of DIRS1, as Ngaro and PAT elements all contain direct, rather than inverted and complementary, repeats. We propose in figure 5 a model for the replication of these elements. As with the replication of other retrotransposons, the first step in this model is the production of a slightly less-than-full-length transcript, beginning within the A1 repeat and terminating within the B2 repeat. This RNA molecule is then copied into minus-strand DNA by reverse transcriptase. The primer for this reaction is not known, but presumably it acts at or near the 3' end of the RNA. Subsequently, the RNA of the RNA/DNA hybrid is degraded by the element's RH, leaving an almost-full-length minus-strand DNA molecule. The ends of this molecule correspond to partial copies of the A1 and B2 repeats. These repeats could anneal to the internal B1/A2 region of one of several potential nucleic acid species, such as a second full-length RNA (or a partial RNA containing just the B1/A2 region, or a plus-strand DNA synthesized using the minus-strand B1/A2 region as a template). Binding of the ends of the minus-strand DNA to this second molecule would form a largely single-stranded, gapped, circular molecule. The gap in the minus-strand DNA could be closed using the B1/A2 region as a template, followed by ligation to form a circle. Plus-strand DNA synthesis could then proceed using the minus-strand DNA as a template, and perhaps an RNA fragment resistant to RH-digestion as a primer. The final result would be a full-length, double-stranded, circular molecule which could be inserted into the host genome by recombination using the encoded tyrosine recombinase. This model thus (1) allows for the synthesis of a full-length DNA molecule from a slightly less-than-full-length RNA, (2) results in the production of a circular, double-stranded, DNA, (3) provides a primer for plus-strand DNA synthesis, and (4) utilizes all the identified repeat structures within the Ngaro and PAT-like elements. The model shares some features with the model for the replication of DIRS1, such as the use of internal copies of the element's termini to regenerate the ends and circularize the resulting DNA; but differs in other respects, such as requiring a second molecule as the template for regenerating the ends, rather than the internal region of the same molecule. The model also has some similarities with the replication of conventional LTR retrotransposons, such as strand transfers and the use of partially degraded RNA as a plus-strand primer. Several predictions can be derived from the model, opening it up for experimental testing.
|
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adams, S. E., P. D. Rathjen, C. A. Stanway, S. M. Fulton, M. H. Malim, W. Wilson, J. Ogden, L. King, S. M. Kingsman, and A. J. Kingsman. 1988. Complete nucleotide sequence of a mouse VL30 retro-element. Mol. Cell. Biol. 8:2989-2998.[ISI][Medline]
Allison, G. E., D. Angeles, N. Tran-Dinh, and N. V. Verma. 2002. Complete genomic sequence of SfV, a serotype-converting temperate bacteriophae of Shigella flexneri. J. Bacteriol. 184:1974-1987.
Aparicio, S., J. Chapman, and E. Stupka, et al. (41 co-authors). 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301-1310.
Arkhipova, I. R., K. I. Pyatkov, M. Meselson, and M. B. Evgen'ev. 2003. Retroelements containing introns in diverse invertebrate taxa. Nat. Genet. 33:123-124.[CrossRef][ISI][Medline]
Belcourt M. F., and P. J. Farabaugh. 1990. Ribosomal frameshifting in the yeast retrotransposon Ty: tRNAs induce slippage on a 7 nucleotide minimal site. Cell 62:339-352.[ISI][Medline]
Boeke, J. D., and J. P. Stoye. 1997. Retrotransposons, endogenous retroviruses, and the evolution of retroelements. Pp 343435 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.
Cappello, J., K. Handelsman, and H. F. Lodish. 1985. Sequence of Dictyostelium DIRS-1: an apparent retrotransposon with inverted terminal repeats and an internal circle junction sequence. Cell 43:105-115.[ISI][Medline]
Capy, P., T. Langin, D. Higuet, P. Maurer, and C. Bazin. 1997. Do the integrases of LTR-retrotransposons and class II element transposases have a common ancestor? Genetica 100:63-72.[CrossRef][ISI][Medline]
Carter, A. T., J. D. Norton, Y. Gibson, and R. J. Avery. 1986. Expression and transmission of a rodent retroviruses-like (VL30) gene family. J. Mol. Biol. 188:105-108.[ISI][Medline]
Davidson, E. H., J. P. Rast, and P. Oliveri, et al. (25 co-authors). 2002. A provisional regulatory gene network for specification of endomesoderm in the sea urchin embryo. Dev. Biol. 246:162-190.[CrossRef][ISI][Medline]
Day, A., and J. D. Rochaix. 1991. A transposon with an unusual LTR arrangement from Chlamydomonas reinhardtii contains an internal tandem array of 76 bp repeats. Nucleic Acids Res. 19:1259-1266.[Abstract]
Day, A., M. Schirmer-Rahire, M. R. Kuchka, S. P. Mayfield, and J.-D. Rochaix. 1988. A transposon with an unusual arrangement of long terminal repeats in the green alga Chlamydomonas reinhardtii. EMBO J. 7:1917-1927.[Abstract]
de Chastonay, Y., H. Felder, C. Link, P. Aeby, H. Tobler, and F. Muller. 1992. Unusual features of the retroid element PAT from the nematode Panagrellus redivivus. Nucleic Acids Res. 20:1623-1628.[Abstract]
Duncan, L., K. Bouckaert, F. Yeh, and D. L. Kirk. 2002. kangaroo, a mobile element from Volvox carteri, is a member of a newly recognised third class of retrotransposons. Genetics 162:1617-1630.
Eickbush, T. H., and H. S. Malik. 2002. Origins and evolution of retrotransposons. Pp 11111144 in N. L. Craig, R. Craigie, M. Gellert, and A. M. Lambowitz, eds. Mobile DNA II. ASM Press, Herndon, Va.
Farabaugh, P. J., H. Zhao, and A. Vimaladithan. 1993. A novel programed frameshift expresses the POL3 gene of retrotransposon Ty3 of yeast: frameshifting without tRNA slippage. Cell 74:93-103.[ISI][Medline]
Fayet, O., P. Ramond, P. Polard, M. F. Prere, and M. Chandler. 1990. Functional similarities between retroviruses and the IS3 family of bacterial insertion sequences? Mol. Microbiol. 4:1771-1777.[ISI][Medline]
Fukuda, K, Y. Kiyokawa, T. Yanagiuchi, Y. Wakai, K. Kitamoto, Y. Inoue, and A. Kimura. 2000. Purification and characterization of isoamyl acetate-hydrolyzing esterase encoded by the IAH1 gene of Saccharomyces cerevisiae from a recombinant Escherichia coli. Appl. Microbiol. Biotechnol. 53:596-600.[CrossRef][ISI][Medline]
Galtier, N., M. Gouy, and C. Gautier. 1996. SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput. Appl. Biosci. 12:543-548.[Abstract]
Genetics Computer Group,. 1994. Program manual for the Wisconsin package. Version 8. Genetics Computer Group, Madison, Wis.
Goodwin, T. J. D., M. I. Butler, and R. T. M. Poulter. 2003. Cryptons: a group of tyrosine-recombinaseencoding DNA transposons from pathogenic fungi. Microbiology 149:3099-3109.
Goodwin, T. J. D., and R. T. M. Poulter. 2001. The DIRS1 group of retrotransposons. Mol. Biol. Evol. 18:2067-2082.
Goodwin, T. J. D., and R. T. M. Poulter. 2002. A group of deuterostome Ty3/gypsy-like retrotransposons with Ty1/copia-like pol-domain orders. Mol. Genet. Genomics 267:481-491.[CrossRef][ISI][Medline]
Haas, N. B., J. M. Grabowski, A. B. Sivitz, and J. B. E. Burch. 1997. Chicken repeat 1 (CR1) elements, which define an ancient family of vertebrate non-LTR retrotransposons, contain two closely spaced open reading frames. Gene 197:305-309.[CrossRef][ISI][Medline]
Ho, Y. S., L. Swenson, and U. Derewenda. L., et al. (12 co-authors). 1997,. Brain acetylhydrolase that inactivates platelet-activating factor is a G-proteinlike trimer. Nature 385:89-93.[CrossRef][ISI][Medline]
Huang, Y-T., Y-C. Liaw, V. Y. Gorbatyuk, and T. H. Huang. 2001. Backbone dynamics of Escherichia coli Thioesterase/Protease I: evidence of a flexible active-site environment for a serine protease. J. Mol. Biol. 307:1075-1090.[CrossRef][ISI][Medline]
Kapitonov, V. V., and J. Jurka. 2003. The esterase and PHD domains in CR1-like non-LTR retrotransposons. Mol. Biol. Evol. 20:38-46.
Kossykh, V. G., S. L. Schlagman, and S. Hattman. 1993. Conserved sequence motif DPPY in region IV of the phage T4 Dam DNA-[N6-adenine]-methyltransferase is important for S-adenosyl-L-methionine binding. Nucleic Acids Res. 21:4659-4662.[Abstract]
Malik, H. S., W. D. Burke, and T. H. Eickbush. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16:793-805.[Abstract]
Malik, H. S., and T. H. Eickbush. 2001. Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses. Genome Res. 11:1187-1197.
Mmolawa, P. T., H. Schmieger, and M. W. Heuzenroeder. 2003. Bacteriophage ST64B, a genetic mosaic of genes from diverse sources isolated from Salmonella enterica serovar Typhimurium DT 64. J. Bacteriol. 185:6481-6485.
Nardini, M., and B. W. Dijkstra. 1999. /ß Hydrolase fold enzymes: the family keeps growing. Curr. Opin. Struct. Biol. 9:732-737.[CrossRef][ISI][Medline]
Nunes-Duby, S. E., H. Joo Kwan, R. S. Tirumalai, T. Ellenberger, and A. Landy. 1998. Similarities and differences among 105 members of the Int family of site-specific recombinases. Nucleic Acids Res. 26:391-406.
Ollis, D. L., E. Cheah, and M. Cygler, et al. (13 co-authors). 1992. The /ß hydrolase fold. Protein Eng. 5:197-211.[ISI][Medline]
Poulter, R., M. Butler, and J. Ormandy. 1999. A LINE element from the pufferfish (fugu) Fugu rubripes which shows similarity to the CR1 family of non-LTR retrotransposons. Gene 227:169-179.[CrossRef][ISI][Medline]
Rabson, A. B., and B. J. Graves. 1997. Synthesis and processing of viral RNA. Pp 205261 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.
Ruiz-Perez, V. L., F. J. Murillo, and S. Torres-Martinez. 1996. Prt1, an unusual retrotransposon-like sequence in the fungus Phycomyces blakesleeanus. Mol. Gen. Genet. 253:324-333.[CrossRef][ISI][Medline]
Schrag, J. D., and M. Cygler. 1997. Lipases and /ß hydrolase fold. Methods Enzymol. 284:85-107.[ISI][Medline]
Swofford, D. L. 1998. PAUP*. Phylogenetic analysis using parsimony (* and other methods), version 4. Sinauer Associates, Sunderland, Mass.
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-4882.
Upton, C., J., and T. Buckley. 1995. A new family of lipolytic enzymes? Trends Biochem. Sci. 20:178-179.[CrossRef][ISI][Medline]
Vazquez, M., C. Ben-Dov, H. Lorenzi, T. Moore, A. Schijman, and M. J. Levin. 2000. The short interspersed repetitive element of Trypanosoma cruzi, SIRE, is part of VIPER, an unusual retroelement related to long terminal repeat retrotransposons. Proc. Natl. Acad. Sci. USA 97:2128-2133.
Xiong, Y., and T. H. Eickbush. 1988. The site-specific ribosomal DNA insertion element R1Bm belongs to a class of non-long-terminal-repeat retrotransposons. Mol. Cell. Biol. 8:114-123.[ISI][Medline]
Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353-3362.[Abstract]