Sequence Analysis of Transposable Elements in the Sea Squirt, Ciona intestinalis

Martin W. Simmen and Adrian Bird

Institute of Cell and Molecular Biology, University of Edinburgh, Edinburgh, Scotland


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 literature cited
 
A systematic search of 1 Mb of genomic sequences from the sea squirt, Ciona intestinalis, revealed the presence of six families of transposable elements. The Cigr-1 retrotransposon contains identical 245-bp long terminal repeats (LTRs) and a 3,630-bp open reading frame (ORF) encoding translation products in the same order as the domains characteristic of gypsy/Ty3-type LTR retrotransposons. The closest homologs of the reverse transcriptase domain were in gypsy elements from Drosophila and the sushi element from the pufferfish. However, the capsid-nucleocapsid region shows the clearest homology to an echinoderm element, Tgr1. Database searches also indicated two classes of non-LTR retrotransposon, named Cili-1 and Cili-2. The Cili-1 sequences show matches to regions of the ORF2 product of mammalian L1 elements. The Cili-2 sequences possess similarity to the RNaseH domain of Lian-Aa1, a mosquito non-LTR retrotransposon. The most abundant element was a short interspersed nucleotide element named Cics-1 with a copy number estimated at 40,000. Cics-1 consists of two conserved domains separated by an A-rich stretch. The 172-bp 5' domain is related to tRNA sequences, whereas the 110-bp 3' domain is unique. Cics-1 is unusual, not just in its modular structure, but also in its lack of a 3' poly(A) tail or direct flanking repeats. A second abundant element, Cimi-1, has an A+T-rich 193-bp consensus sequence and 30-bp terminal inverted repeats (TIRs) and is usually flanked by A+T-rich 2–4-bp putative target site duplications—characteristics of miniature inverted-repeat transposable elements found in plants and insects. A single 2,444-bp foldback element was found, possessing long TIRs containing an A+T-rich internal domain, an array of subrepeats, and a flanking domain at the TIR ends; this is the first example of a chordate foldback element. This study provides the first systematic characterization of the families of transposable elements in a lower chordate.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 literature cited
 
Eukaryote genomes harbor a bewildering variety of transposable elements, the function and evolutionary significance of which are under debate (e.g., Britten and Davidson 1969Citation ; Doolittle and Sapienza 1980Citation ; Orgel and Crick 1980Citation ; Brookfield 1995Citation ; Labrador and Corces 1997Citation ). These elements can be split into two broad classes according to their modes of transposition (Finnegan 1992Citation ). Class I elements perform replicative transposition via an RNA intermediate which is then reverse transcribed into a cDNA molecule and integrated into the genome. Such "retroelements" fall into three categories. First, the long terminal repeat (LTR) retrotransposons encode the proteins necessary for their own replication and are closely related to retroviruses (Boeke and Stoye 1997Citation ). Second, the non-LTR retrotransposons are also autonomous but have a different replicative mechanism, which is believed to utilize the poly(A) tail in their 3' ends (Luan et al. 1993Citation ). Finally, short interspersed nucleotide elements (SINEs) are short elements containing an RNA polymerase III promoter and a 3' end which is A-rich or, less frequently, consists of simple repeats. In most SINE families the pol III promoter region is derived from a particular tRNA gene, although the abundant human Alu sequences have homology to part of the 7SL RNA gene (Deininger 1989Citation ). The discovery that some SINEs are homologous to the extreme 3' ends of non-LTR retrotransposons (Ohshima et al. 1996Citation ; Okada et al. 1997Citation ) has given support to the hypothesis that SINEs depend on the transpositional machinery of non-LTR retrotransposons for mobility (Luan et al. 1993Citation ).

Class II elements mobilize via a DNA intermediate which is excised and reintegrated elsewhere in the genome by a transposase (Plasterk 1995Citation ). Such DNA transposons possess terminal inverted repeats (TIRs) containing transposase-binding sites. Some elements contain an open reading frame (ORF) encoding transposase (e.g., P elements in Drosophila), but copies often become nonautonomous through mutation. Other element families show characteristics of DNA-mediated mobilization but neither encode transposase nor appear to be derivatives of autonomous DNA transposons. Examples are the miniature inverted-repeat transposable elements (MITEs) found in plants (Wessler, Bureau, and White 1995Citation ) and animals (Ünsal and Morgan 1995Citation ; Tu 1997Citation ) and the foldback elements distinguished by long modular TIRs containing arrays of direct subrepeats (Truett, Jones, and Potter 1981Citation ; Liebermann et al. 1983Citation ; Rebatchouk and Narita 1997Citation ).

Molecular studies on ascidian development play an important role in attempts to understand the origin of vertebrates. As primitive members of the phylum Chordata, ascidians in larval stages display many vertebrate-like characteristics, such as a dorsal nerve and a tail region. We recently analyzed the genome of one ascidian, the sea squirt, Ciona intestinalis, using sequence data from short fragments and cosmid inserts of genomic DNA to estimate the number of protein-coding genes (Simmen et al. 1998Citation ).

Apart from the partial sequencing of an LTR retrotransposon (Britten et al. 1995Citation ), we are unaware of other work on repeats in ascidians. Here, we report the systematic search for repetitive elements in the C. intestinalis sequences. Members of several element classes are described, namely, a gypsy/Ty3-type LTR retrotransposon, non-LTR retrotransposons, a tRNA-derived SINE, a MITE, and a foldback element. A report on the methylation status of some of these elements has been presented elsewhere (Simmen et al. 1999Citation ).


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 literature cited
 
DNA Sequences
Ciona intestinalis DNA sequences generated in a previous study were used (Simmen et al. 1998Citation ). These comprised 1,486 sequences from randomly generated short fragments (mean read length 592 bp) determined on a single strand (EMBL accession numbers AJ226133—AJ227618), and four cosmid sequences, referred to here by their EMBL entry names (accession numbers in parentheses): cicos1 (Z80904), cicos2 (Z79640), cicos41 (Z83760), and cicos46 (Z83861). The CIR2 amino acid sequence is given in figure 2 of Britten et al. (1995)Citation .



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 2.—Phylogenetic tree of Cigr-1 and other gypsy/Ty3-class LTR retrotransposons and retroviruses. The tree is based on the alignment of the amino acids in the reverse transcriptase domain shown in figure 1 , with the addition of two vertebrate retroviruses included for comparison: Moloney murine leukemia virus, MoMLV (AF033811), and Rous sarcoma virus, RSV (V01197). A Copia LTR retrotransposon (EMBL accession number M11240) was used to root the tree. The analysis was performed using the neighbor-joining method (Saitou and Nei 1987Citation ), and the bootstrap values shown are percentage values from 1,000 replicates performed using CLUSTALX (Thompson et al. 1997Citation )

 
Identification of Repetitive Elements
To search for known repetitive elements, all of the DNA sequences were scanned against the nonredundant NCBI databases using BLASTN and BLASTX (Altschul et al. 1990Citation ). To search for novel repetitive elements, the 1,486 short sequences were used to form a BLAST database, and the four cosmid sequences were used as queries against this. By visualizing the resulting hits with the MSPCRUNCH and BLIXEM programs (Sonnhammer and Durbin 1994Citation ), repetitive elements were identified by their occurrence in different cosmid regions and in many of the 1,486 short sequences. Preliminary consensus sequences were derived from multiple-sequence alignments of the cosmid hits for each of the putative elements so detected. Final alignments were then constructed by aggregating all the sequences from the short fragments and the cosmids with over 70% nucleotide identity to each of the preliminary consensus sequences. The Inverted program from the EGCG package (Rice et al. 1996Citation ) was used to search the cosmid sequences for inverted repeats.

Sequence Alignments and Phylogenetic Analysis
Unless stated otherwise, multiple-sequence alignments were generated using Pileup from the GCG Wisconsin Package, version 9.1 (Genetics Computer Group), using default parameters. In some cases, subsequent manual refinement was performed with an alignment editor, CINEMA (http://www.biochem.ucl.ac.uk/bsm/dbbrowser/CINEMA2.1/). Consensus sequences were derived from the multiple-sequence alignments using the GCG utility Pretty, with parameter values described in the text. Pairwise alignments were made using the GCG programs Gap and Bestfit. CLUSTAL X (Thompson et al. 1997Citation ) was used to perform the phylogenetic analysis using the neighbor-joining approach (Saitou and Nei 1987Citation ).


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 literature cited
 
Gypsy/Ty3-Class LTR Retrotransposon
The cicos41 cosmid contains a 3,630-bp ORF (starting at position 1310 on the reverse strand) flanked 45 bp upstream and 63 bp downstream by a pair of identical 245-bp direct repeats. The repeat unit begins TG ... and contains a potential polyadenylation signal at position 231. Immediately 5' of the downstream repeat is a polypurine tract. These features are characteristic of LTRs, although there is no indication of either a promoter or the tRNA primer binding site usually found 3' of the left LTR in retroviruses and retrotransposons. The LTRs are flanked by a putative 2-bp target site duplication TA. The 1,209-aa putative ORF product has similarity to domains encoded by the gag and pol genes of LTR retrotransposons. Several features indicate membership of the gypsy/Ty3 class of LTR retrotransposons, so we call the complete 4,226-bp element Cigr-1 (Ciona intestinalis gypsy/Ty3 retrotransposon). First, the order of the domains—nucleocapsid (NC), protease (PR), reverse transcriptase (RT), RNaseH (RH), integrase (IN)—is that found in gypsy/Ty3-type elements. Second, the top hits in a TBLASTN search of the NCBI nucleotide database with the ORF product were all to gypsy/Ty3 elements (P values were in the range 10-63–10-40; e.g., the hits to nomad of Drosophila melanogaster and sushi of Fugu rubripes had P values of 4 x 10-63 and 2 x 10-57, respectively), followed by hits to caulimoviruses (P > 10-37), and then to retroviruses (P > 10-22). A similar pattern emerged when searching with just the RT domain, residues 464–644 (data not shown), regarded as the most reliable domain upon which to base classification (Xiong and Eickbush 1990Citation ). An alignment of the RT domains from Cigr-1 and other LTR retrotransposons reveals all the motifs expected in RT (fig. 1 ).



View larger version (110K):
[in this window]
[in a new window]
 
Fig. 1.—Multiple-sequence alignment of reverse transcriptase (RT) sequences from Cigr-1 and other gypsy/Ty3-class retrotransposons. The sequences are ordered by the degree of identity to Cigr-1 RT. Shading is according to a 50% column consensus: amino acids are shaded black if they match a consensus and gray if they are similar to a consensus. The seven domains previously identified as highly conserved are demarcated by vertical struts and labeled I—VII (Xiong and Eickbush 1990Citation ). The alignment was constructed using CLUSTAL X (Thompson et al. 1997Citation ) and manual adjustment, and the shading is by MACBOXSHADE. The sequence details are as follows: nomad, Drosophila melanogaster (AF039416); gypsy, D. melanogaster (AF033821); ZAM, D. melanogaster (AJ000387); TED, Trichoplusia ni (M32662); Ty3-2, Saccharomyces cerevisiae (M23367); sushi, Fugu rubripes (AF030881); MAGGY, Magnaporthe grisea (L35053); Tgr1, Tripneustes gratilla (M75723); Mag, Bombyx mori (X17219); Cer1, Caenorhabditis elegans (U15406); Cigr-1, Ciona intestinalis (Z83760)

 
Several gypsy/Ty3-type elements include a short env gene coding for a polypeptide similar to that necessary for the extracellular transmission of retroviruses; thus, they are better classed as endogeneous retroviruses, e.g., gypsy (Pelisson et al. 1994Citation ) and ZAM (Leblanc et al. 1997Citation ). Cigr-1 lacks an env gene. Although there are 187 amino acids between the integrase D-DX35E motif and the C-terminus, this is consistent with the size of IN domains in other elements. There is also no evidence of the transmembrane domain expected in an Env polypeptide.

Several findings indicate that Cigr-1 is a member of a recently active family. First, Southern blots with PstI-digested genomic DNA reveal multiple bands, with different individuals having distinct banding patterns, indicating that the genomic location of Cigr-1 elements differs between individuals (Simmen et al. 1999Citation ). Second, searching the sequences from 1,486 fragments of C. intestinalis genomic DNA (see Materials and Methods) revealed four fragments with DNA similarity (BLASTN; P < 10-24) to Cigr-1. In three cases (accession numbers AJ226321, AJ227419, and AJ226522), the match was 98% or 99%, suggesting that these sequences lie in recently inserted Cigr-1–type elements. Extrapolating this hit rate directly to the genome yields a Cigr-1 copy number estimate of 75. In contrast, the match with AJ226402 has only 55% nucleotide identity. It is notable, however, that the entire AJ226402 sequence is an ORF encoding 221 amino acids of the IN domain that is 50% identical to the equivalent stretch of the Cigr-1 IN. This suggests that this genome contains two families of gypsy/Ty3-type elements.

Multiple gypsy/Ty3 subfamilies within single species have been found before (e.g., Britten et al. 1995Citation ). Further evidence of this in C. intestinalis comes from CIR2, a 176-aa fragment of the RT/RH domain of a retroelement found in C. intestinalis during a study (Britten et al. 1995Citation ) of elements in marine species by PCR amplification using degenerate primers from Tgr1, a member of the SURL gypsy/Ty3 family, in the Hawaiian sea urchin, Tripneustes gratilla (Springer, Davidson, and Britten 1991Citation ). CIR2 shows only a 24% match with the equivalent region of the Cigr-1 product. Also, a TBLASTN search shows that the sequences most similar to CIR2 are Tgr1 (P = 2 x 10-24) and the silkworm gypsy/Ty3 element Mag (P = 3 x 10-22), with the match to Cigr-1 being weak (P = 0.003). Thus, distinct gypsy/Ty3 families in C. intestinalis can be more similar to elements in other species than to each other.

Evolutionary Relationship of Cigr-1 to Other gypsy/Ty3 Elements
To investigate Cigr-1's relationship to LTR retrotransposons in other species, a phylogenetic analysis was performed. This was based on the alignment in figure 1 of the RT domains of Cigr-1 and representative LTR retrotransposons, plus two retroviruses, using the copia element from D. melanogaster to root the tree (the complete alignment is entry ds43388 in the EMBL sequence alignment database). The neighbor-joining tree (fig. 2 ) gives a phylogeny broadly similar to those found in previous studies (e.g., Malik and Eickbush 1999Citation ). As C. intestinalis is a nonvertebrate chordate, we were interested in Cigr-1's relationship to the puffer fish element sushi (Poulter and Butler 1998Citation ), which has been shown to be representative of most putative vertebrate gypsy/Ty3 elements found to date (Miller et al. 1999Citation ). Miller et al. (1999)Citation concluded that there are at least two, and possibly four, vertebrate gypsy/Ty3 lineages (on the basis of partial RT sequences), with the non-sushi-like vertebrate elements being related to the Mag/Tgr1 group (discussed in Springer and Britten 1993Citation ). The fact that the sushi-like elements cluster with fungal elements (e.g., MAGGY) in phylogenetic reconstructions, coupled with their apparent absence in other deuterostomes, has led to the conjecture (e.g., Poulter and Butler 1998Citation ; Miller et al. 1999Citation ) that they arose by horizontal transmission from either fungi or plants to an early vertebrate.

Cigr-1 affords a test of this idea, for if a horizontal transmission event occurred earlier in the primitive chordates, then the descendant lineage in C. intestinalis should form a sister group to the vertebrate sushi-like elements. The fact that Cigr-1 and sushi are not neighbors (fig. 2 ) suggests either that the putative horizontal transmission event took place after the divergence of the ascidians from the protovertebrate line or that a family of sushi-like elements exists in the Ciona genome that is yet to be discovered. Either way, the situation is complex, as the analysis also shows that nonvertebrate deuterostomes can contain gypsy/Ty3 elements bearing more similarity (in the RT domain) to the sushi-like branch than to the Mag/Tgr1 branch. Additional data and more rigorous phylogenetic analyses would clearly be useful in clarifying these issues.

In addition, a TBLASTN search revealed that in the capsid and nucleocapsid domains, Cigr-1 shows striking similarity to Tgr1 (P = 10-16) and Mag (P = 10-9), weaker similarity (P > 10-5) to Arabidopsis thaliana and HIV-1 sequences, and none to the other sequences represented in figure 2 . Lack of sequence conservation precludes a reliable phylogenetic analysis based on the CA/NC domains, but a close relationship between Cigr-1 and Tgr1 is also supported by their sharing two rare features: two CX2CX4HX4C RNA-binding motifs in the NC domain (separated in both cases by six amino acids), and only one ORF.

The differing phylogenetic signals in the CA/NC and RT/RH regions suggest that perhaps recombination events have brought together domains from previously distinct elements. We speculate that there may be a family of elements in urochordates with homology to the Mag/Tgr1 group in both the gag and the pol genes. Support for this hypothesis comes from the short CIR2 sequence which shows the strongest similarity to RT/RH of Tgr1 and Mag and little to Cigr-1. Given the evidence for recent horizontal transmission of SURL elements (a family of which Tgr1 is a member) within echinoderms (Gonzalez and Lessios 1999Citation ), another possibility, albeit a more speculative one, is of a similar, ancient transmission to C. intestinalis.

Non-LTR Retrotransposons
Searching the genomic sequences against the protein database revealed seven fragments which had non-LTR retrotransposons as their closest matches. Three show similarity (P < 10-6) to the ORF2 products of various vertebrate L1 elements. Figure 3A indicates the similarities with respect to a typical full-length mouse element, L1spa (EMBL accession number AF016099) (Naas et al. 1998Citation ). In AJ226259 and AJ226190, the pattern of L1 homology is suggestive of the 5' truncated copies known to vastly outnumber full-length copies of mammalian L1's (Voliva et al. 1983Citation ). In AJ226870, the homology is interrupted by a frameshift and three short insertions relative to L1spa. In the absence of any overlap between these sequences, there is no formal proof that they derive from insertions of a common retrotransposon. However, given their common similarity to vertebrate L1-like elements, we suggest that there is such an element—or closely related families of elements—which we label Cili-1.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 3.—Schematic illustration of the relationship between seven Ciona intestinalis fragments and non-LTR retrotransposons. The shaded portions of the fragments show similarity at the amino acid level to the corresponding regions of the full-length comparison element. Fragments are labeled with their EMBL accession numbers (RC denotes reverse complement) and the BLASTX (version 2.0.8) P value and percentage amino acid identity with the comparison full-length element over the shaded region(s). A, Alignment of the three Ciona fragments which had strong BLASTX matches to mammalian L1 elements with respect to the representative mouse element L1spa (EMBL accession number AF016099) (Naas et al. 1998Citation ). The locations of the endonuclease (EN) and RT domains in L1spa were determined by comparison to data in Feng et al. (1996) and Xiong and Eickbush (1990), respectively. AJ226870 RC also has some similarity to L1spa in its 5' end, but in a different frame from that in the rest of the fragment (data not shown). B, Alignment of the four Ciona fragments which had the strongest BLASTX matches to the mosquito element Lian-Aa1 (EMBL accession number U87543). The locations of EN, RT, and RNaseH domains in Lian-Aa1 are according to Tu, Isoe, and Guzova (1998), and the shaded portion of Lian-Aa1 is that which has similarity to one or more Ciona fragments. An arrow indicates the location of a stop codon found in all four fragments, and the horizontal bar indicates the putative 3' untranslated region

 
Figure 3B shows the relationship between four sequences that have their closest matches to Lian-Aa1 (EMBL accession number U87543), a non-LTR retrotransposon containing a single 1,189-aa ORF found in the Aedes aegypti mosquito (Tu, Isoe, and Guzova 1998Citation ). The sequences overlap each other (mean pairwise nucleotide identity by Bestfit 96.7%) and have amino acid homology to a region starting in the 3' end of the RH domain of Lian-Aa1 and extending to within seven amino acids of the Lian-Aa1 C-terminus. The putative Ciona element's ORF extends seven amino acids farther at the C-terminus than does the Lian-Aa1 ORF and is followed by a conserved stretch of 98 bp, which we speculate is a 3' untranslated region (UTR) (fig. 3B ). If these sequences reflect insertions of a non-LTR retrotransposon, we would predict multiple matches to the putative 3' UTR in a search of genomic sequence due to the high frequency of extreme 5' truncation events. This was indeed observed: BLASTN searches using the putative 3' UTR (bases 271–368 of AJ226391) found over a dozen strong matches (P < 10-6) to the short Ciona sequences and three to the cosmids (data not shown). There is therefore evidence for a second non-LTR retrotransposon in C. intestinalis, Cili-2, which, at least in its 3' end, bears more homology to insect elements than to vertebrate ones. Extrapolating the observed sample frequencies of Cili-1 and Cili-2 to the genome suggests copy numbers of approximately 50 per element.

These findings support a recent phylogenetic analysis which classified all non-LTR elements into 11 clades and suggested that each clade originated in the Precambrian era and has since evolved purely by vertical descent (Malik, Burke, and Eickbush 1999Citation ). Under this scheme, Cili-1 would likely be in the L1 clade and Cili-2 in the LOA clade. Cili-2 therefore significantly broadens the species distribution of the LOA clade, which previously contained only arthropod elements.

Composite tRNA-Derived SINE
Three short novel repetitive sequences were identified via the strategy detailed in Materials and Methods. The distribution of two of these—termed {alpha} and {gamma} (approximately 170 and 100 bp long, respectively)—in the cosmids revealed that they tended to colocalize, with {alpha} often being found upstream of one or more {gamma} sequences. An association between {alpha} and {gamma} was also evident from their distribution in the 1,486 random fragment sequences of genomic DNA.

Further analysis suggested that {alpha} and {gamma} are the primary components of a composite tRNA-derived SINE, which we label Cics-1 (fig. 4 ). The 172-bp {alpha} consensus sequence was derived from the 23 near-full-length copies of {alpha} found in the sequences (mean similarity of the copies to the consensus 94%, SD 3%). Immediately downstream of all but one of these {alpha} copies is a short poly(A) region followed by a 12-bp motif (consensus TAATCACCCACA, termed ß) and at least a partial {gamma} sequence. (A similar pattern is seen in the data set in which only the 3' end of {alpha} is complete.)



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 4.—Modular structure of the Cics-1 SINE and component consensus sequences. {alpha}: boxed regions indicate candidate RNA polymerase III promoter A and B sites; the single underlined region is tRNA-related according to the tRNAscan-SE program (Lowe and Eddy 1997Citation ), and the double-underlined region is that also found in the AFC family of SINEs in various fish species (see main text). {gamma}: the boxed TTTT motif is a potential RNA polymerase III transcriptional stop signal. Examples of various Cics-1 variants can be found in the relevant sequence annotations, as follows: {alpha}—p(A)-ß-{gamma} (AJ226662), {alpha} 3'-p(A)-ß-{gamma}3 (Z80904), p(A)-ß-{gamma}4 (AJ226976)

 
A total of 35 near-full-length {gamma} sequences or {gamma} clusters (2–4 copies head to tail) were found. Almost all (32) were flanked immediately 5' by the ß motif. In 21 cases, the flanking region also had similarity to the {alpha} 3' end (or farther), but in the other 14 cases, the {gamma} cluster appeared to be independent (e.g., in AJ226267). We derived two consensus {gamma} sequences, one from copies 3' of an {alpha}, and the other from copies not flanked by {alpha}. The former 98-bp consensus is shown in figure 4 ; the latter 98-bp consensus is 96% identical but lacks the TTTT motif.

As indicated in figure 4, a 72-bp tRNA-derived region lies at the {alpha} 5' end, containing RNA pol III promoter sites separated by 34 bp, typical of their spacing in tRNA genes and SINEs (Deininger 1989Citation ). The similarity to individual tRNA sequences is moderate; the closest match is to a tRNA-Thr gene from D. melanogaster (X02575), with 70% identity over bases 5–74 of {alpha}. The relationship was also detected by the tRNAscan-SE program (Lowe and Eddy 1997Citation ). As in most other SINEs, this segment is followed by a tRNA-unrelated sequence. BLASTN searches indicate that the closest homologs of Cics-1 in this region are AFC SINEs in African cichlids (Takahashi et al. 1998Citation ). Bases 91–112 of {alpha} perfectly match bases from almost the same location in AFCs in several cichlids (e.g., sequences AB016544 from Julidochromis transcriptus and AB009707 from Tropheus moorii; BLASTN P = 0.008). Interestingly, this tRNA-unrelated segment of AFCs has been found to be 74% identical to a 65-bp "core" sequence shared by many families of SINEs in eukaryotes (Gilbert and Labuda 1999Citation ). Comparing the reference core sequence used in that study (human Ther-1 consensus; see fig 4 of Gilbert and Labuda 1999Citation ) with Cics-1-{alpha} revealed 55% identity over bases 87–150 of {alpha}, indicating that Cics-1 belongs to the superfamily of SINEs containing this component.

In other respects, however, Cics-1 is unusual. First, whereas many SINEs have a poly(A) tail, the data indicate that {alpha}-p(A) is rarely, if ever, mobilized on its own. Rather, the almost ubiquitous presence downstream of ß and at least part of {gamma} suggests that it is the {alpha}-p(A)-ß-{gamma} fusion that is mobile. Composite SINEs have previously been found (Kaukinen and Varvio 1992Citation ; Izsvák et al. 1996Citation ; Serdobova and Kramerov 1998Citation ). Second, the 3' ends of many SINEs are similar to the 3' ends of non-LTR retrotransposons and are thought to rely on the latter for mobility (Okada et al. 1997Citation ). We therefore searched for any association between the Cili-2 3' UTR sequences and Cics-1, but none was apparent.

From sequence data alone, it is impossible to fully describe how Cics-1 arose or how it mobilizes. However, one possible scenario is that a pol III readthrough transcript of a tRNA gene or pseudogene coupled to the SINE core segment was aberrantly polyadenylated then retrotranscribed and integrated (by enzymes encoded by Cigr or Cili elements) adjacent to the ß-{gamma} sequence. This event brought into proximity the pol III promoter in {alpha} and the pol III transcriptional stop signal in the {gamma} 3' end (fig. 4 ). It remains unclear, though, how pol III transcripts of the element are reverse transcribed, as the copies lack flanking target site duplications and Cics-1 lacks the 3' poly(A) tail believed to help prime this step in other SINEs (Deininger 1989Citation ). Cics-1 is not unique in this regard: composite SINEs in artiodactyls have simple repeats at the 3' end. It may be significant that several Cics-1 copies have [CATT]2–4 at the 3' end (e.g., in AJ226376, AJ226486, and AJ227046).

Another puzzle concerns the origin of the solitary ß-{gamma} copies and {gamma} clusters, as ß-{gamma} lacks a pol III promoter. Perhaps such sequences are the result of incomplete reverse transcription of full-length transcripts (Weiner, Deininger, and Efstratiadis 1986Citation ; Tu 1999Citation ). The mechanism generating {gamma} clusters is unknown, although it may be relevant that a 71-bp sequence (not shown) containing a 69% match to bases 1–45 of {gamma} occurs in head-to-tail arrays in C. intestinalis; the cosmid cicos1, for example, contains a 29-copy array spanning bases 26504–28535. Whatever the mechanisms allowing it or parts of it to mobilize, Cics-1 has been highly successful in proliferating: extrapolation from the frequency of complete or partial hits (232 in total) in the sequence sample suggests a genomic copy number of 40,000.

Miniature Inverted-Repeat Transposable Element
A third short novel repeat was identified via the strategy detailed in Materials and Methods. Fifteen near-full-length copies were found, and a 193-bp consensus sequence was derived (fig. 5 ). Many incomplete copies, either truncated or containing internal deletions, were also found; the copy number was estimated to be 17,000. The element's features are characteristic of MITEs found in plants and insects (Wessler, Bureau, and White 1995Citation ; Tu 1997Citation ), so we label it Cimi-1. First, Cimi-1 has perfectly matching 30-bp TIRs. Second, the sequence is A+T-rich (60%). Third, the elements are usually flanked by 2–4-bp A+T-rich direct repeats, consistent with the bias to A+T-rich insertion target sites found for other MITEs (Tu 1997Citation ). Thirteen out of 15 copies are immediately flanked by TA on both sides; the two copies that do not are also those with the least similarity to the consensus (fig. 5 ), consistent with the possibility that the putative original TA repeats have been altered by mutation. Furthermore, in 6 out of these 13 cases, the direct repeat is TATA. This is a far higher frequency than expected by chance. In the flanking sequences shown in figure 5 , 27% of the dinucleotides are TA, so the proportion of copies in which the TA direct repeats are also embedded purely by chance within TATA repeats on both sides can be estimated as 0.272 = 0.07, sixfold less than the observed proportion (6/13). As TA and TATA are palindromic, this analysis cannot establish whether these repeats are target site duplications or part of Cimi-1's TIRs. In principle, this can be resolved by examining cases in which Cimi-1 inserts into a known sequence, but unfortunately no such cases were found.



View larger version (44K):
[in this window]
[in a new window]
 
Fig. 5.—Analysis of the Cimi-1 elements. A multiple-sequence alignment was constructed from the 15 full-length Cimi-1 copies found in the short genomic sequence data set using Pileup. The consensus sequence in the upper panel was derived from this alignment using the GCG Pretty program with plurality set to 7; copies found in the cosmid sequences give the same consensus (data not shown). The underlined segments indicate the terminal inverted repeats. The lower panel summarizes the multiple-sequence alignment. The sequences flanking the elements are indicated, and putative target site duplications are shown in bold italics. Degree of similarity of the element itself to the consensus is indicated in the central portion. Sequence identifiers are EMBL accession numbers

 
Recent work indicates that several MITE families share TIR sequence similarities with DNA transposons and that one such family is derived from a Tc1/mariner-class transposon (Feschotte and Mouchès 2000). Cimi-1, however, does not share TIR similarity with those MITE families. Database searches found no non-Ciona Cimi-1 homologs, but revealed copies in the UTRs and introns of various C. intestinalis genes; specifically, five homeobox genes (X83444, X83447, X83453, X83446, and AJ002028), a MyoD family gene (U80080), and myoplasmin-C1 (D42167). Other examples of Cimi-1 copies in genes can be found in figure 1 of Simmen et al. (1999)Citation . This association of MITEs with genes is also found in other animal and plant species (Wessler, Bureau, and White 1995Citation ; Tu 1997Citation ). In contrast, the abundance of Cimi-1 in a genome of only 162 Mb (Simmen et al. 1998Citation ) argues against the hypothesis (Tu 1997Citation ) that abundant MITEs will be found only in large, highly repetitive genomes.

Foldback Element
A scan of the four C. intestinalis cosmid sequences for inverted repeats found one prominent pair in Cicos41. Subsequent analysis revealed a 2,444-bp element (fig. 6 ) spanning bases 18327–20770, in which each inverted repeat arm has a modular architecture, including a tandem array of subrepeats. These features are shared by foldback transposable elements in various eukaryotes, e.g., in Drosophila (Potter 1982Citation ), the sea urchin (Hoffman-Liebermann et al. 1985Citation ), and plants (Rebatchouk and Narita 1997Citation ).



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 6.—Structure of the foldback element. Component names: IR, inverted repeat; ID, inner domain; OD, outer domain; FD, flanking domain; M, middle sequence; L, left; R, right. Relative to the element's 5' end, the components have the following nucleotide coordinates: IR-FD-L, 1–130; IR-OD-L, 131–696; IR-ID-L, 709–748; M, 749–1717; IR-ID-R, 1718–1757; IR-OD-R, 1758–2315; IR-FD-R, 2316–2444. The IR-OD 32-bp subrepeat consensus sequence is AGTCTGACAGTTGCAGGTCGTTTTTTTAAAGT

 
Dominating each inverted repeat (IR) arm is an array of contiguous 32-bp subrepeats (in previously studied foldbacks, this was located at the IR termini and labeled OD). The left IR-OD (IR-OD-L) contains 19 subrepeat copies, and IR-OD-R contains 18 copies; in both domains, the final subrepeat is incomplete. Three IR-OD-L subrepeats contain short deletions, and one IR-OD-R subrepeat contains a 2-bp insertion. Homology between the subrepeats is high; the majority-rule 32-bp consensus sequences from the two domains are identical. Internal to IR-OD is a 40-bp domain (IR-ID) which shows a high match (37/40 bases) between the sequence of one arm and the reverse complement of the other. This domain is highly A+T-rich (77%), characteristic of IR-IDs in other species (Liebermann et al. 1983Citation ; Rebatchouk and Narita 1997Citation ). A novel feature is the extra domain (IR-FD) flanking the ODs. One hundred twenty-seven of the 130 bases in IR-FD-L are matched in the complementary IR-FD-R. There is no notable sequence similarity between the different IR domains, nor do they have any database homologs. The element is immediately flanked on both sides by the sequence GATATGTTT, consistent with the 8–10-bp target insertion sequences in other foldbacks (Truett, Jones, and Potter 1981Citation ; Hoffman-Liebermann et al. 1985Citation ; Rebatchouk and Narita 1997Citation ).

How foldback elements mobilize is unknown, although their structural similarities to class II elements suggest transposition mediated by a transposase. By analogy with DNA transposons, the transposase would be expected to be encoded in the non-repetitive middle domain (M). However, most foldbacks show no evidence of M encoding proteins and the size and sequence of M can vary among members of a family (Hoffman-Liebermann et al. 1985Citation ), suggesting that most copies are nonautonomous. The Ciona M domain is only 969 bp and shows no sign of encoding a transposase: the longest ORF encodes a 99-aa product with no similarity to any known protein. The foldback in cosmid Cicos41 was the only example found, so proof that it belongs to a family of dispersed repeats will require further work. If this was found to be true, it would imply that the Ciona genome also contains an as yet unidentified DNA transposon encoding a transposase capable of also mobilizing the foldback element.


    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 literature cited
 
Ascidians and the other urochordates form a sister group of vertebrates within the chordate phylum. Larval ascidians share many morphological similarities with higher chordates (Satoh and Jeffery 1995Citation ), and this, combined with their well-characterized development (Nishida 1987Citation ), has led to their use as model systems for studies of chordate development. Until now, little has been published about the repetitive elements in ascidians or other nonvertebrate chordates. In the current work, we searched a small sample of genomic sequences (1 Mb) from the ascidian C. intestinalis and found examples from five major groups of transposable elements: a gypsy/Ty3-type LTR retrotransposon, two families of non-LTR retrotransposons, a SINE, a MITE, and a foldback element.

The discovery of these elements should aid efforts to unravel the evolutionary history and significance of various classes of eukaryotic mobile elements. Analysis of the Cigr-1 LTR retrotransposon indicates that its history may have involved domain-swapping, as the RT/RH domains are similar to those in vertebrate sushi-like elements, whereas the CA/NC domains bear close similarity to those in echinoderm SURL elements. The non-LTR elements support a recent phylogeny (Malik, Burke, and Eickbush 1999Citation ) which classed all non-LTR elements into 11 clades; the two Ciona elements fall into separate clades and significantly broaden the species distribution in the LOA clade. The two most abundant elements are a MITE and a modular tRNA-derived SINE with several unusual features: no flanking repeats, an internal poly(A) region, and a downstream segment that is also found independently in the genome. Finally, the foldback element is, to our knowledge, the first example of this class in a chordate. We also speculate that the genome may harbor additional families of elements, specifically, another branch of gypsy/Ty3 LTR retrotransposons and an autonomous DNA transposon.

Further study of these repeats should be particularly useful in tracing the origins of vertebrate elements. We have already shown that the Ciona host genome is unlikely to suppress element mobility via the mechanism often suggested as serving this function in mammalian genomes, i.e., cytosine methylation (Simmen et al. 1999Citation ). Finally, we also believe that the current study validates the strategy of systematically searching genomic sequences for repetitive elements, rather than just detecting elements from well-known families.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 literature cited
 
We thank Susan Tweedie, Jillian Charlton, and the anonymous referees for comments on the manuscript. This work was supported by the Wellcome Trust and the Biotechnology and Biological Sciences Research Council (United Kingdom). M.W.S. is supported by a Research Training Fellowship in Mathematical Biology from the Wellcome Trust.


    Footnotes
 
Howard Ochman, Reviewing Editor

1 Abbreviations: RT, reverse transcriptase; TIR, terminal inverted repeat. Back

2 Keywords: retrotransposon SINEs LINEs foldback element inverted repeat Ciona intestinalis. Back

3 Address for correspondence and reprints: Martin W. Simmen, Institute of Cell and Molecular Biology, University of Edinburgh, Mayfield Road, King's Buildings, Edinburgh EH9 3JR, United Kingdom. E-mail: m.simmen{at}ed.ac.uk Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 literature cited
 

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410.[ISI][Medline]

    Boeke, J. D., and J. P. Stoye. 1997. Retrotransposons, endogenous retroviruses, and the evolution of retroelements. Pp. 343–435 in J. M. Coffin, S. H. Hughes, and H. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, Plainview, N.Y.

    Britten, R. J., and E. H. Davidson. 1969. Gene regulation for higher cells: a theory. Science 165:349–357.

    Britten, R. J., T. J. McCormack, T. L. Mears, and E. H. Davidson. 1995. Gypsy/Ty3-class retrotransposons integrated in the DNA of herring, tunicate, and echinoderms. J. Mol. Evol. 40:13–24.[ISI][Medline]

    Brookfield, J. F. Y. 1995. Transposable elements as selfish DNA. Pp. 130–153 in D. J. Sherratt, ed. Mobile genetic elements. Oxford University Press, Oxford, England.

    Deininger, P. L. 1989. SINEs: short interspersed repeated DNA elements in higher eukaryotes. Pp. 619–636 in D. E. Berg and M. M. Howe, eds. Mobile DNA. American Society of Microbiology, Washington, D.C.

    Doolittle, W. F., and C. Sapienza. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284:601–603.

    Feng, Q., J. V. Moran, H. H. Kazazian Jr., and J. D. Boeke. 1996. Human L1 retrotransposon encodes a conserved endonuclease required for transcription. Cell 87:905–916.

    Feschotte, C., and C. Mouchès. 2000. Evidence that a family of miniature inverted-repeat transposable elements (MITEs) from the Arabidopsis thaliana genome has arisen from a pogo-like DNA transposon. Mol. Biol. Evol. 17:730–737.[Abstract/Free Full Text]

    Finnegan, D. J. 1992. Transposable elements. Curr. Opin. Genet. Dev. 2:861–867.[Medline]

    Gilbert, N., and D. Labuda. 1999. CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs. Proc. Natl. Acad. Sci. USA 96:2869–2874.

    Gonzalez, P., and H. A. Lessios. 1999. Evolution of sea urchin retroviral-like (SURL) elements: evidence from 40 echinoid species. Mol. Biol. Evol. 16:938–952.[Abstract]

    Hoffman-Liebermann, B., D. Liebermann, L. H. Kedes, and S. N. Cohen. 1985. TU elements: a heterogeneous family of modularly structured eucaryotic transposons. Mol. Cell. Biol. 5:991–1001.[ISI][Medline]

    Izsvák, Z., Z. Ivics, D. Garcia-Estefania, S. C. Fahrenkrug, and P. B. Hackett. 1996. DANA elements: a family of composite, tRNA-derived short interspersed DNA elements associated with mutational activities in zebrafish (Danio rerio). Proc. Natl. Acad. Sci. USA 93:1077–1081.

    Kaukinen, J., and S. Varvio. 1992. Artiodactyl retroposons: association with microsatellites and use in SINEmorph detection by PCR. Nucleic Acids Res. 20:2955–2958.[Abstract]

    Labrador, M., and V. G. Corces. 1997. Transposable element-host interactions: regulation of insertion and excision. Annu. Rev. Genet. 31:381–404.[ISI][Medline]

    Leblanc, P., S. Desset, B. Dastugue, and C. Vaury. 1997. Invertebrate retroviruses: ZAM, a new candidate in D. melanogaster. EMBO J. 16:7521–7531.

    Liebermann, D., B. Hoffman-Liebermann, J. Weinthal, G. Childs, R. Maxson, A. Mauron, S. N. Cohen, and L. Kedes. 1983. An unusual transposon with long terminal inverted repeats in the sea urchin Strongylocentrotus purpuratus. Nature 306:342–347.

    Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955–964.[Abstract/Free Full Text]

    Luan, D. D., M. H. Korman, J. L. Jakubczak, and T. H. Eickbush. 1993. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72:595–605.

    Malik, H. S., W. D. Burke, and T. H. Eickbush. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16:793–805.[Abstract]

    Malik, H. S., and T. H. Eickbush. 1999. Modular evolution of the integrase domain in the Ty3/gypsy class of LTR retrotransposons. J. Virol. 73:5186–5190.[Abstract/Free Full Text]

    Miller, K., C. Lynch, J. Martin, E. Herniou, and M. Tristem. 1999. Identification of multiple gypsy LTR-retrotransposon lineages in vertebrate genomes. J. Mol. Evol. 49:358–366.[ISI][Medline]

    Naas, T. P., R. J. DeBerardinis, J. V. Moran, E. M. Ostertag, S. F. Kingsmore, M. F. Seldin, Y. Hayashizaki, S. L. Martin, and H. H. Kazazian Jr. 1998. An actively retrotransposing, novel subfamily of mouse L1 elements. EMBO J. 17:590–597.[Abstract/Free Full Text]

    Nishida, H. 1987. Cell lineage analysis in ascidian embryos by intracellular injection of a tracer enzyme iii: up to the tissue restricted stage. Dev. Biol. 121:526–541.[ISI][Medline]

    Ohshima, K., M. Hamada, Y. Terai, and N. Okada. 1996. The 3' ends of tRNA-derived short interspersed repetitive elements are derived from the 3' ends of long interspersed repetitive elements. Mol. Cell. Biol. 16:3756–3764.[Abstract]

    Okada, N., M. Hamada, I. Ogiwara, and K. Ohshima. 1997. SINEs and LINEs share common 3' sequences: a review. Gene 205:229–243.

    Orgel, L. E., and F. H. C. Crick. 1980. Selfish DNA: the ultimate parasite. Nature 284:604–607.

    Pelisson, A., S. Song, N. Prud'homme, P. Smith, A. Bucheton, and V. Corces. 1994. Gypsy transposition correlates with the production of a retroviral envelope-like protein under the tissue-specific control of the Drosophila flamenco gene. EMBO J. 13:4401–4411.[Abstract]

    Plasterk, R. H. A. 1995. Mechanisms of DNA transposition. Pp. 18–37 in D. J. Sherratt, ed. Mobile genetic elements. Oxford University Press, Oxford, England.

    Potter, S. S. 1982. DNA sequence of a foldback transposable element in Drosophila. Nature 297:201–204.

    Poulter, R., and M. Butler. 1998. A retrotransposon family from the pufferfish (fugu) Fugu rubripes. Gene 215:241–249.

    Rebatchouk, D., and J. O. Narita. 1997. Foldback transposable elements in plants. Plant Mol. Biol. 34:831–835.[ISI][Medline]

    Rice, P., R. Lopez, R. Doelz, and J. Leunissen. 1996. EGCG 8.1 release notes. EMBNET News 3:2–4.

    Saitou, N., and M. Nei. 1987. The neighbour-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.[Abstract]

    Satoh, N., and W. R. Jeffery. 1995. Chasing tails in ascidians: developmental insights into the origin and evolution of chordates. Trends Genet. 11:354–359.[ISI][Medline]

    Serdobova, I. M., and D. A. Kramerov. 1998. Short retroposons of the B2 superfamily: evolution and application for the study of rodent phylogeny. J. Mol. Evol. 46:202–214.[ISI][Medline]

    Simmen, M. W., S. Leitgeb, J. Charlton, S. J. M. Jones, B. R. Harris, V. H. Clark, and A. Bird. 1999. Nonmethylated transposable elements and methylated genes in a chordate genome. Science 283:1164–1167.

    Simmen, M. W., S. Leitgeb, V. H. Clark, S. J. M. Jones, and A. Bird. 1998. Gene number in an invertebrate chordate, Ciona intestinalis. Proc. Natl. Acad. Sci. USA 95:4437–4440.

    Sonnhammer, E. L. L., and R. Durbin. 1994. A workbench for large scale sequence homology analysis. Comput. Appl. Biosci. 10:301–307.[Abstract]

    Springer, M. S., and R. J. Britten 1993. Phylogenetic relationships of reverse transcriptase and RNase H sequences and aspects of genome structure in the gypsy group of retrotransposons. Mol. Biol. Evol. 10:1370–1379.

    Springer, M. S., E. H. Davidson, and R. J. Britten 1991. Retroviral-like element in a marine environment. Proc. Natl. Acad. Sci. USA 88:8401–8404.

    Takahashi, K., Y. Terai, M. Nishida, and N. Okada. 1998. A novel family of short interspersed repetitive elements (SINEs) from cichlids: the patterns of insertion of SINES at orthologous loci support the proposed monophyly of four major groups of cichlid fishes in Lake Tanganyika. Mol. Biol. Evol. 15:391–407.[Abstract]

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTALX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876–4882.[Abstract/Free Full Text]

    Truett, M. A., R. S. Jones, and S. S. Potter. 1981. Unusual structure of the FB family of transposable elements in Drosophila. Cell 24:753–763.

    Tu, Z. 1997. Three novel families of miniature inverted-repeat transposable elements are associated with genes of the yellow fever mosquito, Aedes aegypti. Proc. Natl. Acad. Sci. USA 94:7475–7480.

    ———. 1999. Genomic and evolutionary analysis of Feilai, a diverse family of SINES in the yellow fever mosquito, Aedes aegypti. Mol. Biol. Evol. 16:760–772.[Abstract]

    Tu, Z., J. Isoe, and J. A. Guzova. 1998. Structural, genomic, and phylogenetic analysis of Lian, a novel family of non-LTR retrotransposons in the yellow fever mosquito, Aedes aegypti. Mol. Biol. Evol. 15:837–853.[Abstract]

    Ünsal, K., and G. T. Morgan. 1995. A novel group of families of short interspersed repetitive elements (SINEs) in Xenopus: evidence of a specific target site for DNA-mediated transposition of inverted-repeat SINEs. J. Mol. Biol. 248:812–823.[ISI][Medline]

    Voliva, C. F., C. L. Jahn, M. B. Comer, C. A. Hutchison III, and M. H. Edgell. 1983. The L1Md long interspersed repeat family in the mouse: almost all examples are truncated at one end. Nucleic Acids Res. 11:8847–8859.[Abstract]

    Weiner, A. M., P. L. Deininger, and A. Efstratiadis. 1986. Nonviral retroposons: genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. 55:631–661.[ISI][Medline]

    Wessler, S. R., T. E. Bureau, and S. W. White. 1995. LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. Curr. Opin. Genet. Dev. 5:814–821.[ISI][Medline]

    Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362.[Abstract]

Accepted for publication July 13, 2000.