Structural and Phylogenetic Analysis of TRAS, Telomeric Repeat-Specific Non-LTR Retrotransposon Families in Lepidopteran Insects

Yoko Kubo, Satoshi Okazaki, Tomohiro Anzai and Haruhiko Fujiwara2,

Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
TRAS1 is a non-LTR retrotransposon inserted specifically into the telomeric repeat (TTAGG)n in the silkworm, Bombyx mori. To characterize the evolutionary origin of TRAS-like elements, we identified seven TRAS families (TRAS3, TRAS4, TRAS5, TRAS6, TRASY, TRASZ, and TRASW) from B. mori and four elements from two Lepidoptera, Dictyoploca japonica (TRASDJ) and Samia cynthia ricini (TRASSC3, TRASSC4, and TRASSC9). More than 2,000 copies of various Bombyx TRAS elements accumulated within (TTAGG)n sequences as unusual but orderly tandem repeats. The 5' and 3' regions were highly conserved within each class of Bombyx TRAS elements without truncation. This suggests that distinct classes of TRAS have been maintained independently by retrotransposition into (TTAGG)n. The phylogenetic tree of site-specific retroelements showed that nine TRAS families in Lepidoptera constitute a single phylogenetic group that is closely related to the R1 family that inserts specifically into arthropod 28S rDNA. The higher amino acid sequence identity from endonuclease (EN) to reverse transcriptase (RT) domains between TRAS groups (about 37%–70%) than among TRAS elements and R1Bm (about 25%–30%), may reflect the presence of some DNA structure responsible for their target specificity. Sequence comparison from EN to RT domains among non-LTR elements revealed several regions conserved only within TRAS elements. We found a highly conserved region that resembles the Myb-like DNA-binding structure, between the EN and RT domains. These regions may be involved in site-specific integration of TRAS elements into the (TTAGG)n telomeric repeats.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
The telomeres of most eukaryotes consist of short tandem repeats, called telomeric repeats, that are synthesized by a reverse-transcriptase-like enzyme, telomerase (Greider and Blackburn 1989Citation ). From 6 to 26 nucleotide repeats have been identified in the chromosomal ends of Protozoa, fungi, nematodes, plants, and vertebrates. Many species from a wide variety of insects have the (TTAGG)n telomeric repeats (Okazaki et al. 1993Citation ; Sahara, Marec, and Traut 1999Citation ). Some insects, however, have been shown to lack this type of repeat (Okazaki et al. 1993Citation ; Sasaki and Fujiwara 2000Citation ) and maintain their telomere regions by specialized and telomerase-independent mechanisms. Drosophila telomeres are elongated by two non-LTR retrotransposons, HeT-A and TART, through their retrotransposition onto the chromosomal ends (Biessmann et al. 1990Citation ; Levis et al. 1993Citation ). Other dipteran insects, such as Anopheles gambiae and Chironomus pallidivittatus, probably maintain their telomeres by recombination or gene conversion (Cohn and Edström 1991Citation ; Roth et al. 1997Citation ).

The telomeres of the silkworm, Bombyx mori, consist of telomeric repeats (TTAGG)n and harbor many types of non-LTR retrotransposons (Okazaki, Ishikawa, and Fujiwara 1995Citation ; Takahashi, Okazaki, and Fujiwara 1997Citation ; Fujiwara et al. 2000Citation ). More than 2,000 copies of these retrotransposons are inserted into the repeats in a highly sequence-specific manner. They are classified into two groups, TRAS and SART, based on their insertion sites and directions. We have speculated that more than eight families of TRAS or SART elements are present in the silkworm telomere, while the fine structures of only two families, TRAS1 and SART1, have been studied completely.

Most non–long terminal repeat (non-LTR) retrotransposons are randomly integrated into the host genome, while some have preferable target sites. Several non-LTR elements in Drosophila, Jockey (Priimagi, Mizrokhi, and Ilyin 1988Citation ), F (Minchiotti and Di Nocera 1991Citation ), and I (McLean, Bucheton, and Finnegan 1993Citation ), seem to have no target specificity. The human L1 element L1Hs also has preferable target sequences that are not so strictly defined (Jurka 1997Citation ; Cost and Boeke 1998Citation ). Some non-LTR retrotransposons, however, have a very restricted target in the genome. R1 and R2 are inserted at specific sites in 28S rDNA of insects (Jakubczak, Burke, and Eickbush 1991Citation ). Tx1L of Xenopus laevis locates in a specific site within another family of transposable elements (Tx1D) (Christensen, Pont-Kingdon, and Carroll 2000Citation ). RT1 and RT2 are inserted into the specific sites of 28S rDNA of both A. gambiae and Anopheles arabiensis (Paskewitz and Collins 1989Citation ; Besansky et al. 1992Citation ).

Recent studies have revealed that endonuclease domains encoded in non-LTR retrotransposons are involved in target recognition and cleavage (Feng et al. 1996Citation ). Endonucleases in the N-terminal region of ORF2 (L1Hs [Cost and Boeke 1998Citation ], R1Bm [Feng, Schumann, and Boeke 1998Citation ], and Tx1L [Christensen, Pont-Kingdon, and Carroll 2000Citation ]) and in the C-terminal region of R2 ORF (Yang, Malik, and Eickbush 1999Citation ) have been shown to cut their target sequence at specific sites. Recently, we also found that the TRAS1 endonuclease of the silkworm could cut the (TTAGG)n at specific sites (Anzai, Takahashi, and Fujiwara 2001Citation ). These observations indicate that the endonuclease domain makes a first cleavage on the bottom strands of the target sequences. However, it is still unknown what sequences in the EN (endonuclease) domain or any other regions within ORFs of retrotransposons are involved in target site selection or recognition.

Amino acid sequence comparison may reveal the conserved elements only among retrotransposons that have the same target sequences, leading one to speculate on the putative sequences involved in target specificity. In this study, we screened different classes of TRAS elements from the silkworm and from two other Lepidoptera. Through a comparison of amino acid sequences from EN to RT (reverse transcriptase) domains in ORF2 among the non-LTR retroelements, we found highly conserved regions only among TRAS-like elements. One of these resembles the Myb-like DNA binding structure.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
Biological Materials
A strain of B. mori, P788, was maintained in the laboratory on an artificial diet. Genomic DNAs of Dictyoploca japonica and Samia cynthia ricini were kindly provided by Dr. T. Shimada, University of Tokyo.

Cloning, PCR Amplification, and Sequence Analyses
Novel TRAS families of B. mori have been screened from a genomic lambda phage library which was used for TRAS1 isolation (Okazaki, Ishikawa, and Fujiwara 1995Citation ). Phage clones containing (TTAGG)n repeats were isolated with a 32P-labeled (TTAGG)5 probe. In addition to TRAS1, we found five classes of TRAS families, named TRAS3, TRAS4, TRASY, TRASW, and TRASZ, among (TTAGG)n-bearing phage clones. The complete sequence of TRAS3 was determined. In TRAS4, the region from the EN to the RT domain was amplified by PCR (see below) and sequenced. Only junction regions between retrotransposons and (TTAGG)n were analyzed for TRASY, TRASZ, and TRASW. Since several regions in EN and RT domains are highly conserved among non-LTR retrotransposons, we amplified the region from EN to RT (approximately 70% of the region of RT) by PCR from respective TRAS families in the silkworm and other lepidopteran insects. We designed some primers based on the consensus-degenerate hybrid oligo-nucleotide primer (CODEHOP) strategy (Rose et al. 1998Citation ). Primers used for PCR were as follows: TR4QSAG, 5'-AGGGCGCAAGATCTTCCAAAGCGCTGGCCC-3'; TR5GYKG, 5'-GTTCTTCAACAGTGAGGGGATATAAAGGAGC-3'; TR6SLEN, 5'-GCCTGGCAATACTCTCCACTGTTTTCGAGAC-3'; GTVK, 5'-GGGACTGTNAAAGCNGCNAT; CH-VVGI, 5'-ACCACCAACAACATCGTCGTRGTNGRRRTC-3'; CH-FADD, 5'-GTCTCCGTCGAAAACCAGGACCACRTCRTCNGCRAA-3'. To amplify respective classes of TRAS elements, the following primer sets were used in PCR reaction: TRAS4, TR4QSAG + CH-FADD (annealing at 61°C); TRAS5, TR5GYKG + CH-FADD (51°C); TRAS6, GTVK + TR6SLEN (51°C); TRAS in D. japonica, CH-VVGI + CH-FADD (51°C); TRAS in S. cynthia, GTVK + CH-FADD (51°C). PCR was performed for 35 cycles of 94°C for 45 s, 51–61°C for 45 s, and 72°C for 90 s. Amplified PCR products were cloned into the pGEM(R)-T Easy vector (Promega) and sequenced with ABI-310 DNA sequencer (PE Applied Biosystems).

Structural Prediction
The domains from EN to RT of non-LTR retrotransposons were aligned using the multiple alignment options in CLUSTAL W (Thompson, Higgins, and Gibson 1994Citation ). The phylogenetic tree of retrotransposons was based on the sequence of EN to RT domains using the neighbor-joining method (Saitou and Nei 1987Citation ). Secondary-structural prediction was carried out with DNASIS, version 3.7 (Hitachi).


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
Unusual Tandem Arrays of Non-LTR Retrotransposons and Short (TTAGG)n Stretches in Subtelomeres of B. mori
We previously found that several distinct classes of non-LTR retrotransposons are integrated specifically into the (TTAGG/CCTAA)n telomeric repeats of the silkworm, although most remain uncharacterized except for TRAS1. To access the overall structure of the B. mori telomere and the evolutionary origin of telomere-specific retrotransposons in insects, we further identified and characterized TRAS families in addition to TRAS1. When the telomeric repeat (TTAGG)5 was used as probe for screening a lambda EMBL-3 phage library, about 0.5% of all plaques showed positive signals. They represented clones including the internal telomeric repeats, which lie just inside a 7–8-kb stretch of (TTAGG/CCTAA)n at the extreme ends of chromosomes. Most of the positive clones screened by 32P-labeled (TTAGG)5 also included polyA stretches, which are hallmarks of non-LTR retrotransposons. Therefore, these clones were possible sources for analysis of various classes of telomeric repeat-associated retrotransposons.

From more than 100 positive clones, 8 were selected and subcloned into plasmid vectors for further characterization and sequence analysis (fig. 1 ). Based on the structural difference in the junction regions between the 5' end of retroelements and the target (TTAGG/CCTAA)n, we identified six new families, which we classified into two large groups. Five families, named TRAS3, TRAS4, TRASY, TRASW, and TRASZ, are oriented distal to the telomeric end and are adjacent to CC of the (CCTAA)n telomeric strand, similar to TRAS1 (fig. 3A ). In contrast, one family, named SART2, is oriented in the reverse direction of the TRAS groups and is inserted between the T and A nucleotides of (TTAGG)n, similar to SART1 (fig. 1 ). Restriction mapping, partial sequencing, and hybridization studies on these lambda phage clones revealed that different classes of TRAS or SART elements were clustered between short stretches of the telomeric repeats. Based on the numbers of positive clones in plaque hybridization with each class of retroelement as a probe, we speculate that more than 2,000 copies of TRAS and SART elements, which average 7.5 kb, occupy 3% of the silkworm genome: SART1, 600; TRAS1, 300; TRAS3, 300; other TRAS and SART, 50–200 per haploid genome (data not shown). As can be easily imagined from the structure of the phage clones in figure 1 , the subtelomere region of the silkworm may consist of alternate tandem arrays of retroelements (more than 40 copies at each end) and short (TTAGG/CCTAA)n sequences.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 1.—Clustering structures of different families of TRAS or SART elements in several phage clones of Bombyx mori. Schematic structures of eight lambda phage clones screened with (TTAGG)5 and plasmid subclones (see fig. 3 ) derived from each phage clone are shown. The 5'–3' direction of transcription of the retrotransposon unit is indicated by an open arrow. Six different TRAS families (TRAS1, TRAS3, TRAS4, TRASY, TRASZ, and TRASW) and two SART families (SART1 and 2) are inserted into the telomeric repeats (indicated by solid triangles) in the opposite directions and are tandemly clustered in the subtelomere region of B. mori.

 


View larger version (27K):
[in this window]
[in a new window]
 
Fig. 3.—Junction regions between the Bombyx TRAS elements and the (CCTAA)n telomeric repeats. A, The 5'-end sequences of TRAS elements. Most TRAS elements start at the same position, at nucleotide C just after CC of the (CCTAA) telomeric repeat. The consensus sequence of TRAS1 is shown at the bottom. Dots denote nucleotides identical to those of a pBT3-3 clone. Hyphens indicate gaps introduced to maximize homology. Underlines show putative sequences involved in transcription initiation of non-LTR retrotransposons (see text). B, The 3'-end sequences of TRAS elements. Dots denote nucleotides identical to those of the TRAS3-1 sequence. Each TRAS clone ends with various lengths of polyA tail. The extreme 3' end of TRAS1 (41 bp*) are not shown in the figure. N.D. = not determined

 
TRAS3 Is a Distinct Family of TRAS Elements Which Is Similar to TRAS1
The 5'-end regions of TRAS3 were highly conserved in many subclones isolated from several lambda clones (fig. 3A ). To clarify the structural features of a major class of TRAS in addition to TRAS1, we sequenced a complete unit of the TRAS3 element. Using the 5'-end region of the TRAS3 unit in {lambda}B1 as a probe, we identified a phage clone, {lambda}TRAS3-1, that included a complete unit of TRAS3. Several overlapping parts of the TRAS3 unit were subcloned separately and sequenced. A stretch of 7,988 bp of TRAS3 includes gag- and pol-like ORFs, terminates with a polyA tail, and is 136 bp longer than that of TRAS1 (fig. 2 ). Putative functional domains, including three zinc finger domains (CCHC) in ORF1 (gag-like ORF), EN, RT, RNase H (R/H), and the CCHC domain in ORF2 (pol-like ORF), were all conserved in TRAS3. The overall amino acid sequence similarities between TRAS3 and TRAS1 are 43% and 60% in ORF1 and ORF2, respectively. When the frameshift region was compared between TRAS1 and TRAS3, the two ORFs of both elements overlapped but were one nucleotide out of frame (+1 frameshift). Near the C-terminal end of ORF1, only a TGCTAA sequence (shown boxed in fig. 2 ) was conserved between TRAS1 and TRAS3, suggesting that this sequence may be involved in the frameshifting mechanism. While the +1 frameshift has been well studied in LTR-type retrotransposons, such as Ty elements in S. cerevisiae (Burck et al. 1999Citation ), it is unclear whether TRAS and other non-LTR elements have similar mechanisms for frameshifting with those in LTR-type elements. The structures of TRAS1 and TRAS3 resemble each other, but they exist as distinct groups of retrotransposon families in the silkworm genome. Southern hybridization with several parts of the TRAS3 element as probes showed only one or two prominent bands for any restriction digests (using 6-bp cutters) of genomic DNA of the silkworm, reflecting the conserved structure of major copies of TRAS3 (data not shown). The result of genomic Southern hybridization was consistent with the restriction map predicted from the sequence data for TRAS3 except that two restriction sites, HindIII and SalI (in parentheses in fig. 2 ), were missing in the data based on Southern hybridization (fig. 2 ). Like the TRAS3 element, most copies of TRAS1 elements in the genome are also highly conserved in the structure without 5' truncation (fig. 2 ; Okazaki, Ishikawa, and Fujiwara 1995Citation ). Comparison of sequences and restriction maps between TRAS1 and TRAS3 demonstrated that they could be classified as distinct families of retrotransposons.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 2.—Comparison of complete retrotransposon units for TRAS3 and TRAS1. Restriction maps of TRAS3 and TRAS1 are shown on the lines. Abbreviations for restriction sites used in the map are as follows: E, EcoRI; H, HindIII; X, XbaI; S, SalI; V, EcoRV; K, KpnI; B, BamHI; P, PstI; Xh, XhoI; Sc, SacI. The restriction sites in parentheses are missing in the data based on Southern hybridization. (CCTAA)n = telomeric repeats; An, polyA tail. A closed box indicates the (CA) repeats in the 5' untranslated region of both TRAS elements. Schematic structures of two open reading frames (ORFs) (gag and pol) are shown below the restriction maps. Putative functional domains are as follows: ZF, zinc finger domains (CCHC); EN, endonuclease; RT, reverse transcriptase; R/H, RNase H; Myb, Myb-like DNA-binding domain. The amino acid sequences for the putative frameshift region between two ORFs are shown below ORF structures. The TGCTAA (boxed in the figure) sequence is conserved near the C-terminal end of the gag ORFs of both TRAS families

 
Multiple Families of TRAS in the Silkworm: Their Structural Features
Figure 3 shows sequence alignments of each TRAS clone in the junction region between the telomeric repeats and both the 5' (fig. 3A ) and the 3' (fig. 3B ) ends of respective retroelements (original lambda clones: see fig. 1 ). Based on the first 120 bp of the 5'-end regions, we classified each clone into six different TRAS families. This classification seems reasonable, because the highly conserved structure of genomic copy of TRASY, TRASZ, and TRASW was shown by genomic Southern hybridization with their 5'-region probes, respectively, as for TRAS1 and TRAS3 (data not shown). Most of the clones of each class showed no truncation at their 5' ends. Like TRAS1, all of the TRAS3 clones so far analyzed, and most of other TRAS clones, start at nucleotide C, just after CC of the (CCTAA)n telomeric repeat (fig. 3A ). The first 8 nt, CAGTCTGC, and the CGTC (or CGTT in TRAS1, TRASW, and TRASZ) sequence around position +40, which we suggested were essential for transcription initiation (underlined in fig. 3A ), are conserved widely among all TRAS families and other non-LTR retrotransposons (Takahashi and Fujiwara 1999Citation ). It is intriguing that the 5'-end sequence CAGTCTGC of TRAS and other non-LTR elements is also characteristic of the first 8 bp of terminal inverted repeats of TC1-like and hAT-type elements (CAGTGTNN; Besansky et al. 1996Citation ). The structural conservation at 5'-terminal regions implies that all TRAS families may be actively transcribed. This possibility is supported by the fact that the 5'-end regions of TRASY-like and TRASW-like sequences are found as NV060578 and NV021159 sequences, respectively, in the silkworm expression sequence tag (EST) database (Silkbase; http://samia.ab.a.u-tokyo.ac.jp/silkbase/). The AAGTG (or AACTG in TRASZ) sequence around position +100 (underlined in fig. 3A ) was also conserved among all TRAS families, but we do not know its function.

We also compared the 3'-end structures of each class of TRAS (fig. 3B ). While the data are inconclusive, nucleic acid similarity between two TRAS3 clones, TRAS3-1 (complete unit) and p3-3L, in the 3'-terminal 120 bp was about 83%, while it was about 56% between TRAS-3-1 and the TRAS1 consensus sequence. The 3'-tail region of the LINE and SINE family is conserved strongly in many species (Malik and Eickbush 1998Citation ; Ogiwara et al. 1999Citation ), since the enzymatic machinery of LINEs may recognize higher-order structures of the 3' tail of LINE and SINE. When the 3'-tail region of L1Bm, another major non-LTR retrotransposon of the silkworm, is compared with a dozen genomic copies (Ichimura, Mita, and Sugaya 1997Citation ), the sequence similarity averages about 85%. The above observation that the sequence similarity in the 3'-tail region is higher within the TRAS3 group (83%) but lower between TRAS1 and TRAS3 (56%) suggests that each TRAS family may be recognized by its own retrotransposition machinery but not by others.

In most non-LTR retrotransposons, the 5'-terminal regions are not conserved or are sometimes truncated by incomplete reverse transcription. The sequence uniformity of genomic copies of TRAS and SART elements, especially at the 5'-terminal regions, is therefore an unusual structure compared with other retroelements. This sequence uniformity may be partly determined by selective pressure arising from unequal crossover and gene conversion, as in the R1 and R2 insertion of 28S rDNA. Eickbush and his group concluded that the recombinational forces that work for concerted evolution of the rRNA genes can themselves rapidly amplify and eliminate copies of R1 and R2, independent of their ability to retrotranspose (Jakubczak et al. 1992Citation ). Although organisms rely on a telomerase, telomere-telomere recombination is also thought to proceed by gene conversion and results in a net increase in telomeric DNA. Thus, this kind of recombination in the telomere region may contribute to the structural uniformity of telomeric-repeat-associated retrotransposons.

Identification of TRAS-like Elements in Lepidoptera
To study the more detailed structure of TRAS-like elements from various insect species, we employed the CODEHOP strategy (Rose et al. 1998Citation ) for PCR amplification of unknown targets related to multiply aligned protein sequences (see Materials and Methods). This method was applied in practice to detect the diverse reverse transcriptase-like genes in the human genome. In general, the RT domain of non-LTR retrotransposons consists of seven conserved regions (Xiong and Eickbush 1990Citation ; Nakamura et al. 1997Citation ). The EN domain in the N-terminal region of ORF2 is thought to be responsible for target digestion and is relatively conserved among many non-LTR retroelements. Using PCR, therefore, we tried to amplify TRAS-like elements in the regions from EN to RT domains. At the beginning of the experiment, we had sequence information for TRAS1, TRAS3, TRAS4, TRAS5, and TRAS6 (partial sequence) of the silkworm. Based on the sequence comparison between these TRAS elements and other non-LTR elements, we designed a primer set within the conserved regions of EN and RT to amplify TRAS-like elements specifically but not other retroelements. Furthermore, to amplify the TRAS-like element from insects that are phylogenetically distant from the silkworm, we used two CODE hybrid oligonucleotide primers, CH-VVGI and CH-FADD (see Materials and Methods). They consisted of a short 3'-degenerate core region and a longer 5' consensus clamp region.

Southern hybridization with the Bombyx TRAS1 sequence as a probe suggested that TRAS-like elements were present in a variety of insect groups (data not shown). Thus, we tried several times but failed to detect the TRAS-like elements from insects outside Lepidoptera by the CODEHOP method. This means that the regions designed for PCR primers in these insects may have changed. However, we succeeded in isolating the TRAS-like sequences from two Lepidoptera, S. cynthia ricini and D. japonica. We isolated seven clones from S. cynthia ricini, which were divided into three families, TRASSC3, TRASSC4, and TRASSC9, based on sequence comparison. From D. japonica, we isolated one clone, named TRASDJ. However, we did not determine the 5'- and 3'-terminal sequences of the TRAS-like elements in S. cynthia and D. japonica, and therefore it is uncertain whether these retrotransposons are present in telomeric repeats.

Furthermore, we isolated EN-RT regions for two additional families, TRAS5 and TRAS6, from the silkworm, B. mori. However, we do not yet know the relationship between TRAS5 and TRAS6 and TRASY, TRASZ, and TRASW that were identified from lambda clones, since the structures of the latter groups were analyzed only in junction regions (see fig. 3 ).

The sequence comparison among TRAS elements from Bombyx and from two Lepidoptera revealed several features. The EN to RT regions of B. mori TRAS elements are highly conserved (50%–72% in their amino acid sequences and 56%–70% in their nucleic acid sequences). The amino acid identity between the Bombyx TRAS elements and R1Bm is only 25%–27%, probably reflecting structural differences in functional domains, such as the region involved in sequence-specific digestion. Similar results were also obtained in a comparison of amino acid sequences among TRAS elements of Lepidoptera (37%–55% amino acid identities within TRAS groups, 25%–30% among TRAS and R1Bm). As far as we know, R1 is phylogenetically the most closely related element to TRAS. Therefore, the higher amino acid sequence identity within TRAS groups (about 37%–70%) than among TRAS elements and R1 (about 25%–30%) suggests that putative TRAS-like elements identified here can be classified into the same group. The phylogenetic tree constructed based on amino acid sequence from EN to RT supports the above idea. As shown in figure 4 , all TRAS elements isolated from three Lepidoptera constitute a single phylogenetic group and are closely related to other site-specific elements, R1Bm, RT1Ag, and SART1.



View larger version (55K):
[in this window]
[in a new window]
 
Fig. 4.—Phylogenetic tree of TRAS-like elements from Lepidoptera and other site-specific non-LTR retrotransposons. The tree was constructed using CLUSTAL W based on amino acid sequences from EN to RT domains. The region compared is shown below the tree. Nine TRAS families from lepidopteran insects (TRAS1, TRAS3, TRAS4, TRAS5, and TRAS6 from Bombyx mori; TRASDJ from Dictyoploca japonica; TRASSC3, TRASSC4, and TRASSC9 from Samia cynthia) that we identified in this study constitute a single phylogenetic group (shadow region). R1Bm, RT1Ag, SART1, and TARTDm are other site-specific non-LTR retrotransposons (see fig. 5 ). The bootstrap value is shown at each branch

 
Screening Amino Acid Sequences Involved in (TTAGG)n Recognition of TRAS Elements
Figure 5 shows sequence alignments of amino acid sequences from EN to RT domains of nine TRAS families of Lepidoptera, three site-specific retroelements, R1Bm, SART1, RT1Ag, Drosophila telomere retrotransposon, TART, and human L1. The alignment was arranged to give maximum matching, and the highly conserved amino acids (>86%, 12 matched in 14 sequences) in all elements compared are denoted with asterisks. These highly conserved amino acids were observed mostly within EN and RT domains, but rarely in the region between EN and RT. These highly conserved amino acid residues are probably involved in respective enzymatic activities for EN and RT, some of which have already been suggested in other reports (Feng et al. 1996Citation ; Lingner et al. 1997Citation ). Interestingly, many amino acid residues are conserved only within TRAS elements in the region between EN and RT. In particular, 12 amino acids in the boxed region (TRAS-specific region [TSR]), and about 80 amino acids, which may form a putative Myb-like domain (see below), are highly conserved.



View larger version (84K):
[in this window]
[in a new window]
 
Fig. 5.—Comparison of amino acid sequences among TRAS elements and other site-specific non-LTR retrotransposons. The amino acid sequences from EN to RT domains of each element were aligned using CLUSTAL W. Nine TRAS families were isolated from three Lepidoptera (see fig. 4 ). R1Bm is a site-specific retrotransposon integrated into 28S rDNA of Bombyx mori. SART1 is another telomeric repeat-specific element of B. mori. RT1Ag is integrated specifically into another site of 28S rDNA of Anopheles gambiae (mosquito). TARTDm is the telomere-forming retrotransposon of Drosophila melanogaster. L1Hs is a human retrotransposon that has a preferable target sequence. Dots denote amino acids identical to those of TRAS1. Dashes indicate gaps introduced to maximize homology. Highly conserved amino acids among all elements (more than 12 matched in 14 sequences) are indicated by asterisks. The EN and RT domains are indicated in the figure. The conserved region only within TRAS elements, A, B, TSR (TRAS-specific region), and the putative Myb-like domain are shown in the boxes. Primer regions used for PCR are underlined

 
The endonuclease domain is believed to make the first nick on the target DNA and opens the way for target primed reverse transcription (TPRT) (Yang and Eickbush 1998Citation ). A recent study also showed that the endonuclease domain of TRAS1 could cleave short stretches of (TTAGG/CCTAA)n double-stranded substrate, in a very specific manner (Anzai, Takahashi, and Fujiwara 2001Citation ). Thus, the endonuclease itself of TRAS elements is initially responsible for target recognition of the (TTAGG)n sequence. Only two regions, named En-A and En-B (fig. 5 ), appear to be relatively conserved within TRAS elements. The En-B region is also conserved in R1Bm. The conserved regions found in TRAS1 but not with R1 may represent regions that are involved in binding to different target sites. However, another possibility is that it could represent the closer phylogenetic relationship of the TRAS-like elements. To search a conserved region involved in target binding more clearly, it will be necessary to isolate TRAS-like elements from distantly related insects outside of Lepidoptera.

Strict target specificity may also be ensured by another region in addition to the EN domain, which might be required for recognizing the longer arrays of telomeric repeats or telomeres. To search the TTAGG recognition domain in the TRAS, it is interesting that human telomeric repeat binding factor hTRF1 can bind to (TTAGGG)2 with its Myb domain (König, Fairall, and Rhodes 1998Citation ). In addition, a recent report of Eickbush's group suggested that a site-specific retroelement, R2, also retained the Myb-like domain near the N-terminal region of the ORF (Burke et al. 1999Citation ). The Myb domain is usually composed of 50–60 amino acids forming three helices and is found in many DNA-binding proteins, such as the MYB oncogene (Ogata et al. 1994Citation ) and Engrailed (Ades and Sauer 1994Citation ). To determine whether the TTAGG-specific retrotransposon TRAS also has the Myb-like domain, we searched the helix structure from EN to RT domains using a secondary-structure prediction program (Hitachi, DNASIS). Consequently, we found a helix-turn-helix motif between the EN and RT domains not only in all TRAS elements, but also in other non-LTR retrotransposons, R1Bm, SART1, RT1Ag, TARTDm, and L1Hs (data not shown).

In the Myb-related proteins so far reported, amino acid residues themselves are not conserved strictly in three helices, but the specific positions that are occupied by hydrophobic or charged amino acids are conserved (König and Rhodes 1997Citation ). To know whether these charged and hydrophobic residues are conserved within the putative helix-turn-helix regions predicted above, we compared amino acid sequences in corresponding regions of TRAS and other retrotransposons (data not shown). We found that TRAS elements have highly conserved amino acids with known Myb domains (fig. 6 ), while these residues were changed at several sites in retrotransposons in addition to TRAS. Therefore, we speculate that highly conserved regions of TRAS between EN and RT may be responsible for DNA binding through a putative Myb-like function. There is a possibility that some cysteine-histidine motifs found in both ORF1 and ORF2, or this Myb-like domain, may be required for recognizing the longer arrays of telomeric repeats or telomeres. Further studies, such as DNA-binding analysis, may clarify the above hypothesis.



View larger version (62K):
[in this window]
[in a new window]
 
Fig. 6.—Amino acid sequence alignment of putative Myb domain. The upper section shows several known Myb DNA-binding motifs. The lower section shows the putative Myb-like domain between EN and RT domains of TRAS elements and the R1Bm retrotransposon (see fig. 5 ). RAP1Sc is a telomere-binding protein of Saccharomyces cerevisiae. MYB is a transcriptional activator that binds to the specific DNA sequence AACYG (Biedenkapp et al. 1988Citation ). ENGRAILED (Drosophila) is a transcription factor that binds to TAATTA (Ades and Sauer 1994Citation ). The human telomeric repeat-binding factor hTRF1 can bind to (TTAGGG)2. Important amino acid residues that form the core region of the DNA-binding domain in three helices are shown in the figure: hydrophobic residues are black shaded; charged residues that form interhelical salt bridges are gray shaded; and residues that interact specifically with DNA bases are shown in an open box (based on König and Rhodes 1997Citation )

 

    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
The nucleotide sequences of the TRAS-like elements identified in this study have been deposited in the DDBJ, EMBL, and GenBank nucleotide databases under accession numbers AB46668AB46688.



View larger version (84K):
[in this window]
[in a new window]
 
Fig. 5 (Continued)

 

    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
We thank Dr. T. Shimada for providing DNA samples and Dr. Takahashi for helpful comments. This work was supported by grants from the Ministry of Education, Science, Sports and Culture of Japan.


    Footnotes
 
Thomas H. Eickbush, Reviewing Editor

3 Abbreviations: EN, endonuclease; EST, expression sequence tag; non-LTR, non–long terminal repeat; ORF, open reading frame; PCR, polymerase chain reaction; RT, reverse transcriptase. Back

1 Keywords: site-specific non-LTR retrotransposon TRAS family telomeric repeat Bombyx mori evolution Myb-like domain Back

2 Address for correspondence and reprints: Haruhiko Fujiwara, Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan. haruh{at}k.u-tokyo.ac.jp Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 

    Ades, S. E., and R. T. Sauer. 1994. Differential DNA-binding specificity of the engrailed homeodomain: the role of residue 50. Biochemistry 33:9187–9194.

    Anzai, T., H. Takahashi, and H. Fujiwara. 2001. Sequence-specific recognition and cleavage of telomeric repeat (TTAGG)n by endonuclease of non-long terminal repeat retrotransposon TRAS1. Mol. Cell. Biol. 21:100–108.[Abstract/Free Full Text]

    Besansky, N. J., S. M. Paskewitz, D. M. Hamm, and F. H. Collins. 1992. Distinct families of site-specific retrotransposons occupy identical positions in the rRNA genes of Anopheles gambiae. Mol. Cell. Biol. 12:5102–5110.

    Besansky, N. J., O. Mukabayire, J. A. Bedell, and H. Lusz. 1996. Pegasus, a small terminal inverted repeat transposable element found in the white gene of Anopheles gambiae. Genetica 98:119–129.

    Biedenkapp, H., U. Borgmeyer, A. E. Sippel, and K. H. Klempnauer. 1988. Viral myb oncogene encodes a sequence-specific DNA-binding activity. Nature 335:835–837.

    Biessmann, H., J. M. Mason, K. Ferry, M. D‘hulst, K. Valgeirsdottir, K. L. Traverse, and M. L. Pardue. 1990. Addition of telomere-associated HeT DNA sequences "heals" broken chromosome ends in Drosophila. Cell 61:663–673.

    Burck, C. L., Y. O. Chernoff, R. Liu, P. J. Farabaugh, and S. W. Liebman. 1999. Transcriptional suppressors and antisuppressors after the efficiency of the Ty1 programmed translational frameshift. RNA 5:1451–1457.

    Burke, W. D., H. S. Malik, J. P. Jones, and T. H. Eickbush. 1999. The domain structure and retrotransposition mechanism of R2 elements are conserved throughout arthropods. Mol. Biol. Evol. 16:502–511.[Abstract]

    Christensen, S., G. Pont-Kingdon, and D. Carroll. 2000. Target specificity of the endonuclease from the Xenopus laevis non-long terminal repeat retrotransposon, Tx1L. Mol. Cell. Biol. 20:1219–1226.[Abstract/Free Full Text]

    Cohn, M., and J. E. Edström. 1991. Evolutionary relations between subtypes of telomere-associated repeats in Chironomus. J. Mol. Evol. 32:463–468.[ISI][Medline]

    Cost, G. J., and J. D. Boeke. 1998. Targeting of human retrotransposon integration is directed by the specificity of the L1 endonuclease for regions of unusual DNA structure. Biochemistry 37:18081–18093.

    Feng, Q., J. V. Moran, H. H. Kazazian Jr., and J. D. Boeke. 1996. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905–916.

    Feng, Q., G. Schumann, and J. D. Boeke. 1998. Retrotransposon R1Bm endonuclease cleaves the target sequence. Proc. Natl. Acad. Sci. USA 95:2083–2088.

    Fujiwara, H., Y. Nakazato, S. Okazaki, and O. Ninaki. 2000. Stability and telomere structure of chromosomal fragments in two different mosaic strains of the silkworm, Bombyx mori. Zool. Sci. 17:743–750.

    Greider, C. W., and E. H. Blackburn. 1989. A telomeric sequence in the RNA of Tetrahymena telomerase required for telomere repeat synthesis. Nature 337:331–337.

    Ichimura, S., K. Mita, and K. Sugaya. 1997. A major non-LTR retrotransposon of Bombyx mori, L1Bm. J. Mol. Evol. 45:253–264.[ISI][Medline]

    Jakubczak, J. L., W. D. Burke, and T. H. Eickbush. 1991. Retrotransposable elements R1 and R2 interrupt the rRNA genes of most insects. Proc. Natl. Acad. Sci. USA 88:3295–3299.

    Jakubczak, J. L., M. K. Zenni, R. C. Woodruff, and T. H. Eickbush. 1992. Turnover of R1 (type I) and R2 (type II) retrotransposable elements in the ribosomal DNA of Drosophila melanogaster. Genetics 131:129–142.

    Jurka, J. 1997. Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc. Natl. Acad. Sci. USA 94:1872–1877.

    König, P., L. Fairall, and D. Rhodes. 1998. Sequence-specific DNA recognition by the Myb-like domain of the human telomere binding protein TRF1: a model for the protein-DNA complex. Nucleic Acids Res. 26:1731–1740.[Abstract/Free Full Text]

    König, P., and D. Rhodes. 1997. Recognition of telomeric DNA. Trends Biochem. Sci. 22:43–47.[ISI][Medline]

    Levis, R. W., R. Ganesan, K. Houtchens, L. A. Tolar, and F. M. Sheen. 1993. Transposons in place of telomeric repeats at a Drosophila telomere. Cell 75:1083–1093.

    Lingner, J., T. R. Hughes, A. Shevchenko, M. Mann, V. Lundblad, and T. R. Cech. 1997. Reverse transcriptase motifs in the catalytic subunit of telomerase. Science 276:561–567.

    McLean, C., A. Bucheton, and D. J. Finnegan. 1993. The 5' untranslated region of the I factor, a long interspersed nuclear element-like retrotransposon of Drosophila melanogaster, contains an internal promoter and sequences that regulate expression. Mol. Cell. Biol. 13:1042–1050.[Abstract]

    Malik, H. S., and T. H. Eickbush. 1998. The RTE class of non-LTR retrotransposons is widely distributed in animals and is the origin of many SINEs. Mol. Biol. Evol. 15:1123–1134.[Abstract]

    Minchiotti, G., and P. P. Di Nocera. 1991. Convergent transcription initiates from oppositely oriented promoters within the 5' end regions of Drosophila melanogaster F elements. Mol. Cell. Biol. 11:5171–5180.[ISI][Medline]

    Nakamura, T. M., G. B. Morin, K. B. Chapman, S. L. Weinrich, W. H. Andrews, J. Lingner, C. B. Harley, and T. R. Cech. 1997. Telomerase catalytic subunit homologs from fission yeast and human. Science 277:955–959.

    Ogata, K., S. Morikawa, H. Nakamura, A. Sekikawa, T. Inoue, H. Kanai, A. Sarai, S. Ishii, and Y. Nishimura. 1994. Solution structure of a specific DNA complex of the Myb DNA-binding domain with cooperative recognition helices. Cell 79:639–648.

    Ogiwara, I., M. Miya, K. Ohshima, and N. Okada. 1999. Retropositional parasitism of SINEs on LINEs: identification of SINEs and LINEs in elasmobranchs. Mol. Biol. Evol. 16:1238–1250.[Abstract]

    Okazaki, S., H. Ishikawa, and H. Fujiwara. 1995. Structural analysis of TRAS1, a novel family of telomeric repeat-associated retrotransposons in the silkworm, Bombyx mori. Mol. Cell. Biol. 15:4545–4552.

    Okazaki, S., K. Tsuchida, H. Maekawa, H. Ishikawa, and H. Fujiwara. 1993. Identification of a pentanucleotide telomeric sequence, (TTAGG)n, in the silkworm Bombyx mori and in other insects. Mol. Cell. Biol. 13:1424–1432.[Abstract]

    Paskewitz, S. M., and F. H. Collins. 1989. Site-specific ribosomal DNA insertion elements in Anopheles gambiae and A. arabiensis: nucleotide sequence of gene-element boundaries. Nucleic Acids Res. 17:8125–8133.

    Priimagi, A. F., L. J. Mizrokhi, and Y. V. Ilyin. 1988. The Drosophila mobile element jockey belongs to LINEs and contains coding sequences homologous to some retroviral proteins. Gene 70:253–262.

    Rose, T. M., E. R. Schultz, J. G. Henikoff, S. Pietrokovski, C. M. McCallum, and S. Henikoff. 1998. Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res. 26:1628–1635.[Abstract/Free Full Text]

    Roth, C. W., F. Kobeski, M. F. Walter, and H. Biessmann. 1997. Chromosome end elongation by recombination in the mosquito Anopheles gambiae. Mol. Cell. Biol. 17:5176–5183.

    Sahara, K., F. Marec, and W. Traut. 1999. TTAGG telomeric repeats in chromosomes of some insects and other arthropods. Chromosome Res. 7:449–460.[ISI][Medline]

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.[Abstract]

    Sasaki, T., and H. Fujiwara. 2000. Detection and distribution patterns of telomerase activity in insects. Eur. J. Biochem. 267:1–8.[Free Full Text]

    Takahashi, H., and H. Fujiwara. 1999. Transcription analysis of the telomeric repeat-specific retrotransposons TRAS1 and SART1 of the silkworm Bombyx mori. Nucleic Acids Res. 27:2015–2021.

    Takahashi, H., S. Okazaki, and H. Fujiwara. 1997. A new family of site-specific retrotransposons, SART1, is inserted into telomeric repeats of the silkworm, Bombyx mori. Nucleic Acids Res. 25:1578–1584.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.[Abstract]

    Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362.[Abstract]

    Yang, J., and T. H. Eickbush. 1998. RNA-induced changes in the activity of the endonuclease encoded by the R2 retrotransposable element. Mol. Cell. Biol. 18:3455–3465.[Abstract/Free Full Text]

    Yang, J., H. S. Malik, and T. H. Eickbush. 1999. Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc. Natl. Acad. Sci. USA 96:7847–7852.

Accepted for publication January 11, 2001.