Evolution of Target Specificity in R1 Clade Non-LTR Retrotransposons

Kenji K. Kojima and Haruhiko Fujiwara

Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 
Although most non-long terminal repeat (non-LTR) retrotransposons are inserted throughout the host genome, many non-LTR elements in the R1 clade are inserted into specific sites within the target sequence. Four R1 clade families have distinct target specificity: R1 and RT insert into specific sites of 28S rDNA, and TRAS and SART insert into different sites within the (TTAGG)n telomeric repeats. To study the evolutionary history of target specificity of R1-clade retrotransposons, we have screened extensively novel representatives of the clade from various insects by in silico and degenerate polymerase chain reaction (PCR) cloning. We found four novel sequence-specific elements; Waldo (WaldoAg1, 2, and WaldoFs1) inserts into ACAY repeats, Mino (MinoAg1) into AC repeats, R6 into another specific site of the 28S rDNA, and R7 into a specific site of the 18S rDNA. In contrast, several elements (HOPE, WISHBm1, HidaAg1, NotoAg1, KagaAg1, Ha1Fs1) lost target sequence specificity, although some of them have preferred target sequences. Phylogenetic trees based on the RT and EN domains of each element showed that (1) three rDNA-specific elements, RT, R6, and R7, diverged from Waldo; (2) the elements having similar target sequences are phylogenetically related; and (3) the target specificity in the R1 clade was obtained once and thereafter altered and lost several times independently. These data indicate that the target specificity in R1 clade retroelements has changed during evolution and is more divergent than has been speculated so far.

Key Words: non-LTR retrotransposon • • R1 clade • AP-EN domain • sequence-specific retrotransposition • evolution


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 
Non-long terminal repeat (non-LTR) retrotransposons, also called long interspersed nuclear elements (LINEs), are transposable elements that encode a reverse transcriptase and insert into genomic locations via RNA intermediates. The recent progress of the human genome project has revealed that one non-LTR retrotransposon, L1, integrates essentially throughout chromosomes and occupies more than 20% of the genome (Lander et al. 2001). The integration of L1 might have a role in genetic diseases and cancers (Miki 1998), and in genome reconstruction and gene evolution (Courseaux and Nahon 2001).

Some non-LTR retrotransposons have very restricted integration targets within the genome. There are two distinct groups for such target-specific retroelements. Phylogenetically, non-LTR retrotransposons have been classified into 12 clades (Malik, Burke, and Eickbush 1999; Malik and Eickbush 2000). One target-specific group is an ancient class of non-LTR retroelements, CRE, NeSL, R2, and R4 clades, which encode only one open reading frame (ORF). This ORF includes a restriction enzyme–like endonuclease (RLE) in the C-terminal region. Most elements in these four RLE-encoding clades are target specific. CRE1/CRE2/SLACS/CZAR (trypanosome) and NeSL-1 (nematode) are found in the spliced leader exons (Aksoy et al. 1990; Gabriel et al. 1990; Malik and Eickbush 2000). R2 is located at specific sites in the 28S rDNA of most insects (Jakubczak, Burke, and Eickbush 1991). R4 (Ascaris) is found at another site of the 28S rDNA (Burke, Müller, and Eickbush 1995). In contrast, most of the recently-branched clades, which encode two ORFs and an apurinic/apyrimidinic endonuclease-like endonuclease (AP-EN), like human L1 elements, do not insert in a sequence-specific manner into the host genome. Among the AP-EN-encoding type retrotransposons, only two clade groups have been known to be sequence-specific: one is Tx1L/Tx2L, a member of the L1 clade in Xenopus laevis (Gattett, Knutzon, and Carroll 1989), and the other group comprises several elements within the R1 clade in arthropods.

Five families of non-LTR retrotransposons, R1, RT, TRAS, SART, and Waldo, have been classified into the R1 clade. R1 exists at a specific site of 28S rDNA in most insects and arthropods (Jakubczak, Burke, and Eickbush 1991; Burke et al. 1993, 1998). RT is found 630 bp downstream of the R1 integration site of the Anopheles 28S rDNA (Paskewitz and Collins 1989; Besansky et al. 1992). TRAS and SART are accumulated in the telomeric regions of Bombyx mori, and inserted into different sites of (TTAGG)n telomeric repeats in the opposite direction (Okazaki, Ishikawa, and Fujiwara 1995; Takahashi, Okazaki, and Fujiwara 1997; Kubo et al. 2001). In the R1 clade, only Waldo in Drosophila melanogaster seems to integrate nonspecifically, although it also has a preferred target region near AC repeats (Busseau, Berezikov, and Bucheton 2001). Target-specific retrotransposition is a general and remarkable feature of the R1 clade, but we do not yet know the evolutionary origin of the target specificity of the R1 clade and how the members of this clade have diversified during evolution. Recent studies have shown that AP-EN in the N-terminal region of ORF2 determines the target recognition and cleavage of the R1 elements (Feng, Schumann, and Boeke 1998; Takahashi and Fujiwara 2002). Systematic comparisons of R1 clade elements should clarify the evolutionary history of target specificity and identify what kind of structure in the AP-EN domain is required for specificity.

In this study, we have extensively screened R1 clade elements to elucidate the evolution of target-specific retrotransposition in the R1 clade. We found four new sequence-specific retrotransposons in the R1 clade: Waldo integrates into ACAY repeats; Mino, into AC repeats; R6, into another target site in the 28S rDNA; and R7, into a target of the 18S rDNA. Phylogenetic trees among the R1 clade elements revealed that the target specificity has been independently altered and lost several times during evolution.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 
Cloning Template
The genomic DNA of B. mori (silkworm), Melanotus legatus (click beetle), Acyrthosiphon pisum (aphid), Forficula scudderi (earwig), Teleogryllus taiwanemma (cricket), and Gryllus bimaculatus (cricket) was stored in our laboratory and used for cloning template (Okazaki et al. 1993; Kojima, Kubo, and Fujiwara 2002). Papilio xuthus (swallowtail butterfly) was purchased from Eiko-Kagaku Co. (Osaka, Japan).

Database Analysis
To screen non-LTR retrotransposons in the R1 clade, we used different BLAST search programs (Altschul et al. 1990) in the National Center for Biotechnology Information (NCBI: www.ncbi.nlm.nih.gov) database to perform computer-based nucleotide and protein searches of the following GenBank databases: Non-redundant (NR), dbEST (expressed sequence tags database), dbSTS (sequence tag sites database), dbGSS (genome survey sequences database), HTGS (unfinished high-throughput genomic sequences-phase 0, 1 and 2), Drosophila genome, WGS anopheles (Anopheles gambiae whole genome shotgun sequences database). In addition, all eukaryotic genomic sequences in NCBI were used for screening.

We also searched SilkBase (http://samia.ab.a.u-tokyo.ac.jp/silkbase/) and FlyBase (http://shigen.lab.nig.ac.jp:7081/) with several BLAST programs. We used ORF2 protein sequences and full-length nucleotide sequences of R1Dm (X51968), R1Dmerc (U23194, AF015277), R1Bm (M19755), R1Sc (L00945), R1tarantula (AF015489), R1Scolopendra (AF015820, AF015821), TRAS1 (D38414), TRAS3 (AB046668), SART1 (D85594), RT1 (M93690), RT2 (M93691), Waldo-A (AC007575), Waldo-B (AJ278684), and all newly identified elements in this study as queries for database searches. We also used ORF1 protein sequences of SART1, RT1, RT2, Waldo-A, Waldo-B, and all newly identified elements in this study as queries. Sequence information was analyzed by DNASIS-Mac version 3.7 (Hitachi) and Vector NTI Suite version 7.1 (InforMax).

Degenerate PCR Cloning and Sequencing
To amplify partial sequences in novel representatives of the R1 clade retrotransposon effectively, we have designed two sets of degenerate primers based on four known sequences, SART1 (Bombyx), RT1 (Anopheles), RT2 (Anopheles), and Waldo-B (Drosophila). The primer sequences were designed at the region conserved among SART, RT, and Waldo, but not conserved among R1 and TRAS. Primers used for PCR were displayed as follows:

rt745f: 5'-GAyGTnAArAAyGCmTTCAAyAC-3' (corresponding to the sequences from 4776 to 4798 in SART1; RT domain peak 3)
rt800r: 5'-CCnAGDAyCGACCCyTGnGGrAC-3' (4944 to 4966; RT domain peak 4)
rt975f: 5'-wCnGTnyTGmGnTATGCTGCGCC-3' (5427 to 5449; RT domain peak 8)
rt1125r: 5'-AAnCCrTGyCCsGAsArrACCTG-3' (5850 to 5872; CCHC zinc finger motif).

Polymerase chain reaction (PCR) testing was performed for 35 to 40 cycles (96°C for 20 s; 50° or 55°C for 20 s; 72°C for 20 or 40 s) using the genomic DNA from various insects as template. Amplified PCR products were cloned into the pGEM-T Easy vector (Promega) and sequenced with ABI PRISM 310 Genetic analyzer (PE Applied Biosystems) using a BigDye cycle sequencing kit (PE Applied Biosystems).

Inverse PCR
Based on the sequences obtained by degenerate PCR methods, we performed inverse PCR. The genomic DNA was digested with EcoRI, BglII, HindIII, XhoI, SalI, MboI, and HhaI, respectively. After the enzyme activity was inactivated by incubation at 70°C for 15 min, the digested DNA was circularized by self-ligation using DNA Ligation kit version 2 (Takara Co.). Inverse PCR was performed using the circularized DNA as template with appropriate primer sets within the sequenced regions. The PCR products were cloned into the pGEM-T Easy vector and sequenced as described above.

The PCR conditions and primer sets were arranged for respective elements. For obtaining long WaldoFs1 clones, PCR was performed for 35 cycles (96°C for 20 s, 65°C for 20 s, 72°C for 4 min) with primers, FsAF1 (CAGTAGCCAGCACATAGCGG) and FsAR1 (CGCGGATGGGGGGAATCTTC). For cloning long HalFs1, PCR was performed for 35 cycles (96°C for 20 s, 54°C for 20 s, 72°C for 4 min) with FsBF1 (AGGAAGCAATATGCAAGGCA) and FsBR1 (TTTATAGAGGGTTGGGGAAG). To clone a full-length sequence of SARTPx1, we cloned two different PCR products, and overlapping sequences were assembled. For this purpose, we used two sets of primer, TTAGG6 (TTAGGTTAGGTTAGGTTAGGTTAGGTTAGG)/ageha6 (TTCACAGCAGGGGAAGCGTC), CCTTA6 (CCTAACCTAACCTAACCTAACCTAACCTAA)/ageha7(GGCAAAGACTGCCTTACGAC).

We also used inverse PCR to identify target sequences of four elements in the 3' junction regions. The MboI and HhaI digested DNA were circularized and amplified in two steps by PCR of 30 cycles (96°C for 20 s, Ta (annealing temperature) °C for 20 s, 72°C for 1 min). To obtain the 3' junction of HOPEBm1, the first PCR (Ta: 58°C) was performed with hopeS778F (CCATGCAGATGCGTTGAGTC) and hopeS740R (AGCCCACCCACTAAACGACT), and the second PCR (Ta: 60°C) with hopeS829F (GGACCTCATAGGAGGTTCGG) and hopeS717R (CTGCACTCTTACACCGACGTC). To clone the 3' junction of SARTPx1, the first PCR (Ta: 60°C) was performed with agehaF3 (GTGACAGGACTCAGTGGTGG) and agehaR4 (GGGCAATACGCCCGTCTTCT), and the second PCR (Ta: 60°C) with agehaF5 (CGCTCCCGGTGGAAAGTGAA) and agehaR5 (CTCTCCGCTTCTTCCTTGGT). To obtain the 3' junction of WaldoFs1, the first PCR (Ta: 60°C) was performed with haaF4 (ATGCTTAAGTCCCAAGACAACTGGG) and haaR9 (TGTTTCCGGCGTCGGAGGTT), and the second PCR (Ta: 60°C) with haaF8 (GCTGGTACAGCTAAGGCCTC) and haaR10 (TCCTCCCCACCCATGTCTCA). To clone the 3' junction of HalFs1, the first PCR (Ta: 56°C) was performed with FsBF3 (TGGAACAGCACCAGTGAGTT) and FsBR3 (AGATGCACGTTGGCCTATCC), and the second PCR (Ta: 60°C) with FsBF2 (ACGCG GTTCCTGGTCGTCTA) and FsBR2 (CCGAAACATCCGTGCCCTGA).

Sequence Alignment and Phylogenetic Analysis
Amino acid sequences of the EN and RT domains of each element were aligned using CLUSTAL_X (Thompson et al. 1997), followed by manual gap adjustments. Phylogenetic trees were constructed by the Neighbor-Joining and Maximum-Parsimony methods, with the MEGA2 program (Kumar et al. 2001). The significance of the various phylogenetic lineages was assessed by bootstrap analysis. All parameters used in both programs were default.


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 
Identification of Novel Retrotransposon Families in the R1 Clade
Although several site-specific non-LTR retrotransposons in the R1 clade have been reported in insects, there is limited information as to how many elements are included in the clade. To study the divergence of R1 clade retrotransposons and their evolutionary history, we screened for novel retrotransposon families in the R1 clade, by in silico cloning and degenerate PCR methods. The elements newly identified in this study are summarized in figure 1.



View larger version (30K):
[in this window]
[in a new window]
 
FIG. 1. Non-LTR retrotransposons in the R1 clade identified in this study. A. Elements identified by in silico cloning. B. Elements identified by degenerate and inverse PCR. The organism and accession number for each element are shown. Elements, except short sequences, were reconstructed from several copies in order to code longer ORFs similar to those of known elements. The structural features for the longest sequence, size, and domain organization of each retrotransposon are schematized on the right. Some have fragmented information because of incomplete sequencing. An asterisk on the accession number indicates the clone name as it appeared in SilkBase. The elements identified here were categorized into three classes (see tables 1, 2, and 3 and the text): Class I: sequence-specific elements, whose target sequence was already characterized; Class II: novel sequence-specific elements; Class III: non-sequence-specific elements. We have very little sequence information on WRSR (Waldo/RT/SART-like retrotransposon) elements; thus, they could not be classified into the above groups. We do not know the phylogenetic relationship among the WRSR elements at present

 
Identification of uncharacterized sequences in the DNA database, also known as in silico cloning, is a powerful tool for extensive screening of R1 clade elements. A BLAST search was performed of all known genomic and expressed sequences (see Materials and Methods). Our goal in this study was to create, from all available sequence information, a complete list of the elements belonging to the R1 clade. This search identified dozens of R1 clade elements from six insect species, D. melanogaster, A. gambiae, Aedes aegypti, B. mori, Manduca sexta, and Hyphantria cunea (fig. 1A). We also searched the remaining eukaryotic organisms on the available databases but could not find R1 clade elements. The search results included the already reported and characterized elements R1Dm, RT1, RT2, Waldo-A, and Waldo-B (Busseau, Berezikov, and Bucheton 2001; Besansky et al. 1992; Jakubczak et al. 1992). We also found two reported elements, HOPE (Kravariti, Lecanidou, and Rodakis 1995) and Guildenstern (Hill et al. 2001) that have not been fully characterized.

Using a degenerate PCR method, we also screened for novel R1 clade elements from the genomes of a wide variety of arthropods, and obtained partial sequence within RT domain of several different elements including SART, Waldo, and Hal, from six different insects in five orders (fig. 1B). By inverse PCR, we cloned a full-length SARTPx1 and ORF2-3'untranslated regions (UTR) in WaldoFs1 and HalFs1.

On the basis of phylogenetic position and target specificity (see below), we categorized the elements as either a novel class or not. On this category, we named novel R1 clade elements, R6 and R7, which integrate into the 28S and 18S rDNA of Anopheles. Hal and WISH were newly named (fig. 1). Because of the fragmented sequence information obtained on WRSR (Waldo/RT/SART-like retrotransposon) (fig. 1), at present, we do not know the phylogenetic relationships between each element called WRSR and its target sequences.

Analyses of Target Site Duplication and Target Specificity
To study whether each retrotransposon screened integrates into the specific target site, we investigated the boundary sequences between the host genome and the retroelement. We could determine both the 5' and 3' ends of copies identified by in silico cloning because the non-LTR retrotransposons usually end at the poly(A) (except R1 in the R1 clade has no poly(A) tail). The 5' end could be determined from sequence comparison of multiple copies on the database. Furthermore, target site duplication (TSD), which is usually observed at both ends of non-LTR retrotransposons, revealed distinct boundary regions. Table 1 lists each TSD sequence detected in each retroelement copy on the database, which should give clear information on whether the element is target specific. In contrast, only 3' junction sites were detectable in some retrocopies by in silico cloning (table 2) and by inverse PCR (table 3). In this case, however, the target specificity could be determined by comparing the 3' junction sequences in multiple copies. It is shown that the sequence-specific elements (classes I and II, see below) have actually highly conserved 3' junction sequences in multiple copies (tables 2 and 3). In the non-sequence-specific elements (class III), it is possible that the divergence in the 3' junction sequence reflects mutations accumulated after the insertion into their targets. In most cases, however, we have observed that the mutation frequency is not so different among the three classes, I, II, and III, within the retrotransposon units (data not shown). This indicates that the 3' junction sequence reflects nearly the original target sequence in retrotransposition.


View this table:
[in this window]
[in a new window]
 
Table 1 Summary of 3' Ends of Elements and Target Site Duplications Identified In Silico.

 

View this table:
[in this window]
[in a new window]
 
Table 2 3' Junction Sequences Determined In Silico.

 

View this table:
[in this window]
[in a new window]
 
Table 3 3' Junction Sequences Determined by Inverse PCR.

 
Based on the target sequences shown in tables 1, 2, and 3, we categorized the elements into three classes (I, II and III in fig. 1 and tables 1–3). The first group (I) comprises sequence-specific elements whose target sequences are homologous to already characterized elements (RTAg3, RTAg4, and SARTPx1), or already reported sequence-specific elements (R1Dm, RT1, and RT2). The second group (II) comprises novel sequence-specific elements so far uncharacterized: Waldo (WaldoAg1, WaldoAg2, and WaldoFs1) inserts specifically into ACAY repeats; MinoAg1, into AC repeats; R6 (R6Ag1, R6Ag2, R6Ag3), into another specific site of the 28S rDNA; and R7 (R7Ag1 and R7Ag2), into a specific site of the 18S rDNA. The third group (III) comprises elements that show no obvious sequence specificity and includes HOPE (HOPEBm1, HOPEBm2, HOPEHc1), HidaAg1, NotoAg1, KagaAg1, WISHBm1. and HalFs1. These elements seem to integrate into all over genomic locations.

Class I: Additional Information on Formerly Reported Sequence-Specific Elements (RT, SART)
In this study, we obtained additional RT-class elements, designated RTAg3 and RTAg4 (fig. 1), which use the same target sequence as the previously reported RT1 and RT2 in A. gambiae. In these elements, the 5'-ends of TSD are highly conserved, but the 3'-ends of TSD are occasionally variable. It is evident that the 5'-end of TSD is consistent with the cleavage site on the bottom strand by the endonuclease domain of retrotransposons (Feng, Schumann, and Boeke 1998; Christensen, Pont-Kingdon, and Carroll 2000; Anzai, Takahashi, and Fujiwara 2001). The 3' ends of TSD in these clones seem to correspond to the cleavage site on the top strand by the EN domain, although there is a possibility that several nucleotides are deleted from the 3' ends just after the cleavage by the EN domain. However, various 3'-ends of TSD are also shown between CT and A in R1Dm (data not shown). In addition, the polymorphism of the length of TSD was often observed in sequence-specific elements described below. These observations support the possibility that various 3'-ends of TSD in RT elements are mainly produced by weak specificity for top strand cleavage.

From the swallowtail butterfly (P. xuthus), we cloned a telomeric-repeat-specific element, SARTPx1, and found that all copies were inserted into the same target sequence and in the same direction as SART1 of B. mori (Takahashi, Okazaki, and Fujiwara 1997) (table 3, I).

Class II: Novel Sequence-Specific Elements (Waldo, Mino, R6, and R7)
Two Waldo elements (A and B) in D. melanogaster were reported to be nonspecific elements that prefer to integrate near AC repeats (Busseau, Berezikov, and Bucheton 2001). The Waldo elements identified here (fig. 1A, II, and B, II), which integrate specifically into ACAY repeats, are closely related phylogenetically to Waldo-A and Waldo-B (fig. 2), and thus they are categorized in the same group. We identified two new Waldo elements in A. gambiae (Diptera) and one in F. scudderi (Dermaptera), indicating widespread distribution of Waldo among the different orders of insects. All cloned copies of Waldo elements in A. gambiae and F. scudderi insert into ACAY repeats, which consist of primarily AC repeats (table 1, II, table 3, II, and data not shown). MinoAg1 insert into AC repeats, which are the same as the target sequence of Waldo, although the phylogenetic analysis indicates that this element obtained sequence-specificity for AC repeats independently of Waldo (see fig. 2).



View larger version (38K):
[in this window]
[in a new window]
 
FIG. 2. Phylogeny of retrotransposons in the R1 clade. A. Neighbor-Joining (NJ) tree of EN domain. B. NJ tree of RT domain. The phylogeny is a 50% consensus tree and is rooted on IDm (I factor of D. melanogaster). The number next to each node indicates a value as a percentage of 1000 replicates. Elements newly identified in this study are shown with circles (). The non-sequence-specific elements are shown with asterisks (*). Included in this analysis are (accession number in parentheses): SART1 (D85594), R1Bm (M19755), TRAS1 (D38414), and TRAS3 (AB046668) from Bombyx mori; RT1 (M93690) and RT2 (M93691) from A. gambiae; Waldo-A (AC007575), Waldo-B (AJ278684), R1Dm (X51968), and IDm (G157749) from D. melanogaster; R1Sc (L00945) from Sciara coprophila; R1tarantula (AF015489) from Dugesiella sp.; and LOA (X60177) from D. silvestris. Bootstrap values less than 50% are not shown

 
R6 inserts into a site identical to that of the R2 element in 28S rDNA, which is 60 bp upstream of the R1 insertion site (fig. 3). R6 generates a 15 bp TSD, which includes a 2 bp target site deletion sequence of R2. All R6Ag1 copies in the Anopheles genome insert into the specific 28S rDNA sequence, according to data of the 3' junction sequences (table 2, II), although we could find only one TSD (table 1, II). In contrast, at least ten R6Ag2 copies retain TSD, although eight of ten copies insert into a region other than 28S rDNA (table 1, II). These insertion sites resemble the 28S rDNA target sequence, but the sequence specificity of R6Ag2 retrotransposition seems less strict. Among the four copies of R6Ag3, we identified two TSD-retaining copies. One copy inserts into the 28S rDNA target sequence (TSD: 5'-AAGGTAGCCAATGC-3'), and the others insert into similar sequences in non-rDNA locations (TSD: 5'-AAGGTAACCTGCTGT-3'), indicating that R6Ag3 is sequence specific.



View larger version (18K):
[in this window]
[in a new window]
 
FIG. 3. Locations of insertion sites of rDNA-specific retroelements in an rDNA unit. Insertion sites of the novel sequence-specific R1 clade retrotransposons, R6 and R7, are shown as vertical lines on the rDNA unit, with other reported elements, R1, R2, and RT. Arrows indicate the 5' to 3' orientation of retroelement insertion. Only RT was inserted in the opposite (antisense) direction. The sequences show the precise integration site on which both bottom and top strand cleavage sites generated by endonuclease, as indicated by vertical lines. The sequence between the bottom (R6 and R7, lower strand; RT, upper strand) and top strand cleavage site should be duplicated in retrotransposition of each element (TSD). Insertion of R1 generates a 14-bp target site duplication (5'-TGTCCCTATCTACT-3'). R2 insertion, however, generates a 2-bp target site deletion (5'-GG-3'), due to inverted cleavage of the bottom and top strands, shown as thin lines. The insertion site of R2 overlaps the R6 site. The R1 insertion site is about 60 bp downstream of the R6 insertion site

 
R7 inserts into a specific sequence in the 18S rDNA (fig. 3; table 2, II). R7 generates a 15–17bp TSD (table 1, II). R7 is the first sequence-specific element that inserts into the 18S rDNA. It is noteworthy that the R7 target sequence resembles the RT target in the 28S rDNA (fig. 3). The 6bp sequence encompassing the bottom strand cleavage site, 5'-CACAAG-3', is completely conserved in both elements. The 6bp sequence 5'-TGYGGY-3' (5'-TGCGGC-3' in R7, 5'-TGTGGT-3' in RT) is also conserved near the top strand cleavage site. In addition, it is remarkable that a 5'-AAG-3' near the bottom strand cleavage site and a 5'-TGC-3' near the top strand cleavage site are conserved in the target sites of both R6 and R7. The structural similarity of target sequences among RT, R6, and R7 indicates that they must have originated from a common ancestral element.

Class III: Non-Sequence-Specific Elements (Hida, Noto, HOPE, WISH, Kaga, Hal)
Only single data for TSD were collected for HidaAg1, NotoAg1, and HOPEHc1 (table 1, III), and therefore it is difficult to determine whether they are sequence specific or not. However, 3'-junction analyses revealed that HidaAg1 and NotoAg1 are integrated into different sequences independently (table 2, III), indicating that they may be classified into non-sequence-specific elements. We also found that a formerly reported element, Guildenstern (Hill et al. 2001), is a member of the NotoAg1 group.

HOPE was first reported as a non-LTR retrotransposon inserted into the early chorion 6F6.3 gene in B. mori (Kravariti, Lecanidou, and Rodakis 1995). The three HOPE elements characterized in this study include HOPEBm1, which is the first reported HOPE in the chorion gene, and SART2, which we previously reported as telomere-specific element (Kubo et al. 2001). HOPEBm1 has been classified as a non-sequence-specific retroelement, because 10 of 11 copies identified by in silico cloning (table 1, III and data not shown), and 13 of 16 clones identified by inverse PCR (table 3, III) are inserted into a variety of sequences. The remainder are inserted into telomeric repeats at the same site as the SART family, however, and therefore HOPEBm1 has a weak specificity for telomeric repeats. Lower similarity among the various insertion sites to telomeric repeats (table 3, III) also supported this possibility. It is more evident that HOPEBm2 is a nonspecific element, as shown in table 1. Because of insufficient data, we could not judge the sequence-specificity of HOPEHc1.

WISHBm1 is similar to the SART1 sequence in structure, but it inserts into random genomic sites (table 1, III). Even in the 3' UTR region, WISHBm1 and SART1 have structural similarity. However, SART1 (B. mori) and SARTPx1 (P. xuthus), which are in the same group but in different insects, have distinct 3' UTR. This observation indicates that WISHBm1 is a derivative of SART1 and has lost sequence specificity for telomeric repeats.

Six copies of KagaAg1 have different TSD (table 1, III), and the four 3' junction sequences of HalFs1 are not similar to one another (table 3, III), indicating that both elements are classified into the non-sequence-specific elements.

Evolution of Sequence-Specificity in the R1 Clade
We constructed phylogenetic trees of the R1 clade elements by endonuclease (EN) domain and reverse transcriptase (RT) domain with two different methods: Neighbor-Joining (NJ) and Maximum-Parsimony (MP). Because the results of the two methods showed nearly identical topology, and because the MP trees revealed less resolution, only the NJ trees are shown (fig. 2). Several groups such as the group [R6 and WaldoAg1, WaldoAg2] or the group [Hal and other Waldo elements] were not resolved in the MP tree of the RT domain. IDm (I factor in D. melanogaster) was used as an out-group in the trees. LOA is also an out-group of R1 clade in the RT tree. However, it was reported that LOA fell within the R1 clade in the EN tree (Malik, Burke, and Eickbush 1999), and this element is located at the same position in our trees. The tree of EN has less resolution than that of RT, because the EN domain is shorter and less conserved than the RT domain. This may also explain the inconsistent position of R6Ag2 between the RT and EN trees.

The relationship between phylogenetic location and sequence specificity revealed that non-sequence-specific elements (shown with an asterisk in fig. 2), Hal, HOPE, KagaAg1, HidaAg1, and NotoAg1 are distantly positioned on the trees. This positioning indicates that the non-sequence-specific element has independently lost its sequence specificity from the closely related sequence-specific element. In other words, the sequence specificity could not be obtained independently from a closely related non-sequence-specific element. Thus the non-sequence-specific elements HOPE and WISH, which are closely related to SART, might have lost their sequence specificity independently.

Another important point is that the elements having the same sequence specificity (in groups of TRAS, R1, SART, RT, R7, and R6) are closely located on the trees. This fact suggests that the elements with the same target sequence have diverged from the same ancestral element, which obtained its specificity from a single evolutionary event.

One exception was found in ACAY repeat-specific elements. MinoAg1 is clearly distant from all Waldo elements, being more homologous to the RT elements, and is thought to have obtained AC target independently. Five Waldo elements constitute a monophyly in the EN tree, but not in the RT tree. The RT tree showed that Waldo-A, Waldo-B, and WaldoFs1 are closely related to Hal, but WaldoAg1 and WaldoAg2 constitute a monophyly with R6/R7/RT rDNA inserted elements. Because the EN domain is the primary determinant for target sequence selection (Feng, Schumann, and Boeke 1998; Christensen, Pont-Kingdon, and Carroll 2000; Anzai, Takahashi, and Fujiwara 2001; Takahashi and Fujiwara 2002), the EN domain could be more conserved than the RT domain, among elements having the same sequence specificity, such as the Waldo elements. According to this idea, the EN tree indicates that the five Waldo elements originated from a common ACAY-integrating ancestor. The phylogenetic relationship on the RT tree indicates that the R6/R7/RT have branched from the Waldo elements.

The position of HalFs1 is also different on the two trees. Accumulation of a mutation or base substitution resulting from loss of function of the element could explain the inconsistency between the two trees. In fact, all five clones of HalFs1 that we identified in this study were defective (GenBank accession numbers AB078940 to AB078944).

Based on the above results and phylogenetic trees, we created a hypothetical model for the evolution of target specificity in R1 clade elements (fig. 4). The sequence specificity among R1 clade retrotransposons might have been obtained at one event before diversification. The first segregation occurred to produce the TRAS/R1 branch and then the SART/Waldo branch. R1 has been reported to exist among a wide variety of arthropod species (Jakubczak, Burke, and Eickbush 1991; Burke et al. 1993, 1998). TRAS is also suggested to exist among many insect orders (Kubo and Fujiwara, unpublished data). In this study, we showed that Waldo is found in two distant insect orders, Diptera and Dermaptera. These observations demonstrated that R1, TRAS, and Waldo are ancient retrotransposons that branched at a very early stage, although we do not yet have the evidence for the wide distribution of the SART element. These four elements have different target sequences, and thus the sequence specificity might have changed from the common ancestor, being a TRAS or R1 prototype. Later, R6/RT/R7 might have diverged from Waldo. R7 and RT are closely related phylogenetically and have very similar target sequences. R6 and R7 also have similar target sequences, not only in the bottom strand cleavage but also in the top strand cleavage. In addition, MinoAg1, which appears to have branched from RT, has similar sequences 5'-CACACA-3'/5'-TGTGTG-3' near the bottom strand of the RT element (5'-CCACAA-3'/5'-TTGTGG-3').



View larger version (34K):
[in this window]
[in a new window]
 
FIG. 4. Evolution of sequence specificity of non-LTR retrotransposons in the R1 clade. The phylogeny is constructed on the basis of figure 3. To show alteration of sequence specificity more simply, non-sequence-specific elements have been omitted. Target and flanking sequences of each element are shown on the right. The bottom to top strand cleavage in each element is represented by bent lines. Arrows indicate variation of top strand cleavage. Broken boxes indicate homologous sequences near cleavage sites among Waldo, R6, R7, RT, and Mino. Broken lines between top and bottom strands, in TRAS, SART, Waldo, and Mino, indicate unidentified exact cleavage sites, because they target the tandem repeats TTAGG and ACAY (or AC) repeats

 
Here, we have clarified eight different sequences as targets for R1 clade elements (R1, TRAS, SART, RT, R6, R7, Waldo, and Mino), although all targets were restricted in only three genomic locations, rDNA, telomeric repeats, and ACAY repeats (including AC repeats), all of which are repeated sequences. Why could we find no other target sequence? One possibility is that a few copied sequences in the host genome are not appropriate for sequence-specific retrotransposition, and an inserted element in such a position would be eliminated via selective pressure during evolution. In this study, we also found that many elements have lost the target specificity and have integrated into essentially random genomic locations, indicating that changes in the target specificity, which might represent a structural change in the EN domain, have occurred frequently during evolution. If the changed sequence specificity happened to hit a repeat sequence such as rDNA, the element could survive selection and therefore become a new sequence-specific retroelement. The frequent, random changes in the EN domain might have produced various sequence-specific elements in R1 clade non-LTR retrotransposons.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 
This work was supported by grants from the Ministry of Education, Science, and Culture of Japan (MESCJ).


    Footnotes
 
E-mail: haruh{at}k.u-tokyo.ac.jp. Back

Thomas Eickbush, Associate Editor


    Literature Cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 

    Aksoy, S., S. Williams, S. Chang, and F. F. Richards. 1990. SLACS retrotransposon from Trypanosoma brucei gambiense is similar to mammalian LINEs. Nucleic Acids Res. 18:785-792.[Abstract]

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410.[CrossRef][ISI][Medline]

    Anzai, T., H. Takahashi, and H. Fujiwara. 2001. Sequence-specific recognition and cleavage of telomeric repeat (TTAGG)n by endonuclease of non-long terminal repeat retrotransposon TRAS1. Mol. Cell Biol. 21:100-108.[Abstract/Free Full Text]

    Besansky, N. J., S. M. Paskewitz, D. M. Hamm, and F. H. Collins. 1992. Distinct families of site-specific retrotransposons occupy identical positions in the rRNA genes of Anopheles gambiae. Mol. Cell. Biol. 12:5102-5110.[Abstract]

    Browne, M. J., C. A. Read, H. Roiha, and D. M. Glover. 1984. Site specific insertion of a type I rDNA element into a unique sequence in the Drosophila melanogaster genome. Nucleic Acids Res. 12:9111-9122.[Abstract]

    Burke, W. D., D. G. Eickbush, Y. Xiong, J. Jakubczak, and T. H. Eickbush. 1993. Sequence relationship of retrotransposable elements R1 and R2 within and between divergent insect species. Mol. Biol. Evol. 10:163-185.[Abstract]

    Burke W. D., H. S. Malik, W. C. Lathe Iii, and T. H. Eickbush. 1998. Are retrotransposons long-term hitchhikers? Nature 392:141-142.[CrossRef][ISI][Medline]

    Burke, W. D., F. Müller, and T. H. Eickbush. 1995. R4, a non-LTR retrotransposon specific to the large subunit rRNA genes of nematodes. Nucleic Acids Res. 23:4628-4634.[Abstract]

    Busseau, I., E. Berezikov, and A. Bucheton. 2001. Identification of Waldo-A and Waldo-B, two closely related non-LTR retrotransposons in Drosophila. Mol. Biol. Evol. 18:196-205.[Abstract/Free Full Text]

    Christensen, S., G. Pont-Kingdon, and D. Carroll. 2000. Target specificity of the endonuclease from the Xenopus laevis non-long terminal repeat retrotransposon, Tx1L. Mol. Cell. Biol. 20:1219-1226.[Abstract/Free Full Text]

    Courseaux, A., and J. L. Nahon. 2001. Birth of two chimeric genes in the Hominidae lineage. Science 291:1293-1297.[Abstract/Free Full Text]

    Feng, Q., G. Schumann, and J. D. Boeke. 1998. Retrotransposon R1Bm endonuclease cleaves the target sequence. Proc. Natl. Acad. Sci. USA 95:2083-2088.[Abstract/Free Full Text]

    Gabriel, A., T. J. Yen, D. C. Schwartz, C. L. Smith, J. D. Boeke, B. Sollner-Webb, and D. W. Cleveland. 1990. A rapidly rearranging retrotransposon within the miniexon gene locus of Crithidia fasciculata. Mol. Cell. Biol. 10:615-624.[ISI][Medline]

    Gattett, J. E., D. S. Knutzon, and D. Carroll. 1989. Composite transposable elements in the Xenopus laevis genome. Mol. Cell. Biol. 9:3018-3027.[ISI][Medline]

    Hill, S. R., S. S. Leung, N. L. Quercia, D. Vasiliauskas, J. Yu, I. Pasic, D. Leung, A. Tran, and P. Romans. 2001. Ikirara insertions reveal five new Anopheles gambiae transposable elements in islands of repetitious sequence. J. Mol. Evol. 52:215-231.[ISI][Medline]

    Jakubczak, J. L., W. D. Burke, and T. H. Eickbush. 1991. Retrotransposable elements R1 and R2 interrupt the rRNA genes of most insects. Proc. Natl. Acad. Sci. USA 88:3295-3299.[Abstract]

    Jakubczak, J. L., M. K. Zenni, R. C. Woodruff, and T. H. Eickbush. 1992. Turnover of R1 (type I) and R2 (type II) retrotransposable elements in the ribosomal DNA of Drosophila melanogaster. Genetics 131:129-142.[Abstract/Free Full Text]

    Kidd, S. J., and D. M. Glover. 1980. A DNA segment from D. melanogaster which contains five tandemly repeating units homologous to the major rDNA insertion. Cell 19:103-119.[CrossRef][ISI][Medline]

    Kojima, K. K., Y. Kubo, and H. Fujiwara. 2002. Complex and tandem repeat structure of subtelomeric regions in the Taiwan cricket, Teleogryllus taiwanemma. J. Mol. Evol. 54:474-485.[CrossRef][ISI][Medline]

    Kravariti, L., R. Lecanidou, and G. C. Rodakis. 1995. Sequence analysis of a small early chorion gene subfamily interspersed within the late gene locus in Bombyx mori. J. Mol. Evol. 41:24-33.[ISI][Medline]

    Kubo, Y., S. Okazaki, T. Anzai, and H. Fujiwara. 2001. Structural and phylogenetic analysis of TRAS, telomeric repeat-specific non-LTR retrotransposon families in Lepidopteran insects. Mol. Biol. Evol. 18:848-857.[Abstract/Free Full Text]

    Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244-1245.[Abstract/Free Full Text]

    Malik, H. S., W. D. Burke, and T. H. Eickbush. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16:793-805.[Abstract]

    Malik, H. S., and T. H. Eickbush. 2000. NeSL-1, an ancient lineage of site-specific non-LTR retrotransposons from Caenorhabditis elegans. Genetics 154:193-203.[Abstract/Free Full Text]

    Miki, Y. 1998. Retrotransposal integration of mobile genetic elements in human diseases. J. Hum. Genet. 43:77-84.[CrossRef][ISI][Medline]

    Lander, E. S., L. M. Linton, and B. Birren, et al. (100 co-authors). 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921.[CrossRef][ISI][Medline]

    Okazaki, S., H. Ishikawa, and H. Fujiwara. 1995. Structural analysis of TRAS1, a novel family of telomeric repeat-associated retrotransposons in the silkworm, Bombyx mori. Mol. Cell. Biol. 15:4545-4552.[Abstract]

    Okazaki, S., K. Tsuchida, H. Maekawa, H. Ishikawa, and H. Fujiwara. 1993. Identification of a pentanucleotide telomeric sequence, (TTAGG)n, in the silkworm Bombyx mori and in other insects. Mol. Cell. Biol. 13:1424-1432.[Abstract]

    Paskewitz, S. M., and F. H. Collins. 1989. Site-specific ribosomal DNA insertion elements in Anopheles gambiae and A. arabiensis: nucleotide sequence of gene-element boundaries. Nucleic Acid Res. 17:8125-8133.[Abstract]

    Takahashi, H., and H. Fujiwara. 2002. Transplantation of target site specificity by swapping the endonuclease domains of two LINEs. EMBO J. 21:408-417.[Abstract/Free Full Text]

    Takahashi, H., S. Okazaki, and H. Fujiwara. 1997. A new family of site-specific retrotransposons, SART1, is inserted into telomeric repeats of the silkworm, Bombyx mori. Nucleic Acids Res. 25:1578-1584.[Abstract/Free Full Text]

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-4882.[Abstract/Free Full Text]

Accepted for publication October 11, 2002.