Identification of Waldo-A and Waldo-B, Two Closely Related Non-LTR Retrotransposons in Drosophila

Isabelle Busseau, Eugène Berezikov and Alain Bucheton

*Institut de Génétique Humaine, Centre National de la Recherche Scientifique, Montpellier, France; and
{dagger}Institute of Cytology and Genetics, Novosibirsk, Russia


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 literature cited
 
We have identified two novel, closely related subfamilies of non-long-terminal-repeat (non-LTR) retrotransposons in Drosophila melanogaster, the Waldo-A and Waldo-B subfamilies, that are in the same lineage as site-specific LTR retrotransposons of the R1 clade. Both contain potentially active copies with two large open reading frames, having coding capacities for a nucleoprotein as well as endonuclease and reverse transcriptase activities. Many copies are truncated at the 5' end, and most are surrounded by target site duplications of variable lengths. Elements of both subfamilies have a nonrandom distribution in the genome, often being inserted within or very close to (CA)n arrays. At the DNA level, the longest elements of Waldo-A and Waldo-B are 69% identical on their entire length, except for the 5' untranslated regions, which have a mosaic organization, suggesting that one arose from the other following new promoter acquisition. This event occurred before the speciation of the D. melanogaster subgroup of species, since both Waldo-A and Waldo-B coexist in other species of this subgroup.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 literature cited
 
Non-long-terminal-repeat (non-LTR) retrotransposons are an almost constant component of eukaryotic genomes. A recent extensive phylogenetic study of the endonuclease and reverse transcriptase domains of many non-LTR retrotransposons allowed the investigators to distinguish 11 distinct clades (Malik, Burke, and Eickbush 1999Citation ). In Drosophila melanogaster, four of these clades have been identified: the Jockey, I, R1 and R2 clades, with the last three being represented by the I, R1Dm, and R2Dm elements, respectively. The Jockey clade contains several subfamilies, including Jockey, F, Doc, BS, G, and TART. Elements from the Jockey, I, and R1 clades share a common organization. They contain two large open reading frames (ORFs). The first one encodes a protein with several zinc finger motifs of the CCHC type whose function remains unknown. A possible role as a nucleocapsid was suggested based on nucleic acid–binding properties (Dawson et al. 1997Citation ). The second ORF encodes endonuclease and reverse transcriptase (Martin et al. 1995Citation ; Feng et al. 1996Citation ; Feng, Schumann, and Boeke 1998Citation ) and, in the case of the I factor, RNase H activities (Fawcett et al. 1986Citation ; Abad et al. 1989Citation ). Elements from the R2 clade contain only one ORF, encoding endonuclease and reverse transcriptase activities (Xiong and Eickbush 1988aCitation ). Most non-LTR retrotransposons are terminated at the 3' end by an A-rich sequence, usually a polyA (F, G, Doc, Jockey, BS, R2, TART), occasionally (TAA)n (I).

I elements and elements from the Jockey clade, except TART, insert at random locations, although they show a marked preference for AT-rich sites. TART elements insert preferentially at the ends of the chromosomes, where they are found associated with HeT-A elements, playing the role of telomeres (Levis et al. 1993Citation ). R1 and R2 elements are site-specific and are mostly found inserted at the same positions within the 28S rDNA genes (Xiong and Eickbush 1988bCitation ; Jakubczak, Xiong, and Eickbush 1990Citation ).

Retrotransposition of these elements is believed to start with the synthesis of a full-length transcript that may serve both as the messenger for protein translation and as the transposition intermediate (Chaboissier et al. 1990Citation ). The full-length transcript is produced from an internal promoter located within the 5' untranslated region (UTR) as was shown for Jockey (Mizrokhi, Georgieva, and Ilyin 1988Citation ), I (McLean, Bucheton, and Finnegan 1993Citation ), Doc, and F elements (Minchiotti and Di Nocera 1991Citation ; Contursi, Minchiotti, and Di Nocera 1995Citation ). In the case of R2Bm in Bombyx mori, reverse transcription initiates at the site of integration, using as a primer the 3' OH end of the target DNA that was liberated after cleavage by the endonuclease (Luan et al. 1993Citation ). This target primed reverse transcription (TPRT) model is currently suggested for other non-LTR retrotransposons. Non-LTR retrotransposons often tend to lose their 5' ends upon transposition due to incomplete reverse transcription.

Until recently, the only means to recover new transposons in Drosophila were either to wait until a serendipitous study of a spontaneous mutation would reveal a new insertion or to perform PCR experiments with degenerated oligonucleotides designed from conserved sites like the reverse transcriptase domains of non-LTR retrotransposons. Nowadays, the genome of D. melanogaster is almost entirely sequenced (Adams et al. 2000Citation ), and this, along with the availability of powerful tools allowing very rapid sequence searches and analyses, largely facilitates the identification of new transposable elements. Here, we describe two closely related new subfamilies of non-LTR retrotransposons in D. melanogaster, the Waldo-A and Waldo-B subfamilies.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 literature cited
 
Sequence Analyses
Searches for matches of amino acid sequences in the nonredundant database were done at the National Center for Biotechnology Infomation using BLASTP, version 2.0.12 (Altschul et al. 1997Citation ). Searches for matches of nucleotide sequences in the Drosophila genomic sequences were performed at the Berkley Drosophila Genome Project (BDGP; http://www.fruitfly.org) using BLASTN, version 2.0a19 (Washington University) (Altschul et al. 1990Citation ), or at the NCBI using BLASTN, version 2.0.12. Searches for ORFs, restriction sites, and putative target site duplications were performed using the DNA Strider 1.3f11 program (Commissariat à l'Energie Atomique).

Reverse transcriptase and apurinic/apyrimidinic endonuclease domains of Waldo-A and Waldo-B elements were aligned to previously established alignments DS36752 and DS36736 (Malik, Burke, and Eickbush 1999Citation ) by the hmmalign program from the HMMER package, version 2.1.1, (http://hmmer.wustl.edu). Reconstruction of neighbor-joining phylogenetic trees and bootstrap analysis were carried out with MEGA, version 1.02 (Kumar, Tamura, and Nei 1993Citation ).

Protein sequences of the regions of ORF1 containing CCHC motifs in non-LTR retrotransposons were aligned with the Multialin program (http://www.toulouse.inra.fr) (Corpet 1998Citation ) and shaded with Boxshade, version 3.21 (http://www.ch.embnet.org).

PCR Amplifications
PCR amplifications were performed using standard conditions with Taq DNA polymerase (Promega). The Waldo ORF PCR fragment was amplified from clone J1 DNA using primers ww1 (5'-AGGTGGACAGAAACCACTCGACGGG-3') and ww2 (5'-CCTCCTTAGCTTTTTGGTAACAAGC-3'). The probes used in Southern blot hybridizations were amplified from genomic DNA from the Cha strain using primers Ber1up (5'-CGAGAGACAAAGGGCATAGCTTCC-3') and Ber1do (5'-GTTGCTGATCGATGCCCATAGCCG-3') for PCR Waldo-A ORF1, Ber2up (5'-TGTTATAAAAGCAGTGGCGCTGGG-3') and Ber2do (5'-CGCCACTCCGCATTAGGCTGAGAG-3') for PCR Waldo-A ORF2, 341up (5'-AGCGAGGATAGGGGCGTGCTAGTG-3') and 341do (5'-TAGGGAGCTCGGTGGCCGAATTCG-3') for PCR Waldo-B ORF1, and 342up (5'-GATCGTCAAAGCAGCCGCCACCGC-3') and 342do (5'-ATCCCTCACTCGCAAGGTATTTGC-3') for PCR Waldo-B ORF2.

Southern Blots
Digestion of genomic DNA, gel electrophoresis, transfer on Nytran N nylon membranes (Schleicher and Schuell), and hybridization with 32P-labeled DNA probes were performed following standard procedures (Sambrook, Fritsch, and Maniatis 1989Citation ) and suppliers' specifications. Hybridizations were carried out overnight at 42°C in 50% formamide. Washes were in 2 x SSC, 0.1% SDS, followed by 0.1 x SSC, 0.1% SDS at 42°C.

Inverse PCR
After digestion with SspI that did not cut within known Waldo elements, genomic DNA from flies of the Cha strain was self-ligated and amplified (35 cycles of 94°C for 45 s, 50°C for 45 s, and 72°C for 2 min) with Taq DNA polymerase (Promega) using primers Rw172 (5'-ACCTTGACTGGCAGTCCCGGTGAGC-3') and Berdo87 (5'-GCTCTACTGTCGCAACACAACACTG-3') specific for Waldo-A elements or primers Rw172 and 34do76 (5'-TGCAGTTTACGGCTGACCGGACTCG-3') specific for Waldo-B elements. PCR fragments were cloned using the PCR-Script Amp cloning kit (Stratagene) and sequenced using standard procedures by Genome Express S.A.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 literature cited
 
Identification of an Endonuclease Domain Encoded by a Repeated Mobile Sequence
The starting point of the present work was the identification, within the D. melanogaster genomic clone containing the Jockey J1 element (Priimagi, Mizrokhi, and Ilyin 1988Citation ), of an ORF immediately adjacent to a large (CA)n microsatellite array (fig. 1 ). This ORF, called Waldo, was conceptually translated into an amino acid sequence, which was used for a BLAST search in the protein sequences present in the current databases. It was found to be related to the endonuclease domain encoded by non-LTR retrotransposons, more closely to the mosquito RT1 element (Besansky et al. 1992Citation ) and the insect R1 elements. A PCR fragment encompassing the Waldo ORF (fig. 1 ) was synthesized and used as a probe to hybridize Southern blots of genomic DNA extracted from four different D. melanogaster strains and digested with various restriction enzymes. Complex patterns of hybridization with strain-to-strain variations (not shown) suggested that the Waldo ORF belonged to a larger unit of at least 1.5 kb that appeared to be mobile and repeated several times in the genome. Taken together, these observations suggested that the Waldo ORF belonged to a yet-unidentified non-LTR retrotransposon in D. melanogaster. This motivated further studies, presented below.



View larger version (7K):
[in this window]
[in a new window]
 
Fig. 1.—Sequence organization between SalI and SmaI restriction sites of clone J1. The drawing is not to scale. A light striped box represents the Jockey element, and the arrow above indicates transcriptional orientation. A black box represents the Waldo ORF, and the arrow above indicates the coding strand. A heavy striped box represents CA repeats. White boxes represent genomic DNA flanking the Jockey element

 
Two New Subfamilies of Non-LTR Retrotransposons
The DNA sequence of the Waldo ORF was used for a BLAST search in the genome sequences released by the BDGP. Two categories of sequences homologous to the Waldo ORF and with coding capacities came out of this search. The first category of sequences, contained in AC005734, AC006563, and AC007575, were 100% identical to the Waldo ORF; they were designated Waldo-A sequences. The second category of sequences, contained in AC005847 and AC004349, showed 69% similarity to the Waldo ORF; they were designated Waldo-B sequences. Further studies, presented below, showed that these sequences specified two distinct but closely related subfamilies of non-LTR retrotransposons.

The Waldo-A Subfamily
The sequences surrounding the Waldo ORF homology present in AC005734, AC006563, and AC007575 were analyzed using DNA Strider. Their organization was typical of that of non-LTR retrotransposons, with two large ORFs of 1,497 and 2,964 bp (fig. 2 ). Conceptual translation of these ORFs indicated that the first ORF may encode a protein of 498 amino acids containing three CCHC motifs, and the second ORF may encode a protein of 987 amino acids containing endonuclease and reverse transcriptase domains analogous to those of other non-LTR retrotransposons, as well as one CCHC motif. No RNaseH domain could be identified. The two ORFs overlap by 24 bp. The Waldo-A element in AC006563 has a stop codon in frame at position 1093 of ORF1 and is presumably unable to encode a full-length product. The 3' UTR is 328 bp and terminates with a polyA stretch. Putative target site duplications (TSDs) were identified at the ends of each element. For simplicity, sequences lying between the putative 5' TSD and the first ATG of ORF1 will be referred to as the 5' UTR of the element considered.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 2.—Sequence organization of Waldo-A and Waldo-B elements from the BDGP. Accession numbers are indicated on the left, followed by the chromosomal locations of the clones that contain Waldo elements. Waldo elements are represented as white boxes inside which ORFs are indicated as thin arrows. At the 3' ends of the elements, the numbers of A residues are indicated. Putative target site duplications are indicated with thick black arrows. Positions of restriction sites relevant to the study are indicated, as are positions of the PCR probes that were used in Southern blot hybridization

 
The 3'-most 400 bp (excluding the polyA stretch) of the Waldo-A element was used for a BLAST search in the genomic sequences released by the BDGP. This allowed us to recover several other copies of the Waldo-A element that were variously truncated at the 5' end. They all terminated at the 3' end with a polyA stretch and were surrounded by putative TSDs. The sequence organization of those in AC007818, AC007669, AC007356, AC005430, AC007147, and AC004251 are shown in figure 2 . All of these elements were present within clones that were localized at dispersed sites on chromosomal arms (Hartl et al. 1994Citation ; Hoskins et al. 2000Citation ).

Our BLAST searches for Waldo-A–related sequences also identified, in addition to the elements presented in figure 2, a number of variously degenerated short elements with 75%–93% sequence similarity, usually rearranged and devoid of coding capacities (data not shown). Some of them were located in region 41–43 of chromosome II and were associated with other defective, rearranged copies of other known retrotransposons, either LTR or non-LTR (data not shown).

The Waldo-B Family
Analyses of the Waldo-B elements were conducted in the same way as analyses for the Waldo-A elements. Their sequence organizations were very similar (fig. 2 ). Two long Waldo-B elements were identified within AC005847 and AC004349. They contained two ORFs of 1,467 and 2,970 bp that overlapped by 20 bp. These two ORFs may encode proteins of 488 and 989 amino acids with the same domain organization as in Waldo-A elements. BLASTs of BDGP sequences with the 3'-most 400 bp of the Waldo-B element in AC005847 allowed us to recover one 5' truncated copy of Waldo-B, present in AC007851. Other copies of Waldo-B were identified in this search but were not included in this study because their complete sequences were not available, and we did not know whether they are complete or truncated at the 5' end. The Waldo-B element present in AC004349 has a deletion of the 3' end starting 6 bp before the polyA sequence. It is not flanked by TSDs, suggesting that the 3' end of the element was deleted after insertion. The 3' UTR of the Waldo-B elements in AC005847 and AC007851 are 302 bp long and terminate with a polyA stretch, and putative short TSDs could be identified. All of these elements were present within clones that were localized at dispersed sites on chromosomal arms (Hartl et al. 1994Citation ; Hoskins et al. 2000Citation ). As in the case of Waldo-A, we also identified variously degenerated Waldo-B elements associated with other defective, rearranged copies of other known retrotransposons (data not shown).

The Waldo elements described in this work were all retrieved from the sequences released by the BDGP before the publication of the complete sequence of the Drosophila genome (Release 1) by Adams et al. (2000). We also searched Waldo elements in the sequences from Release 1. As expected, we found in Release 1 all of the Waldo copies that we had identified in the sequences released by the BDGP, along with a few more copies of Waldo-A and Waldo-B elements. However, none of the Waldo-A or Waldo-B elements found in the sequences of Release 1 are capable of encoding complete products of ORF1 or ORF2. In fact, all of the sequences that could be found in both the BDGP and the Release 1 databases contain several differences in the bodies of the Waldo elements, whereas the surrounding sequences are identical. This is assumed to be due to the high level of errors within repetitive sequences in Release 1 (Myers et al. 2000; see also http://www.celera.com/genomeanalysis/ and http://www.fruitfly.org/sequence/faq.html). Therefore, Waldo-A and Waldo-B elements that were found in sequences from Release 1 were not included in the present study.

The 5' UTRs of Waldo Elements
The sequence organization of the 5' UTRs of the three longer Waldo-A elements (in AC006563, AC005734, and AC007575) and of the Waldo-B element in AC005847 are shown in figure 3a. The 5' UTR of the longest Waldo-A element (in AC006563) is 615 bp long. The 5' UTRs of the other two Waldo-A elements are truncated at the 5' ends, and the Waldo-A element in AC007575 also has an internal deletion between nucleotides -222 and -440. The putative 5' UTR of the Waldo-B element in AC005847 is 434 bp long and contains, between nucleotides -129 and -246, a short region of similarity (71%) with the sequences lying between nucleotides -287 and -404 in the Waldo-A element in AC006563. However, in this short region there are no similarities between the putative 5' UTR of Waldo-A and Waldo-B elements. Minchiotti, Contursi, and Di Nocera (1997)Citation have identified an 18-bp-long consensus sequence that is located around 20 nt from the transcription start in the 5' UTR of several non-LTR retrotransposons and that is required for proper transcription initiation. We searched within the 5' UTR of Waldo-A and Waldo-B for the presence of this consensus sequence. It appeared that the longest Waldo-A element (AC006563) contained, starting 20 nt from its 5' end (position -595), sequences that matched this consensus well (fig. 3b ). It is therefore likely that this Waldo-A element is full length. No such sequences were identified in Waldo-B.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 3.—The 5' UTRs of Waldo elements. a, Sequence organization of the 5' UTRs of the longest Waldo elements. Numbers indicate positions in the elements. Position +1 is the first nucleotide of the ATG start codon of ORF1. White boxes represent 5' UTRs of Waldo-A elements, stippled boxes represent 5' UTRs of Waldo-B elements, striped boxes represent the regions of sequence similarity between the two. Unboxed hatchings indicate a deletion. b, Alignment of the 18-bp consensus sequence identified in the 5' UTR of the longest Waldo-A element and of other non-LTR retrotransposons (Minchiotti, Contursi, and Di Nocera, 1997Citation ). Numbers indicate the position of each sequence relative to the 5' end of the element. In the consensus sequence at the bottom, R stands for purine, Y for pyrimidine, W for A or T, and M for A or C, and X denotes any nucleotide

 
The putative 5' UTRs of Waldo-A in AC006563 and of Waldo-B in AC005847 were used in BLAST search in expressed sequence tags (ESTs) of the BDGP. No EST similar to the Waldo-A 5' UTR was found. Three ESTs similar to the Waldo-B 5' UTR were found, corresponding to two cDNAs obtained with RNAs extracted from larvae and early pupae (clone LP01280) and from head, brain, and sensory organs (clone HL01331). These two cDNAs contain both Waldo-B sequences and 5' adjacent sequences and therefore probably correspond to read-through transcripts.

Genomic Organization of Waldo-A and Waldo-B Elements
We designed four PCR probes using relevant oligonucleotides, each of them specific to ORF1 or ORF2 of each element (fig. 2 ). These probes were hybridized under high stringency to Southern blots of genomic DNA from four strains of D. melanogaster digested with either SmaI or SalI/NcoI (fig. 4 ). SmaI cuts both elements once, while SalI/NcoI digests should release two internal fragments containing either ORF1 or ORF2 from both elements. Under these conditions, each probe reveals a specific set of fragments, indicating that the Waldo-A and Waldo-B elements can be distinguished by hybridization. When the DNAs were digested with SmaI, many fragments hybridize with all probes, confirming that several copies of Waldo-A and Waldo-B elements are present in all four tested strains. The patterns of hybridization with each probe are largely similar, but some bands differ from strain to strain, as expected for mobile elements. When the DNAs were digested with SalI/NcoI, bands corresponding to internal fragments of the expected sizes were intensively revealed in all cases, in addition to several other bands giving a weaker signal and corresponding to higher- and lower-molecular-weight fragments. These results indicate that both the Waldo-A and the Waldo-B subfamilies comprise several potentially full length elements containing both ORF1 and ORF2, along with a number of defective elements. This is in agreement with the findings of our BLAST searches.



View larger version (58K):
[in this window]
[in a new window]
 
Fig. 4.—Southern blot analyses of Drosophila melanogaster genomic DNA probed with various parts of Waldo elements. Strains Cha (a), w1118 (b), misy (c), and JA (d) were used in the study. The enzymes that were used to digest genomic DNA and the PCR probes that were used for hybridization are indicated at the top of the figure. The positions of the probes on Waldo sequences are indicated in figure 2

 
Nonrandom Distribution of Waldo-A and Waldo-B Elements
Strikingly, the 5' ends of many copies of Waldo-A (4/9) and Waldo-B (2/3) that we found in the BDGP sequences were located near (CA)n repeats or inserted within such sequences (fig. 5 ). Moreover, the original Waldo ORF was found very close to a long stretch of (CA)n. Other known non-LTR retrotransposons (I, F, Jockey, Doc, FB elements) are seldom located near such sequences. In order to verify whether this observation reflects a property of Waldo-A and Waldo-B elements, we identified by inverse PCR and sequenced the ends and adjacent DNA of some long copies of these elements present in the strain Cha of D. melanogaster. We thus recovered sequences from the ends of four Waldo-A elements and two Waldo-B elements (fig. 5 ). Some of these elements indeed appeared to be located close to (CA)n sequences, as we expected. Some others were not associated with (CA)n sequences, but with other kinds of repeats: (TTTACACA)n in the case of CBE2 and (CAACA)n in the case of C21. Finally, three elements, CBA4, CBE2, and C16–C24, did not seem to be associated with any kind of repeated sequences. Therefore, the Waldo-A and Waldo-B elements certainly show a strong tendency to insert very close to microsatellite sequences mostly of the kind (CA)n, but this is not a mandatory rule.



View larger version (43K):
[in this window]
[in a new window]
 
Fig. 5.—Sequences at the ends of Waldo-A and Waldo-B elements; sequences at the ends of Waldo-A elements from the BDGP (a) and from inverse-PCR analyses (b), and sequences at the ends of Waldo-B elements from the BDGP (c) and from inverse-PCR analyses (d). Waldo sequences are shown in bold, adjacent DNA sequences are shown in plain type, target site duplications are underlined, and repeats are italicized

 
Sequences adjacent to the Waldo elements recovered by inverse PCR were used for BLAST search in the Drosophila genome sequence (Adams et al. 2000). This allowed us to identify the empty sites for the copies of Waldo-A in CBA4, C8, and C21 (not shown), indicating that these are recently transposed copies.

Relationship Between Waldo-A, Waldo-B, and Other Non-LTR Retrotransposons
Waldo-A and Waldo-B are 69% similar to each other at the DNA level for their entire length except for their 5' UTRs. The putative protein products of ORF1 and ORF2 are 62.3% and 66.8% similar at the amino acid level, respectively, between Waldo-A and Waldo-B.

The reverse transcriptase and endonuclease domains of Waldo-A and Waldo-B were compared with the alignments of Malik, Burke, and Eickbush (1999)Citation . Both domains were found to be related to those of elements of the R1 clade (fig. 6 ). This clade comprises site-specific non-LTR retrotransposons RT1 and RT2 in Anopheles gambiae (Besansky et al. 1992Citation ), SART1 (Takahashi, Okazaki, and Fujiwara 1997Citation ) and TRAS1 (Okazaki, Ishikawa, and Fujiwara 1995Citation ) in Bombyx mori, and R1 in insects. On the reverse transcriptase tree (fig. 6a ) Waldo and RT elements are grouped together, whereas SART1 is an external branch. On the endonuclease tree (fig. 6b ), RT elements group with SART1 with high confidence, but the position of Waldo elements is uncertain and the Waldo branch appears equally away from all other site-specific elements in the clade.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 6.—Phylogeny of Waldo-A and Waldo-B elements; neighbor-joining tree of reverse transcriptase (a) and AP endonuclease (b) domains of non-LTR retrotransposons. Only the parts of the trees containing elements of the R1 and LOA clades are shown. Numbers to the left of the nodes are bootstrap values as percentages of 500 replicates. Sequence alignments were produced by HMER by adding Waldo-relevant domains to alignments DS36752 and DS36736 (Malik, Burke, and Eickbush 1999Citation )

 
Proteins encoded by ORF1 of non-LTR retrotransposons are usually much more divergent than those encoded by ORF2. The only domain that can be recognized within many ORF1 products is a short region containing three zinc finger–like motifs of the CCHC type. Comparison of these domains between several non-LTR retrotransposons emphasized the close relationship between Waldo and other elements from the R1 clade (fig. 7 ). The CCHC domain of the I element from D. melanogaster added to the study is more divergent.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 7.—Alignments of amino acid sequences of the CCHC region of the ORF1 from non-LTR retrotransposons. Black dots indicate amino acids that are present in all aligned sequences. Open dots indicate amino acids that are present in at least four of the R1 clade elements

 
Coexistence of the Waldo-A and Waldo-B Subfamilies in Other Species of the D. melanogaster Subgroup
Given the strong similarity of Waldo-A and Waldo-B, they appear to be closely related, and they obviously define two subfamilies deriving from the same original non-LTR retrotransposon. As a preliminary attempt to trace their history and date the divergence, we looked for their presence in other Drosophila species from the D. melanogaster subgroup: three strains of D. melanogaster, two strains of D. simulans, and one strain each of D. mauritiana, D. teissieri, and D. yakuba. One strain of D. virilis belonging to a distant group was also added to the study. Genomic DNAs were digested with NcoI, which should release an internal fragment of 2,236 bp hybridizing with probe Waldo-A ORF2 and an internal fragment of 1,930 bp hybridizing with probe Waldo-B ORF2. One of the two NcoI sites lies within a 26-bp sequence perfectly conserved between the Waldo-A and Waldo-B elements. Southern blots are shown in figure 8 . With Waldo-A, a band of 2.2 kb was revealed with an intense hybridization signal in D. melanogaster, D. simulans, and D. mauritiana, in addition to many other bands with weaker signal. The overall intensities of the signal were very similar in D. melanogaster and the sibling species D. simulans and D. mauritiana. Sequences similar to the Waldo-A probe were also distinguishable in D. teissieri and D. yakuba as bands of a weaker intensity, but not in D. virilis. With Waldo-B, a band of 1.9 kb with an intense hybridization signal was revealed only in D. melanogaster. Several bands heterogenous in size were revealed in D. simulans and D. mauritiana, but their intensity was much weaker than that of those seen in D. melanogaster. In D. teissieri and D. yakuba, the signal was even weaker. There were no detectable signals in D. virilis. The patterns of hybridization with the Waldo-A ORF2 probe and with the Waldo-B ORF2 probes never overlapped, indicating that both subfamilies coexist in all species tested. Therefore, the divergence between the Waldo-A and the Waldo-B elements occurred before the emergence of these species.



View larger version (115K):
[in this window]
[in a new window]
 
Fig. 8.—Southern blot analyses of genomic DNA from various Drosophila species. Genomic DNA was digested with NcoI and hybridized with the ORF2 internal PCR probes of Waldo-A (a) and Waldo-B (b). The positions of the probes on Waldo sequences are indicated in figure 2

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 literature cited
 
Waldo-A and Waldo-B are two previously undescribed transposable elements found in D. melanogaster. Sequence comparisons showed that they are very similar and represent two closely related subfamilies. Phylogenetic analyses of the endonuclease and reverse transcriptase domains of the product of ORF2 indicated that they belong to the R1 clade defined by Malik, Burke, and Eickbush (1999)Citation . Comparisons of the sequences around the CCHC domain of the ORF1 product also emphasized the close relationship between Waldo elements and retrotransposons of the R1 clade. This clade includes site-specific non-LTR retrotransposons like RT1 and RT2 in mosquitoes, R1 in dipterans, and SART and TRAS in silkworms. However, Waldo elements cannot be considered as bona fide site-specific retrotransposons (see below). In vitro studies of the endonuclease domain of R1 elements in B. mori have revealed a preference for the DNA sequence into which it is inserted in the genome, suggesting that the site specificity of integration of R1 elements is largely determined by the endonuclease domain (Feng, Schumann, and Boeke 1998Citation ). The Waldo-A and Waldo-B endonuclease domains, which are probably devoid of strong cleavage specificity, appear equally distant from those of other elements of the R1 clade.

Copies of Waldo-A and Waldo-B are often inserted near or within repeats such as (CA)n or related sequences. This might be the reason why they were not discovered before, since these repeats are rarely found within genes. The frequent association between Waldo-A and Waldo-B elements and microsatellite-like sequences indicates a nonrandom distribution of Waldo elements in the genome. This might reflect a preference of integration of Waldo elements, not at the sequence level but possibly by interaction with some higher-order chromatin structures determined by microsatellite regions. Alternatively, this could result from a better conservation of the elements that integrated in these types of regions than of those that integrated elsewhere. However, in this case, one would expect degenerated elements to be associated with microsatellite-like sequences as well, and this does not appear to be the case. Noticeably, other non-LTR retrotransposons of Drosophila are not found preferentially associated with repeated DNA, so the distribution pattern of Waldo elements appears specific to this family.

The available data bring very little insight into the frequency of retrotransposition of Waldo-A and Waldo-B elements. Southern blots revealed that some genomic restriction fragments containing Waldo sequences are variable from strain to strain, suggesting that Waldo elements have recently transposed. Besides, inverse PCR analyses of Waldo elements in the Cha strain identified copies inserted within genomic sequences that are found empty in the strain that was used for sequencing by Adams et al. (2000)Citation . These observations indicate that the Waldo elements are capable of transposition. However, the intensity of their transpositional activity is difficult to estimate, although it is probably not very high. Since the production of a full-length transcript is a prerequisite for mobility of non-LTR retrotransposons, it would be of interest to determine whether Waldo-A and Waldo-B are transcriptionally active. Searches for ESTs did not allow us to identify such a candidate, but this might be due to the fact that their transcription might be restricted to particular tissues. In general, non-LTR retrotransposons are transcribed from an internal promoter located within their 5' UTRs. We were able to identify within the Waldo-A 5' UTR some sequences matching a consensus found in the promoter of other non-LTR retrotransposons (Minchiotti, Contursi, and Di Nocera 1997Citation ). It therefore seems reasonable to speculate that Waldo also uses an internal promoter located within the 5' UTR. Further work, including Northern and RT-PCR analyses, will be necessary to address the question of the transcriptional activity of Waldo. However, such studies might not be very informative in view of retrotranspositional activity. Among D. melanogaster non-LTR retrotransposons, the I factor is the only one for which a strong correlation between transcription and retrotransposition has been established (Chaboissier et al. 1990Citation ; McLean, Bucheton, and Finnegan 1993Citation ). By contrast, Jockey, F, and Doc are actively transcribed in various tissues (Mizrokhi, Georgieva, and Ilyin 1988Citation ; Minchiotti et al. 1994Citation ; Zhao and Bownes 1998Citation ) but undergo extremely low levels of retrotransposition.

Waldo-A and Waldo-B represent two closely related subfamilies that coexist within the same species. This situation is reminiscent of that of L1 elements in some mammals. In the mouse, several subfamilies of L1 elements coexist. Two of them, the A and TF subfamilies, contain retrotranspositionally active members (DeBernardinis et al. 1998Citation ; Naas et al. 1998Citation ). Full-length copies of the A and TF subfamilies are very similar, except within their 5' UTR, which are constituted by several monomeric repeats retaining promoter activity (Severyinse, Hutchison, and Edgell 1992Citation ; DeBernardinis and Kazazian 1999Citation ). Monomeric sequences of TF-type L1 5' UTR are different from those of the A type, and therefore the two subfamilies are under distinct transcriptional control. It is believed that a new subfamily of mouse L1 may be formed following occasional capture of a new 5' UTR with promoter activity (Adey et al. 1994Citation ). The same could be true for the Waldo-A and Waldo-B subfamilies, with one having derived from the other after accidental acquisition of a new promoter. This probably resulted from complex events, given the mosaic structures of the 5' UTRs, which contain a short (~100 bp) region of similarity, surrounded by unrelated blocks. Possibly, the region that is conserved between the two 5' UTRs might contain some sequences that are important for the activity or regulation of the elements. This event can be dated to before the formation of the species of the D. melanogaster subgroup; since Waldo-A and Waldo-B coexist in all tested species from this subgroup, it is likely that they were both present within their common ancestor.

All non-LTR retrotransposon families that are currently known in D. melanogaster are old components of the genome that are also found in sibling species. Some of them, like R1 and R2, are common to all dipterans (Jakubczak, Burke, and Eickbush 1991Citation ) and are also found outside (Malik, Burke, and Eickbush 1999Citation ). At least one case of loss of a functional element followed by reinvasion has been documented: the I factor which existed in the common ancestor of the D. melanogaster subgroup was apparently lost in D. melanogaster and very efficiently reinvaded the species in the middle of the century (Bucheton et al. 1986, 1992Citation ; Sezutsu, Nitasaka, and Yamazaki 1995Citation ). Studies of the I factor family and, to a lesser extent, of other families of non–site-specific, non-LTR retrotransposons have revealed that recently integrated copies (full-length or 5'-truncated) are found mostly in euchromatic sites, whereas defective, rearranged, inactive copies, corresponding to old components of the genome, have accumulated in the pericentromeric heterochromatic regions (Crozatier et al. 1988Citation ; Simonelig et al. 1988Citation ; Vaury, Bucheton, and Pélisson 1989Citation ; Pimpinelli et al. 1995Citation ). All Waldo-A and Waldo-B elements shown in figure 2 map to euchromatic sites. They are more than 99.5% similar within each subfamily, variably truncated at the 5' end, and, except for Waldo-B in AC004349, surrounded by target site duplications. Therefore, they most likely correspond to recently transposed copies. Our BLAST searches also identified more divergent elements, variously mutated and deleted, with many of them being present in clones for which no specific chromosomal location could be assigned by the BDGP. These degenerated elements could very well correspond to pericentromeric copies. It is interesting to note that these degenerated elements also fall into two subfamilies, one more related to Waldo-A and the other more related to Waldo-B. Determination of the sequences of the elements located in pericentromeric heterochromatin would allow thorough studies of these elements and bring insight into the evolutionary story of Waldo-A and Waldo-B subfamilies.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 literature cited
 
The sequences reported in this paper have been deposited in the GenBank database (accession numbers AF281636AF281649).


    Note Added in Proof
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 literature cited
 
The pilger non-LTR retrotransposon (GenBank accession number AJ278684) corresponds to the Waldo-B element in AC005847.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 literature cited
 
This paper is dedicated to the memory of Laurent "Doone" Caron, who inspired the name "Waldo." We thank Matthieu Seveau for his help in the study of Waldo elements in the Cha strain, and Christophe Terzian for critical reading of the manuscript. This work was supported by grants from the Centre National de la Recherche Scientifique (CNRS) and from the Association pour la Recherche sur le Cancer (ARC).


    Footnotes
 
Pierre Capy, Reviewing Editor

1 Abbreviations: ORF, open reading frame; TSD, target site duplication; UTR, untranslated region. Back

2 Keywords: Drosophila non-LTR retrotransposon microsatellite phylogeny Back

3 Address for correspondence and reprints: Isabelle Busseau, Institut de Génétique Humaine, Centre National de la Recherche Scientifique, 141 rue de la Cardonille, 34396 Montpellier cedex 05, France. E-mail: busseau{at}igh.cnrs.fr Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 literature cited
 

    Abad, P., C. Vaury, A. Pélisson, M.-C. Chaboissier, I. Busseau, and A. Bucheton. 1989. A long interspersed repetitive element—the I factor of Drosophila teissieri—is able to transpose in different Drosophila species. Proc. Natl. Acad. Sci. USA 86:8887–8891.

    Adams, M. D., S. E. Celniker, R. A. Holt et al. (195 co-authors). 2000. The genome sequence of Drosophila melanogaster. Science 287:2185–2195.

    Adey, N. B., T. O. Tollefsbol, A. B. Sparks, M. H. Edgell, and C. A. Hutchison III. 1994. Molecular resurrection of an extinct ancestral promoter for mouse L1. Proc. Natl. Acad. Sci. USA 91:1569–1573.

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410.[ISI][Medline]

    Altschul, S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.[Abstract/Free Full Text]

    Besansky, N. J., S. M. Paskewitz, D. M. Mills-Hamm, and F. H. Collins. 1992. Distinct families of site-specific retroposons occupy identical positions in the rRNA genes of Anopheles gambiae. Mol. Cell. Biol. 12:5102–5110.

    Bucheton, A., M. Simonelig, C. Vaury, and M. Crozatier. 1986. Sequences similar to the I transposable element involved in I-R hybrid dysgenesis in D. melanogaster occur in other Drosophila species. Nature 322:650–652.

    Bucheton, A., C. Vaury, M.-C. Chaboissier, P. Abad, A. Pélisson, and M. Simonelig. 1992. I elements and the Drosophila genome. Genetica 86:175–190.

    Chaboissier, M. C., I. Busseau, J. Prosser, D. J. Finnegan, and A. Bucheton. 1990. Identification of a potential RNA intermediate for transposition of the LINE-like element I factor in Drosophila melanogaster. EMBO J. 9:3557–3563.

    Contursi, C., G. Minchiotti, and P. P. Di Nocera. 1995. Identification of sequences which regulate the expression of Drosophila melanogaster Doc elements. J. Biol. Chem. 270:26570–26576.[Abstract/Free Full Text]

    Corpet, F. 1988. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 16:10881–10890.[Abstract]

    Crozatier, M., C. Vaury, I. Busseau, A. Pélisson, and A. Bucheton. 1988. Structure and genomic organization of I elements involved in I-R hybrid dysgenesis in Drosophila melanogaster. Nucleic Acids Res. 16:9199–9213.

    Dawson, A., E. Hartswood, T. Paterson, and D. J. Finnegan. 1997. A LINE-like transposable element of Drosophila, the I factor, encodes a protein with properties similar to those of retroviral nucleocapsids. EMBO J. 16:4448–4455.[Abstract/Free Full Text]

    DeBernardinis, R. J., J. L. Goodier, E. M. Ostertag, and H. H. Kazazian. 1998. Rapid amplification of a retrotransposon subfamily is evolving the mouse genome. Nat. Genet. 20:288–290.[ISI][Medline]

    DeBernardinis, R. J., and H. H. Kazazian. 1999. Analysis of the promoter from an expanding mouse retrotransposon subfamily. Genomics 56:317–323.

    Fawcett, D. H., C. K. Lister, E. Kellett, and D. J. Finnegan. 1986. Transposable elements controlling I-R hybrid dysgenesis in D. melanogaster are similar to mammalian LINEs. Cell 47:1007–1015.

    Feng, Q., J. V. Moran, H. H. Kazazian, and J. D. Boeke. 1996. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87:905–916.

    Feng, Q., G. Schumann, and J. D. Boeke. 1998. Retrotransposon R1Bm endonuclease cleaves the target sequence. Proc. Natl. Acad. Sci. USA 95:2083–2088.

    Hartl, D. L., D. I. Nurminsky, R. W. Jones, and E. R. Lozovskaya. 1994. Genome structure and evolution in Drosophila: applications of the framework P1 map. Proc. Natl. Acad. Sci. USA 91:6824–6829.

    Hoskins, R. A., C. R. Nelson, B. P. Berman et al. (21 co-authors). 2000. A BAC-based physical map of the major autosomes of Drosophila melanogaster. Science 287:2271–2274.

    Jakubczak, J. L., W. D. Burke, and T. H. Eickbush. 1991. Retrotransposable elements R1 and R2 interrupt the rRNA genes of most insects. Proc. Natl. Acad. Sci. USA 88:3295–3299.

    Jakubczak, J. L., Y. Xiong, and T. H. Eickbush. 1990. Type I (R1) and type II (R2) ribosomal DNA insertions of Drosophila melanogaster are retrotransposable elements closely related to those of Bombyx mori. J. Mol. Biol. 212:37–52.

    Kumar, S., K. Tamura, and M. Nei. 1993. MEGA: molecular evolutionary genetics analysis. Version 1.02. Pennsylvania State University, University Park.

    Levis, R. W., R. Ganesan, K. Houtchens, L. A. Tolar, and F. M. Sheen. 1993. Transposons in place of telomeric repeats at a Drosophila telomere. Cell 75:1083–1093.

    Luan, D. D., M. H. Korman, J. L. Jakubczak, and T. H. Eickbush. 1993. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72:595–605.

    McLean, C., A. Bucheton, and D. J. Finnegan. 1993. The 5' untranslated region of the I factor, a long interspersed nuclear element-like retrotransposon of Drosophila melanogaster, contains an internal promoter and sequences that regulate expression. Mol. Cell. Biol. 13:1042–1050.[Abstract]

    Malik, H. S., W. D. Burke, and T. H. Eickbush. 1999. The age and evolution of non-LTR retrotransposable elements. Mol. Biol. Evol. 16:793–805.[Abstract]

    Martin, F., C. Maranon, M. Olivares, C. Alonso, and M. C. Lopez. 1995. Characterization of a non-long terminal repeat retrotransposon cDNA (L1Tc) from Trypanosoma cruzi: homology of the first ORF with the Ape family of DNA repair enzymes. J. Mol. Biol. 247:49–59.[ISI][Medline]

    Minchiotti, G., C. Contursi, and P. P. Di Nocera. 1997. Multiple downstream promoter modules regulate the transcription of the Drosophila melanogaster I, Doc and F elements. J. Mol. Biol. 267:37–46.[ISI][Medline]

    Minchiotti, G., C. Contursi, F. Graziani, G. Gargiulo, and P. P. Di Nocera. 1994. Expression of Drosophila melanogaster F elements in vivo. Mol. Gen. Genet. 245:152–159.

    Minchiotti, G., and P. P. Di Nocera. 1991. Convergent transcription initiates from oppositely oriented promoters within the 5' end regions of Drosophila melanogaster F elements. Mol. Cell. Biol. 11:5171–5180.[ISI][Medline]

    Mizrokhi, L. J., S. G. Georgieva, and Y. V. Ilyin. 1988. Jockey, a mobile Drosophila element similar to mammalian LINEs, is transcribed from the internal promoter by RNA polymerase II. Cell 54:685–691.

    Myers, E. W., G. G. Sutton, A. L. Delcher et al. (29 co-authors). 2000. A whole-genome assembly of Drosophila. Science 287:2196–2204.

    Naas, T. P., R. J. DeBerardinis, J. V. Moran, E. M. Ostertag, S. F. Kingsmore, M. F. Seldin, Y. Hayashizaki, S. L. Martin, and H. H. Kazazian. 1998. An actively retrotransposing, novel subfamily of mouse L1 elements. EMBO J. 17:590–597.[Abstract/Free Full Text]

    Okazaki, S., H. Ishikawa, and H. Fujiwara. 1995. Structural analysis of TRAS1, a novel family of telomeric repeat-associated retrotransposons in the silkworm, Bombyx mori. Mol. Cell. Biol. 15:4545–4552.

    Pimpinelli, S., M. Berloco, L. Fanti, P. Dimitri, S. Bonaccorsi, E. Marchetti, R. Caizzi, C. Caggese, and M. Gatti. 1995. Transposable elements are stable structural components of Drosophila melanogaster heterochromatin. Proc. Natl. Acad. Sci. USA 92:3804–3808.

    Priimagi, A. F., L. J. Mizrokhi, and Y. V. Ilyin. 1988. The Drosophila mobile element Jockey belongs to LINEs and contains coding sequences homologous to some retroviral proteins. Gene 70:253–262.

    Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular cloning, a laboratory manual. Cold Spring Harbor Laboratory Press, New York.

    Severyinse, D. M., C. A. Hutchison III, and M. H. Edgell. 1992. Identification of transcriptional activity within the 5' A-type monomer sequence of the mouse LINE-1 retroposon. Mamm. Genome 2:41–50.

    Sezutsu, H., E. Nitasaka, and T. Yamazaki. 1995. Evolution of the LINE-like I element in the Drosophila melanogaster species subgroup. Mol. Gen. Genet. 249:168–178.[ISI][Medline]

    Simonelig, M., C. Bazin, A. Pélisson, and A. Bucheton. 1988. Transposable and nontransposable elements similar to the I factor involved in Inducer-Reactive (IR) hybrid dysgenesis in Drosophila melanogaster coexist in various Drosophila species. Proc. Natl. Acad. Sci. USA 85:1141–1145.

    Takahashi, H., S. Okazaki, and H. Fujiwara. 1997. A new family of site-specific retrotransposons, SART1, is inserted into telomeric repeats of the silkworm, Bombyx mori. Nucleic Acids Res. 25:1578–1584.

    Vaury, C., A. Bucheton, and A. Pélisson. 1989. The beta heterochromatic sequences flanking the I elements are themselves defective transposable elements. Chromosoma 98:215–224.

    Xiong, Y., and T. H. Eickbush. 1988a. Functional expression of a sequence-specific endonuclease encoded by the retrotransposon R2Bm. Cell 55:235–246.

    ———. 1988b. The site specific ribosomal DNA insertion element R1Bm belongs to a class of non-terminal repeat retrotransposons. Mol. Cell. Biol. 8:114–123.

    Zhao, D., and M. Bownes. 1998. The RNA product of the Doc retrotransposon is localized on the Drosophila oocyte cytoskeleton. Mol. Gen. Genet. 257:497–504.[ISI][Medline]

Accepted for publication October 12, 2000.