Ancient Lineages of Non-LTR Retrotransposons in the Primitive Eukaryote, Giardia lamblia

William D. Burke, Harmit S. Malik, Stephen M. Rich and Thomas H. Eickbush3

*Department of Biology, University of Rochester;
{dagger}Fred Hutchison Cancer Research Center, Seattle;
{ddagger}Department of Biomedical Sciences, Tufts University


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
Mobile elements that use reverse transcriptase to make new copies of themselves are found in all major lineages of eukaryotes. The non–long terminal repeat (non-LTR) retrotransposons have been suggested to be the oldest of these eukaryotic elements. Phylogenetic analysis of non-LTR elements suggests that they have predominantly undergone vertical transmission, as opposed to the frequent horizontal transmissions found for other mobile elements. One prediction of this vertical model of inheritance is that the oldest lineages of eukaryotes should exclusively harbor the oldest lineages of non-LTR retrotransposons. Here we characterize the non-LTR retrotransposons present in one of the most primitive eukaryotes, the diplomonad Giardia lamblia. Two families of elements were detected in the WB isolate of G. lamblia currently being used for the genome sequencing project. These elements are clearly distinct from all other previously described non-LTR lineages. Phylogenetic analysis indicates that these Genie elements (for Giardia early non-LTR insertion element) are among the oldest known lineages of non-LTR elements consistent with strict vertical descent. Genie elements encode a single open reading frame with a carboxyl terminal endonuclease domain. Genie 1 is site specific, as seven to eight copies are present in a single tandem array of a 771-bp repeat near the telomere of one chromosome. The function of this repeat is not known. One additional, highly divergent, element within the Genie 1 lineage is not located in this tandem array but is near a second telomere. Four different telomere addition sites could be identified within or near the Genie elements on each of these chromosomes. The second lineage of non-LTR elements, Genie 2, is composed of about 10 degenerate copies. Genie 2 elements do not appear to be site specific in their insertion. An unusual aspect of Genie 2 is that all copies contain inverted repeats up to 172 bp in length.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
Genome sequencing projects are providing remarkable insights into the evolution of transposable elements and the role they have played in shaping eukaryotic genomes. Transposable elements can undergo rapid expansion in number over short periods of evolution (e.g., maize; SanMiguel et al. 1998Citation ) or slowly increase in number over hundreds of millions of years (e.g., mammals; International Human Genome Consortium 2001Citation ). Meanwhile, organisms with small genomes have mechanisms which prevent transposable elements from expanding in number (e.g., Drosophila; Charlesworth and Langley 1989Citation ). Only through a greater understanding of the recombinational processes, selection forces, and cellular control mechanisms that allow transposable elements to accumulate in some species but not others will we be able to understand what controls the size and structure of eukaryotic genomes.

One of the major classes of transposable elements present in eukaryotes comprises the non–long terminal repeat (non-LTR) retrotransposons (also called LINEs, polyA retrotransposons, or retroposons). To date, non-LTR elements have been identified in all major groups of eukaryotes, with the exception of the bdelldoid rotifers (Arkhipova and Meselson 2000Citation ). On the basis of phylogenetic analysis of their protein-coding sequences, non-LTR retrotransposons have been suggested to be the progenitor of eukaryotic LTR retrotransposons, which have in turn given rise to several classes of viruses, including the vertebrate retroviruses (Malik, Henikoff, and Eickbush 2000Citation ; Malik and Eickbush 2001Citation ).

Current evidence suggests that non-LTR elements utilize a relatively simple mechanism of insertion (Luan et al. 1993Citation ; Moran et al. 1996Citation ; Chabossier, Finnegan, and Bucheton 2000). An endonuclease encoded by the element first cleaves a chromosomal target site. A reverse transcriptase (RT), also encoded by the element, then uses the 3' end of the cleaved DNA as primer and polymerizes a cDNA copy of the element's RNA transcript directly onto the target site. The precise steps in this simple mechanism probably vary between non-LTR elements because these elements exhibit considerable flexibility in structure (reviewed in Eickbush and Malik 2001Citation ). For example, the oldest lineages encode a single open reading frame (ORF) with the endonuclease domain located carboxyl terminal (C-terminal) to the RT domain (e.g., R2 in fig. 1 ). These C-terminal endonucleases have active sites similar to those found in certain restriction enzymes (Yang, Malik, and Eickbush 1999Citation ). The younger lineages of non-LTR elements encode their endonuclease domain amino terminal (N-terminal) to the RT domain (e.g., L1 in fig. 1 ). This N-terminal endonuclease appears to have been derived from a cellular apurinic endonuclease (Martin et al. 1995Citation ; Feng et al. 1996Citation ).



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 1.—Structure of the Genie elements from G. lamblia. The structures of the R2 elements from insects (Burke et al. 1998Citation ) and the L1 elements from mammals (Kazazian and Moran 1998Citation ) are shown as examples of the two most frequently encountered structures for non-LTR retrotransposons. Genie 2 is represented by a consensus sequence because each of the individual copies have undergone multiple deletions. To obtain an ORF, changes in reading frame or bypassing of termination codons (thin vertical lines) were postulated in this consensus sequence. The central RT domain of each element is shaded, as are the endonuclease domains found either N- or C-terminal to the RT domain. The N-terminal endonuclease domain (APE) has similarity to apurinic endonucleases (Martin et al. 1995Citation ; Feng et al. 1996Citation ). The positions of two critical motifs, a C-X-C-X-H-C (CCHC) and a PD-X-D (PD...D), within the C-terminal endonuclease are indicated (Yang, Malik, and Eickbush 1999Citation ). Zinc-finger domains of the structure CCHH within the N-terminal domain of some elements are indicated. The 5' untranslated regions of Genie 1 and 2 are postulated to be extremely short. The 5' end of the single Genie 1A element has not been cloned, and the size of the 3' UTR region cannot be determined because the uninserted site is not available. Genie 2 elements contain a terminal inverted repeat up to 170 bp in length (arrows)

 
Phylogenetic analysis of several different classes of non-LTR elements has shown vertical modes of transmission (Burke et al. 1998Citation ; Furano 2000Citation ). Although a few instances of horizontal transfer of non-LTR elements between taxa have been suggested (Zupunski, Gubensek, and Kordis 2001Citation ), many deep lineages of non-LTR elements are limited to specific groups of organisms which suggests a predominantly vertical mode of descent (Malik, Burke, and Eickbush 1999Citation ; Eickbush and Malik 2001Citation ). If non-LTR retrotransposons use a vertical mode of transmission in eukaryotes, then the oldest lineages of eukaryotes should only contain descendants of the earliest branching clades of non-LTR elements. The diplomonad, Giardia lamblia, is one of the most primitive eukaryotes (Adam 2000Citation ). Indeed, its divergence from other eukaryotes is regarded as lying close to the transition between prokaryotes and eukaryotes (Sogin et al. 1989Citation ; Hashimoto et al. 1995Citation ). In this report we have used sequences generated by the G. lamblia genome sequencing project (The Josephine Bay Paul Center at the Marine Biological Laboratories, Woods Hole) to identify two new lineages of non-LTR elements. Consistent with a vertical descent model, these G. lamblia elements are among the oldest lineages of non-LTR elements.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
Cells and DNA Extraction
Axenic cultures of G. lamblia (isolate WB) were maintained by passage every 72 h in TYI-S-33 medium supplemented with bile (Keister 1983Citation ). DNA was extracted from encysted (infectious) cells grown to 4.6 x 108 cells/ml. Cysts were ruptured by two rounds of freeze-thawing. Cysts were then extracted with 0.2% SDS, 0.2 mg/ml proteinase K at 45°C. Resulting lysates were extracted with phenol-chloroform and precipitated with NaCl-ethanol. Genomic DNA concentrations were estimated on agarose gels by comparing the intensity of the ethidium bromide–stained high molecular weight genomic DNA to known quantities of insect genomic DNA.

Sequence Analysis
The "single pass reads" submitted to GenBank by the G. lamblia Genome Project of The Josephine Bay Paul Center at the Marine Biological Laboratory were used in this analysis. A shotgun sequencing strategy is being used by this group based on clones generated by enzyme digestion and random shearing. The final screens of this database for the present report (August 2001) were performed on an approximately three- to fourfold coverage of the 12-Mbp G. lamblia genome. A detailed description of the genome project can be found at http://www.bpc.mbl.edu. All Genie 1 sequences were nearly identical in sequence, and overlapping reads could be readily assembled into a single sequence. Using these assembled sequences, oligonucleotide primers were constructed. The entire Genie 1 ORF was PCR amplified on overlapping clones and sequenced. The sequence and location of the PCR primers used for the assembly are as follows: forward primers—CTAACACACAGCCTGCAGCGCCTGGC (1), GCTTCCTCAGGCCCAGAGC (297), ATGAGGCCTGTCGCGGCCGGACC (1410), CAAGGCCATTAACGGGATTC (2732), and CTTCCTCGATGGGACTCGTCATC (3838); and reverse primers—GCTCGCAGACAGTCGTACGGC (750), ACTGCGATAGGGCGCCATCGGC (1612), GGCATTCCGCACATCCACCGTGAGC (1787), GAAGGAAGGGCTGCTTGCAGTGGC (2712), and CGGTGCGATGGTACGAATG (3988). All primer sequences are presented in a 5' to 3' orientation with the numbers in parentheses corresponding to the location of the 5' end of each primer within the Genie 1 sequence. PCR products were cloned into the m13 phage vector, mp18T2 (Burke, Müller, and Eickbush 1995Citation ), and multiple clones were organized into complementary pairs and sequenced. These compiled sequences of G. lamblia Genie 1 have been deposited in GenBank (accession number AF440196).

In pairwise comparison of nucleotide sequences, greater polymorphism was observed among Genie 2 elements than among Genie 1 elements. All single pass reads from the database revealed disrupted ORFs. Attempts to use PCR to recover and sequence an intact Genie 2 element were not successful. Therefore, five full-length Genie 2 elements were constructed by contig assembly of single pass reads from the database. The nucleotide sequence of a putative ancestral Genie 2 without these deletions was reconstituted.

We have submitted composite sequences based on the single pass reads to the GenBank Third Party Annotation database under the following accession numbers: Genie 1 consensus, BK000095; 771 bp repeat consensus, BK000096; Genie 1A, BK000097; and Genie 2 (putative progenitor sequence), BK000098.

Genomic Blots and Probes
The Genie 1 target site probe corresponded to a 292-bp region starting 45 bp downstream of the Genie 1 insertion site. This probe was generated by PCR amplification of genomic DNA using the primers ATGGTGCCTCGCGTATGTGC (nucleotide position 262–281) and ATGGTGTGGGTGGACCGATTC (534–554). The Genie 1 probe was a 513-bp KpnI to PstI fragment from near the middle of the element (position 2.4–2.9 kb). This fragment was derived by KpnI-PstI digestion of one of the cloned Genie 1 segments used in the sequencing. The Genie 2 probe was obtained by PCR amplification of genomic DNA using primers CAGGGCAGCCCGCTC AGCACGTTCCTC (1466–1492) and TCAGAAGTAGGATGACAGGAGACGGTC (2746–2772). This 1.2-kb fragment was cloned into mp18T2, and the entire insert of one clone was used as the probe. All probe fragments were labeled using P32-dCTP using the Rediprime II random primer labeling kit (Amersham Pharmacia). Approximately 0.5–1.0 µg genomic DNA was digested with the appropriate restriction enzymes, separated on a 1.0% agarose gel, blotted onto nitrocellulose paper, and hybridized with the P32-labeled probe. Hybridization conditions were as described previously (Jakubczak et al. 1992Citation ), except that the NaCl concentration was 0.3 M, and the final wash of the filter was in 0.075 M NaCl.

Phylogenetic Analysis
Most sequences used in the phylogenetic analysis were obtained from our previous report (Malik, Burke, and Eickbush 1999Citation ). The following sequences have been added: Retroplasmid from Neurospora crassa, g11466082; Group-II introns from Saccharomyces cerevisiae, g6226520; Schizosaccharomyces pombe, g101099; Lactococcus lactis, g7474150; Escherichia coli, g10955409; Sinorhizobium meliloti, g15141360; non-LTR element NeSL, g7508572; Cnl1, (www.sequence.stanford.edu/group/C.neoformans/); Rex6, AB021490; L1 from gorilla, AF036235; L1 from loris, g126296; Tdd3, g7489905; Tx1 from Danio rerio, AC091626; Zorro1, g7264294; and Zorro3, AF254443.

RT domains of non-LTR elements were aligned using the multiple alignment options in CLUSTALX (Thompson et al. 1997Citation ), followed by minor manual adjustments of gaps. The alignments of the RT domain and combined RT-endonuclease domains have been deposited at EMBL with accession numbers ALIGN_000231 and 000232, respectively. Phylogenetic trees were generated by the Neighbor-Joining method using the PAM250 matrix of PHYLIP (Felsenstein 1993Citation ) and maximum parsimony heuristic options as implemented in PAUP* version 4.0d64 (tree–bisection-reconnection branch swapping with maximum number of trees saved at each step limited to five). Bootstrapping was also carried out using PAUP* version 4.0d64 (Swofford 1999Citation ).


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
Using conserved regions of the RT domain of various LTR and non-LTR retrotransposons, we identified two distinct lineages of non-LTR elements in the public releases of the G. lamblia genome sequencing project conducted by The Josephine Bay Paul Center. We were not able to identify LTR retrotransposons. Consistent with our prediction that these retrotransposons should be among the earliest non-LTR lineages, we called these elements Genie 1 and 2 for Giardia early non-LTR insertion elements. It should be noted that a small segment of the RT domain from both lineages had been previously identified by Arkhipova and Meselson (2000)Citation in their PCR survey of different classes of transposable elements in eukaryotes.

The Structure and Distribution of Genie 1
Using the single pass reads available in the database we constructed oligonucleotide primers complementary to segments within this element to PCR amplify and sequence the element on overlapping segments (see Materials and Methods). Based on our sequencing and the single pass reads available in the database, the average nucleotide sequence difference between all pairs of Genie 1 copies was less than 0.5%, and most of the individual copies have an intact ORF.

The structure of the 4.7-kb Genie 1 element is diagrammed in figure 1 . The element encodes a single 1,109 codon ORF with a centrally located RT domain. C-terminal to this RT domain is a region with strong sequence similarity to the endonuclease domain of R2 and other phylogenetically ancient non-LTR elements (Malik, Burke, and Eickbush 1999Citation ; Yang, Malik, and Eickbush 1999Citation ). The critical residues of this C-terminal endonuclease include a cysteine-histidine motif of the structure C-X2-C-X7-H-X3-C (CCHC) and a P-D-X8-D (PD...D) motif. Mutagenesis of the two aspartic residues of the latter motif has identified these residues as a part of the active site of the R2 endonuclease (Yang, Malik, and Eickbush 1999Citation ; W. Burke and T. H. Eickbush, unpublished data). Located N-terminal to the RT domain of the Genie 1 ORF are two putative zinc-finger nucleic acid binding motifs of the sequence, C-X2-C-X12-H-X4–5-H (CCHH). The presence of such motifs is again similar to R2 and other non-LTR elements with C-terminal endonuclease domains. In the case of R2, the N-terminal zinc-finger motifs have been shown to be involved in sequence-specific recognition of the DNA target site (Yang, Malik, and Eickbush 1999Citation ; S. Christensen and T. H. Eickbush, unpublished data).

Two properties of the Genie 1 structure are unusual. First, the ORF of Genie 1 contains an additional 150 amino acid segment at its C-terminal end that is not found in R2 or other non-LTR elements. The function of this extra domain is unknown. Second, the 5' and 3' untranslated regions (UTR) of Genie 1 are unusual in size. Whereas the 3' UTR is over 1.5 kb in length, significantly larger than that of most non-LTR elements, the 5' UTR of Genie 1 is only 15–18 bp in length. Such a short 5' UTR has not been found for other non-LTR elements but is common for characterized genes of G. lamblia (Adam 2000Citation ).

All Genie 1 elements are inserted into the same site of a 771-bp chromosomal DNA repeat (fig. 2 ). Comparison of inserted and uninserted repeats revealed that Genie 1 insertions result in either a 2-, 3-, or 4-bp target site deletion. Whereas no sequence variation is found at the 3' junction of different copies, the 5' junctions of Genie 1 elements contain the variation associated with the target site deletion and variation in the length of a poly(A) sequence (12–16 nucleotides). Five representative 5' junctions are presented in figure 2 . Identical 3' junctions with variable 5' junctions are the hallmark of non-LTR retrotransposons (Eickbush and Malik 2001Citation ).



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 2.—Target site of the Genie 1 elements. Shown at the top is a portion of the uninserted 771-bp repeat sequence. Shown below are representative examples of the 5' and 3' junctions of Genie 1 elements inserted into this repeat (accession numbers at left). Genie 1 insertion results in a 2- to 4-bp deletion of the target site (nucleotide between the two vertical lines of the uninserted sequence). The 5' end of Genie 1 elements contains a poly(A) tail that varies from 13 to 16 nucleotides. Nucleotide variations within the target sequence or Genie 1 elements are underlined

 
The 771-bp chromosomal target sequences of the Genie 1 element are tandemly arranged in the genome. Sequence difference between these repeats is extremely low (<0.2%). The tandem organization of this target sequence was initially apparent from the database searches, as all single pass reads containing this repeat extended into an adjacent repeat or into a Genie 1 element. However, no junction of this tandem array with flanking chromosome sequences was detected in the database.

To directly determine what fraction of the 771-bp repeats is occupied by Genie 1 elements we conducted genomic blots probed with sequences from the 771-bp tandem repeat or the Genie 1 element. Shown in figure 3B is a genomic blot probed with a segment from the tandem repeat. The DNA was digested with either PstI or BglI, which cleave once or twice, respectively, within the 771-bp repeat. The BglI digest gave rise to a hybridizing band at approximately 0.70 kb and a more intense band at 0.87 kb. As shown by the restriction map in figure 3A, the 0.70-kb band was derived from the uninserted tandem repeats, whereas the 0.87-kb band was derived from Genie 1 inserted repeats. The PstI digest also gave rise to a weaker band (0.77 kb), representing uninserted repeats and a more intense band (1.35 kb), representing Genie 1 inserted repeats. Quantification of the relative intensity between the two bands in each lane on a PhosphorImager indicated that about 75% of the 771-bp repeats contain Genie 1 insertions.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 3.—Genomic blots to determine the number and organization of Genie 1 elements. A, Diagram of a Genie 1 inserted 771-bp target repeat in tandem arrangement with an uninserted repeat. The restriction sites used for the genomic blots and to generate the hybridization probes are shown. S, SstI; P, PstI; B, BglI; H, HindIII; and K, KpnI. Probe locations are indicated with the shaded boxes. The location and sizes of the PstI and BglI fragments hybridizing to the target repeat probe in panel B are indicated below this map. B, Genomic DNA probed with the target repeat probe. Genomic DNA was digested with either BglI or PstI and probed with a 292-bp segment from the tandem repeat (location diagrammed in A). Based on ethidium bromide staining, the amount of DNA in the BglI digest is approximately twice that in the PstI digest. C, Genomic DNA probed with a central Genie 1 segment. DNA was digested with either SstI or HindIII and probed with a 492-bp fragment near the middle of Genie 1 (shaded box in A). Hybridization and washing conditions were the same as those in B. The intense bands are consistent with the fragment size generated by Genie 1 elements inserted into consecutive 771-bp repeats. Larger genomic fragments represent insertions separated by one or more uninserted 771-bp units. Lower bands in the SstI digest probably represent additional SstI sites within copies of the element

 
To determine if there are Genie 1 copies located outside the 771-bp repeat we conducted additional genomic blots (fig. 3C ). Genomic DNA was digested with SstI, which cleaves several times near the 5' end of the Genie 1 element, or HindIII, which cleaves the Genie 1 element once. The resulting digests were probed with a DNA fragment that hybridizes near the middle of the element. The size of the major hybridizing band in each digest was consistent with the length predicted for Genie 1 elements inserted into two consecutive repeats of the 771-bp sequence. The weaker bands above these major bands in the SstI and HindIII digests are larger than the main band by multiples of approximately 0.8 kb, and thus they appear to represent the Genie 1 elements separated by one or more uninserted units. Two weak bands that were lower than the major band were also detected in the SstI digest. The origin of one of these lower bands is attributed to an additional SstI site in at least one copy of Genie 1, whereas the origin of the second lower fragment is unknown. Quantification of the hybridizing bands in the SstI digest indicates that the major band is approximately three to four times more intense than each of the fainter (presumably single copy) bands, suggesting that there are seven to eight Genie 1 elements in G. lamblia isolate WB.

The variable 5' junction of these Genie 1 copies (fig. 2 ) suggests that they arose by multiple independent insertion events rather than a single insertion into one repeat followed by an expansion to other repeats through recombination. It seems unlikely that Genie 1 elements would have had sufficient time to evolve specificity for the 771-bp repeat unless it encoded a useful function for the organism. GenBank searches for sequence identities in other organisms gave no significant matches; thus, the origin or function of this repeat is unknown. Because several previously defined non-LTR elements with C-terminal endonuclease domains insert specifically into leader exon repeats of kinetoplastids and nematodes (Aksoy et al. 1990Citation ; Gabriel et al. 1990Citation ; Villanueva et al. 1991Citation ; Teng, Wang, and Gabriel 1995Citation ; Malik and Eickbush 2000Citation ), one possible function for the 771-bp repeat was that it encoded a short leader RNA that is trans-spliced onto mRNA before translation. Although no evidence for such trans-splicing has been obtained in G. lamblia (Adam 2000Citation ), we directly tested this model using two different genes of G. lamblia, actin and ß-tubulin. On the basis of the known sequences of these genes (L29032 and X06748), we used primers complementary to sequences near the 5' end of the coding region of each gene and reverse transcribed G. lamblia RNA. This cDNA was then used as a template in a PCR reaction using one primer complementary to the coding region of the gene and a second primer to one of two positions within the 771-bp repeat. On the basis of the position of the non-LTR insertions relative to the transcribed region in the spliced leaders of other species, these primers were complementary to the 771-bp repeat a short distance upstream of the Genie 1 insertions. We obtained no evidence that this sequence had been trans-spliced onto either the actin or ß-tubulin mRNA (data not shown). These experiments do not eliminate the possibility that the 771-bp repeat contains a leader exon that is spliced onto only a limited set of G. lamblia genes or during only limited periods of their life cycle. It will be interesting to learn if similar tandem repeats exist in other diplomonads.

The Structure of Genie 1A Elements
During our searches of the G. lamblia database with Genie 1 sequences, we identified one additional element that was clearly related to Genie 1 elements but was highly divergent in sequence. We refer to this element as Genie 1A. Using the single pass reads, a complete element could not be assembled because the 5' end of the element was not located in the database. Because this Genie 1A element was located next to a telomere (see subsequently) we attempted to clone the 5' end of the element using one PCR primer near the end of the available sequences and a second primer to telomeric sequences. Unfortunately this approach was also unsuccessful.

The available Genie 1A ORF is intact and encodes an endonuclease domain C-terminal of the RT domain (see fig. 1 ). Similar to that in Genie 1, a 150 amino acid extension beyond this endonuclease domain was found in the Genie 1A ORF. The N-terminal domain of the Genie 1A ORF contains at least one zinc-finger motif, but the available sequences end before the probable location of a second motif. The genomic sequences flanking the 3' end of the element could be followed for some distance using the overlapping single pass reads available in the database. No sequence identities could be detected between the 3' UTRs of Genie 1 and 1A, and a potential unoccupied target site could not be detected in this downstream sequence. Thus, neither the size of the 3' UTR nor the Genie 1A target site could be determined.

Genie 1 and 1A Elements are at Chromosomal Telomeres
Giardia lamblia is believed to have five chromosomes ending in typical telomeric repeats of the sequence TAGGG (reviewed in Adam 2000Citation ). The putative telomerase gene itself has recently been identified (Malik, Burke, and Eickbush 2000Citation ). Several of the G. lamblia chromosomes have been shown to have tandemly repeated ribosomal (rRNA) gene units located near one end. These rDNA repeats have been shown to be interrupted by telomeric repeats at four different sites (Adam, Nash, and Wellems 1991Citation ; Hou et al. 1995Citation ). Other than these rDNA unit junctions only one other telomere junction has been reported (Hou et al. 1995Citation ). The 771-bp tandem repeat containing the Genie 1 elements as well as the single copy of Genie 1A also appear to be located near the telomeres of G. lamblia. As shown in figure 4 , typical telomere repeats were found at positions 109, 841, and 3389 of the Genie 1 elements as well as at position 21 of the 771-bp target repeat. In the case of Genie 1A, telomere repeats were found at positions 116 and 689 within the element as well as at a position 1,962 bp downstream of the Genie 1A termination codon.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 4.—Telomeric repeat addition sites within or near the Genie 1 and 1A elements. In each comparison, the sequence of the chromosomal segment without telomere addition is shown at the top, and the sequence of this region with telomere repeats is shown at the bottom. The position of the addition within the element is indicated. In the case of the third Genie 1A site, the telomere sequences are located 1,962 bp downstream of the Genie 1A termination codon. Element sequences unchanged by the telomere additions are indicated with vertical lines. Telomere repeats are underlined with arrows and are presented in the opposite orientation to the more traditional manner (i.e., TAGGG) in order that the Genie 1 and 1A sequences can be shown in their 5' to 3' direction. Nucleotides at each junction included within a box represent sequences that do not correspond to either the telomere repeat or the chromosomal site before addition. The telomere addition site within the large subunit rRNA gene that was reported to contain extra nucleotides is shown at the bottom for comparison (Hou et al. 1995Citation ). It should be noted that these extra sequences are a nested series of additions of the sequence, TCTGTCTGT

 
The telomere junctions associated with Genie 1 and 1A elements are similar to those that have been found for telomere junctions associated with the rDNA units of G. lamblia. First, in several instances, the addition of telomeric repeats involved 2–3 nucleotide identities with the telomeric repeat (Genie 1 positions 109 and 841, Genie 1A position 116). In the four remaining examples, 1–7 additional nucleotides were found at the junctions that do not correspond to either the Genie elements or the telomeric repeat (boxed nucleotides in fig. 4 ). The presence of such additional nucleotides was also found in one of the telomere junctions associated with the rDNA unit (Hou et al. 1995Citation ) and is shown in figure 4 . The extra nucleotides found at these junctions are variable lengths of the sequence, TCTGTCTGT. On the basis of current models of telomere addition (reviewed in Blackburn 2001Citation ), these extra nucleotides could represent the sequence of the telomere-associated RNA used as template for telomere addition. In such a model, for those chromosomal ends in which this telomere RNA cannot anneal to the free DNA end, reverse transcription by the G. lamblia telomerase could start at variable locations upstream of the telomere repeat sequences in this RNA. This model could be confirmed by the identification of the telomere-associated RNA of G. lamblia.

Finally, evidence is accumulating that many, if not all, chromosomes of G. lamblia are tetraploid (Yang and Adam 1994Citation ; Hou et al. 1995Citation ). Our analysis of Genie 1 and 1A elements adds further support for this suggestion. As shown in figure 4 , the database screens revealed four different telomeric ends at both the Genie 1 and Genie 1A loci. All four ends associated with the Genie 1 repeat array were cloned (three within the element and one within the target sequence). Only three telomeric ends within or near the single Genie 1A were cloned. Because 116 bp of Genie 1 sequence is available in the database that extends beyond the 5'–most terminal telomere location, a fourth telomere site can be postulated that is located nearer to or beyond the 5' end of the Genie 1A element.

The Structure and Distribution of Genie 2
The second family of non-LTR elements in G. lamblia, Genie 2, is composed of inactive, highly divergent copies. Using the overlapping single pass reads, we could assemble five potentially full-length Genie 2 elements. Unlike Genie 1, the level of nucleotide difference between the five Genie 2 copies ranged between 9% and 11% (excluding indels). The total level of divergence between these copies was significantly higher because of multiple deletions (1–53 bp in length) associated with each copy. Segments of at least five additional copies of Genie 2 could also be identified in the database. These partial copies also had accumulated deletions. To determine whether additional copies of Genie 2 were present in the G. lamblia genome but were not cloned by the procedure used in the genome sequencing effort, we probed genomic blots with internal Genie 2 sequences (fig. 5 ). Using low stringency hybridization and washing criteria, 8–10 bands of similar intensity could be distinguished in these blots. Based on these genomic blots, we suggest that the G. lamblia genome contains only the divergent copies of Genie 2 detected in the database. The many deletions associated with the individual Genie 2 copies are similar to those reported for old, inactive copies of non-LTR elements in different species of Drosophila (Petrov et al. 2000Citation ).



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 5.—Genomic blot to determine the number of Genie 2 elements. Approximately 1.0 µg G. lamblia DNA was digested with HindIII and probed with a 1.2-kb segment from near the middle of a Genie 2 element. Gel electrophoresis, DNA hybridization, and washing conditions are the same as those used in figure 3 .

 
Because no copy of Genie 2 currently present in the WB strain appears to contain an intact ORF, we attempted to reconstitute an intact Genie 2 element by removing all internal deletions from the five full-length copies that were assembled. The final length of this reconstituted element was 3.3 kb. However, even this hypothetical progenitor sequence did not contain a completely intact ORF. Like the Genie 1 and 1A elements, the ORF encoded by Genie 2 elements contains RT and C-terminal endonuclease domains (fig. 1 ). Two reading frame shifts were required to assemble this portion of the hypothetical ORF. Like R2 elements, the C-terminal end of the Genie 2 ORF extended only a short distance downstream of the endonuclease domain with no evidence of the 150 amino acid C-terminal extension found in Genie 1 and 1A elements.

Because of the low sequence similarity to other non-LTR elements, the N-terminal end of the Genie 2 ORF was more difficult to define. The sequence of the reconstituted Genie 2 element suggested that the ORF would begin approximately 330 bp from its 5' end. However, because all G. lamblia genes (Adam 2000Citation ) as well as the Genie 1 elements contain extremely short 5' UTR, we postulate that two closely spaced termination codons in this reconstituted sequence should be ignored (i.e., are not representative of the sequence of an active element) which would enable the ORF to begin at an ATG codon near the 5' end of the element.

The most unusual structural aspect of the Genie 2 element was the presence of long inverted terminal repeats (arrows in fig. 1 ). Shown in figure 6 are the sequences of these repeats along with short segments of the flanking chromosomal sequences for the eleven 5' and ten 3' ends that could be identified in the database. In no instance were we able to identify an uninserted version of the target site in the database, suggesting that Genie 2 elements are fixed on all chromosomal homologues of the WB strain. Because of the absence of uninserted sites and the possibility of DNA deletions subsequent to the insertion event, the exact boundaries of the Genie 2 insertions could not be determined. However, based simply on sequence identity between Genie 2 elements, the length of the inverted repeat on these different copies varied from 126 to 172 bp as a result of variable truncations at both the 5' and 3' ends of the element. Although it was not always possible to correlate the 5' end of one copy with its 3' end, the length of the truncations at the two ends of individual copies was generally not the same. We could find no evidence of a distinct target site associated with the Genie 2 elements.



View larger version (136K):
[in this window]
[in a new window]
 
Fig. 6.—Comparison of the terminal inverted repeats of Genie 2 elements. The reverse complements of the Genie 2 3' junctions are shown to allow direct comparison of the inverted repeats. The uninserted chromosomal sequence is not known for any Genie 2 insertion, and it is also possible that some of these junctions may have undergone a deletion subsequent to their insertion event. Therefore, the precise border between element and flanking chromosomal sequence is unknown. Nucleotides found in a majority of the junctions are assumed to be Genie 2 sequences and are shown in uppercase letters within a shaded box. Flanking sequences are shown in lowercase letters. Sequences are identified by their GenBank accession numbers shown at the left. Several additional Genie 2 junctions could be identified in the database but were more divergent in sequences and contained internal deletions.

 
Nucleotide divergence among the 5' inverted repeats of different Genie 2 elements varied from 1% to 22%, whereas the divergence among the 3' inverted repeats varied from 1% to 29%. Meanwhile all 5' repeats differ in sequence from all 3' ends by at least 23%. Analysis of these Genie 2 element sequences suggests that recombination can occur between different copies of the element, as well as between the ends of the same element. In two instances, the 5' flanking region of different copies of Genie 2 elements are identical (compare AC033970 and AC059321, as well as AC051569 and AC067610). In these examples, the elements are defined as different copies because the divergence of the Genie 2 elements in these same sites were 19% and 14%, respectively. Although these results could be explained by the independent insertion of Genie 2 elements into the same genomic location, it seems more likely that this situation arose by recombination between two different Genie 2 elements. In two other instances, the flanking sequences at the 5' end of a Genie 2 insertion were identical to the flanking sequences at the 3' end of another Genie 2 insertion (compare sequences AC075161 and AC057954, as well as AC051569 and AC078299). Again, although these examples could be explained by independent insertions of Genie 2 into the same genomic location (this time in opposite orientation), the Genie 2 insertions could also be explained by a recombination between the inverted repeat at the 5' and 3' ends of the same element. Such a recombination, if it resulted in a simple crossover, would invert the element. Consistent with this suggestion of recombination between the inverted repeats at the ends of Genie 2 elements, a number of nucleotide substitutions are shared between the 3' and 5' ends of certain Genie 2 elements. This is most readily seen in the region between nucleotide position 92 and 112 in figure 6 .

The only other non-LTR element that has been shown to contain inverted terminal repeats is the EhRLE element of Entamoeba histolytica (Sharma et al. 2001Citation ). The inverted repeats in these elements are only 27 bp in length. The possible function of these inverted repeats can only be speculated upon. Because the repeat sequence at the 5' end of each Genie 2 copy is divergent in sequence from that at the 3' end of the same copy, the terminal repeats are not regenerated from each other during the retrotransposition process. This would suggest that the element's RNA transcript is likely to contain both repeats, and that integration could still occur by a target primed reverse transcription mechanism. In one scenario, the terminal repeats could anneal, blocking too high a level of translation, while at the same time providing stability for the full-length transcript. Alternatively, the RNA duplex region could serve as a means for the element's RT to recognize the ends of the transcript for reverse transcription.

The Phylogenetic Relationship of Genie Elements to Other Non-LTR Elements
A possible phylogenetic relationship of the Genie elements with previously described non-LTR elements is shown in figure 7 . This phylogeny was based on two sets of sequence data. Shown is the analysis based on the highly conserved palm and fingers regions of the RT domain which can be found in all retroelements (Xiong and Eickbush 1990Citation ). The non-LTR phylogeny was rooted using representative taxa of the retroelements with the closest sequence similarity to the non-LTR elements: bacterial and mitochondrial mobile group-II introns and the Mauriceville retroplasmid of N. crassa (Belfort et al. 2001Citation ). To provide better resolution among the non-LTR lineages, we have also repeated the analysis by including the more variable thumb region of the RT domain (see Burke et al. 1999Citation ). The bootstrap values from this second analysis are also shown in the figure. The analysis was conducted with an updated compilation of all reported non-LTR elements sequences; however, to conserve space only the oldest clades of non-LTR elements are drawn in figure 7 . The relationships of the younger clades of non-LTR elements, specifically the RTE, TAD, LOA, R1, Jockey, Cr1 and I clades, are unchanged from those reported previously (Malik, Burke, and Eickbush 1999Citation ).



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 7.—Phylogeny of non-LTR elements based on their RT domains. The name of each element and the species of origin is given to the right of each branch. The non-LTR elements have been previously divided into distinct clades which were named after the earliest element identified from each clade (Malik, Burke, and Eickbush 1999Citation ). The relationships between members of the later branching clades of non-LTR elements are not shown and are essentially unchanged from our previous report (Malik, Burke, and Eickbush 1999Citation ). The phylogeny presented is derived by the neighbor-joining method based on ~315 amino acid residue positions corresponding to the fingers and palm region of the RT domain. The phylogeny is rooted on the retroplasmid and mobile group-II intron sequences. Numbers next to each node indicate bootstrap values as percentages of 1,000 replicates. Numbers in parentheses represent bootstrap values for the major nodes (i.e., those defining each distinct clade of non-LTR elements) based on the entire ~440 amino acid residues RT domain. Major nodes also supported by maximum parsimony methods (greater than 50% bootstrap values) are indicated with an asterisk. A scale of amino acid divergence per position is indicated

 
Genie 1, 1A, and 2 are members of the same lineage of non-LTR elements. The level of divergence between these elements is high. Based on typical rates of divergence of non-LTR elements (Malik, Burke, and Eickbush 1999Citation ), divergence of the RT domain of Genie 1 and 2 is consistent with a time of separation of over 600 Myr. The Genie elements are in turn most closely related to the previously oldest known clade of elements, the CRE clade. Bootstrap support for this relationship is low using only the palm and fingers of the RT domain (53%) but increases when the thumb region is included (89%). The CRE clade originally was defined as containing those elements which specifically insert into the spliced leader exons of diverse species of kinetoplastids. More recently, the Cnl1 element from the basidiomycetous fungus, Crytococcus neoformans, has been found to be a member of this clade (Goodwin and Poulter 2001Citation ). Cnl1 elements do not insert into spliced leader exons but instead preferentially insert into preexisting copies of Cnl1. These nested arrays of Cnl1 are in close proximity to telomeres. A second addition to the updated phylogeny is the NeSL elements of Caenorhabditis elegans (Malik and Eickbush 2000Citation ). NeSL elements are members of the same lineage as the R4, Dong, and Rex6 elements (Xiong and Eickbush 1993Citation ; Burke, Müller, and Eickbush 1995Citation ; Volff et al. 2001Citation ). After the Genie and Cre clades, the R4-NeSL lineage is older than any other non-LTR lineage.

Because the different non-LTR lineages in figure 7 are still poorly resolved, we have attempted to improve the resolution by including the sequence information of the C-terminal endonuclease domain (fig. 8 ). Because of the inclusion of the C-terminal endonuclease domain, non-LTR elements with an N-terminal apurinic endonuclease domain are excluded from the analysis, as are the retroplasmid and group-II intron sequences. We have therefore drawn the phylogeny in figure 8 using the same rooting as in figure 7 . The addition of the endonuclease domain provides greater resolution of the old non-LTR lineages based on neighbor-joining methods with a significant number of nodes, now also supported by maximum parsimony methods. The Genie and Cre clades still appear to be sister groups to the R2 and R4 clades, and NeSL remains within the R4 clade. Presumably, as the characterization of more genomes of primitive eukaryotes proceeds and additional members of these old non-LTR lineages are identified, we will be better able to infer the evolutionary history of these oldest retrotransposons.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 8.—Phylogeny of the oldest lineages of non-LTR elements based on both their RT and C-terminal endonuclease domains (~600 amino acid residues). Bootstrap support for each node by the NJ method are shown. Major nodes also supported by MP methods (greater than 50% bootstrap values) are indicated with an asterisk. The tree is drawn rooted as in figure 7

 
Concluding Comments
Consistent with a strict vertical descent, the non-LTR retrotransposons identified in G. lamblia correspond to the oldest lineages of elements characterized to date. Indeed, non-LTR elements with a single ORF and a C-terminal endonuclease domain appear to be the predominant elements that can be found in the most primitive eukaryotes. Many of these primitive non-LTR elements will probably exhibit target specificities for tandemly repeated DNA sequences. The most likely such sites include the rRNA genes and spliced leader exons. This report provides evidence for a third site. All Genie 1 copies were located in a 771-bp tandem repeat. Although we do not know if the 771-bp repeat has an important function in G. lamblia, it seems unlikely that an element would have had sufficient time to evolve site specificity unless this repeat does have a function. It will be interesting to determine if other diplomonads contain similar tandem repeats and whether Genie-like elements remain associated with these repeats.

Based on our data, Genie 1A and 2 are like Cnl1, Dong, and Rex6 and do not appear to be site specific. It should be noted, however, that both Genie 1A and Cnl1 elements are associated with telomeric repeats (Goodwin and Poulter 2001Citation ), whereas Dong and Rex6 elements can frequently be found in tandem TA-rich repeats (Xiong and Eickbush 1993Citation ; Volff et al. 2001Citation ; H. Malik, unpublished data). Thus, some level of insertion specificity may exist even for these elements. In addition, because only a single, partial copy of Genie 1A was identified, and all Genie 2 copies represented old degenerate copies, it remains possible that these elements do have significant insertion specificity for a tandem repeat from which they can be rapidly lost. In this case, what we have observed in the WB isolate may be the occasional copies that can insert outside the repeat and are now degenerate. Thus, the analysis of additional strains and species of Giardia is needed before we can infer the insertion specificity of Genie 1A and 2.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
This work was supported by N.S.F. grant MCB-9974606 to T.H.E., N.I.H. grant R01-GM060759 to S.M.R., and a postdoctoral fellowship to H.S.M. from the Helen Hay Whitney Foundation. Giardia lamblia cysts were provided by the Tufts University Center for Gastroenterology Research on Absorptive and Secretory Processes, NIDDK P30 DK34928. We thank D. Eickbush for useful comments on the manuscript.


    Footnotes
 
Pierre Capy, Reviewing Editor

Keywords: Genie retrotransposons vertical evolution phylogenetic analysis insertion specificity Back

Abbreviations: non-LTR, non–long terminal repeat; RT, reverse transcriptase; ORF, open reading frame; C-terminal, carboxyl terminal; N-terminal, amino terminal. Back

Address for correspondence and reprints: Dr. Thomas H. Eickbush, Department of Biology, Hutchison Hall, University of Rochester, Rochester, New York 14627. eick{at}mail.rochester.edu . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 

    Adam R. D., 2000 The Giardia lamblia genome Int. J. Parasitol 30:475-484[ISI][Medline]

    Adam R. D., T. E. Nash, T. E. Wellems, 1991 Telomeric location of Giardia rDNA genes Mol. Cell. Biol 11:3326-3330[ISI][Medline]

    Aksoy S., S. Williams, S. Chang, F. F. Richards, 1990 SLACS retrotransposon from Trypanosoma brucei gambiense is similar to mammalian LINEs Nucleic Acids Res 18:785-792[Abstract]

    Arkhipova I., M. Meselson, 2000 Transposable elements in sexual and ancient asexual taxa Proc. Natl. Acad. Sci. USA 97:14473-14477[Abstract/Free Full Text]

    Belfort M., V. Derbyshire, M. M. Parker, B. Cousineau, A. M. Lambowitz, 2001 Mobile introns: pathways and proteins in N. Craig, R. Craigie, M. Gellert, and A. Lambowitz, eds. Mobile DNA II, Chap. 31. American Society of Microbiology Press, Washington D.C. (in press)

    Blackburn E. H., 2001 Switching and signaling at telomeres Cell 106:661-673[ISI][Medline]

    Burke W. D., H. S. Malik, J. P. Jones, T. H. Eickbush, 1999 Conserved structure and mechanism of integration of the R2 retrotransposable element in all arthropods Mol. Biol. Evol 16:502-511.[Abstract]

    Burke W. D., H. S. Malik, W. C. Lathe III, T. H. Eickbush, 1998 Are retrotransposons longterm hitchhikers? Nature 392:141-142[ISI][Medline]

    Burke W. D., F. Müller, T. H. Eickbush, 1995 R4, a non-LTR retrotransposon specific to the large subunit rRNA genes of nematodes Nucleic Acids Res 23:4628-4634[Abstract]

    Chaboissier M. C., D. Finnegan, A. Bucheton, 2000 Retrotransposition of the I factor, a non–long terminal repeat retrotransposon of Drosophila, generates tandem repeats at the 3' end Nucleic Acids Res 28:2467-2472[Abstract/Free Full Text]

    Charlesworth B., C. H. Langley, 1989 The population genetics of Drosophila transposable elements Annu. Rev. Genet 23:251-287[ISI][Medline]

    Eickbush T. H., H. S. Malik, 2001 Evolution of retrotransposons In N. Craig, R. Craigie, M. Gellert, and A. Lambowitz, eds. Mobile DNA II, Chap. 47. American Society of Microbiology Press, Washington D.C. (in press)

    Felsenstein J., 1993 PHYLIP (phylogeny inference package). Version 3.55 Distributed by the author. Department of Genetics, University of Washington, Seattle

    Feng Q., J. V. Moran, H. H. Kazaian Jr., J. D. Boeke, 1996 Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition Cell 87:905-916[ISI][Medline]

    Furano A. V., 2000 The biological properties and evolutionary dynamics of mammalian LINE-1 retrotransposons Prog. Nucleic Acids Res. Mol. Biol 64:255-294[ISI][Medline]

    Gabriel A., T. J. Yen, D. C. Schwartz, C. L. Smith, J. D. Boeke, B. Sollner-Webb, D. W. Cleveland, 1990 A rapidly rearranging retrotransposon within the miniexon gene locus of Crithidia fasciculata Mol. Cell Biol 10:615-624[ISI][Medline]

    Goodwin T. J. D., R. T. M. Poulter, 2001 The diversity of retrotransposons in the yeast Cryptococcus neoformans Yeast 18:865-880[ISI][Medline]

    Hashimoto T., Y. Nakamura, T. Kamaishi, F. Nakamura, J. Adachi, K. Okamoto, M. Hasegawa, 1995 Phylogenetic place of mitochondrion-lacking protozoan, Giardia lamblia, inferred from amino acid sequences of elongation factor 2 Mol. Biol. Evol 12:782-793[Abstract]

    Hou G., S. M. Le Blancq, E. Yaping, H. Zhu, M. G. Lee, 1995 Structure of a frequently rearranged rRNA-encoding chromosome of Giardia lamblia Nucleic Acids Res 23:3310-3317[Abstract]

    International Human Genome Sequencing Consortium. 2001 Initial sequencing and analysis of the human genome Nature 409:860-921[ISI][Medline]

    Jakubczak J. L., M. K. Zenni, R. C. Woodruff, T. H. Eickbush, 1992 Turnover of R1 (Type I) and R2 (Type II) retrotransposable elements in the ribosomal DNA of Drosophila melanogaster Genetics 131:129-142[Abstract/Free Full Text]

    Kazazian H. H. Jr.,, J. V. Moran, 1998 The impact of L1 retrotransposition on the human genome Nat. Genet 19:19-24[ISI][Medline]

    Keister D. B., 1983 Axenic culture of Giardia lamblia in TYI-S-33 medium supplemented with bile Trans. R. Soc. Trop. Med. Hyg 77:487-488[ISI][Medline]

    Luan D. D., M. H. Korman, J. L. Jakubczak, T. H. Eickbush, 1993 Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition Cell 72:595-605[ISI][Medline]

    Malik H. S., W. D. Burke, T. H. Eickbush, 1999 The age and evolution of non-LTR retrotransposable elements Mol. Biol. Evol 16:793-805[Abstract]

    ———. 2000 Telomerase catalytic subunits from Giardia lamblia and Caenorhabditis elegans Gene 251:101-108[ISI][Medline]

    Malik H. S., T. H. Eickbush, 2000 NeSL-1, an ancient lineage of site-specific non-LTR retrotransposons from Caenorhabditis elegans Genetics 154:193-203[Abstract/Free Full Text]

    ———. 2001 Phylogenetic analysis of Ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses Genome Res 11:1187-1197[Abstract/Free Full Text]

    Malik H. S., S. Henikoff, T. H. Eickbush, 2000 Poised for contagion: evolutionary origins of the infectious abilities of invertebrate retroviruses Genome Res 10:1307-1318[Abstract/Free Full Text]

    Martin F., C. Maranon, M. Olivares, C. Alonso, M. C. Lopez, 1995 Characterization of a non–long terminal repeat retrotransposon cDNA (L1Tc) from Trypanosoma cruzi: homology of the first ORF with the Ape family of DNA repair enzymes J. Mol. Biol 247:49-59[ISI][Medline]

    Moran J. V., S. E. Holmes, T. P. Naas, R. J. DeBerardinis, J. D. Boeke, H. H. Kazazian Jr., 1996 High frequency retrotransposition in cultured mammalian cells Cell 87:917-927[ISI][Medline]

    Petrov D. A., T. A. Sangster, J. S. Johnston, D. L. Hartl, K. L. Shaw, 2000 Evidence for DNA loss as a determinant of genome size Science 287:1060-1062[Abstract/Free Full Text]

    SanMiguel P., B. S. Gaut, A. Tiknonov, Y. Nakajima, J. L. Bennetzen, 1998 The paleontology of intergene retrotransposons of maize Nat. Genet 20:43-45[ISI][Medline]

    Sharma R., A. Bagchi, A. Bhattacharya, S. Bhattacharya, 2001 Characterization of a retrotransposon-like element from Entamoeba histolytica Mol. Biochem. Parasitol 116:45-53[ISI][Medline]

    Sogin M. L., J. H. Gunderson, H. J. Elwood, R. A. Alonso, D. A. Peattie, 1989 Phylogenetic meaning of the kingdom concept: an unusual ribosomal RNA from Giardia lamblia Science 243:75-77[ISI][Medline]

    Swofford D. L., 1999 PAUP 4.0 Laboratory of Molecular Systematics. Smithsonian Institution, Washington, D.C

    Teng S.-H., S. X. Wang, A. Gabriel, 1995 A new non-LTR retrotransposon provides evidence for multiple distinct site-specific elements in Crithidia fasciculata miniexon arrays Nucleic Acids Res 23:2929-2936[Abstract]

    Thompson J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 25:4876-4882[Abstract/Free Full Text]

    Villanueva M. S., S. P. Williams, C. B. Beard, F. F. Richards, S. Aksoy, 1991 A new member of a family of site-specific retrotransposons is present in the spliced leader RNA genes of Trypanosoma cruzi Mol. Cell. Biol 11:6139-6148[ISI][Medline]

    Volff J. N., C. Korting, A. Froschauer, K. Sweeney, M. Schartl, 2001 Non-LTR retrotransposons encoding a restriction-like endonuclease in vertebrates J. Mol. Evol 52:351-360[ISI][Medline]

    Xiong Y., T. H. Eickbush, 1990 Origin and evolution of retroelements based upon their reverse transcriptase sequences EMBO J 9:3353-3362[Abstract]

    ———. 1993 Dong, a new non-long terminal repeat (non-LTR) retrotransposable element from Bombyx mori Nucleic Acids Res 21:1318.[ISI][Medline]

    Yang Y. M., R. D. Adam, 1994 Allele-specific expression of a variant-specific surface protein (VSP) of Giardia lamblia Nucleic Acids Res 22:2102-2108[Abstract]

    Yang J., H. S. Malik, T. H. Eickbush, 1999 Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements Proc. Natl. Acad. Sci. USA 96:7847-7852[Abstract/Free Full Text]

    Zupunski V., F. Gubensek, D. Kordis, 2001 Evolutionary dynamics and evolutionary history in the RTE clade of non-LTR retrotransposons Mol. Biol. Evol 18:1849-1863[Abstract/Free Full Text]

Accepted for publication November 20, 2001.