CsRn1, a Novel Active Retrotransposon in a Parasitic Trematode, Clonorchis sinensis, Discloses a New Phylogenetic Clade of Ty3/gypsy-like LTR Retrotransposons

Young-An Bae, Seo-Yun Moon, Yoon Kong, Seung-Yull Cho and Mun-Gan Rhyu

Department of Microbiology, College of Medicine, Catholic University of Korea, Seoul, Korea
Department of Molecular Parasitology, Sungkyunkwan University School of Medicine, Suwon, Korea


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
We screened the genome of a trematode, Clonorchis sinensis, in order to identify novel retrotransposons and thereby provide additional information on retrotransposons for comprehensive phylogenetic study. Considering the vast potential of retrotransposons to generate genetically variable regions among individual genomes, randomly amplified polymorphic DNAs (RAPDs) detected by arbitrarily primed polymerase chain reactions were selected as candidates for retrotransposon-related sequences. From RAPD analysis, we isolated and characterized a novel retrotransposon in C. sinensis as the first member of uncorrupted long-terminal-repeat (LTR) retrotransposons in phylum Platyhelminthes. The retrotransposon, which was named Clonorchis sinensis Retrotransposon 1 (CsRn1), showed a genomewide distribution and had a copy number of more than 100 per haploid genome. CsRn1 encoded an uninterrupted open reading frame (ORF) of 1,304 amino acids, and the deduced ORF exhibited similarities to the pol proteins of Ty3/gypsy-like LTR retrotransposons. The mobile activity of master copies was predicted by sequence analysis and confirmed by the presence of mRNA transcripts. Phylogenetic analysis of Ty3/gypsy-like LTR retrotransposons detected a new clade comprising CsRn1, Kabuki of Bombyx mori, and an uncharacterized element of Drosophila melanogaster. With its high repetitiveness and preserved mobile activity, it is proposed that CsRn1 may play a significant role in the genomic evolution of C. sinensis.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
A remarkable wealth of data is now available on the diversity and distribution of retrotransposons in nearly all eukaryotes (see Boeke and Stoye 1997Citation and references therein). As a class of transposable elements, retrotransposons move and integrate into new sites within genomes via reverse transcription of an RNA intermediate (Boeke et al. 1985Citation ) and can therefore provide genetic variations ranging from simple sequence polymorphism within the elements to dramatic alterations in chromosomal structure (Kidwell and Lisch 1997Citation ; Fedoroff 2000Citation ). Based on such vast potential to generate genetic variations, retrotransposons have been known as major agents in evolution that give rise to phenotypic variants (Long et al. 2000Citation ) and, in the long term, drive speciation (Mcdonald 1990Citation ).

With the cumulative data on retrotransposons, various studies have discussed the probable evolutionary course of retrotransposons, including studies of long-terminal-repeat (LTR) family, non-LTR family, and exogenous retroviruses (Xiong and Eickbush 1990Citation ; Malik, Burke, and Eickbush 1999Citation ; Malik and Eickbush 1999Citation ). However, the phylogenetic relationship of diverse retrotransposons remains ambiguous, mainly with regard to factors concerning the branching patterns in phylogenetic trees, such as the formation of polytomy (Malik and Eickbush 1999Citation ) and polyphyletic distribution (Malik and Eickbush 1999Citation ; Marín and Lloréns 2000Citation ). Thus, the number of ancient classes responsible for diverse clades of retrotransposons and the extent of possible horizontal transfer between different species are difficult to define. Such ambiguities are likely to arise from limitations in sequence information on retrotransposons and unbalanced sampling from each taxon.

There have been a number of reports on the LTR retrotransposons of animals such as nematodes (Felder et al. 1994Citation ; Bowen and McDonald 1999Citation ), insects (Lindsley and Zimm 1992Citation ; Biessmann et al. 1999Citation ; Abe et al. 2000Citation ), echinoderms (Britten et al. 1995Citation ), and fish (Poulter and Butler 1998Citation ). In Platyhelminthes, however, Gulliver of Schistosoma japonicum (Laha et al. 2001Citation ) is the only full-length LTR retrotransposon so far described in the phylum, although Arkhipova and Meselson (2000)Citation have recently reported a segmental sequence of an LTR retrotransposon in Dugesia. Moreover, the sequence of Gulliver is corrupted, even though its expression has been demonstrated at the level of transcription by reverse transcription-PCR (RT-PCR) (Laha et al. 2001Citation ). Thus, currently there are no reports on uncorrupted LTR retrotransposons, which may play a significant role in the formation of genomes in Platyhelminthes, including trematodes.

Retrotransposons introduce variations through their heterogeneous integration and subsequent sequence divergence, and these polymorphic regions can be identified as randomly amplified polymorphic DNAs (RAPDs) by arbitrarily primed PCR (AP-PCR) (Abe et al. 1998Citation ). In the present study, we attempted to identify retrotransposons from the genome of Clonorchis sinensis, an important human liver fluke in East Asia, based on the analysis of RAPD sequences. By screening variable genetic regions from individual C. sinensis using the AP-PCR method, a retrotransposon, named Clonorchis sinensis Retrotransposon 1 (CsRn1), was isolated as the second complete but the first uncorrupted LTR retrotransposon identified in the phylum Platyhelminthes. Structural and genomic analyses of CsRn1 and its phylogenetic relationship with other Ty3/gypsy-like LTR retrotransposons are presented.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Parasite and DNA Extraction
Adult C. sinensis were collected from the livers of experimental rabbits which were challenged orally with metacercariae obtained from naturally infected fish in Kimhae, Korea, 3 months prior to dissection. The worms were washed with physiological saline five times at 4°C. Fresh worms were used immediately for DNA extraction with the Wizard DNA Purification Kit (Promega, Madison, Wis.) according to the manufacturer's instructions.

AP-PCR and Cloning of Individual Worm-Specific Products
Genomic DNAs extracted from individual worms were used for the amplification of worm-specific RAPD regions by PCR under low-stringency conditions. The following arbitrarily designed primers were used in the PCR reactions: AP-1 (5'-GATCCGTTCA-3'), AP-3 (5'-ACCCATACCC-3'), B4 (5'-GGACTGGAGT-3'), B5 (5'-TGCGCCCTTC-3'), B8 (5'-GTCCACACGG-3'), B12 (5'-CCTTGACGCA-3'), B13 (5'-TTCCCCCGCT-3'), F6 (5'-GGGAATTCGG-3'), and F15 (5'-CCAGTACTCC-3'). The reaction mixtures included 40 ng of genomic DNA, 1.25 µM of primers, 0.2 mM each of dATP, dGTP, dCTP, and dTTP, and 1.25 U of Taq polymerase (Takara, Shiga, Japan) in a total reaction volume of 20 µl. PCR conditions were as follows: 32 cycles of 1 min at 94°C, 1 min at 37°C, and 2 min at 72°C, and a final extension of 10 min at 72°C. The reproducibility of the results was tested by repeating the reactions under identical conditions three times. The PCR products were fractionated by electrophoresis on agarose gels and stained with ethidium bromide. Individual-specific bands were recovered from agarose gels, reamplified with the corresponding primers, and then cloned into pGEM-T Easy Vector (Promega) for nucleotide sequencing.

Southern Blot Hybridization
Five micrograms of genomic DNAs isolated from C. sinensis adult worms were digested with restriction enzymes. After being fractionated through 0.8% agarose gels, the DNAs were blotted onto nylon membrane (Hybond-N+; Amersham Pharmacia Biotech, Uppsala, Sweden) by capillary action in 10 x standard saline citrate (SSC). The blots were hybridized with probes enzymatically labeled with the ECL Direct Labeling Kit (Amersham Pharmacia Biotech). The labeling and hybridizing conditions were according to the manufacturer's instructions. The membranes were washed twice in 6 M urea, 0.4% sodium dodecyl sulfate, and 0.1 x SSC at 42°C for 20 min and twice in 2 x SSC at room temperature for 5 min.

Dot-Blot Analysis
Ten micrograms of genomic DNA were blotted onto nylon membrane according to the standard procedure of dot-blot analysis (Sambrook, Fritsch, and Maniatis 1989Citation ). Two identical membranes were prepared, of which one was hybridized with probe for repetitive sequences and the other was hybridized with that for cysteine protease as a single copy control (GenBank accession number AF271091). The probe for protease was amplified from C. sinensis genomic DNA with the CsCP3-S1 (5'-GCTGGACTCCGACTACCCATATG-3') and CsCP3-R3 (5'-GGTTTAAACGATTGTGCATCGC-3') primers. Probe labeling and hybridizing conditions were same as those for Southern blot hybridization. The intensities of signals were measured using the LAS-1000plus system (FUJIFILM, Tokyo, Japan).

Construction and Screening of Genomic DNA Library
DNAs from adult worms were partially digested with Sau3AI (Takara). DNA fragments of 9~23 kb were recovered by ultracentrifugation onto sucrose gradients, purified, and then cloned into lambda FIX II vector predigested with XhoI (Stratagene, La Jolla, Calif.). The constructs were packaged into lambda particles using Gigapack III Gold-11 packaging extract. The unamplified libraries were screened with DNA probe labeled with the ECL labeling kit. The conditions for plaque-lift hybridization were identical to those for Southern blot hybridization. The inserts of lambda clones were amplified by long PCR (Chen, Fockler, and Higuchi 1994Citation ) using primers designed from vector regions (5'-CTAATACGACTCACTATAGGGCGTCG-3' and 5'-CCCTCACTAAAGGGAGTCGACTCG-3') and LA Taq polymerase according to the standard cycle conditions (Takara). The amplified products were digested with restriction enzymes and were then cloned into pBluescript II SK(-) phagemid (Stratagene) for nucleotide sequencing.

Screening of cDNA Library by PCR Methods
A cDNA library of adult C. sinensis was constructed in {lambda}ZAP II vector using a cDNA synthesis kit (Stratagene) according to the manufacturer's instructions. The library was amplified and was then used in standard PCR reactions for the detection of mRNA transcripts. The PCR products were cloned into pGEM-T Easy Vector for sequencing.

Sequence Analysis
The nucleotide sequences were automatically determined with an ABI PRISM 377 DNA sequencer (Applied Biosystems, Foster City, Calif.) and a BigDye Terminator Cycle Sequencing Reaction Kit (Perkin Elmer Corporation, Foster City, Calif.). To ensure the accuracy of sequencing reactions, sequences of single strands from five clones of vector-ligated DNA fragments were determined. For PCR products obtained during cDNA library screening, nucleotide sequences from both strands were determined. After sequencing, homology searches were performed against the nonredundant database at the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nhm.gov/) using BLASTN and BLASTX (Altschul et al. 1997Citation ). The REPEAT program in the GCG package, version 8. (University of Wisconsin), was used to determine the direct repeat sequences. The putative open reading frames (ORFs) were predicted by GeneScan (Burge and Karlin 1997Citation ), GeneMark (Borodovsky and Lukashin, unpublished; http://genemark.biology.gatech.edu/GeneMark/), and ORF Finder (NCBI). A search for the functional protein domains was performed using ProDom (Gouzy, Corpet, and Kahn 1999Citation ) and ProfileScan (Gribskov, McLachlan, and Eisenberg 1987Citation ).

Phylogenetic Analysis
The nucleotide sequences were aligned with ClustalX (Thompson et al. 1997Citation ). After optimizing the sequence alignments using the PHYDIT program (Chun 1995Citation ), divergence values were calculated and a dendrogram was drawn using the programs DNADIST and NEIGHBOR, respectively, of PHYLIP (Felsenstein 1993Citation ). Based on the previous reports (Xiong and Eickbush 1990Citation ; Malik and Eickbush 1999Citation ), reverse transcriptase (RT), RNase H (RH), and integrase (IN) domains in pol proteins were defined from alignments using CLUSTAL W (Thompson, Higgins, and Gibson 1994Citation ), and amino acid sequences of the three domains were combined for the phylogenetic analysis of LTR retrotransposons. After aligning the combined sequences with ClustalX and optimizing the alignment with GeneDoc (Nicholas and Nicholas 1997Citation ), a phylogenetic analysis was performed using PROTDIST and NEIGHBOR of PHYLIP. The trees were displayed by TreeView (Page 1996Citation ), and the statistical significance of branching points was evaluated with 1,000 random samplings of the input sequence alignments using SEQBOOT.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Isolation of RAPDs by AP-PCR
With nine 10mer arbitrary primers, singly or in pairs, AP-PCRs were performed to find band shifts in genomic DNAs separately extracted from individual C. sinensis worms. In AP-PCRs with DNAs from three worms, three individual-specific band shifts were discovered (fig. 1A ), and their sequences were used to search the GenBank database using the BLAST algorithm. The sequence of a band shift with the B13 primer, named P-B13 (GenBank accession number AZ551682), showed significant identity at the amino acid (aa) sequence level to pol proteins in Kabuki of Bombyx mori (AB032718; identity value of 34%) and in bovine syncytial virus (U94514, 23%). Significant identities to any known genes were not found in the homology searches using the sequences of the other two bands. To determine the genomic distribution of P-B13, the P-B13 DNA fragment was used to probe a Southern blot of genomic DNA from C. sinensis. Multiple bands of hybridization to the genomic DNA digested with a series of restriction enzymes indicated that P-B13 represents a highly repetitive region (fig. 1B ), which is one of the prominent features of retrotransposons. Thus, it was proposed that P-B13 is a portion of retrotransposon present within the C. sinensis genome.



View larger version (56K):
[in this window]
[in a new window]
 
Fig. 1.—Arbitrarily primed-PCR(AP-PCR) analysis of individual worms for isolating randomly amplified polymorphic DNAs (RAPD). A, Electrophoresis of AP-PCR products on agarose gel visualized by ethidium bromide staining. Separately extracted genomic DNAs from individual Clonorchis sinensis worms were amplified using AP primers. Band shifts corresponding to individual-specific RAPD markers are indicated by arrows. Numbers at the top refer to individual worms, and primers used in each reaction are indicated above the numbers. The positions of DNA size standards (in bp) are shown on the left. B, Southern blot hybridization of P-B13 to genomic DNAs of C. sinensis. Restriction endonucleases used for the digestion of DNAs are indicated at the top. The positions of DNA size standards (in kb) are shown on the right

 
The Structure of a Novel LTR Retrotransposon, CsRn1
The overall structure of a novel retrotransposon encompassing the P-B13 marker was determined from three randomly selected lambda clones, labeled {lambda}Cs-1, {lambda}Cs-2, and {lambda}Cs-4, obtained by screening the genomic DNA library with the P-B13 probe (fig. 2 ). In each of these three clones, the full lengths of the CsRn1 retrotransposon differed slightly due to several insertions and deletions (indels), and a consensus sequence determined from the three clones had a size of 5,026 bp. The whole units of CsRn1 were bounded by direct repeats of 4 bp known as target site duplications (TSDs) introduced during the process of integration.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 2.—Overall structure of the CsRn1 long-terminal-repeat (LTR) retrotransposon. Boxes containing black arrows represent flanking LTRs. Striped boxes within the open reading frame (ORF) show the conserved functional domains of Gag, protease (PR), reverse transcriptase (RT), RNase H (RH), and three subdomains of integrase (IN). Duplicated target sites of 4 bp are represented as target site duplications (TSDs). Positions of P-B13 RAPD and LTR probe for southern hybridization (P-LTR) are indicated. A restriction map of the full-length CsRn1 element is presented above the structure

 
The complete CsRn1 elements were flanked by LTRs with an average size of 471 bp (fig. 2 ). As in most LTR retrotransposons (Boeke and Stoye 1997Citation ), the LTRs contained at both ends short inverted repeats of 3 bp that initiated as TG and terminated as CA (5'-TGT ... ACA-3'). A sequence motif (AATACA) similar to a typical poly A signal sequence (AATAAA) was found within the LTR sequence, but a promoter signal sequence (TATA box) could not be defined. The sequence of 12 bp adjacent to the 5'-LTR was complementary to the nucleotides at the 3' ends of bovine and chicken tRNATrp. Thus, this region seems to be a likely primer-binding site (PBS) for the synthesis of the first cDNA strand. An additional priming site (5'-GGGGGAGTAG-3') for the synthesis of the second cDNA strand (polypurine tract [PPT]) was found in the direct upstream region of 3'-LTR (fig. 3 ).



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 3.—Sequences of the putative primer-binding site (PBS) and polypurine tract (PPT) of CsRn1. The sequences of 5' long-terminal-repeat (LTR) termini, PBS, PPT, and 3' LTR termini are aligned and presented. The 3' end of bovine tRNATrp is also presented in its corresponding region. Dots represent gaps introduced into sequences to increase their similarity. Sequences of Kabuki of Bombyx mori (AB032718) and an undescribed LTR retrotransposon of Drosophila melanogaster on AE003787 are used for the alignment

 
An internal region of the CsRn1 copy from lambda clone 4 ({lambda}Cs-4) contained one large, uninterrupted ORF (1,304 aa). The copies from the other two lambda clones ({lambda}Cs-1 and {lambda}Cs-2) showed similar results after the sequences were corrected for corruptions. The deduced ORF included well-conserved functional domains in the order protease (PR), RT, RH, and IN (fig. 2 ). Instead of a conventional Gag motif (CCHC), the ORF had an apparent nucleic-acid-binding site (CHCC) at the predicted 3' ends of Gag just prior to the DTG aspartic protease active site. Although the CHCC motif is unusual, it is also repetitively found in Gag proteins of Kabuki and AE003787 (fig. 4 ). The amino acid conservation (fig. 4 ) and domain order (PR-RT-RH-IN) in the ORF indicated that CsRn1 belongs to a family of Ty3/gypsy-like LTR retrotransposons (Pringle 1999Citation ).



View larger version (76K):
[in this window]
[in a new window]
 
Fig. 4.—Multiple-sequence alignments of functional protein domains from various long-terminal-repeat (LTR) retrotransposons. The amino acid sequences of putative protein domains, Gag, protease, reverse transcriptase, RNase H, and integrase, are aligned. Seven conserved subdomains of reverse transcriptase are underlined (Xiong and Eickbush 1990Citation ). The amino acid sequences of CsRn1 domains are compared with those of other LTR retrotransposons: Kabuki of Bombyx mori (AB032718), undescribed LTR retrotransposon of Drosophila melanogaster on AE003787, Sushi of Fugu rubripes (AF030881), and Osvaldo of Drosophila buzzatii (Z46728). For the CHCC Gag domain, only those of Kabuki and AE003787 are used for the alignment, since the other elements have a CCHC Gag motif instead of CHCC. Sequence identities with CsRn1 are highlighted in black, while similar residues are highlighted by gray boxes

 
Identification of CsRn1 Master Copies
CsRn1 elements were interspersed throughout the genome of C. sinensis, rather than being tandemly arrayed or clustered at limited loci, and the copy number was estimated to be approximately 100 per haploid genome under high-stringency conditions (0.1 x SSC) (fig. 5 ). In addition to the 3 initially identified copies, 11 copies of full-length CsRn1 elements were further obtained from the C. sinensis genomic library for comparison of their overall sequences and structures. The 14 CsRn1 copies (GenBank accession numbers AY013558AY013571) shared an overall sequence identity of 98.7%, which slightly differed in LTRs (98.3%) and in internal coding regions (98.9%) (table 1 ). In three copies (CsRn1-4, CsRn1-7, and CsRn1-52), the reading frames were retained, while in others they were corrupted by indels and/or stop codons introduced by base substitutions. All copies were bounded by TSDs of 4 bp and adjacent unique single-copy regions, supporting the heterogeneous genomewide distribution of CsRn1.



View larger version (56K):
[in this window]
[in a new window]
 
Fig. 5.—Intragenomic distribution and copy number of CsRn1. A, Genomic Southern blots showing that CsRn1 is highly reiterated throughout the Clonorchis sinensis genome. Restriction endonucleases used for the digestion of DNAs are indicated at the top. Arrows in lane 1 (AccI digests) indicate two prominent bands of 950 bp and 1,059 bp corresponding to the expected sizes of AccI digests spanning the LTRs and internal sequences at the 5' and 3' regions, respectively (see fig. 2). The positions of DNA size standards (in kb) are shown on the right. B, Dot-blot analysis for determining the copy number of CsRn1. Each of two membranes was dotted in duplicate with varying amounts (µg) of genomic DNAs as shown on the right. The probes used for hybridization are indicated at the top: P-B13 for CsRn1 (see fig. 2) and Cys-Pro for cysteine protease as a single-copy control. The signal intensities between the two blots were compared to estimate the copy number of CsRn1

 

View this table:
[in this window]
[in a new window]
 
Table 1 Sequence Similarities Among CsRn1 Copies

 
A sequence alignment of 14 full-length CsRn1 copies showed base substitutions shared by more than two copies at numerous positions (90 of 5,023 bp), which showed clear subdivisions within the CsRn1 elements. For example, two nucleotide pairs at positions 2666 and 2692 divided CsRn1 copies into two groups, designated group G, with diagnostic bases of G·T, and group C, with C·C. The two groups were further distinguished by nucleotide pairs observed at eight other positions, although the bases at these positions were less distinctive (data not shown). This, together with the finding of well-conserved sequences at the remaining positions, demonstrated that the members of group G formed a clonal lineage originating from a common progenitor copy. The three diagnostic bases at positions 66, 4240, and 4619 suggested further divergence of this group into two subgroups (fig. 6 ).



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 6.—Phylogenetic analysis of the full-length CsRn1 elements. The analysis was conducted using PHYLIP based on an alignment of full-length nucleotide sequences. The tree was constructed using the neighbor-joining algorithm and was unrooted. The number at a particular node indicates its percentage of appearances in 1,000 bootstrap replicates, and only values of >60% are indicated. Groupings supported by the bootstrap analysis, as well as by the diagnostic bases, are marked as thicker branches (see text)

 
Short branch lengths relating the members of group G in a phylogenetic tree (average length = 0.001; fig. 6 ) (Medstrand and Mager 1998Citation ) and nearly identical LTR sequences of the copies (table 1 ) (Dangel et al. 1995Citation ) suggested that the CsRn1 copies of this group have recently been replicated. Together with the fact that the major fraction of the CsRn1 elements belong to group G (7 of 14 sequenced copies), these results demonstrate a role of the elements belonging to group G as active master copies for the multiplication of CsRn1 elements. The copies of group C also had numerous diagnostic bases, but they were heterogeneous and showed higher levels of sequence divergence when compared with group G (fig. 6 ). Thus, these divergent copies were thought to be inactive variant forms of CsRn1 that expanded prior to the expansion of group G.

The Mobile Activity of the CsRn1 Master Copy
The genomic distribution of the element was found to be heterogeneous among individuals of C. sinensis, which supports the recent expansion of CsRn1 (fig. 7A ). Based on this finding, we attempted to examine the presence of CsRn1 transcripts in the total RNA molecules extracted from the adult worms by Northern blot analysis but failed to detect any signals when the blots were probed with P-B13 probe (data not shown). We then performed PCRs with higher sensitivity using a cDNA library as template and four primer sets covering the full-length of mRNA transcripts (see fig. 7B and its legend). As shown in figure 7B , 3'-end and P-B13 regions were well amplified, whereas 5'-end and RT regions were not amplified or weakly amplified, possibly due to the incomplete extension of cDNAs from 3' ends during the construction of the cDNA library.



View larger version (53K):
[in this window]
[in a new window]
 
Fig. 7.—Mobile activity of CsRn1. A, Southern blot analysis of CsRn1 with genomic DNAs separately extracted from individual worms (presented as numbers on the top) and P-LTR probe (see fig. 2). HindII restriction endonuclease was used for the digestion of the DNAs. Several polymorphic bands among individuals introduced by recent expansion of CsRn1 can be seen between 0.56 and 1.4 kb. B, Amplification of 5'-end, reverse transcriptase (RT), P-B13 marker, and 3'-end regions of CsRn1 from a cDNA library of Clonorchis sinensis by PCR methods. Primers used in each PCR reaction are as follows: T3 and 5LTR-R (5'-CGACTAAATCCGCTGAATC-3') for the 5'-end region; RT-F (5'-GACGAAAGGTCCACCTGTC-3') and RT-R (5'-GGGCAATGGTGAAATACCTG-3') for RT; Pro-F (5'-TCTGGTTGAGCGTTTCCATC-3') and Pro-R (5'-CACTAGAGCGTTGACCGTG-3') for P-B13; and T7 and 3LTR-F (5'-GAAACTTGAAGTGAGCAAC-3') for the 3'-end region. M = 100-bp size marker. C, Homology search of CsRn1 genomic sequence against the dbEST of GenBank databases. The full-length CsRn1 and three expressed sequence tags (ESTs) of Schistosoma mansoni with the highest match are presented. The hatched boxes indicate the matched regions, and their positions in the CsRn1 sequence are shown, together with the homology values. The ESTs of S. mansoni are presented with accession numbers in the databases and were obtained from cDNA libraries of male (AI977543) and female (AI975406 and AI976475) adult worms of the trematode

 
The PCR products of 3'-end regions were cloned and sequenced, and 12 randomly chosen clones showed six different sequences. The 3' ends of CsRn1 transcripts lay within 141–146 bp downstream of the 5' end of the 3' LTR and about 15 bp downstream of the presumed AATACA poly A signal sequence. When compared with genomic copies of CsRn1 elements, the transcripts shared the same bases with the copies of group G at diagnostic substitution positions (data not shown). A homology search of CsRn1 sequence against dbEST of GenBank revealed that a yet-unidentified CsRn1-like element is also expressed in Schistosoma mansoni, one of the well-studied trematodes (fig. 7C ). Since the expansion of retrotransposons is largely restricted to germ cells and/or early embryonic stages, the frequency of CsRn1 amplification could not be determined by Northern blot analysis, in which the total RNA molecules from the whole organism were used. However, the polymorphic distribution of CsRn1 among individuals and the presence of its mRNA transcripts suggest constant, uninterrupted expansion of the element responsible for the high copy number.

The Evolutionary Relationship of CsRn1 with Other Retrotransposons
The amino acid sequence of RT encoded in CsRn1 showed strong homology with that of RT in many Ty3/gypsy-like LTR retrotransposons, particularly in Kabuki (identity value of 63%) and AE003787 (58%), and a similar pattern of homology was also found with the sequences of IN (fig. 4 ). Moreover, the nucleotide sequence of CsRn1 exhibited homology with the DNA sequences of the two elements over a relatively long range of nucleotide positions (Kabuki, 53.6% identity in 2,972 nt; AE003787, 53.5% in 3,583 nt). In view of these observations, we performed a phylogenetic analysis using amino acid sequences of pol proteins in order to gain further understanding of the relationship of CsRn1 with other Ty3/gypsy-like LTR retrotransposons. In addition to RT, the amino acid sequences of RH and IN were selected for this analysis to increase the resolution (Malik and Eickbush 1999Citation ).

In a previous report, Malik and Eickbush (1999)Citation divided Ty3/gypsy-like LTR retrotransposons into eight distinct clades. As shown in figure 8 , the members of the eight clades were well separated in a tree constructed by the UPGMA method. Interestingly, however, CsRn1 formed a previously undetected, tightly conserved clade with Kabuki and AE003787. A similar clustering pattern was observed in a tree constructed with the neighbor-joining algorithm (data not shown), and the statistical significance of the branching points was well supported by bootstrap analysis. As members of a new clade (designated the CsRn1 clade), the three elements shared a number of common features, such as a CHCC Gag motif, a TSD of 4 bases (in the case of Kabuki, TSD cannot be determined; see Abe et al. 2000Citation ), and sequence conservations in the PBS and PPT (fig. 3 ), as well as functional protein domains (fig. 4 ).



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 8.—Phylogenetic relationship of CsRn1 with other Ty3/gypsy-like retrotransposons. The analysis was based on an alignment of amino acid sequences from reverse transcriptase to central subdomain of integrase (DDE motif), and the tree was constructed by the UPGMA algorithm using PHYLIP and was unrooted. Branching nodes separating each clade are represented as thicker bars, and bootstrap values of these nodes are shown as numbers at the nodes. The elements marked with asterisks indicate retrotransposons belonging to the genus Errantivirus. The elements used in the analysis are as follows: 412, Drosophila melanogaster (X04132); AE003787, undescribed retrotransposon of D. melanogaster; AF262041, undescribed retrotransposon of Arabidopsis thaliana; AF026205 and U88169, undescribed retrotransposons of Caenorhabditis elegans; Athila, A. thaliana (AB005248); Blastopia, D. melanogaster (Z27119); Cer1, C. elegans (U15406); Cft1, Cladosporium fulvum (AF051915); Deal, Ananas comosus (Y12432); Cyclops, Vicia faba (AB007466); Gulliver, Schistosoma japonicum (AF243513); Gypsy, D. melanogaster (M12927); Kabuki, Bombyx mori (AB032718); Mag, B. mori (S08405); MarY1, Tricholoma matsukake (AB028236); Mdg1, D. melanogaster (X59545); Mdg3, D. melanogaster (X95908); Osvaldo, D. buzzatti (AJ133521); RIRE7, Oryza sativa (AB033235); Sushi, Fugu rubripes (AF030881); Ted, Trichoplusia ni (M32662); Tf1, Schizosaccharomyces pombe (M38526); Tom, D. ananassae (Z24451); Tv1, D. virilis (AF056940); Ty3-2, Saccharomyces cerevisiae (S53577); Ulysses, D. virilis (X56645); Woot, Tribolium castaneum (U09586); Yoyo, Ceratitis capitata (U60529); ZAM, D. melanogaster (AJ000387)

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
In the present study, we characterized CsRn1 from the genetically polymorphic regions among individual C. sinensis worms as the first member of uncorrupted LTR retrotransposons found in the phylum Platyhelminthes. The full-length CsRn1 encodes a single uninterrupted ORF which resembles pol proteins of Ty3/gypsy-like elements. A phylogenetic analysis showed that CsRn1 is a member of a distinct, previously undetected clade of LTR retrotransposons which exhibit definitive characteristics such as highly conserved PBS (tRNATrp) and PPT, a highly conserved and unusual CHCC Gag motif, a TSD of 4 bases, and strong similarity of functional protein domains.

For expansion, retrotransposons are transcribed by host RNA polymerase and then reverse-transcribed by their own reverse transcriptase. Because these two enzymes have no proofreading capacity (Varmus and Brown 1989Citation ), cDNAs produced during the process of transposition tend to acquire random base substitutions. These base substitutions frequently inactivate the progeny copies, and the resulting "dead-on-arrival" (DOA) copies with no mobile activities are likely to be subjected to neutral evolution, through which the accumulation of sequence variations within the DOA copies is accelerated (Petrov, Lozovskaya, and Hartl 1996Citation ). Thus, together with the low fidelity of RT, the neutral evolution of the inactive DOA copies may have an additional effect on sequence divergence among individual genomes of a species. In this study, a variable genomic region among individuals was successfully identified by AP-PCR-based RAPD analysis as CsRn1 LTR retrotransposon-related sequence.

During an evolutionary time, a particular subset of retrotransposons expands differentially, rather than simultaneously, from other variant subsets with selective advantage for expansion (Clough et al. 1996Citation ). Thus, a recently expanded master copy can be distinguished as the subset with the largest population (Boissinot, Chevret, and Furano 2000Citation ) and with low levels of sequence divergence among its members (Medstrand and Mager 1998Citation ). In addition to these characteristics, high sequence identity between flanking LTRs of individual copies can be accepted as a hallmark of their recent integration in cases of LTR retrotransposons, since a pair of LTRs use the same sequences as templates for their replications (Dangel et al. 1995Citation ). CsRn1 copies of group G satisfied all of these criteria (table 1 and fig. 6 ), suggesting the role of group G as an active master copy, and the preserved mobile activity was confirmed by the uncorrupted ORF, heterogeneous distribution among individual genomes (fig. 7A ), and the presence of mRNA transcripts of CsRn1 (fig. 7B ).

The coding capacity of CsRn1 suggests that the element is a member of metaviruses, which have no env gene (Pringle 1999Citation ). Although no significant homology to other LTR retrotransposons at the nucleotide sequence level was found throughout the whole unit, sequence motifs for the synthesis of double-stranded cDNA (PBS and PPT) showed strong identity with those of Kabuki (B. mori) and AE003787 (D. melanogaster) (fig. 3 ). Moreover, the amino acid sequences in the unusual CHCC Gag motif, RT, and IN were well conserved in the three elements (fig. 4 ). With these shared properties, a phylogenetic analysis suggested that CsRn1 formed an ancient, previously undetected clade of Ty3/gypsy-like LTR retrotransposons found in insects and trematodes (CsRn1 clade; fig. 8 ). The nucleotide sequences of Kabuki and AE003787 in their putative ORF regions are corrupted, which inactivates the elements and gives rise to low copy numbers (the number of Kabuki was estimated as eight in the haploid genome; Abe et al. 2000Citation ). However, several copies of CsRn1 were uncorrupted and showed maintained mobile capacity (fig. 7 ), suggesting that the element might have acquired high copy numbers in C. sinensis genomes (more than 100 per haploid genome; fig. 5 ) through its continuous expansion.

Although they had a phylogenetically close relationship, differences in the numbers of ORFs and no significant homology in nucleotide sequences were observed among the members of the new clade, which reduces the probability of horizontal transfer between insects and trematodes in the recent past. Thus, it is likely that these elements evolved from a common ancestor that was present in the common progenitor of insects and trematodes or was transferred from insects to trematodes, or vice versa, during the early stage of the divergence of the two taxa. However, the possibility of horizontal transfer between insects and trematodes is uncertain because few cases, only within similar species, have been reported (Gonzalez and Lessios 1999Citation ; Jordan, Matyunina, and McDonald 1999Citation ). The isolation of further elements belonging to the CsRn1 clade in insects and trematodes or the finding of an errantivirus(s) that is phylogenetically related to the clade (Malik and Eickbush 1999Citation ) will be helpful in understanding the detailed evolutionary course among the members of the clade.

Only a few data on LTR retrotransposons are available for the Platyhelminthes, despite the large content of repetitive elements in their genomes (see Regev, Lamb, and Jablonka 1998Citation and references therein). Model organisms such as D. melanogaster, A. thaliana, and S. cerevisiae, commonly used in previous studies, have genome structures with relatively low complexity and repetitive elements with low copy numbers, which makes it difficult to estimate the actual significance of retrotransposons in the complex genome of the animal. Thus, the present study using a trematode provides advantages for the study of retrotransposons with small but complex genomes. We confirmed the presence of partial sequences similar to those of CsRn1 in other trematodes, S. mansoni (fig. 7C ) and Paragonimus westermani (unpublished data). These results reflect the presence of a unique LTR retrotransposon family belonging to the CsRn1 clade in the lower animal taxa which may play significant roles in the evolution of genomes. Our results concerning an active LTR retrotransposon and its related elements will broaden the current knowledge on LTR retrotransposons and provide a clue for further studies on the evolutionary origin of diverse reverse-transcribing elements.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
All of the nucleotide sequences described in this article were deposited in GenBank of NCBI, and their accession numbers are as follows: P-B13 RAPD marker, AZ551682; CsRn1-1, AY013558; CsRn1-2, AY013570; CsRn1-4, AY013569; CsRn1-7, AY013563; CsRn1-15, AY013566; CsRn1-16, AY013562; CsRn1-26, AY013571; CsRn1-39, AY013564; CsRn1-52, AY013565; CsRn1-5lb4, AY013568; CsRn1-74, AY013560; CsRn1-82, AY013561; CsRn1-86, AY013567; CsRn1-89, AY013559.

A sequence alignment of multiple CsRn1 copies was deposited in GenBank in linked form to each nucleotide set, of which accession numbers are presented above, and the alignment of pol proteins used for phylogenetic analysis in this work was deposited in linked form to the nucleotide sequence of CsRn1-4 (AY013569).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
This work was supported by a grant from the Korea Science and Engineering Foundation (KOSEF) (number 1999 2-208-004 5).


    Footnotes
 
Kenneth Wolfe, Reviewing Editor

1 Abbreviations: IN, integrase; PBS, primer-binding site; PPT, polypurine tract; PR, protease; RT, reverse transcriptase; TSD, target site duplication; RH, RNase H. Back

2 Keywords: Clonorchis sinensis trematode LTR retrotransposon AP-PCR RAPD master copy Back

3 Address for correspondence and reprints: Mun-Gan Rhyu, Department of Microbiology, College of Medicine, Catholic University of Korea, Seoul 137-701, Korea. rhyumung{at}cmc.cuk.ac.kr Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Abe H., M. Kanehara, T. Terada, F. Ohbayashi, T. Shimada, S. Kawai, M. Suzuki, T. Sugasaki, T. Oshiki, 1998 Identification of novel random amplified polymorphic DNAs (RAPDs) on the W chromosome of the domesticated silkworm, Bombyx mori, and the wild silkworm, B. mandarina, and their retrotransposable element-related nucleotide sequences Genes Genet. Syst 73:243-254[ISI][Medline]

    Abe H., F. Ohbayashi, T. Shimada, T. Sugasaki, S. Kawai, K. Mita, T. Oshiki, 2000 Molecular structure of a novel gypsy-Ty3-like retrotransposon (Kabuki) and nested retrotransposable elements on the W chromosome of the silkworm Bombyx mori Mol. Gen. Genet 263:916-924[ISI][Medline]

    Altschul S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402[Abstract/Free Full Text]

    Arkhipova I., M. Meselson, 2000 Transposable elements in sexual and ancient asexual taxa Proc. Natl. Acad. Sci. USA 97:14473-14477[Abstract/Free Full Text]

    Biessmann H., M. F. Walter, D. Le, S. Chuan, J. G. Yao, 1999 Moose, a new family of LTR-retrotransposons in the mosquito Anopheles gambiae Insect Mol. Biol 8:201-212[ISI][Medline]

    Boeke J. D., D. J. Garfinkel, C. A. Styles, G. R. Fink, 1985 Ty elements transpose through an RNA intermediate Cell 40:491-500[ISI][Medline]

    Boeke J. D., J. P. Stoye, 1997 Retrotransposons, endogenous retroviruses, and the evolution of retroelements Pp. 343–435 in J. M. Coffin, S. H. Hughes, and H. E. Varmus, eds. Retroviruses. Cold Spring Harbor Laboratory Press, New York

    Boissinot S., P. Chevret, A. V. Furano, 2000 L1(LINE-1) retrotransposon evolution and amplification in recent human history Mol. Biol. Evol 17:915-928[Abstract/Free Full Text]

    Bowen N. J., J. F. McDonald, 1999 Genomic analysis of Caenorhabditis elegans reveals ancient families of retroviral-like elements Genome Res 9:924-935[Abstract/Free Full Text]

    Britten R. J., T. J. McCormack, T. L. Mears, E. H. Davidson, 1995 Gypsy/Ty3-class retrotransposons integrated in the DNA of herring, tunicate, and echinoderms J. Mol. Evol 40:13-24[ISI][Medline]

    Burge C., S. Karlin, 1997 Prediction of complete gene structures in human genomic DNA J. Mol. Biol 268:78-94[ISI][Medline]

    Chen S., C. Fockler, R. Higuchi, 1994 Efficient amplification of long targets from human genomic DNA and cloned inserts Proc. Natl. Acad. Sci. USA 91:5695-5699[Abstract]

    Chun J., 1995 PHYDIT Distributed by the author (http://kctc2.kribb.re.kr/jchun/phydit/right.html)

    Clough J. E., J. A. Foster, M. Barnett, H. A. Wichman, 1996 Computer simulation of transposable element evolution: random template and strict master models J. Mol. Evol 42:52-58[ISI][Medline]

    Dangel A. W., B. J. Baker, A. R. Mendoza, C. Y. Yu, 1995 Complement component of C4 gene intron 9 as a phylogenetic marker for primates: long terminal repeats of the endogenous retrovirus ERV-K (C4) are a molecular clock of evolution Immunogenetics 42:41-52[ISI][Medline]

    Fedoroff N., 2000 Transposons and genome evolution in plants Proc. Natl. Acad. Sci. USA 97:7002-7007[Abstract/Free Full Text]

    Felder H., A. Herzceg, Y. de Chastonay, P. Aeby, H. Tobler, F. Müller, 1994 Tas, a retrotransposon from the parasitic nematode Ascaris lumbricoides Gene 149:219-225[ISI][Medline]

    Felsenstein J., 1993 PHYLIP (phylogeny inference package) Version 3.5c. Distributed by the author (http://evolution.genetics.washington.edu/phylip.html), Department of Genetics, University of Washington, Seattle

    Gonzalez P., H. A. Lessios, 1999 Evolution of sea urchin retroviral-like (SURL) elements: evidence from 40 Echinoid species Mol. Biol. Evol 16:938-952[Abstract]

    Gouzy J., F. Corpet, D. Kahn, 1999 Whole genome protein domain analysis using a new method for domain clustering Comput. Chem 23:330-340

    Gribskov M., A. D. McLachlan, D. Eisenberg, 1987 Profile analysis: detection of distantly related proteins Proc. Natl. Acad. Sci. USA 84:4355-4358[Abstract]

    Jordan I. K., L. V. Matyunina, J. F. McDonald, 1999 Evidence for the recent horizontal transfer of long terminal repeat retrotransposon Proc. Natl. Acad. Sci. USA 96:12621-12625[Abstract/Free Full Text]

    Kidwell M. G., D. Lisch, 1997 Transposable elements as sources of variation in animals and plants Proc. Natl. Acad. Sci. USA 94:7704-7711[Abstract/Free Full Text]

    Laha T., A. Loukas, C. K. Verity, D. P. McManus, P. J. Brindley, 2001 Gulliver, a long terminal repeat retrotransposon from the genome of the oriental blood fluke Schistosoma japonicum Gene 264:59-68[ISI][Medline]

    Lindsley D. L., G. G. Zimm, 1992 The genome of Drosophila melanogaster Academic Press, New York

    Long A. D., R. F. Lyman, A. H. Morgan, C. H. Langley, T. F. C. Mackay, 2000 Both naturally occurring insertions of transposable elements and intermediate frequency polymorphisms at the achaete-scute complex are associated with variation in bristle number in Drosophila melanogaster Genetics 154:1255-1269[Abstract/Free Full Text]

    Mcdonald J. F., 1990 Macroevolution and retroviral elements Bioscience 40:183-191[ISI]

    Malik H. S., W. D. Burke, T. H. Eickbush, 1999 The age and evolution of non-LTR retrotransposable elements Mol. Biol. Evol 16:793-805[Abstract]

    Malik H., T. H. Eickbush, 1999 Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons J. Virol 73:5186-5190[Abstract/Free Full Text]

    Marín I., C. Lloréns, 2000 Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data Mol. Biol. Evol 17:1040-1049[Abstract/Free Full Text]

    Medstrand P., D. L. Mager, 1998 Human-specific integrations of the HERV-K endogenous retrovirus family J. Virol 72:9782-9787[Abstract/Free Full Text]

    Nicholas K. B., H. B. Nicholas Jr., 1997 GeneDoc: a tool for editing and annotation multiple sequence alignments Distributed by the authors (www.cris.com/~ketchup/genedoc.shtmi)

    Page R. D., 1996 Tree View: an application to display phylogenetic trees on personal computers Comput. Appl. Biosci 12:357-358[Medline]

    Petrov D. A., E. R. Lozovskaya, D. L. Hartl, 1996 High intrinsic rate of DNA loss in Drosophila Nature 384:346-349[ISI][Medline]

    Poulter R., M. Butler, 1998 A retrotransposon family from the pufferfish (fugu) Fugu rubripes Gene 215:241-249[ISI][Medline]

    Pringle C. R., 1999 Virus taxonomy—1999. The universal system of virus taxonomy, updated to include the new proposals ratified by the International Committee on Taxonomy of Viruses during 1998 Arch. Virol 144:421-429[ISI][Medline]

    Regev A., M. J. Lamb, E. Jablonka, 1998 The role of DNA methylation in invertebrates: developmental regulation or genome defense? Mol. Biol. Evol 15:880-891[Free Full Text]

    Sambrook J., E. F. Fritsch, T. Maniatis, 1989 Molecular cloning: a laboratory manual 2nd edition. Cold Spring Harbor Press, New York

    Thompson J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 25:4876-4882[Abstract/Free Full Text]

    Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]

    Varmus H., P. Brown, 1989 Retroviruses Pp. 53–108 in D. E. Berg and M. H. Howe, eds. Mobile DNA. American Society of Microbiology, Washington, D.C

    Xiong Y., T. H. Eickbush, 1990 Origin and evolution of retroelements based upon their reverse transcriptase sequences EMBO J 9:3353-3362[Abstract]

Accepted for publication April 4, 2001.