Structural and Evolutionary Analysis of the copia-like Elements in the Arabidopsis thaliana Genome

Javier Terol, Mari Cruz Castillo, Mónica Bargues, Manuel Pérez-Alonso and Rosa de Frutos2,

Departamento de Genética, Facultad de Ciencias Biológicas, Universitat de València, Valencia, Spain


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
The analysis of 460 kb of genomic sequence of Arabidopsis thaliana chromosome III allowed us to identify two new transposable elements named AtC1 and AtC2. AtC1 shows identical long terminal repeats (LTRs) and all the structural features characteristic of the copia-like active elements. AtC2 is also a full copia-like element, but a putative stop codon in the open reading frame (ORF) would produce a truncated protein. In order to identify the copia-like fraction of the A. thaliana genome, a careful computer-based analysis of the available sequences (which correspond to 92% of the genome) was performed. Approximately 300 nonredundant copia-like sequences homologous to AtC1 and AtC2 were detected, which showed an extreme heterogeneity in size and degree of conservation. This number of copies would correspond to approximately 1% of the A. thaliana genome. Seventy-one sequences were selected for further analysis, with 23 of them being full complete elements. Five corresponded to previously described ones, and the remaining ones, named AtC3 to AtC18 are new elements described in this work. Most of these elements presented a putative functional ORF, nearly identical LTRs, and the other elements necessary for retrotransposon activity. Phylogenetic trees, supported by high bootstrap values, indicated that these 23 elements could be considered separate families. In turn, these 23 families could be clustered into six major lineages, named copia I–VI. Most of the 71 analyzed sequences clustered into these six main clades. The widespread presence of these copia-like superfamilies throughout plant genomes is discussed.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
DNA repetitive sequences represent the major fraction of plant genomes. In some species, they constitute up to 50% of the genome, contributing directly to evolution and host genome organization (Pearce et al. 1996aCitation ; SanMiguel et al. 1996Citation ). Plant genomes are especially rich in copia-like retroelements, which have been found in all the species that have been analyzed (Flavell et al. 1992, 1997Citation ; Voytas et al. 1992Citation ). The copia-like elements are usually present in high copy numbers; for example, more than 105 copies of BARE-1 have been estimated to be present in barley (Manninen and Schulman 1993Citation ), 105 copies in the rye genome (which constitutes 3.5% of the genome) (Pearce et al. 1997Citation ), around 106 copies in Vicia faba (Pearce et al. 1996aCitation ), and 104 copies in Avena (Linares, Serna, and Fominaya 1999Citation ). However, in some cases, such as that of rice, the copy number of copia-like elements may be as low as 100 per genome (Wang et al. 1999Citation ), and the Tpv2 family of Phaseolus vulgaris seems to consist of 40 copies (Garber et al. 1999Citation ). Most of the copia-like plant elements tend to be dispersed throughout the euchromatin, although their specific location depends on the plant species and the particular retrotransposon (Brandes et al. 1997Citation ; Kumar et al. 1997Citation ; Miller et al. 1998Citation ; Kumar and Bennetzen 1999Citation ). In contrast, some of them tend to be concentrated at the paracentromeric regions, as is the case for Arabidopsis thaliana, Allium cepa, and Cicer arietinum (Pearce et al. 1996bCitation ; Brandes et al. 1997Citation ; Heslop-Harrison et al. 1997Citation ). A streaking feature of the copia-like elements is that they are present as highly heterogeneous populations in plant genomes. An extreme example was obtained by sequencing 31 fragments of the reverse transcriptase region from the potato. It was found that all the sequences were different, with similarities varying from 5% to 75% (Flavell, Smith, and Kumar 1992Citation ). Extreme heterogeneity in sequences in a wide range of species from both dicotyledons and monocotyledons have also been found (Flavell et al. 1992, 1997Citation ; VanderWiel, Voytas, and Wendel 1993Citation ; Kumar 1996Citation ; Matsuoka and Tsunewaki 1996, 1999Citation ; Kumar et al. 1997Citation ; Pearce et al. 1997Citation ; Wang et al. 1997Citation ; Kuipers, Heslop-Harrison, and Jacobsen 1998Citation ; Yáñez et al. 1998Citation ; Kumar and Bennetzen 1999Citation ). Most of the works show that within the genome of a species, a particular family of copia-like elements is usually composed of multiple subfamilies that could be considered analogous to retroviral quasispecies.

The small genome of A. thaliana (130 Mb) has a low content of repetitive DNA. The interspersed DNA fraction is especially low, constituting 2% of the genome (Meyerowitz 1992Citation ). During the last decade, some families of transposable elements have been described in this species: the non-long-terminal-repeat (non-LTR) retrotransposons Ta11-1 (Wright et al. 1996Citation ) and TSCL (Chye, Cheung, and Xu 1997Citation ), the LTR-retrotransposon gypsy-like elements Tat1 (Peleman et al. 1991Citation ; Wright and Voytas 1998Citation ) and Athila (Pélisier et al. 1995, 1996Citation ), the transposon-like elements Limpet1 (Klimyuk and Jones 1997Citation ), Tag1 (Tsay et al. 1993; Frank et al. 1997Citation ), and Tag2 (Henk, Warren, and Innes 1999Citation ), the superfamilies Arnold and Harbinger (Kapitonov and Jurka 1999Citation ) and Basho (Le et al. 2000), and the foldback transposon Hairpin (Adé and Belzile 1999Citation ). Some families of repetitive elements structurally related to the miniature inverted-repeat transposable elements (MITEs) have also been described (Casacuberta et al. 1998Citation ; Surzycki and Belknap 1999Citation ; Feschotte and Mouchès 2000Citation ; Le et al. 2000Citation ). Referring to the copia-like elements, 10 related families, designated Ta1Ta10, were considered a superfamily within the genome of A. thaliana (Voytas and Ausubel 1988Citation ; Voytas et al. 1990Citation ; Konieczny et al. 1991Citation ). Some sequences related to Ta1Ta10 elements have been reported (Brandes et al. 1997Citation ), and the divergence between members of these families is high. Voytas (1992)Citation considered that the elements that share >95% nucleotide identity can be regarded as members of the same family, and those that share <85% can be considered members of different families. The copy numbers are low, with only a few copies per family, compared with other copia-like plant families. It has been estimated that the Ta1Ta10 superfamily constitutes 0.1% of the A. thaliana genome (Konieczny et al. 1991Citation ), with most of the elements located in clusters at the paracentromeric heterochromatin (Brandes et al. 1997Citation ). These data are in accord with works that show that the copia-like elements concentrate in the centromeric regions (Heslop-Harrison et al. 1997Citation ) and, in a more generic reference, that regions flanking the centromeres are densely populated by transposable elements (Copenhaver et al. 1999Citation ; Cold Spring Harbor Laboratory et al. 2000). copia-like elements other than those of the Ta1Ta10 superfamily, such as Evelknievel (Henikoff and Comai 1998Citation ), Art1 (Hervé et al. 1999Citation ), Meta1 (Kapitonov and Jurka 1999Citation ), and AtRE1-AtRE2 (Kuwahara, Kato, and Komeda 2000), have been described within the A. thaliana genome. It has also been described that the presence of copia-like elements extends to the mitochondrial genome of this species (Knoop et al. 1996Citation ).

The complete sequence of A. thaliana chromosomes II and IV have recently been reported (European Union Arabidopsis Genome Sequencing Consortium et al. 1999Citation ; Lin et al. 1999Citation ). The analysis of those sequences has shown the low content of repetitive DNA of the A. thaliana genome when compared with other plant species. Dispersal repeats, which consist of predominantly LTR and non-LTR retrotransposons, are found throughout the chromosome arms. However, the main fraction of transposable elements is concentrated at the pericentromeric heterochromatin, as had previously been described. This region is constituted mainly by a few genes and a high density of presumably inactive mobile elements (Lin et al. 1999Citation ). In addition to the complete sequences of chromosomes II and IV, studies of different stretches of the A. thaliana chromosomes have been performed (Quigley et al. 1996Citation ; Thompson, Schmidt, and Dean 1996Citation ; Comella et al. 1999Citation ; Terryn et al. 1999Citation ), as have computer-assisted analyses, looking for the presence of different families of transposable elements (Kapitonov and Jurka 1999Citation ; Le et al. 2000Citation ). In this paper, we report the results of a careful computer-based analysis performed in order to identify, characterize, and establish the evolutionary relationships among copia-like elements that exist in the A. thaliana genome.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
DNA Sequencing
The P1 genomic clones T15C9, F11C1, F3A4, T20E23, and T12C14, from chromosome III, were completely sequenced in the frame of the European Arabidopsis Genome Sequencing Project. For each clone, a shotgun library was constructed and the DNA sequence was determined with an ABI automatic DNA sequencer (in collaboration with Sistemas Genómicos S.L.). The sequencing projects were managed with the Staden software package (Staden 1996Citation ), and a total of 460 kb were analyzed, with an average redundancy of 8.44 characters per position.

Sequence Analyses
The searches for open reading frames (ORFs) on the genomic clones were performed with the GENSCAN program (Burge and Karlin 1997) in collaboration with the Munich Information Center for Protein Sequences (MIPS). The GenBank (GB) and European Molecular Biology (EMB) databases were searched for sequence similarities using the BLAST programs at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/BLAST/) (Altschul et al. 1997Citation ). An estimate of the similarity between sequences was obtained with the GAP program from the GCG Software Package (Wisconsin University). Multiple alignments were performed with the CLUSTAL W program (Thompson, Higgins, and Gibson 1994Citation ). Genetic distances were calculated with the Poisson correction method (Nei and Chakraborty 1976Citation ) for amino acid sequences; the phylogenetic trees were constructed with the neighbor-joining (Saitou and Nei 1987Citation ) and UPGMA (Swofford and Selander 1981Citation ) methods, the bootstrap test was carried out with 1,000 iterations. These evolutionary analyses were performed with the MEGA platform (Kumar, Tamura, and Nei 1993Citation ).


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Search and Characterization of copia-like Elements on 460 kb of Genomic Sequence
Within the frame of an international consortium for sequencing of the A. thaliana genome, we determined the nucleotide sequence of the P1 genomic clones T15C9, F11C1, F3A4, T20E23, and T12C14, corresponding to the short arm of chromosome III (European Union Chromosome 3 Arabidopsis Sequencing Consortium et al. 2000). The accession numbers for these sequences are AL132970, AL132976, AL132978, AL133363, and AL162507. Information on performance of analysis and a detailed annotation of these database entries can be viewed at the Munich Information Center for Protein Sequences (MIPS; http://www.mips.biochem.mpg.de/proj/thal/). Our contribution to the sequencing of chromosome III has been the determination of the nucleotide sequences of the inserts of these five clones, which amount to a total of 540,430 bp. We considered it interesting to analyze the presence and distribution of known transposable elements on these sequences in an effort to understand their dynamic and evolution in the A. thaliana genome.

We used the GenScan program to predict the ORFs present on the sequences we had determined. We obtained a total of 113 putative ORFs, which is in agreement with the analysis performed at the MIPS, which produced a total of 107 predicted coding sequences. The hypothetical proteins produced by the predicted ORFs were used in different systematic searches against the GB and EMB databases using the BLAST sequence similarity search tool (http://www.ncbi.nlm.nih.gov/BLAST/). These searches yielded only four sequences which presented similarity to previously described transposable elements: one similar to the Ac-like transposable element, another one similar to gypsy-like sequences, and the two remaining ones presenting a high degree of similarity to copia-like retrotransposons. We decided to concentrate our studies on the two copia-like sequences, which were named AtC1 (accession number AF287471) and AtC2 (accession number AF287472), and we performed further analyses on them.

AtC1 was predicted as a single 4,359-bp-long ORF coding for a hypothetical protein of 1,452 amino acids. Computer searches of the GB and EMB databases found significant similarity between AtC1 and several known copia-like retrotransposons. These comparisons revealed that AtC1 contains all the amino acid domains found to be conserved among autonomously active retroelements. Both the amino acid conservation and the domain order served to identify AtC1 as a copia-like retrotransposon.

We analyzed the genomic sequence to determine the structure of this new element (fig. 1A ). AtC1 has a long internal region of 4,629 bp, bounded by 335-bp LTRs, which present the canonical inverted repeats at their ends (TG-CA). The nucleotide sequences of the LTRs were found to be completely identical; a 5-bp direct repeat (CTGCT) flanking the LTRs could correspond to the target site duplication (TSD) (fig. 1B ). The internal region contains one large ORF which could code for the genes gag and pol, encoding nucleic acid, protease, integrase, reverse transcriptase, and RNase H domains, respectively, in that order (fig. 1A ). The primer-binding site is in accord with the one described by Gauss and Sprinzl (1983)Citation for plant tRNAimet (PBS) (fig. 1B ). There is also a polypurine tract (PPT) at the end of the ORF. Thus, the structural analysis of AtC1 suggests that it could be an active element, as it displays all the main features described in the retrotransposons that have been shown to be able to transpose.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 1.—A, Schematic representation of the elements AtC1 and AtC2. LTR = long terminal repeat; Gag = nucleic acid–binding domain; PR = protease domain; IN = integrase domain; RT = retrotranscriptase domain; RH = RNase H domain; PBS = primer-binding site; PPT = polypurine tract; TAA = stop codon on AtC2 open reading frame (ORF). B, Nucleotide sequence of the target site duplication (TSD), PBS, LTR ends, and PPT. The inverted repeats of the LTR from AtC1 are shown underlined

 
The prediction programs defined AtC2 as an ORF composed of two exons, with the protein deduced from this ORF being 1320 amino acids long. Comparisons of this protein sequence against the databases confirmed that AtC2 belonged to the copia-like family of retrotransposons. As the Ty-copia retroviral-like family is characterized by the encoding of a polyprotein with a single reading frame (Voytas and Ausubel 1988Citation ), we decided to check for the existence of the two exons predicted for AtC2. We found that a direct translation of the sequence beginning at the ORF initiation codon gave rise to a protein of 1,337 amino acids, 17 residues longer than the one predicted originally, with a stop codon at nucleotide position 1435. The alignments showed that the additional residues were conserved with respect to other copia-like proteins, confirming that the 51-bp intron predicted by the Gen Scan program was an artifact caused by the presence of the stop codon. The complete polyprotein could code for all of the amino acid domains described for copia-like elements; however, the presence of the stop codon would produce a truncated protein 478 amino acids long, which would lack the RT and RNaseH domains, indicating that AtC2 is not able to transpose autonomously. The LTRs are identical except for the presence of five indels in them, which causes a small difference in the sizes of the LTRs (3 nt). This difference in size and the fact that the LTRs have also lost the inverted repeats at their ends (fig. 1B ) indicate that the element has been inactive for a long enough period to accumulate changes in its sequence.

The overall similarity at the amino acid level between AtC1 and AtC2 was 42.21%, although the conservation varied extensively along the sequence. The best conserved domains were the integrase (68.1% similarity) and the retrotranscriptase (60% similarity). The similarity data and the differences in the lengths and structures of the elements indicate that AtC1 and AtC2 diverged long ago, as the degree of conservation between them is similar to that found between these elements and the copia element from Drosophila melanogaster (43.2% and 44.15% similarity, respectively).

Search for Homologous Sequences
Database searches performed at the DNA and amino acid levels with the programs BLASTP and TBLASTN did not yield any sequences identical to AtC1 or to AtC2. However, the TBLASTN searches allowed us to identify two new copia-like sequences on clones T15G18 (accession number AC006567; located on chromosome IV) and K2N11 (accession number AB022213; located on chromosome V) that were very similar to AtC1 and AtC2, respectively. We named the elements AtC3 (accession number AJ292423) and AtC4 (accession number AJ293575).

The BLAST P searches performed on the GB and EMB databases with AtC1 and AtC2 produced several hundreds of sequences related to the Ty-copia family of retrotransposons, and we decided to study them in detail. Only a few of these sequences corresponded to previously described elements: Evelknievel (Henikoff and Comai 1998Citation ), the Ta1 (Voytas et al. 1990Citation ; Konieczny et al. 1991Citation ) and AtRe1 (Kuwahara, Kato, and Komeda 2000) families from A. thaliana, Hopscotch from Zea mays (White, Habera, and Wessler 1994Citation ), Bare-1 from barley (Manninen and Schulman 1993Citation ), Tnt from tobacco (Grandbastien, Spielmann, and Caboche 1989Citation ), and SIRE-1 from soybean (Laten, Majumdar, and Gaucher 1998Citation ). Most of the other database entries corresponded to predicted proteins from A. thaliana and had been produced by the automatic annotation of genomic clones. These sequences are the result of the work of the international consortium which is performing the sequencing of the genome of this model plant.

The copia-like sequences we obtained showed extreme heterogeneity in length and degree of conservation. The size of the predicted proteins ranged from approximately 80 to 1,500 amino acids, which indicates the existence of many defective elements. Many proteins had been described as the products of ORFs, with a number of predicted exons that ranged from 2 to 8. We verified that, as it was the case with AtC2, the predicted introns were artifacts caused by the presence of indels or stop codons. In this way, we established that the copia-like polyproteins described as multiexonic ORFs indicated the existence of defective or mutated copia-like elements that were unable to transpose autonomously. We decided then to select those copia-like polyproteins coded by a single ORF in order to allow us to identify putative active elements.

We chose a total of 25 sequences which might correspond to active elements and studied them in detail. Eight of the elements belonged to copia-like families previously described in A. thaliana: Ta1 (Voytas et al. 1990Citation ; Konieczny et al. 1991Citation ), Evelknievel (Henikoff and Comai 1998Citation ), and AtRE1 and AtRE2 (Kuwahara, Kato, and Komeda 2000). The remaining 17 sequences, including AtC1, AtC3, and AtC4, were characterized in proof for the first time, and the 14 new elements were named AtC5AtC18. In spite of having described AtC2 as an element producing a truncated polyprotein, we included it in the study, as it presented most of the structural features of this kind of transposon and could be representative of a new copia-like family.

Structural Characterization of the New copia-like Elements
We identified the DNA sequences from which the analyzed polyproteins were derived and carried out a structural analysis in order to find all of the characteristic features of the copia-like retrotransposons. The results are summarized in table 1 , in which the known families Ta-1, Evelknievel, AtRE1, and AtRE2 have been also included. The most striking observation is that of the high degree of heterogeneity between the 26 analyzed elements. They all have different sizes, ranging from 4,629 to 5,738 bp, more than 1 kb of difference in size. The 26 elements present many of the structural features that characterize the autonomously active elements. All them present LTRs, with 17 of them showing identical 5' and 3' repeats. In the remaining 9 elements (AtRE2, AtC2, AtC4, AtC5, AtC8, AtC11, AtC14, AtC15, and AtC17), the 5' and 3' LTRs present small differences in size, mainly due to one-base indels. The inverted repeats are portrayed by 19 of our elements, while AtC2, AtC8, AtC11AtC13, AtC17, and AtC18 present LTRs where the inverted repeats begin with the canonical TG sequence but have a 3' change. Only the elements from the families AtRE1 and Ta-1 present LTRs identical in size and sequence. The rest of the elements display LTRs very heterogeneous in size, ranging from 120 to 734 bp, and sequence, with no significative similarities between them. When we analyzed the genomic sequences adjacent to the LTRs we found that 17 presented direct repeats that could correspond to the TSD, suggesting recent transposition events. Finally, most of the elements presented both the primer-binding site (PBS) and the polypurine tract (PPT) (adjacent to the 5' LTR and before the 3' LTR, respectively), suggesting that they can transpose autonomously.


View this table:
[in this window]
[in a new window]
 
Table 1 Summary of All the Features Analyzed in the Copia-like elements Studied in this Work

 
A multiple alignment with the 26 A. thaliana polyproteins, the copia element from D. melanogaster, Hopscotch from Zea mays, Bare-1 from barley, Tnt from tobacco, and SIRE1 from soybean was performed with the CLUSTAL W program (Thompson, Higgins, and Gibson 1994Citation ). The study of the alignment revealed that, in all but two cases, the predicted proteins displayed all of the highly conserved domains present in retrotransposons and that these domains were ordered as is usual in copia-like elements: Gag protein (Prats et al. 1988Citation ), protease (Pearl and Taylor 1987Citation ), integrase (Johnson et al. 1986Citation ), reverse transcriptase (Xiong and Eickbush 1990Citation ), and RNaseH (Johnson et al. 1986Citation ) (data not shown). Only the Gag region of Ta1Ta3 and AtC8 presented one change in the consensus zinc finger that characterizes this region (Prats et al. 1988Citation ), probably compromising the ability of these elements to engage in active transposition.

Copy Number and Chromosome Distribution
TBLASTN and BLASTN searches on the GB and EMB databases were performed for each of the elements studied in this work in order to find out how many identical copies existed in the A. thaliana genome (table 1 ). Most of the elements were present as single copies in the genome; AtC3, AtC5, AtC7, AtC13, AtC15, and AtC67 showed two identical copies in the same or different chromosomes, and only Evelknievel and AtC10 could be considered multicopy elements, with three and six copies, respectively. Nevertheless, these data will have to be confirmed when the A. thaliana genome is completely sequenced. The elements are widely dispersed on the five A. thaliana chromosomes, with seven copies on chromosome I, eight on chromosome II, seven on chromosome III, six on chromosome IV, and two on chromosome V. The smaller number of elements on chromosome V is probably due to the status of the sequencing project, with a smaller number of sequenced clones in the databases. The chromosome positions of the genomic clones which carry the retrotransposons indicate that they are dispersed along the chromosomes.

Search for Expressed Sequence Tags
The DNA sequence of the newly described elements was used to perform searches on the expressed sequence tag (EST) databases in order to find the corresponding ESTs. Three of the analyzed copia-like sequences produced several identical ESTs: AtC7, with eight different ESTs; AtC10, with two ESTs; and AtC18, with 1 EST. The fact that AtC7 and AtC10 present both transcriptional activity and several copies (two and six, respectively) on the A. thaliana genome enforces the idea they might be active retrotransposons.

Phylogenetic Analysis
The comparison of the amino acid sequences of the polyproteins coded by the copia-like elements characterized in this work reflects the high heterogeneity found at the structural level. The degree of similarity varies from 99% to 41%. As has been mentioned above, the similarity was much higher when the conserved domains, rather than the complete protein, were compared (data not shown). A phylogenetic tree was constructed with the neighbor-joining method based on complete sequences of the 26 polyproteins, and the bootstrap test was performed with a total of 1,000 iterations (fig. 2 ). A similar topology was obtained using the UPGMA method. We observed a distribution of the sequences into six major lineages, or families. Elements belonging to the same lineage showed similarities higher than 50%. The topology of the tree was supported by the high bootstrap values of the main branches. We decided to name the lineages copia I–VI.



View larger version (85K):
[in this window]
[in a new window]
 
Fig. 2.—Phylogenetic tree constructed with the complete polyproteins from the complete Arabidopsis thaliana elements, as well as Hopscotch from maize (ZMHOPS), Tnt1 from tobacco (TNT1), and SIRE-1 from soybean (GMSIRE1). The proteins were aligned with the CLUSTAL W program (Thompson, Higgins, and Gibson 1994Citation ), the distances were calculated with the Poisson correction for amino acids, the tree was constructed with the neighbor-joining method, and the bootstrap values were calculated over 1,000 iterations. The six main lineages in which the sequences can be grouped are indicated with gray and white boxes

 
The most homogeneous lineage is copia I, mainly composed of the AtRE1 and AtRE2 elements, which are almost identical. The other lineages are more heterogeneous both in structure and sequence, and they could be split into new lineages, as in the cases of copia II and V. For example, copia II could be divided into two new lineages, composed of AtC6 and AtC7, and AtC1, AtC5, and AtC4, respectively. Following our previously established criteria, we maintained copia II as a single lineage, as the elements belonging to it present a similarity higher than 50%.

The tree also includes copia elements from different plant species: Hopscotch from maize (White, Habera, and Wessler 1994Citation ), Tnt from tobacco (Grandbastien, Spielmann, and Caboche 1989Citation ), and SIRE-1 from soybean (Laten, Majumdar, and Gaucher 1998Citation ). Each of these sequences group with one of the lineages we have described, indicating that the copia elements of those families are closer to the elements from different species than to the other A. thaliana families (fig. 2 ).

In an effort to clarify the structure and evolutionary relationships of the copia-like sequence population in A. thaliana, we identified the RT region used by Konieczny et al. (1991)Citation in their study of the Ta-1 family in all the copia-like sequences we analyzed. We performed a multiple alignment with the RT domains of 63 Arabidopsis polyproteins and included the 8 TA sequences (TA2TA10) characterized in the above-mentioned work. The phylogenetic tree constructed with the alignment using the neighbor-joining method (fig. 3 ) displays a topology very similar to the ones obtained for the complete polyprotein, although the bootstrap values are much lower. We also obtained a similar cluster configuration when the UPGMA method was used to construct the tree. Most of the RT sequences could be assigned to the six major lineages described in figure 2 , although there are a number them that could form other lineages different from the ones we have proposed.



View larger version (62K):
[in this window]
[in a new window]
 
Fig. 3.—Phylogenetic tree constructed with the RT regions from 71 Arabidopsis thaliana copia-like retrotransposons. The RT sequences from the previously described elements and the ones described in this work are indicated by their names; the other sequences are indicated by their accession numbers. The sequences were aligned with the CLUSTAL W program (Thompson, Higgins, and Gibson 1994Citation ), the distances were calculated with the Poisson correction for amino acids, the tree was constructed with the neighbor-joining method, and the bootstrap values were calculated over 1,000 iterations. We have indicated the six major lineages we propose with gray and white boxes. The dashed lines on the right of the boxes indicate the sequences that might constitute additional major lineages

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
In this work, we performed an extensive analysis of the copia-like elements in the A. thaliana genome. To achieve this, we used information provided by the genome sequencing project, produced at our laboratory, and available at the databases. Early and recent works on copia-like elements in A. thaliana have been performed by cloning and sequencing genomic clones carrying copia-like sequences or by PCR amplification of conserved regions of the elements (Voytas and Ausubel 1988Citation ; Voytas et al. 1990Citation ; Konieczny et al. 1991Citation ; Kuwahara, Kato, and Komeda 2000Citation ), which has limited the scope of these analyses to closely related sequences. Therefore, in using the genome sequencing data, our work offers a broader view of the nature and evolution of the copia-like retrotransposons in this model plant. The analysis of this information has become a powerful tool for studying the evolution of mobile elements, and several works have used this approach to analyze the evolution of different groups of elements in species whose genomes had been sequenced. This has been the case for Caenorhabditis elegans (Oosumi, Garlick, and Belknap 1996Citation ) and for Saccharomyces cerevissiae (Kim et al. 1998Citation ).

The estimated size for the A. thaliana genome is 130 Mb and, to date, 120.7 Mb are available on the databases (updated on July 27, 2000, obtained from the Arabidopsis Genome Initiative web page at http://www.arabidopsis.org/agi.html). Thus, 92.8% of the A. thaliana genome is available, and 60% has already been annotated. A systematic computer-based survey had already been performed to identify mobile elements near wild-type genes, with negative results in this species (Bureau, Ronald, and Wessler 1996Citation ). Recently, some analyses on the mobile element fraction of the A. thaliana genome using computer-based methodology have been performed. Six new lineages of the Ty3/gypsy group have been found, and phylogenetic analysis reveals that this group of plant retroelements forms two main monophyletic clades (Marin and Llorens 2000Citation ). The presence of 142 groups of putative transposable elements has been detected in a systematic survey of a large sample (17.2 Mb) of the A. thaliana genome, of which 27 are copia-like retrotransposons (Le et al. 2000). According to Kapitonov and Jurka (1999)Citation , at least 100 diverse families of copia-like retrotransposons are present in this species.

The BLASTN and TBLASTN searches performed on the DNA databases allowed us to calculate that the approximate number of copia-related sequences present in the A. thaliana genome was 300. Previous works had estimated it at 200 copies (Voytas et al. 1990Citation ; Konieczny et al. 1991Citation ; Flavell et al. 1997Citation ); thus, our estimate represents a 50% increase over the previous one. These differences can be explained because of the approach used in the previous works, for which the PCR technique limited the scope of the detection to sequences which had conserved the binding sites for the primers used in the amplifications. If we assume that the mean size of the copia-like sequences is between 4 and 5 kb and that the copy number is approximately 300, it can be deduced that the copia-like fraction detected in this computer analysis represents at least 1% of the A. thaliana genome. Previous data suggested that members of Ta1Ta10 families constituted approximately 0.1% of the genome of this species (Konieczny et al. 1991Citation ). Our results agree with recently published works that suggest that the copia-like fraction would constitute 1% of the A. thaliana genome (Kapitonov and Jurka 1999Citation ). If this datum were confirmed, this percentage would constitute an extremely low proportion of the A. thaliana genome when compared with the copia-like components of other plants, such as barley (Manninen and Schulman 1993Citation ), maize (SanMiguel et al. 1996Citation ), rye (Pearce et al. 1997Citation ), and Avena (Linares, Serna, and Fominaya 1999Citation ). Several works have pointed out that there is a relationship between the overall total copy number of retrotransposons and the host genome size and that this could largely account for most genome size variation in plants (Katsiotis, Schmidt, and Heslop-Harrison 1996Citation ; Kumar et al. 1997Citation ; Linares, Serna, and Fominaya 1999Citation ). If this is the case, the low copy number of copia sequences in A. thaliana would be in correspondence with the small size of its genome.

We have analyzed 71 copia-like sequences, and most of them appeared to be highly heterogeneous in size and potentially inactive. Only 25 elements showed many of the structural characteristics necessary for potentially activity, with eight of these corresponding to families described in previous works. The presence of target site duplication in these 25 elements, along with the LTRs being identical or nearly identical for each element, indicates a predictable recent activity. In this respect, the high sequence identity between the 5' and 3' LTRs of retrotransposons from different families of maize (SanMiguel et al. 1998Citation ) led to the proposal of a recent burst of activity of the elements in the maize genome. Although most of the analyzed sequences indicate the presence of inactive elements that could be considered relicts of ancient active ones, it is interesting to note the presence of potentially active copia-like elements in the A. thaliana genome. Supporting the potential activity of these elements, three of them have associated ESTs, which implies transcriptional activity for them. None of the analyzed elements are located on the heterochromatin, indicating that they have inserted in coding regions, although it would be necessary to perform a more exhaustive analysis of the genomic sequences adjacent to the insertion points to find how many genes, if any, have been interrupted by the insertions.

The most noticeable characteristic of the 23 copia-like analyzed elements is their heterogeneity. All of these elements vary in sequence and size, although they maintain similar copia-like structures. Divergence values ranging from 5% to 40% are in accord with the extreme variability previously described for copia-like elements in plants (Flavell et al. 1992, 1997Citation ; Flavell, Smith, and Kumar 1992Citation ; VanderWiel, Voytas, and Wendel 1993Citation ; Kumar 1996Citation ; Matsuoka and Tsunewaki 1996, 1999Citation ; Kumar et al. 1997Citation ; Pearce et al. 1997Citation ; Wang et al. 1997Citation ; Kuipers, Heslop-Harrison, and Jacobsen 1998Citation ; Yañez et al. 1998Citation ; Kumar and Bennetzen 1999Citation ). Some of these works find that the nucleotide divergence among copia-like elements ranges from 0.4% to 57.8% in several grass species (Matsuoka and Tsunewaki 1999Citation ) and from 17% to 61% in rye (Pearce et al. 1997Citation ), and that the divergence at the amino acid level varies from 1% to 64% in rice (Wang et al. 1997Citation ) and from 33% to 58% in Lycopersicon chilense (Yañez et al. 1998Citation ). Nevertheless, a relatively high homogeneity has been found among members of the copia-like SIRE-1 family in the soybean genome (Laten, Majumdar, and Gaucher 1998Citation ). In A. thaliana, high divergence values between members of the Ta1Ta10 families was reported early (Voytas and Ausubel 1988Citation ; Voytas et al. 1990Citation ; Konieczny et al. 1991Citation ). This high divergence is confirmed by the fact that all of the elements we analyzed in our work are very different ones. It could be proposed that all of the elements belong to different families, and therefore in the A. thaliana genome there would exist at least 23 families of different copia-like elements. The phylogenetic analysis indicated strong phylogenetic relationships among them, and six major clades were defined, supported by high bootstrap values. These results strongly agree with recent data from Le et al. (2000), which mention the presence of 27 groups of copia-like elements in the A. thaliana genome. In contrast, Kapitonov and Jurka (1999)Citation reported the presence of 100 copia-like diverse families in the genome of this species. These strong differences in the estimation of family number are probably caused by the criteria used to establish families of copia-like elements.

Ten families, Ta1Ta10, were described early in the A. thaliana genome, and the authors considered that elements sharing >95% nucleotide identity could be considered members of the same family and that those sharing <85% identity could be regarded as members of different families (Voytas 1992Citation ). Assuming the same criterion, with the exception of AtRE-1 and AtRE-2, the rest of the elements could be considered members of separate families. In a similar way, we propose that there are at least 23 families of copia-like families in the A. thaliana genome. These 23 families are grouped into six major lineages which display high divergence values between them. Interestingly, divergences between members belonging to different lineages are higher than divergences with respect to copia-like elements from other plant species. For example, the copia I lineage includes the Hopscotch element from maize, copia V includes SIRE-1 from soybean, and copia VI includes Tnt1 from Nicotiana tabacum. It is remarkable that the copia I lineage presents elements from such distant species as the monocotyledonous Z. mays and the dicotyledonous A. thaliana. Our present data suggest the existence of these 23 copia-like families among plant genomes; these families would be grouped into a few main lineages. Four superfamilies, named families G1–G4 by Matsuoka and Tsunewaki (1999)Citation , have been described extended to grass species. We think that the lineages copia I–VI described here are related to the G1–G4 superfamilies described in the mentioned work. In fact, copia VI and G2 share some elements analyzed in both works. The presence of sharply diverged lineages between the copia-like fraction of several plant species has been established, for example, in Vicia, Solanum, Gossypium, and Lycopersycon (VanderWiel, Voytas, and Wendel 1993Citation ; Kumar et al. 1997Citation ; Yañez et al. 1998Citation ). However, to establish the phylogenetic relationships between these interspecific lineages of copia-like elements, a more extensive analysis must be done.

As we discussed above, most of the analyzed members of the corresponding families are potentially active. They tend to maintain the structural characteristics necessary for retrotransposition. The existence in a given superfamily of active elements present in the genome of phylogenetically very distant species suggests the presence of active copies of the ancient element in the ancestral species. Therefore, divergent changes operating in the elements of the different species would have been subjected to functional constraints. Finally, most of the 71 analyzed sequences were defective and probably located in the heterochromatin region. We estimate that approximately 80% of copia-like elements in the A. thaliana genome are defective, which is coincident with the copia-like fraction estimated for other plants, such as rye, with 96% defective elements (Pearce et al. 1997Citation ). The presence of such a large fraction of defective elements could constitute a good strategy for the plants to avoid the deleterious effects of the huge number of transposable elements that inhabit their genomes.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
This work was supported by grants BIO4-CT98-0549 (from EC) and BIO099-1320 (from DGESIC, Ministerio de Educación y Cultura) to M.P.-A. J.T. was supported by the program Reincorporación de Doctores a Grupos de Investigación of the Ministerio de Educación y Cultura.


    Footnotes
 
Pierre Capy, Reviewing Editor

1 Keywords: copia-like elements Arabidopsis thaliana, evolution of transposable elements Back

2 Address for correspondence and reprints: Rosa de Frutos, Departamento de Genética Facultad de Ciencias Biológicas, Dr. Moliner 50, 46100 Burjasot, Valencia, Spain. rosa.frutos{at}uv.es Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 

    Adé, J., and F. J. Belzile. 1999. Hairpin elements, the first family of foldback transposons (FTs) in Arabidopsis thaliana. Plant J. 19:591–597.

    Altschul, S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.[Abstract/Free Full Text]

    Brandes, A., J. S. Heslop-Harrison, A. Kamm, S. Kubis, R. L. Doudrick, and T. Schmidt. 1997. Comparative analysis of the chromosomal genomic organization of Ty1-copia-like retrotransposons in pteridophytes, gymnosperms and angiosperms. Plant Mol. Biol. 33:11–21.[ISI][Medline]

    Bureau, T. E., P. C. Ronald, and S. R. Wessler. 1996. A computer-based systematic survey reveals the predominance of small inverted-repeat elements in wild-type rice genes. Proc. Natl. Acad. Sci. USA 93:8524–9.

    Burge, C., and S. Karlin. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:78–94.[ISI][Medline]

    Casacuberta, E., J. M. Casacuberta, P. Puigdomènech, and A. Monfort. 1998. Presence of miniature inverted-repeat transposable elements (MITEs) in the genome of Arabidopsis thaliana: characteristics of the Emigrant family of elements. Plant J. 16:79–85.[ISI][Medline]

    Chye, M. L., K. Y. Cheung, and J. Xu. 1997. Characterization of TSCL, a nonviral retroposon from Arabidopsis thaliana. Plant Mol. Biol. 35:893–904.

    Cold Spring Harbor Laboratory, Washington University Genome Sequencing Center, and PE Biosystems Arabidopsis Sequencing Consortium. 2000. The complete sequence of a heterochromatic island from a higher eukaryote. Cell 100:377–386.

    Comella, P., H. J. Wu, M. Laudie, C. Berger, R. Cooke, M. Delseny, and F. Grellet. 1999. Fine sequence analysis of 60 kb around the Arabidopsis thaliana AtEm1 locus on chromosome III. Plant Mol. Biol. 41:687–700.[ISI][Medline]

    Copenhaver, G. P., K. Nickel, T. Kuromori et al. (14 co-authors). 1999. Genetic definition and sequence analysis of Arabidopsis centromeres. Science 286:2468–2474.

    European Union Arabidopsis Genome Sequencing Consortium, Cold Spring Harbor Laboratory, Washington University in St Louis, and PE Biosystems Arabidopsis Sequencing Consortium. 1999. Progress in sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana. Nature 402:769–777.

    European Union Chromosome 3 Arabidopsis Sequencing Consortium, Institute for Genomic Research, and Kazusa DNA Research Institute. 2000. Sequence and analysis of chromosome 3 of the plant Arabidopsis thaliana. Nature 408:820–823.

    Feschotte, C., and C. Mouchès. 2000. Evidence that a family of miniature inverted-repeat transposable elements (MITEs) from Arabidopsis thaliana genome has arisen from a pogo-like transposon. Mol. Biol. Evol. 17:730–737.[Abstract/Free Full Text]

    Flavell, A. J., E. Dunbar, R. Anderson, S. R. Pearce, R. Hartley, and A. Kumar. 1992. Ty1-copia group retrotransposons are ubiquitous and heterogeneous in higher plants. Nucleic Acids Res. 20:3639–3644.[Abstract]

    Flavell, A. J., S. R. Pearce, P. Heslop-Harrison, and A. Kumar. 1997. The evolution of Ty1-copia group retrotransposons in eukaryote genomes. Genetica 100:185–195.

    Flavell, A. J., D. B Smith, and A. Kumar. 1992. Extreme heterogeneity of Ty1-copia group retrotransposons in plants. Mol. Gen. Genet. 231:233–242.[ISI][Medline]

    Frank, M. J., D. Liu, Y. F. Tsay, C. Ustach, and N. M. Crawford. 1997. Tag1 is an autonomous transposable element that shows somatic excision in both Arabidopsis and tobacco. Plant Cell 9:1745–1756.

    Garber, K., I. Bilic, O. Pusch, J. Tohme, A. Bachmair, D. Schweizer, and V. Jantsch. 1999. The Tpv2 family of retrotransposons of Phaseolus vulgaris: structure, integration characteristics, and use for genotype classification. Plant Mol. Biol. 39:797–807.[ISI][Medline]

    Gauss, D. H., and M. Sprinzl. 1983. Compilation of sequences of tRNA genes. Nucleic Acids Res. 11:55–103.

    Grandbastien, M. A., A. Spielmann, and M. Caboche. 1989. Tnt1, a mobile retroviral-like transposable element of tobacco isolated by plant cell genetics. Nature 337:376–380.

    Henikoff, S., and L. Comai. 1998. A DNA methyltransferase homolog with a chromodomain exists in multiple polymorphic forms in Arabidopsis. Genetics 149:307–318.

    Henk, A. D., R. F. Warren, and R. W. Innes. 1999. A new Ac-like transposon of Arabidopsis is associated with a deletion of the RPS5 disease resistence gene. Genetics 151:1581–1589.

    Hervé, C., J. Serres, P. Dabos, H. Canut, A. Barre, P. Rougé, and B. Lescure. 1999. Characterization of the Arabidopsis lecRK-a genes: members of a superfamily encoding putative receptors with an extracellular domain homologous to legume lectins. Plant Mol. Biol. 39:671–682.[ISI][Medline]

    Heslop-Harrison, J. S., A. Brandes, S. Taketa et al. (15 co-authors). 1997. The chromosomal distributions of Ty1-copia group retrotransposable elements in higher plants and their implications for genome evolution. Genetica 100:197–204.

    Hirochika, H., K. Sugimoto, Y. Otsuki, H. Tsugawa, and M. Kanda. 1996. Retrotransposons of rice involved in mutations induced by tissue culture. Proc. Natl. Acad. Sci. USA 93:7783–7788.

    Janetzky, B., and L. Lehle. 1992. Ty4, a new retrotransposon from Saccharomyces cerevisiae, flanked by tau-elements. Biol. Chem. 267:19798–19805.[Abstract/Free Full Text]

    Johnson, M. S., M. A. McClure, D. F. Feng, J. Gray, and R. F. Doolittle. 1986. Computer analysis of retroviral pol genes: assignment of enzymatic functions to specific sequences and homologies with nonviral enzymes. Proc. Natl. Acad. Sci. USA 83:7648–7652.

    Kapitonov, V. V., and J. Jurka. 1999. Molecular paleontology of transposable elements from Arabidopsis thaliana. Genetica 107:27–37.

    Katsiotis, A., T. Schmidt, and J. S. Heslop-Harrison. 1996. Chromosomal and genomic organization of Ty1-copia-like retrotransposon sequences in the genus Avena. Genome 39:410–417.

    Kim, J. M., S. Vanguri, J. D. Boeke, A. Gabriel, and D. F. Voytas. 1998. Transposable elements and genome organization: a comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res. 8:464–478.[Abstract/Free Full Text]

    Klimyuk, V. I., and J. D. G. Jones. 1997. AtDMC1, the Arabidopsis homologue of the yeast DMC1 gene: characterization, transposon-induced allelic variation and meiosis-associated expression. Plant J. 11:1–14.[ISI][Medline]

    Knoop, V., M. Unseld, J. Marienfeld, P. Brandt, S. Sünkel, H. Ullrich, and A. Brennicke. 1996. copia, gypsy and LINE-like retrotransposon fragments in the mitochondrial genome of Arabidopsis thaliana. Genetics 142:579–585.

    Konieczny, A., D. F Voytas, M. P. Cummings, and F. M. Ausubel. 1991. A superfamily of Arabidopsis thaliana retrotransposons. Genetics 127:801–809.

    Kuipers, A. G., J. S. Heslop-Harrison, and E. Jacobsen. 1998. Characterisation and physical localisation of Ty1-copia-like retrotransposons in four Alstroemeria species. Genome 41:357–367.

    Kumar, A. 1996. The adventures of the Ty1-copia group of retrotransposons in plants. Trends Genet. 12:41–43.[ISI][Medline]

    Kumar, A., and J. L. Bennetzen. 1999. Plant retrotransposons. Annu. Rev. Genet. 33:479–532.[ISI][Medline]

    Kumar, A., S. R. Pearce, K. McLean, G. Harrison, J. S. Heslop-Harrison, R. Waugh, and A. J. Flavell. 1997. The Ty1-copia group of retrotransposons in plants: genomic organisation, evolution, and use as molecular markers. Genetica 100:205–217.

    Kumar, S., K. Tamura, and M. Nei. 1993. MEGA: molecular evolutionary genetic analysis. Version 1.0. Pennsylvania State University, University Park.

    Kuwahara, A., A. Kato, and Y. Komeda. 2000. Isolation and characterization of copia-type retrotransposons in Arabidopsis thaliana. Gene 244:127–136.

    Laten, H. M., A. Majumdar, and E. A. Gaucher. 1998. SIRE-1, a copia/Ty1-like retroelement from soybean, encodes a retroviral envelope-like protein. Proc. Natl. Acad. Sci. USA 95:6897–6902.

    Le, Q. H., S. Wright, Z. Yu, and T. Bureau. 2000. Transposon diversity in Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 20:7376–7381.

    Lin, X., S. Kaul, S. Rounsley et al. (37 co-authors). 1999. Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana. Nature 402:761–768.

    Linares, C., A. Serna, and A. Fominaya. 1999. Chromosomal organization of a sequence related to LTR-like elements of Ty1-copia retrotransposons in Avena species. Genome 42:706–713.

    Manninen, I., and A. H. Schulman. 1993. BARE-1, a copia-like retroelement in barley (Hordeum vulgare L.). Plant Mol. Biol. 22:829–846.[ISI][Medline]

    Marin, I., and C. Llorens. 2000. Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data. Mol. Biol. Evol. 17:1040–1049.[Abstract/Free Full Text]

    Matsuoka, Y., and K. Tsunewaki. 1996. Wheat retrotransposon families identified by reverse transcriptase domain analysis. Mol. Biol. Evol. 13:1384–1392.[Abstract/Free Full Text]

    ———. 1999. Evolutionary dynamics of Ty1-copia group retrotransposons in grass shown by reverse transcriptase domain analysis. Mol. Biol. Evol. 16:208–217.[Abstract]

    Meyerowitz, E. M. 1992. Introduction to the Arabidopsis genome. Pp. 100–118 in C. Koncz, N. Chua, and J. Schell, eds. Methods in Arabidopsis research. World Scientific Publishing, Singapore.

    Miller, J. T., F. Dong, S. A. Jackson, J. Song, and J. Jiang. 1998. Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics 150:1615–1623.

    Nei, M., and R. Chakraborty. 1976. Empirical relationship between the number of nucleotide substitutions and interspecific identity of amino acid sequences in some proteins. J. Mol. Evol. 26:313–323.

    Oosumi, T., B. Garlick, and W. R. Belknap. 1996. Identification of putative nonautonomous transposable elements associated with several transposon families in Caenorhabditis elegans. J. Mol. Evol. 43:11–18.[ISI][Medline]

    Pearce, S. R., G. Harrison, P. J. Heslop-Harrison, A. J. Flavell, and A. Kumar. 1997. Characterization and genomic organization of Ty1-copia group retrotransposons in rye (Secale cereale). Genome 40:617–625.

    Pearce, S. R., G. Harrison, D. Li, J. Heslop-Harrison, A. Kumar, and A. J. Flavell. 1996a. The Ty1-copia group retrotransposons in Vicia species: copy number, sequence heterogeneity and chromosomal localisation. Mol. Gen. Genet. 250:305–315.

    Pearce, S. R., U. Pich, G. Harrison, A. J. Flavell, J. S. Heslop-Harrison, I. Schubert, and A. Kumar. 1996b. The Ty1-copia group retrotransposons of Allium cepa are distributed throughout the chromosomes but are enriched in the terminal heterochromatin. Chromosome Res. 4:357–364.

    Pearl, L. H., and W. R. Taylor. 1987. A structural model for the retroviral proteases. Nature 329:351–354.

    Peleman, J., B. Cottyn, M. Van Montagu, and D. Inzé. 1991. Transient occurrence of extrachromosomal DNA of an Arabidopsis thaliana transposon-like elements, Tat1. Proc. Natl. Acad. Sci. USA 88:3518–3622.

    Pélisier, T., S. Tutois, J. M. Deragon, S. Tourmente, S. Genestier, and G. Picard. 1995. Athila, a new retroelement from Arabidopsis thaliana. Plant Mol. Biol. 29:441–452.

    Pélisier, T., S. Tutois, S. Tourmente, J. M. Deragon, and G. Picard. 1996. DNA regions flanking the major Arabidopsis thaliana satellite are principally enriched in Athila retroelement sequences. Genetica 97:141–151.

    Prats, A. C., L. Sarih, C. Gabus, S. Litvak, G. Keith, and J. L. Darlix. 1988. Small finger protein of avian and murine retroviruses has nucleic acid annealing activity and positions the replication primer tRNA onto genomic RNA. EMBO J. 7:1777–1783.[Abstract]

    Quigley, F., P. Dao, A. Cottet, and R. Mache. 1996. Sequence analysis of an 81 kb contig from Arabidopsis thaliana chromosome III. Nucleic Acids Research 24:4313–4318.

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.[Abstract]

    SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima, and J. L. Bennetzen. 1998. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20:43–45.[ISI][Medline]

    SanMiguel, P., A. Tikhonov, Y. K. Jin et al. (11 co-authors). 1996. Nested retro-transposons in the intergenic regions of the maize genome. Science 274:765–768.

    Staden, R. 1996. The Staden sequence analysis package. Mol. Biotechnol. 5:233–241.[ISI][Medline]

    Surzycki, S. A., and W. R. Belknap. 1999. Characterization of repetitive DNA elements in Arabidopsis. J. Mol. Evol. 48:684–691.[ISI][Medline]

    Swofford, D. L., and R. R. Selander. 1981. BIOSYS-1. A computer program for the analysis of allelic variation in genetics. University of Illinois, Urbana.

    Terryn, N., L. Heijnen, A. De Keyser et al. (21 co-authors). 1999. Evidence for an ancient chromosomal duplication in Arabidopsis thaliana by sequencing and analyzing a 400-kb contig at the APETALA2 locus on chromosome 4. FEBS Lett. 445:237–245.[ISI][Medline]

    Thompson, H. L., R. Schmidt, and C. Dean. 1996. Analysis of the occurrence and nature of repeated DNA in an 850 kb region of Arabidopsis thaliana chromosome 4. Plant Mol. Biol. 32:553–557.[ISI][Medline]

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.[Abstract]

    Tsay, Y. F., M. J. Frank, T. Page, C. Dean, and N. M. Crawford. 1993. Identification of a mobile endogenous transposon in Arabidopsis thaliana. Science 260:342–344.

    VanderWiel, P. L., D. F. Voytas, and J. F. Wendel. 1993. copia-like retrotransposable element evolution in diploid and polyploid cotton (Gossypium L.). J. Mol. Evol. 36:429–447.[ISI][Medline]

    Voytas, D. F. 1992. Arabidopsis and cotton (Gossypium) as models for studying copia-like retrotransposon evolution. Genetica 86:13–20.

    Voytas, D. F., and F. M. Ausubel. 1988. A copia-like transposable element family in Arabidopsis thaliana. Nature 336:242–244.

    Voytas, D. F., M. P. Cummings, A. Koniczny, F. M. Ausubel, and S. R. Rodermel. 1992. copia-like retrotransposons are ubiquitous among plants. Proc. Natl. Acad. Sci. USA 89:7124–7128.

    Voytas, D. F., A. Konieczny, M. P. Cummings, and F. M. Ausubel. 1990. The structure, distribution and evolution of the Ta1 retrotransposable element family of Arabidopsis thaliana. Genetics 26:713–721.

    Wang, S., N. Liu, K. Peng, and Q. Zhang. 1999. The distribution and copy number of copia-like retrotransposons in rice (Oryza sativa L.) and their implications in the organization and evolution of the rice genome. Proc. Natl. Acad. Sci. USA 96:6824–6828.

    Wang, S., Q. Zhang, P. J. Maughan, and M. A. Saghai Maroof. 1997. copia-like retrotransposons in rice: sequence heterogeneity, species distribution and chromosomal locations. Plant Mol. Biol. 33:1051–1058.[ISI][Medline]

    White, S. E., L. F. Habera, and S. R. Wessler. 1994. Retrotransposons in the flanking regions of normal plant genes: a role for copia-like elements in the evolution of gene structure and expression. Proc. Natl. Acad. Sci. USA 91:11792–11796.

    Wright, D. A., N. Ke, J. Smalle, B. M. Hauge, H. M. Goodman, and D. F. Voytas. 1996. Multiple non-LTR retrotransposons in the genome of Arabidopsis thaliana. Genetics 142:569–578.

    Wright, D. A., and D. F. Voytas. 1998. Potential retroviruses in pants: Tat1 is related to a group of Arabidopsis thaliana Ty3/gypsy retrotransposons that encode envelope-like proteins. Genetics 149:703–715.

    Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362.[Abstract]

    Yáñez, M., I. Verdugo, M. RodrÍguez, S. Prat, and S. Ruiz-Lara. 1998. Highly heterogeneous families of Ty1/copia retrotransposons in the Lycopersicon chilense genome. Gene 222:223–228.

Accepted for publication January 23, 2001.