The Evolutionary Origin and Genomic Organization of SINEs in Arabidopsis thaliana

Alain Lenoir1, Laurence Lavie1, José-Luis Prieto, Chantal Goubely, Jean-Charles Cote, Thierry Pélissier and Jean-Marc Deragon

Centre National de la Recherche Scientifique, Université Blaise Pascal Clermont-Ferrand II, Aubière cedex, France
Agriculture Canada, Research Centre, St-Jean-sur-Richelieu, Canada


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We have characterized the two families of SINE retroposons present in Arabidopsis thaliana. The origin, distribution, organization, and evolutionary history of RAthE1 and RAthE2 elements were studied and compared to the well-characterized SINE S1 element from Brassica. Our studies show that RAthE1, RAthE2, and S1 retroposons were generated independently from three different tRNAs. The RAthE1 and RAthE2 families are older than the S1 family and are present in all tested Cruciferae species. The evolutionary history of the RAthE1 family is unusual for SINEs. The 144 RAthE1 elements of the Arabidopsis genome cannot be classified in distinct subfamilies of different evolutionary ages as is the case for S1, RAthE2, and mammalian SINEs. Instead, most RAthE1 elements were probably derived steadily from a single source gene that was maintained intact and active for at least 12–20 Myr, a result suggesting that the RAthE1 source gene was under selection. The distribution of RAthE1 and RAthE2 elements on the Arabidopsis physical map was studied. We observed that, in contrast to other Arabidopsis transposable elements, SINEs are not concentrated in the heterochromatic regions. Instead, SINEs are grouped in the euchromatic chromosome territories several hundred kilobase pairs long. In these territories, SINE elements are closely associated with genes. A retroposition partnership between Arabidopsis SINEs and LINEs is proposed.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Transposable elements, in particular retroelements (class I transposable elements), are discrete components of the plant nuclear genome that can amplify and reinsert at other sites. Major classes of retroelements are recognized, including long interspersed nuclear elements (LINE), short interspersed nuclear elements (SINE), long terminal repeats (LTR)-retrotransposons, and retroviruses (Kumar and Bennetzen 1999Citation ; Schmidt 1999Citation ). Retroelements multiply by a reverse transcription of the RNA they encode. They are highly amplified in plant genomes, and (along with recognizable, degenerate derivatives) frequently represent half of the nuclear DNA (Pearce et al. 1996Citation ; SanMiguel et al. 1996Citation ). They have been found in all plants investigated so far and are very heterogeneous (Flavell, Smith, and Kumar 1992Citation ). Because retroelements are such an important component of the genome, and one that is mobile, it is likely that retroelements are a major, if not the major, source of plant biodiversity.

SINEs are short (less than 500 bp) nonautonomous (and noncoding) elements transcribed by the RNA polymerase III complex and found in a wide variety of eukaryotes (Deininger 1989Citation ; Okada and Ohshima 1995Citation ). SINEs are ancestrally related to tRNA with the exception of several families of mammalian SINEs related to 7SL RNA. LINEs are autonomous elements encoding a protein complex necessary for retroposition. LINEs are also widely distributed among eukaryotes and can be classified in a minimum of 12 distinct clades based on a phylogenetic analysis of their reverse transcriptase domain, each clade dating back to the Precambrian era (Malik, Burke, and Eickbush 1999Citation ; Malik and Eickbush 2000Citation ). Recently, the transposition of nonautonomous SINEs was suggested to depend on proteins encoded by autonomous LINE partners (Smit 1996Citation ; Boeke 1997Citation ; Jurka 1997Citation ; Okada et al. 1997Citation ). In that hypothesis, SINEs would have evolved as parasites of the LINE retroposition machinery.

SINE and LINE have been intensively studied in animals where they represent the major class of retroelements (for a recent review see Deragon and Capy 2000Citation ). In plants, SINE and LINE are usually less abundant than the LTR-retrotransposons and are much less studied. The best characterized plant SINE element is the SINE S1 from Brassica species (Deragon et al. 1994Citation ; Gilbert et al. 1997Citation ; Lenoir et al. 1997Citation ; Arnaud et al. 2000Citation ). Over 90% of the Arabidopsis thaliana genome is now available (The Arabidopsis Genome Initiative 2000Citation ) providing a unique opportunity to study all SINEs from a higher eukaryote genome. Earlier studies (Surzycki and Belknap 1999Citation ; Le et al. 2000Citation ) have revealed the presence of three SINE families in Arabidopsis (RAthE1 or SL1, SL2, and SL3). However, we found that the SL3 sequences were misclassified as a SINE and represented the 3' end of LINE elements (unpublished data). In this work, the origin, the distribution, the organization, and the evolution history of RAthE1 (SL1) and RAthE2 (SL2) elements were studied and compared to the well-characterized SINE S1 element from Brassica.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Phylogenetic Analyses
The SINE and tRNA sequences were aligned using Clustal W multiple-alignment program (Version 1.5, Thompson, Higgins, and Gibson 1994Citation ) with some manual refinements (elimination of unnecessary gaps at the beginning and at the end of the Clustal W alignment). Evolutionary distances were calculated using the Jin-Nei distance method of the Dnadist program (PHYLIP package Version 3.573c, Felsenstein 1989Citation ). The coefficient of variation of the Gamma distribution (to incorporate rate heterogeneity) and the expected transition to transversion ratio (t) were obtained by preanalyzing the data with the Tree-Puzzle program (Version 5.0, Strimmer and Von Haeseler 1997Citation ). Phylogenetic trees were inferred using the Neighbor-Joining method (PHYLIP package Version 3.573c, Felsenstein 1989Citation ). Consensus trees were inferred using the consense program (PHYLIP package). The significance of the various phylogenetic lineages was assessed by bootstrap analyses (Hedges 1992Citation ).

{chi}2 Analysis
We aligned the RAthE1, RAthE2, and LINE insertion sites. The target site duplications (TSDs) were adjusted to the left so that they all started at the same position, and to the right so that they all ended at the same position (as in Tatout, Lavie, and Deragon 1998Citation ). Only sites with perfect TSDs of a minimum of 9 bp were used (44 sites for RAthE1, 14 sites for RAthE2, and 30 sites for LINEs). The analysis on the dinucleotide distribution flanking SINEs were done essentially as described in Jurka (1997)Citation . Briefly, {chi}2 = S4i = 1 (Oi - Ei)2/Ei; where Oi is the dinucleotides occurrences and Ei is the total number of dinucleotides at a given position x base composition. We used a significance level of P < 0.01 for 3 degrees of freedom. The data for the S1 insertion sites are described in Tatout, Lavie, and Deragon (1998)Citation .

SINE and LINE Distribution
PCR primers corresponding to RathE1 (AY033656, RathE1.1 5'-AGTGTCGTTAGCTCAATTGG-3', RathE1.2 5'-GAYCCTAGGCGAAGCTTAG-3', RathE1.3 5'-CAGGAGGTTCTGGGCCG-3', and RathE1.4 5'-(T)15 GAATACCAGGAGGTTCTGG-3') and RathE2 (AY033702, RathE2.1 5'-AGCCCAAGCATCTGTGGTC-3', RathE2.2 5'-TCAGCCCCGGTAGAAACGC-3', and RathE2.3 5'-CCCTATCGGCTGATCCGG-3') consensus sequences were used to amplify SINEs in various Cruciferae species. After a first PCR reaction (using primers RathE1.1 and RathE1.4 or RathE2.1 and RathE2.3), a second PCR amplification with more internal primers (RathE1.2 and RathE1.3 or RathE2.1 and RathE2.2) was used to confirm the presence of the SINE. For the amplification of LINEs, degenerate PCR primers targeting the reverse transcriptase domain were designed (LINE1.1 5'-GARTTYTTYARRGVAGCTTGG-3', LINE 1.2 5'-TCGTCAGCRAARCATBARRTG-3', and LINE 1.3 5'-RTCAAAMGCTTTNSARAGATC-3'). These primers can amplify Arabidopsis LINEs from one of the two families described by Noma, Ohtsubo, and Ohtsubo (2000)Citation . After a first PCR reaction (using primers LINE1.1 and LINE1.3) a second PCR amplification, with a more internal primer (LINE 1.1 and LINE 1.2) was used to confirm the presence of LINEs. The species used for LINE and SINE amplification are representatives of the diversity in the Cruciferae family with at least one representative for each of the six major tribes (Hedge 1976Citation ). The species used (and their tribe classifications) are: Arabidopsis thaliana (Arabideae), Arabidopsis suecica (Arabideae), Cardaminopsis petraca (Arabideae), Barberea vulgaris (Arabideae), Aphragmus oxycarpus (Sissymbrieae), Sisymbrium irio (Sissymbrieae), Armoracia rusticana (Drabeae), Capsella bursa-pastoris (Lepidieae), Erysimum cheiranthoides (Hesperidae), Crambe hispanica (Brassiceae), Zilla spinosa (Brassiceae), Sinapis pubescence (Brassiceae), Brassica oleracea (Brassiceae). For the PCR reactions, 100 ng of DNA were subjected to 30 PCR cycles using the Robocycler 96 (Stratagene) with 1 µM each of sense and antisense primers and 0.2 U of Gold Star DNA polymerase (Eurogenetec) in a final volume of 25 µl. The annealing temperature used in all cases corresponds to the melting temperature of the primers (Tm) minus 4°C. The second PCR reaction was done under the same conditions, using 1 µl of the PCR product obtained after the first amplification.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
A BLAST search of the Arabidopsis database (March 2001), using the RAthE1 consensus sequence (Surzycki and Belknap 1999Citation ) or the SL2 sequences (Le et al. 2000Citation ) allowed the identification of 144 different copies of RAthE1 elements and 58 different copies of RAthE2 (SL2) elements. The characteristics of these elements are summarized in table 1 (RAthE1) and table 2 (RAthE2), respectively. The consensus RAthE1 sequence (GeneBank #AY033656) is 148-bp long and possesses consensus A and B boxes that are characteristic of promoters targeted by the RNA polymerase III. Most RAthE1 elements end with a poly (A) region and 56% of them are flanked by TSDs of at least 9 bp. Surprisingly, 39% of the RAthE1 copies are truncated, most of them in their 5' region, a rare situation for SINE elements (Deininger and Batzer 1995Citation ; Okada and Ohshima 1995Citation ). One-third of the truncated copies are flanked by TSDs, suggesting that, at least in these cases, the truncation arose during the transposition process and not as a secondary rearrangement. The consensus RAthE2 sequence (GenBank #AY033702) is 303-bp long and also possesses consensus A and B boxes. Most RAthE2 elements end with a poly (A) region and 43% of them are flanked by TSDs of at least 9 bp. The majority (57%) of RAthE2 elements are truncated, most of them in their 5' region. Again, one-third of the truncated copies are flanked by TSD.


View this table:
[in this window]
[in a new window]
 
Table 1 Copy Number, Structure, and Localization of the RAthEl Family

 

View this table:
[in this window]
[in a new window]
 
Table 2 Copy Number, Structure, and Localization of the RAthE2 Family

 
Origin and Evolution History of RAthE1 and RAthE2; Comparison with S1 from Brassica
Most SINE families are ancestrally derived from tRNA, with the exception of several mammalian SINEs that originate from 7SL RNA (Deininger and Batzer 1995Citation ; Okada and Ohshima 1995Citation ). The S1 from Brassica is a well-characterized tRNA-related SINE family that is used as a model to study retroposition in plants (Gilbert et al. 1997Citation ). A direct comparison of the S1, RAthE1, and RAthE2 consensus sequences revealed a low level of sequence identity (<50%), most of it in the pol III promoter region (not shown), suggesting that these three families arose independently. We have compared the 5' region of these three consensus SINE sequences with representative Arabidopsis tRNAs (fig. 1 ). We found that S1 and RAthE2 consensus sequences are related to proline and glycine tRNAs, respectively, whereas the RAthE1 consensus sequence is more related to cysteine tRNAs. These results confirm that RAthE1 and RAthE2 are, like S1, tRNA-related SINEs and that the three Cruciferae SINE families arose independently from three different tRNA precursors.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 1.—Evolutionary origin of the Cruciferae SINE elements. The consensus tRNA domain from each of the three Cruciferae SINE families (RAthE1, RAthE2, and S1) was aligned with representative Arabidopsis tRNAs. The phylogeny was obtained using the Neighbor-Joining method. Numbers above each node indicate bootstrap values as percentages out of 1,000 replicates. The names of the SINE families are shown in bold. The nucleotide divergence scale is indicated

 
The ages and the evolution history of S1, RAthE1, and RAthE2 elements were compared. Three different alignments composed of 46 S1, 46 RAthE1, or 32 RAthE2 full-length or near full-length elements were made. We first used these alignments to calculate the genetic distance of each sequence compared to the family consensus sequence. As the family consensus sequence approximates the sequence of the founder element (Jurka 1998Citation ), the genetic distance of SINEs from its consensus is related to the age of the family. We found that the RAthE1 and RAthE2 families (mean distance from the consensus: 0.1775 and 0.2080, respectively) are older than the S1 family (mean distance from the consensus: 0.1008). These results are supported by the fact that RAthE1 and RAthE2 are present in all Cruciferae species tested (not shown, see Materials and Methods for the list of species tested), whereas S1 is almost exclusively present in species of the Brassiceae tribe (Lenoir et al. 1997Citation ).

Using the Neighbor-Joining method and the genetic distance matrices obtained from the SINE alignments, we have constructed three phylogenies (fig. 2 ). In previous works, we have shown that S1 elements in Brassica species are generated from a small number of founder elements creating subfamilies of different evolutionary ages (Deragon et al. 1994Citation ; Lenoir et al. 1997Citation ). This pattern of evolution is very similar to the one described for mammalian SINEs (Deininger and Batzer 1995Citation ; Jurka 1995Citation ). New founder sequences are generated either by the mutation of old ones (serial formation) or from the mutation of previously inactive elements (parallel formation) (Deininger and Batzer 1995Citation ). The S1 phylogeny is composed of different clusters (fig. 2A ), each representing the activity of distinct founder sequences generating distinct subfamilies (Lenoir et al. 1997Citation ). As for S1, the RAthE2 family can be separated in several clusters, suggesting the presence of different subfamilies and the use of different founder sequences (fig. 2C ). The situation is very different for the RAthE1 family. Although this family is composed of young (distance from the consensus: 0.0591) and old (distance from the consensus: 0.4828) elements, they cannot be grouped into subfamilies of different evolutionary ages as indicated by the absence of significant bootstrap values on the phylogeny (fig. 2B ). Instead, all RAthE1 elements appear to have been generated by a single founder sequences that was maintained unchanged and active for at least 12–20 Myr, the age of the Cruciferae family (Cavell et al. 1998Citation ).



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 2.—Comparison of A, S1; B, RAthE1; and C, RAthE2 phylogenies. The phylogenies were obtained using the Neighbor-Joining method. Numbers above each node indicate bootstrap values as percentages out of 1,000 replicates. The S1 elements were named according to previous studies (Lenoir et al. 1997Citation ; Tatout, Lavie, and Deragon 1998Citation ; Tatout et al. 1999Citation ). The nucleotide divergence scale is indicated for each phylogeny

 
Target Site Specificity and Genomic Localization of the RAthE1 and RAthE2 Elements
We found that RathE1 and RathE2 insertion sites present a specific DNA signature. The first nick leading to RAthE1 and RAthE2 insertion is enriched in TA, CA, and TG dinucleotides followed by GA/AG/GG/AA dinucleotides (fig. 3 ). No sequence specificity is observed for the second nick (not shown). This signature is identical to the one described for S1 insertion sites (fig. 3 ) (Tatout, Lavie, and Deragon 1998Citation ). Therefore, the three SINE families share the same target site specificity at the DNA level. We next looked at the localization of the RAthE1 and RAthE2 elements, at the gene level (using 10 kb windows; see tables 1 and 2 ) and at the chromosome level (using 1 Mb windows; fig. 4 ). We found that, in contrast to other Arabidopsis transposable elements (The Arabidopsis Genome Initiative 2000Citation ), most SINEs are intercalated with genes. Forty percent of the RAthE1 elements and 65% of RAthE2 elements are either in an intron or within 500 bp (in 5' or 3') of an exon (see tables 1 and 2 ). For one RAthE1 and three RAthE2 loci, the SINE is predicted to be part of a gene exon. A small proportion of elements (5% for RAthE1 and 2% for RAthE2) is localized more than 5 kb away from a gene. In these rare cases, the SINEs were positioned in repeat-rich regions. At the chromosome level, SINEs are grouped in euchromatic chromosome territories several hundred thousand base pairs long (fig. 4 ). The highest SINE density is found in gene-rich euchromatic territories flanking heterochromatic regions such as centromeres, nucleolar organizers (NORs), and heterochromatic knobs (see fig. 4 ).



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 3.—Dinucleotide distributions surrounding RAthE1 (open circles), RAthE2 (black triangles), S1 (black circles), and LINE (open triangle) 5' nicking site. The {chi}2 values of the dinucleotide distribution for each position are presented. The horizontal line corresponds to significance levels of P < 0.01 for 3 degrees of freedom. {chi}2 analysis indicates that position 0 and +1 are significant for the LINE and the three SINE families. The SINE and LINE 5' nicking site is composed at the dinucleotide level of a peak of kinkable dinucleotide (TG/CA/TA) followed by a peak of dipurine (GG/AA/GA/AG)

 


View larger version (34K):
[in this window]
[in a new window]
 
Fig. 4.—Position on the Arabidopsis physical map of SINE elements. The five Arabidopsis chromosomes were separated in 1 Mb regions. Heterochromatic regions on the chromosomes are represented by black sectors. The gene density is symbolized by white (highest density) to dark gray (lowest density) sectors (from The Arabidopsis Genomic Initiative, 2000). Black dots represent RAthE1 elements and white dots represent RAthE2 elements. The NORs are located on the short arms of chromosome 2 and chromosome 4. Heterochromatic knobs are represented by stars (one is present on the short arm of chromosome 4, the other is located on the long arm of chromosome 5, The Arabidopsis Genomic Initiative, 2000Citation )

 
Which Partner for the Two Arabidopsis SINE Families?
SINEs are nonautonomous elements that must use in trans the enzymatic machinery from another (autonomous) retroelement. Recently, primary sequence homology was found between the 3' UTR of LINEs (mainly belonging to the CR1 and RTE clades, Malik, Burke, and Eickbush 1999Citation ) and the 3' region of several tRNA-related SINEs from different species (Okada et al. 1997Citation ). This shared 3' region between SINEs and LINEs was proposed to facilitate the trans-selection of SINE RNA by the LINE retroposition complex (Okada et al. 1997Citation ; Weiner 2000Citation ). In a previous study, 198 LINEs have been identified in the Arabidopsis (nuclear) genome, all belonging to the L1 clade (Noma, Ohtsubo, and Ohtsubo 2000Citation ). We searched (using the program BLAST) the Arabidopsis database for LINE elements that would share 3' sequence homologies with RathE1 or RathE2. No such element was found. We also made a consensus sequence of the 3' UTR region of Arabidopsis LINEs. This consensus sequence was compared to the consensus sequence of RathE1 and RathE2 elements. Again, no homology was detected. These results suggest that Arabidopsis LINEs are not involved in SINE retroposition or that the SINE-LINE relation in Arabidopsis is not based on a shared 3' end region. To test the possible implication of a LINE endonuclease (see Cost and Boeke 1998Citation for a functional description of a LINE endonuclease) in SINE integration, we analyzed the DNA signature of LINE integration sites as we did for SINEs (fig. 3 ). We observed that the DNA signature of LINE integration sites is identical to the one found for the RathE1 and RathE2 SINEs suggesting that a LINE endonuclease is implicated in SINE integration. Using degenerated primers targeting the reverse transcriptase of LINEs present in Arabidopsis (see Materials and Methods), we were able to amplify a LINE fragment in all Cruciferae species where we have detected RathE1 and RathE2 elements (not shown, see Materials and Methods for the list of the species tested). The common presence of SINEs and LINEs in all Cruciferae species tested and the common DNA signature of SINE and LINE integration suggest that the LINE integration machinery was implicated in SINE retroposition.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Most eukaryotic SINEs are derived ancestrally from tRNA (Okada and Ohshima 1995Citation ). This is the case of the S1 family from Brassica (Deragon et al. 1994Citation ). Using representative tRNAs from Arabidopsis, we were also able to retrace the origin of RAthE1 and RAthE2 to tRNAs (fig. 1 ). Unexpectedly, we observed that the three Cruciferae SINE families are related to different tRNAs and therefore emerged independently. The age of RAthE1 and RAthE2 families is similar and it is likely that these two families were active in parallel. The presence of young SINE copies in the RAthE1 and RAthE2 families indicate that they were active in recent past (i.e., a few million years) but we cannot determine if they are still active today. The Arabidopsis genome is known to possess a low proportion of transposable elements (approximately 10%, The Arabidopsis Genome Initiative 2000Citation ) compared with other plant species. The presence of two independent SINE families in a genome such as Arabidopsis and the presence of three independent SINE families in Cruciferae (speciation time: 12–20 Myr) suggest that SINEs are ubiquitous in plants.

The evolution history of the RAthE1 family is unusual for SINEs. Although the RAthE1 family is composed of young and old elements, they cannot be grouped into subfamilies as is the case of S1, RAthE2, and mammalian SINEs (Jurka 1995Citation ; Deininger and Batzer 1995Citation ). Instead, all RAthE1 elements appear to have been generated by a single founder sequence. This result strongly suggests that the RAthE1 source gene was (and may still be) under selection. Without selection, we expect the founder sequence to accumulate mutations (by genetic drift) and to propagate these mutations by retroposition creating diagnostic mutations that are characteristic of SINEs subfamilies (serial subfamily formation, Deininger and Batzer 1995Citation ). The absence of subfamilies suggests that the founder element was maintained intact by selection over the period of retroposition activity of the SINE family (at least 12–20 Myr). Furthermore, this also implies that the formation of new founding sequences from previously inactive elements (parallel subfamily formation, Deininger and Batzer 1995Citation ) was impaired. The RAthE1 (but not the RAthE2 or S1) consensus sequence presents two mismatches in the A box and two mismatches in the B box compared to the eukaryote consensus (not shown). This observation suggests that the RAthE1 internal promoter may not be fully autonomous. Transcription for this family may thus only be possible at the founder sequence locus because of the presence of external enhancers. The presence of external enhancers has been shown to be important for the activity of a putative S1 founder sequence (Arnaud et al. 2001Citation ). In that situation, the formation of new founding RAthE1 elements from retroposed element would be more difficult and the parallel formation of subfamilies impaired.

Chromosomal Localization of Arabidopsis SINEs
In general, Arabidopsis transposable elements have the propensity to occupy gene-poor region composed mainly of other repeats (The Arabidopsis Genome Initiative 2000Citation ). This is particularly the case for the class I retroelements that primarily occupy the centromeres and are excluded from regions flanking the rDNA repeats (The Arabidopsis Genome Initiative 2000Citation ). In contrast, we show here that RAthE1 and RAthE2 SINE retroelements are interspersed with genes and are enriched in chromosomal territories flanking the rDNA repeats. In fact, SINEs are more abundant in gene-rich euchromatic territories flanking several heterochromatic regions, including centromeres, NORs, and heterochromatic knobs (see fig. 4 ). SINEs are almost absent in centromeric and pericentromeric regions where other transposable elements are abundant. This chromosomal distribution of SINE is intriguing and might be accounted for, in part, by a target site specificity. We observed that Arabidopsis SINEs present the same insertion site specificity as S1 and mammalian SINEs (fig. 3 ). This specificity is not strict and includes a preference for certain dinucleotides, capable of forming kinks or bends, at the insertion sites (McNamara et al. 1990Citation ; Jurka et al. 1997Citation ; Tatout, Lavie, and Deragon. 1998Citation ). This result suggests that DNA secondary structure, more than the primary sequence itself, is important for SINE insertion (Tatout, Lavie, and Deragon 1998Citation ). We have also shown recently that SINEs are targeted to matrix attachment sites (MARs) in plant genomes (Tikhonov et al. 2001Citation ). MARs are known to be enriched in kinkable dinucleotides and to adopt unusual DNA secondary structures (Reviewed in Bode et al. 1998Citation ). Therefore, it is tempting to speculate that SINE-rich transition domains between heterochromatin and euchromatin can adopt unusual DNA secondary structures.

The chromosomal distribution of SINEs could also be explained in part by a selection process. In that case, the presence of SINEs in gene-rich regions may be explained by their short size. Although SINEs may have an impact on gene expression, these short elements may be better tolerated in gene-rich regions compared with longer ones. This cannot, however, explain the absence of SINEs in gene-rich chromosomal regions not flanking heterochromatic domains (see fig. 4 ).

SINE-LINE Relation in Arabidopsis thaliana
Our effort to identify an Arabidopsis LINE element that would share primary sequence homologies with RathE1 and RathE2 failed. Despite this lack of sequence identity we suggest that Arabidopsis LINEs are partners of RathE1 and RathE2 retroposition and that this partnership is similar to the one described for the mammalian L1 (LINE) and Alu (SINE) elements.

L1 is generally proposed as the LINE partner of the SINE Alu, despite a lack of primary sequence identity between the two sequences (Smit 1996Citation ; Boeke 1997Citation ; Jurka 1997Citation ). This hypothesis is supported by (1) a statistical analysis of the human LINE L1 and SINE Alu integration sites showing a common target site selection for the two elements (Feng et al. 1996Citation ; Jurka 1997Citation ), (2) the specificity of the L1 endonuclease that can explain L1 and Alu integration events (Feng et al. 1996Citation ; Cost and Boeke 1998Citation ), (3) the observation that L1 can generate processed pseudogenes in a trans-retroposition process analogous to Alu amplification (Esnault, Maestre, and Heidmann. 2000Citation ), and (4) the observation that retroviral reverse transcriptase, the only other cellular enzyme that could mobilize SINEs, is unable to generate the footprints observed following SINE retroposition (Dornburg and Temin 1990Citation ; Esnault, Maestre, and Heidmann 2000Citation ). The putative partnership of L1 and Alu has been described in a model called the Poly(A) Connection (Boeke 1997Citation ). In this model, the binding of nascent L1 proteins to the poly(A) tract of translated L1 mRNA would play an important role in the retroposition efficiency of this element (Moran et al. 1996Citation ; Boeke 1997Citation ). The success of Alu amplification could be explained if Alu RNA is also interacting with nascent LINE proteins through its own poly (A) tract. To do so, Alu RNA needs to be targeted to the ribosome during L1 mRNA translation. This targeting could be the result of the ability of Alu RNA to adopt a secondary structure similar to 7SL RNA (Sinnett et al. 1991Citation ) and to bind two cellular signal recognition particle (SRP) proteins (Chang and Maraia 1993Citation ). As the SRP complex is bound to ribosomes near the exit hole where nascent proteins emerged, Alu RNA would be suitably positioned to interact with nascent L1 proteins.

As for the mammalian L1 element, all Arabidopsis LINEs belong to the L1 clade and end with poly(A) tracts (Noma, Ohtsubo, and Ohtsubo 2000Citation ). LINEs belonging to the L1 clade were never implicated in a SINE partnership based on primary sequence homologies (Okada et al. 1997Citation ). This suggests that LINEs from the L1 clade, in opposition to LINEs from other clades, do not use a sequence-specific selection process for trans-retroposition. RathE1 and RathE2 SINEs also end with long poly(A) tracts. We observed in this work that Arabidopsis SINEs and LINEs present a common target site DNA signature (fig. 3 ) and a common distribution among Cruciferae species. It is therefore possible that the SINE-LINE relationship in Arabidopsis is not based on primary sequence identity but on the presence of a common poly (A) region. If true, this would constitute the first nonmammalian poly(A) connection. In this model, the tRNA-like structure of the SINE RNA would be important to target the SINE RNA to the ribosome.

GenBank references: RAthE1 (AY033656-AY033701), RAthE2 (AY033702-AY033733, AY044647).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
This work was supported by the Centre National de la Recherche Scientifique (UMR 6547 Biomove and GDR2157) and by Université Blaise Pascal.


    Footnotes
 
Pierre Capy, Reviewing Editor

Keywords: repetitive sequence retrotransposon Alu element transposable element retroposition Arabidopsis thaliana Back

Address for correspondence and reprints: Jean-Marc Deragon, Centre National de la Recherche Scientifique, UMR 6547 Biomove and GDR 2157, Université Blaise Pascal Clermont-Ferrand II, 24 Avenue des Landais, 63177, Aubière cedex, France. j-marc.deragon{at}univ-bpclermont.fr . Back

1 The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint first authors. Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Arnaud P., C. Goubely, T. Pélissier, J. M. Deragon, 2000 SINE retroposons can be used in vivo as nucleation centers for de novo methylation Mol. Cell. Biol 20:3434-3441[Abstract/Free Full Text]

    Arnaud P., Y. Yukawa, L. Lavie, T. Pélissier, M. Sugiura, J. M. Deragon, 2001 In vitro analysis of the SINE S1 POL III promoter from Brassica; impact of methylation and influence of external sequences Plant J 26:295-306[ISI][Medline]

    Bode J., J. Bartsch, C. Mielke, D. Schubler, J. Seibler, C. Benham, T. Boulikas, M. C. Iber, 1998 Transcription-promoting genomic sites in mammalia: their elucidation and architectural principles Gene Ther. Mol. Biol 1:551-580

    Boeke J. D., 1997 LINES and Alus-the polyA connection Nat. Genet 16:6-7[ISI][Medline]

    Cavell A., D. Lydiate, I. Parkin, C. Dean, M. Trick, 1998 A 30 centimorgan segment of Arabidopsis thaliana chromosome 4 has six collinear homologues within the Brassica napus genome Genome 41:62-69[ISI][Medline]

    Chang D. Y., R. J. Maraia, 1993 A cellular protein binds B1 and Alu small cytoplasmic RNAs in vitro J. Biol. Chem 268:6423-6428[Abstract/Free Full Text]

    Cost G. J., J. D. Boeke, 1998 Targeting of human retrotransposon integration is directed by the specificity of the L1 endonuclease for regions of unusual DNA structure Biochemistry 37:18081-18093[ISI][Medline]

    Deininger P. L., 1989 SINEs: Short interspersed repeated DNA elements in higher eucaryotes Pp. 619–636 in D. E. Berg and M. M. Howe, eds. Mobile DNA. American Society for Microbiology, Washington, D.C

    Deininger P. L., M. A. Batzer, 1995 SINE master genes and population biology Pp. 43–60 in R. J. Maraia, ed. The impact of short interspersed elements (SINEs) on the host genome. R. G. Landes Company, Springer, Austin, Tex

    Deragon J. M., P. Capy, 2000 Impact of transposable elements on the human genome Ann. Med 32:264-273[ISI][Medline]

    Deragon J. M., B. S. Landry, T. Pélissier, S. Tutois, S. Tourmente, G. Picard, 1994 An analysis of retroposition in plants based on a family of SINEs from Brassica napus J. Mol. Evol 39:378-386[ISI][Medline]

    Dornburg R., H. M. Temin, 1990 cDNA genes formed after infection with retroviral vector particles lack the hallmarks of natural processed pseudogenes Mol. Cell. Biol 10:68-74[ISI][Medline]

    Esnault C., J. Maestre, T. Heidmann, 2000 Human LINE retrotransposons generate processed pseudogenes Nat. Genet 24:363-367[ISI][Medline]

    Felsenstein J., 1989 PHYLIP (phylogeny inference package) Cladistics 5:164-166

    Feng Q., J. V. Moran, H. H. Kazazian, J. D. Boeke, 1996 Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition Cell 87:905-916[ISI][Medline]

    Flavell A. J., D. B. Smith, A. Kumar, 1992 Extreme heterogeneity of Ty1-copia group retrotransposons in plants Mol. Gen. Genet 231:233-242[ISI][Medline]

    Gilbert N., P. Arnaud, A. Lenoir, S. I. Warwick, G. Picard, J. M. Deragon, 1997 Plant S1 SINEs as a model to study retroposition Genetica 100:155-160[ISI][Medline]

    Hedge I. C., 1976 A systematic and geographical survey of the old world Cruciferae Pp. 1–45 in J. G. Vaughn, A. J. Macleod, and B. M. G. Jones, eds. The biology and chemistry of the Cruciferae. Academic Press, London

    Hedges S. B., 1992 The number of replications needed for accurate estimation of the bootstrap P value in phylogenetic studies Mol. Biol. Evol 9:366-369[Free Full Text]

    Jurka J., 1995 Origin and evolution of Alu repetitive element Pp. 25–41 in R. J. Maraia, ed. The impact of short interspersed elements (SINEs) on the host genome. R. G. Landes Company, Springer, Austin, Tex

    ———. 1997 Sequence patterns indicate an enzymatic involvement in integation of mammalian retroposons Proc. Natl. Acad. Sci. USA 94:1872-1877[Abstract/Free Full Text]

    ———. 1998 Repeats in genomic DNA: mining and meaning Curr. Opin. Struct. Biol 8:333-337[ISI][Medline]

    Kumar A., J. F. Bennetzen, 1999 Plant retrotransposons Ann. Rev. Genet 33:497-532

    Le Q. H., S. Wright, Z. Yu, T. Bureau, 2000 Transposon diversity in Arabidopsis thaliana Proc. Natl. Acad. Sci. USA 97:7376-7381[Abstract/Free Full Text]

    Lenoir A., B. Cournoyer, S. I. Warwick, G. Picard, J. M. Deragon, 1997 Evolution of SINE S1 retroposons in Cruciferae plant species Mol. Biol. Evol 14:934-941[Abstract]

    Malik H. M., W. D. Burke, T. H. Eickbush, 1999 The age and evolution of non-LTR retrotranposable elements Mol. Biol. Evol 16:793-805[Abstract]

    Malik H. M., T. H. Eickbush, 2000 NeSL-1, an ancient lineage of site-specific non-LTR retrotransposons from Caenorhabditis elegans Genetics 154:193-203[Abstract/Free Full Text]

    McNamara P. T., A. Bolshoy, E. N. Trifonov, R. E. Harrington, 1990 Sequence-dependent kinks induced in curved DNA J. Biomol. Struct. Dyn 8:529-538[ISI][Medline]

    Moran J. V., S. E. Holmes, T. P. Naas, R. J. Deberardinis, J. D. Boeke, H. H. Kazazian, 1996 High frequency retrotransposition in cultured mammalian cells Cell 87:917-927[ISI][Medline]

    Noma K., H. Ohtsubo, E. Ohtsubo, 2000 ATLN Elements, LINEs from Arabidopsis thaliana: identification and characterization DNA Res 7:291-303[ISI][Medline]

    Okada N., M. Hamada, I. Ogiwara, K. Ohshima, 1997 SINEs and LINEs share common 3' sequences: a review Gene 205:229-243[ISI][Medline]

    Okada N., K. Ohshima, 1995 Evolution of t-RNA–derived SINEs Pp. 61–80 in R. J. Maraia, ed. The impact of short interspersed elements (SINEs) on the host genome. R. G. Landes Company, Springer, Austin, Tex

    Pearce S. R., G. Harrison, D. Li, J. S. Heslop-Harrison, A. Kumar, A. J. Flavell, 1996 The Ty1-copia group retrotransposons in Vicia species: copy number, sequence heterogeneity and chromosomal localisation Mol. Gen. Genet 250:305-315[ISI][Medline]

    SanMiguel P., A. Tikhonov, Y. K. Jin, et al. (11 co-authors) 1996 Nested retrotransposons in the intergenic regions of the maize genome Science 274:765-768[Abstract/Free Full Text]

    Schmidt T., 1999 LINEs, SINEs and repetitive DNA: non-LTR retrotransposons in plant genomes Plant Mol. Biol 40:903-910[ISI][Medline]

    Sinnett D., C. Richer, J. M. Deragon, D. Labuda, 1991 Alu RNA secondary structure consists of two independent 7SL RNA-like folding units J. Biol. Chem 266:8675-8678[Abstract/Free Full Text]

    Smit A. F. A., 1996 The origin of interspersed repeats in the human genome Curr. Opin. Genet. Dev 6:743-748[ISI][Medline]

    Strimmer K., A. Von Haeseler, 1997 Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment Proc. Natl. Acad. Sci. USA 94:6815-6819[Abstract/Free Full Text]

    Surzycki S. A., W. R. Belknap, 1999 Characterization of repetitive DNA elements in Arabidopsis J. Mol. Evol 48:684-691[ISI][Medline]

    Tatout C., L. Lavie, J. M. Deragon, 1998 Similar target site selection occurs in integration of plant and mammalian retroposons J. Mol. Evol 47:463-470[ISI][Medline]

    Tatout C., S. I. Warwick, A. Lenoir, J. M. Deragon, 1999 SINE insertions as clade markers for wild Cruciferae species Mol. Biol. Evol 16:1614-1621[Free Full Text]

    The Arabidopsis Genome Initiative. 2000 Analysis of the genome sequence of the flowering plant Arabidopsis thaliana Nature 408:796-815[ISI][Medline]

    Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalities and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]

    Tikhonov A. P., C. Tatout, L. Lavie, J. L. Bennetzen, Z. Avramova, J. M. Deragon, 2001 Matrix-attachment regions (MARs) as target sites for SINE integration in Brassica genomes Chromosome Res 9:325-337[ISI][Medline]

    Weiner A. M., 2000 Do all SINEs lead to LINEs? Nat. Genet 24:332-333[ISI][Medline]

Accepted for publication August 27, 2001.