Centre National de la Recherche Scientifique, Université Blaise Pascal Clermont-Ferrand II, Aubière cedex, France
Agriculture Canada, Research Centre, St-Jean-sur-Richelieu, Canada
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
SINEs are short (less than 500 bp) nonautonomous (and noncoding) elements transcribed by the RNA polymerase III complex and found in a wide variety of eukaryotes (Deininger 1989
; Okada and Ohshima 1995
). SINEs are ancestrally related to tRNA with the exception of several families of mammalian SINEs related to 7SL RNA. LINEs are autonomous elements encoding a protein complex necessary for retroposition. LINEs are also widely distributed among eukaryotes and can be classified in a minimum of 12 distinct clades based on a phylogenetic analysis of their reverse transcriptase domain, each clade dating back to the Precambrian era (Malik, Burke, and Eickbush 1999
; Malik and Eickbush 2000
). Recently, the transposition of nonautonomous SINEs was suggested to depend on proteins encoded by autonomous LINE partners (Smit 1996
; Boeke 1997
; Jurka 1997
; Okada et al. 1997
). In that hypothesis, SINEs would have evolved as parasites of the LINE retroposition machinery.
SINE and LINE have been intensively studied in animals where they represent the major class of retroelements (for a recent review see Deragon and Capy 2000
). In plants, SINE and LINE are usually less abundant than the LTR-retrotransposons and are much less studied. The best characterized plant SINE element is the SINE S1 from Brassica species (Deragon et al. 1994
; Gilbert et al. 1997
; Lenoir et al. 1997
; Arnaud et al. 2000
). Over 90% of the Arabidopsis thaliana genome is now available (The Arabidopsis Genome Initiative 2000
) providing a unique opportunity to study all SINEs from a higher eukaryote genome. Earlier studies (Surzycki and Belknap 1999
; Le et al. 2000
) have revealed the presence of three SINE families in Arabidopsis (RAthE1 or SL1, SL2, and SL3). However, we found that the SL3 sequences were misclassified as a SINE and represented the 3' end of LINE elements (unpublished data). In this work, the origin, the distribution, the organization, and the evolution history of RAthE1 (SL1) and RAthE2 (SL2) elements were studied and compared to the well-characterized SINE S1 element from Brassica.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
2 Analysis
We aligned the RAthE1, RAthE2, and LINE insertion sites. The target site duplications (TSDs) were adjusted to the left so that they all started at the same position, and to the right so that they all ended at the same position (as in Tatout, Lavie, and Deragon 1998
). Only sites with perfect TSDs of a minimum of 9 bp were used (44 sites for RAthE1, 14 sites for RAthE2, and 30 sites for LINEs). The analysis on the dinucleotide distribution flanking SINEs were done essentially as described in Jurka (1997)
. Briefly,
2 = S4i = 1 (Oi - Ei)2/Ei; where Oi is the dinucleotides occurrences and Ei is the total number of dinucleotides at a given position x base composition. We used a significance level of P < 0.01 for 3 degrees of freedom. The data for the S1 insertion sites are described in Tatout, Lavie, and Deragon (1998)
.
SINE and LINE Distribution
PCR primers corresponding to RathE1 (AY033656, RathE1.1 5'-AGTGTCGTTAGCTCAATTGG-3', RathE1.2 5'-GAYCCTAGGCGAAGCTTAG-3', RathE1.3 5'-CAGGAGGTTCTGGGCCG-3', and RathE1.4 5'-(T)15 GAATACCAGGAGGTTCTGG-3') and RathE2 (AY033702, RathE2.1 5'-AGCCCAAGCATCTGTGGTC-3', RathE2.2 5'-TCAGCCCCGGTAGAAACGC-3', and RathE2.3 5'-CCCTATCGGCTGATCCGG-3') consensus sequences were used to amplify SINEs in various Cruciferae species. After a first PCR reaction (using primers RathE1.1 and RathE1.4 or RathE2.1 and RathE2.3), a second PCR amplification with more internal primers (RathE1.2 and RathE1.3 or RathE2.1 and RathE2.2) was used to confirm the presence of the SINE. For the amplification of LINEs, degenerate PCR primers targeting the reverse transcriptase domain were designed (LINE1.1 5'-GARTTYTTYARRGVAGCTTGG-3', LINE 1.2 5'-TCGTCAGCRAARCATBARRTG-3', and LINE 1.3 5'-RTCAAAMGCTTTNSARAGATC-3'). These primers can amplify Arabidopsis LINEs from one of the two families described by Noma, Ohtsubo, and Ohtsubo (2000)
. After a first PCR reaction (using primers LINE1.1 and LINE1.3) a second PCR amplification, with a more internal primer (LINE 1.1 and LINE 1.2) was used to confirm the presence of LINEs. The species used for LINE and SINE amplification are representatives of the diversity in the Cruciferae family with at least one representative for each of the six major tribes (Hedge 1976
). The species used (and their tribe classifications) are: Arabidopsis thaliana (Arabideae), Arabidopsis suecica (Arabideae), Cardaminopsis petraca (Arabideae), Barberea vulgaris (Arabideae), Aphragmus oxycarpus (Sissymbrieae), Sisymbrium irio (Sissymbrieae), Armoracia rusticana (Drabeae), Capsella bursa-pastoris (Lepidieae), Erysimum cheiranthoides (Hesperidae), Crambe hispanica (Brassiceae), Zilla spinosa (Brassiceae), Sinapis pubescence (Brassiceae), Brassica oleracea (Brassiceae). For the PCR reactions, 100 ng of DNA were subjected to 30 PCR cycles using the Robocycler 96 (Stratagene) with 1 µM each of sense and antisense primers and 0.2 U of Gold Star DNA polymerase (Eurogenetec) in a final volume of 25 µl. The annealing temperature used in all cases corresponds to the melting temperature of the primers (Tm) minus 4°C. The second PCR reaction was done under the same conditions, using 1 µl of the PCR product obtained after the first amplification.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
Using the Neighbor-Joining method and the genetic distance matrices obtained from the SINE alignments, we have constructed three phylogenies (fig. 2
). In previous works, we have shown that S1 elements in Brassica species are generated from a small number of founder elements creating subfamilies of different evolutionary ages (Deragon et al. 1994
; Lenoir et al. 1997
). This pattern of evolution is very similar to the one described for mammalian SINEs (Deininger and Batzer 1995
; Jurka 1995
). New founder sequences are generated either by the mutation of old ones (serial formation) or from the mutation of previously inactive elements (parallel formation) (Deininger and Batzer 1995
). The S1 phylogeny is composed of different clusters (fig. 2A
), each representing the activity of distinct founder sequences generating distinct subfamilies (Lenoir et al. 1997
). As for S1, the RAthE2 family can be separated in several clusters, suggesting the presence of different subfamilies and the use of different founder sequences (fig. 2C
). The situation is very different for the RAthE1 family. Although this family is composed of young (distance from the consensus: 0.0591) and old (distance from the consensus: 0.4828) elements, they cannot be grouped into subfamilies of different evolutionary ages as indicated by the absence of significant bootstrap values on the phylogeny (fig. 2B
). Instead, all RAthE1 elements appear to have been generated by a single founder sequences that was maintained unchanged and active for at least 1220 Myr, the age of the Cruciferae family (Cavell et al. 1998
).
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The evolution history of the RAthE1 family is unusual for SINEs. Although the RAthE1 family is composed of young and old elements, they cannot be grouped into subfamilies as is the case of S1, RAthE2, and mammalian SINEs (Jurka 1995
; Deininger and Batzer 1995
). Instead, all RAthE1 elements appear to have been generated by a single founder sequence. This result strongly suggests that the RAthE1 source gene was (and may still be) under selection. Without selection, we expect the founder sequence to accumulate mutations (by genetic drift) and to propagate these mutations by retroposition creating diagnostic mutations that are characteristic of SINEs subfamilies (serial subfamily formation, Deininger and Batzer 1995
). The absence of subfamilies suggests that the founder element was maintained intact by selection over the period of retroposition activity of the SINE family (at least 1220 Myr). Furthermore, this also implies that the formation of new founding sequences from previously inactive elements (parallel subfamily formation, Deininger and Batzer 1995
) was impaired. The RAthE1 (but not the RAthE2 or S1) consensus sequence presents two mismatches in the A box and two mismatches in the B box compared to the eukaryote consensus (not shown). This observation suggests that the RAthE1 internal promoter may not be fully autonomous. Transcription for this family may thus only be possible at the founder sequence locus because of the presence of external enhancers. The presence of external enhancers has been shown to be important for the activity of a putative S1 founder sequence (Arnaud et al. 2001
). In that situation, the formation of new founding RAthE1 elements from retroposed element would be more difficult and the parallel formation of subfamilies impaired.
Chromosomal Localization of Arabidopsis SINEs
In general, Arabidopsis transposable elements have the propensity to occupy gene-poor region composed mainly of other repeats (The Arabidopsis Genome Initiative 2000
). This is particularly the case for the class I retroelements that primarily occupy the centromeres and are excluded from regions flanking the rDNA repeats (The Arabidopsis Genome Initiative 2000
). In contrast, we show here that RAthE1 and RAthE2 SINE retroelements are interspersed with genes and are enriched in chromosomal territories flanking the rDNA repeats. In fact, SINEs are more abundant in gene-rich euchromatic territories flanking several heterochromatic regions, including centromeres, NORs, and heterochromatic knobs (see fig. 4
). SINEs are almost absent in centromeric and pericentromeric regions where other transposable elements are abundant. This chromosomal distribution of SINE is intriguing and might be accounted for, in part, by a target site specificity. We observed that Arabidopsis SINEs present the same insertion site specificity as S1 and mammalian SINEs (fig. 3
). This specificity is not strict and includes a preference for certain dinucleotides, capable of forming kinks or bends, at the insertion sites (McNamara et al. 1990
; Jurka et al. 1997
; Tatout, Lavie, and Deragon. 1998
). This result suggests that DNA secondary structure, more than the primary sequence itself, is important for SINE insertion (Tatout, Lavie, and Deragon 1998
). We have also shown recently that SINEs are targeted to matrix attachment sites (MARs) in plant genomes (Tikhonov et al. 2001
). MARs are known to be enriched in kinkable dinucleotides and to adopt unusual DNA secondary structures (Reviewed in Bode et al. 1998
). Therefore, it is tempting to speculate that SINE-rich transition domains between heterochromatin and euchromatin can adopt unusual DNA secondary structures.
The chromosomal distribution of SINEs could also be explained in part by a selection process. In that case, the presence of SINEs in gene-rich regions may be explained by their short size. Although SINEs may have an impact on gene expression, these short elements may be better tolerated in gene-rich regions compared with longer ones. This cannot, however, explain the absence of SINEs in gene-rich chromosomal regions not flanking heterochromatic domains (see fig. 4 ).
SINE-LINE Relation in Arabidopsis thaliana
Our effort to identify an Arabidopsis LINE element that would share primary sequence homologies with RathE1 and RathE2 failed. Despite this lack of sequence identity we suggest that Arabidopsis LINEs are partners of RathE1 and RathE2 retroposition and that this partnership is similar to the one described for the mammalian L1 (LINE) and Alu (SINE) elements.
L1 is generally proposed as the LINE partner of the SINE Alu, despite a lack of primary sequence identity between the two sequences (Smit 1996
; Boeke 1997
; Jurka 1997
). This hypothesis is supported by (1) a statistical analysis of the human LINE L1 and SINE Alu integration sites showing a common target site selection for the two elements (Feng et al. 1996
; Jurka 1997
), (2) the specificity of the L1 endonuclease that can explain L1 and Alu integration events (Feng et al. 1996
; Cost and Boeke 1998
), (3) the observation that L1 can generate processed pseudogenes in a trans-retroposition process analogous to Alu amplification (Esnault, Maestre, and Heidmann. 2000
), and (4) the observation that retroviral reverse transcriptase, the only other cellular enzyme that could mobilize SINEs, is unable to generate the footprints observed following SINE retroposition (Dornburg and Temin 1990
; Esnault, Maestre, and Heidmann 2000
). The putative partnership of L1 and Alu has been described in a model called the Poly(A) Connection (Boeke 1997
). In this model, the binding of nascent L1 proteins to the poly(A) tract of translated L1 mRNA would play an important role in the retroposition efficiency of this element (Moran et al. 1996
; Boeke 1997
). The success of Alu amplification could be explained if Alu RNA is also interacting with nascent LINE proteins through its own poly (A) tract. To do so, Alu RNA needs to be targeted to the ribosome during L1 mRNA translation. This targeting could be the result of the ability of Alu RNA to adopt a secondary structure similar to 7SL RNA (Sinnett et al. 1991
) and to bind two cellular signal recognition particle (SRP) proteins (Chang and Maraia 1993
). As the SRP complex is bound to ribosomes near the exit hole where nascent proteins emerged, Alu RNA would be suitably positioned to interact with nascent L1 proteins.
As for the mammalian L1 element, all Arabidopsis LINEs belong to the L1 clade and end with poly(A) tracts (Noma, Ohtsubo, and Ohtsubo 2000
). LINEs belonging to the L1 clade were never implicated in a SINE partnership based on primary sequence homologies (Okada et al. 1997
). This suggests that LINEs from the L1 clade, in opposition to LINEs from other clades, do not use a sequence-specific selection process for trans-retroposition. RathE1 and RathE2 SINEs also end with long poly(A) tracts. We observed in this work that Arabidopsis SINEs and LINEs present a common target site DNA signature (fig. 3
) and a common distribution among Cruciferae species. It is therefore possible that the SINE-LINE relationship in Arabidopsis is not based on primary sequence identity but on the presence of a common poly (A) region. If true, this would constitute the first nonmammalian poly(A) connection. In this model, the tRNA-like structure of the SINE RNA would be important to target the SINE RNA to the ribosome.
GenBank references: RAthE1 (AY033656-AY033701), RAthE2 (AY033702-AY033733, AY044647).
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: repetitive sequence
retrotransposon
Alu element
transposable element
retroposition
Arabidopsis thaliana
Address for correspondence and reprints: Jean-Marc Deragon, Centre National de la Recherche Scientifique, UMR 6547 Biomove and GDR 2157, Université Blaise Pascal Clermont-Ferrand II, 24 Avenue des Landais, 63177, Aubière cedex, France. j-marc.deragon{at}univ-bpclermont.fr
.
1 The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint first authors.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Arnaud P., C. Goubely, T. Pélissier, J. M. Deragon, 2000 SINE retroposons can be used in vivo as nucleation centers for de novo methylation Mol. Cell. Biol 20:3434-3441
Arnaud P., Y. Yukawa, L. Lavie, T. Pélissier, M. Sugiura, J. M. Deragon, 2001 In vitro analysis of the SINE S1 POL III promoter from Brassica; impact of methylation and influence of external sequences Plant J 26:295-306[ISI][Medline]
Bode J., J. Bartsch, C. Mielke, D. Schubler, J. Seibler, C. Benham, T. Boulikas, M. C. Iber, 1998 Transcription-promoting genomic sites in mammalia: their elucidation and architectural principles Gene Ther. Mol. Biol 1:551-580
Boeke J. D., 1997 LINES and Alus-the polyA connection Nat. Genet 16:6-7[ISI][Medline]
Cavell A., D. Lydiate, I. Parkin, C. Dean, M. Trick, 1998 A 30 centimorgan segment of Arabidopsis thaliana chromosome 4 has six collinear homologues within the Brassica napus genome Genome 41:62-69[ISI][Medline]
Chang D. Y., R. J. Maraia, 1993 A cellular protein binds B1 and Alu small cytoplasmic RNAs in vitro J. Biol. Chem 268:6423-6428
Cost G. J., J. D. Boeke, 1998 Targeting of human retrotransposon integration is directed by the specificity of the L1 endonuclease for regions of unusual DNA structure Biochemistry 37:18081-18093[ISI][Medline]
Deininger P. L., 1989 SINEs: Short interspersed repeated DNA elements in higher eucaryotes Pp. 619636 in D. E. Berg and M. M. Howe, eds. Mobile DNA. American Society for Microbiology, Washington, D.C
Deininger P. L., M. A. Batzer, 1995 SINE master genes and population biology Pp. 4360 in R. J. Maraia, ed. The impact of short interspersed elements (SINEs) on the host genome. R. G. Landes Company, Springer, Austin, Tex
Deragon J. M., P. Capy, 2000 Impact of transposable elements on the human genome Ann. Med 32:264-273[ISI][Medline]
Deragon J. M., B. S. Landry, T. Pélissier, S. Tutois, S. Tourmente, G. Picard, 1994 An analysis of retroposition in plants based on a family of SINEs from Brassica napus J. Mol. Evol 39:378-386[ISI][Medline]
Dornburg R., H. M. Temin, 1990 cDNA genes formed after infection with retroviral vector particles lack the hallmarks of natural processed pseudogenes Mol. Cell. Biol 10:68-74[ISI][Medline]
Esnault C., J. Maestre, T. Heidmann, 2000 Human LINE retrotransposons generate processed pseudogenes Nat. Genet 24:363-367[ISI][Medline]
Felsenstein J., 1989 PHYLIP (phylogeny inference package) Cladistics 5:164-166
Feng Q., J. V. Moran, H. H. Kazazian, J. D. Boeke, 1996 Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition Cell 87:905-916[ISI][Medline]
Flavell A. J., D. B. Smith, A. Kumar, 1992 Extreme heterogeneity of Ty1-copia group retrotransposons in plants Mol. Gen. Genet 231:233-242[ISI][Medline]
Gilbert N., P. Arnaud, A. Lenoir, S. I. Warwick, G. Picard, J. M. Deragon, 1997 Plant S1 SINEs as a model to study retroposition Genetica 100:155-160[ISI][Medline]
Hedge I. C., 1976 A systematic and geographical survey of the old world Cruciferae Pp. 145 in J. G. Vaughn, A. J. Macleod, and B. M. G. Jones, eds. The biology and chemistry of the Cruciferae. Academic Press, London
Hedges S. B., 1992 The number of replications needed for accurate estimation of the bootstrap P value in phylogenetic studies Mol. Biol. Evol 9:366-369
Jurka J., 1995 Origin and evolution of Alu repetitive element Pp. 2541 in R. J. Maraia, ed. The impact of short interspersed elements (SINEs) on the host genome. R. G. Landes Company, Springer, Austin, Tex
. 1997 Sequence patterns indicate an enzymatic involvement in integation of mammalian retroposons Proc. Natl. Acad. Sci. USA 94:1872-1877
. 1998 Repeats in genomic DNA: mining and meaning Curr. Opin. Struct. Biol 8:333-337[ISI][Medline]
Kumar A., J. F. Bennetzen, 1999 Plant retrotransposons Ann. Rev. Genet 33:497-532
Le Q. H., S. Wright, Z. Yu, T. Bureau, 2000 Transposon diversity in Arabidopsis thaliana Proc. Natl. Acad. Sci. USA 97:7376-7381
Lenoir A., B. Cournoyer, S. I. Warwick, G. Picard, J. M. Deragon, 1997 Evolution of SINE S1 retroposons in Cruciferae plant species Mol. Biol. Evol 14:934-941[Abstract]
Malik H. M., W. D. Burke, T. H. Eickbush, 1999 The age and evolution of non-LTR retrotranposable elements Mol. Biol. Evol 16:793-805[Abstract]
Malik H. M., T. H. Eickbush, 2000 NeSL-1, an ancient lineage of site-specific non-LTR retrotransposons from Caenorhabditis elegans Genetics 154:193-203
McNamara P. T., A. Bolshoy, E. N. Trifonov, R. E. Harrington, 1990 Sequence-dependent kinks induced in curved DNA J. Biomol. Struct. Dyn 8:529-538[ISI][Medline]
Moran J. V., S. E. Holmes, T. P. Naas, R. J. Deberardinis, J. D. Boeke, H. H. Kazazian, 1996 High frequency retrotransposition in cultured mammalian cells Cell 87:917-927[ISI][Medline]
Noma K., H. Ohtsubo, E. Ohtsubo, 2000 ATLN Elements, LINEs from Arabidopsis thaliana: identification and characterization DNA Res 7:291-303[ISI][Medline]
Okada N., M. Hamada, I. Ogiwara, K. Ohshima, 1997 SINEs and LINEs share common 3' sequences: a review Gene 205:229-243[ISI][Medline]
Okada N., K. Ohshima, 1995 Evolution of t-RNAderived SINEs Pp. 6180 in R. J. Maraia, ed. The impact of short interspersed elements (SINEs) on the host genome. R. G. Landes Company, Springer, Austin, Tex
Pearce S. R., G. Harrison, D. Li, J. S. Heslop-Harrison, A. Kumar, A. J. Flavell, 1996 The Ty1-copia group retrotransposons in Vicia species: copy number, sequence heterogeneity and chromosomal localisation Mol. Gen. Genet 250:305-315[ISI][Medline]
SanMiguel P., A. Tikhonov, Y. K. Jin, et al. (11 co-authors) 1996 Nested retrotransposons in the intergenic regions of the maize genome Science 274:765-768
Schmidt T., 1999 LINEs, SINEs and repetitive DNA: non-LTR retrotransposons in plant genomes Plant Mol. Biol 40:903-910[ISI][Medline]
Sinnett D., C. Richer, J. M. Deragon, D. Labuda, 1991 Alu RNA secondary structure consists of two independent 7SL RNA-like folding units J. Biol. Chem 266:8675-8678
Smit A. F. A., 1996 The origin of interspersed repeats in the human genome Curr. Opin. Genet. Dev 6:743-748[ISI][Medline]
Strimmer K., A. Von Haeseler, 1997 Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment Proc. Natl. Acad. Sci. USA 94:6815-6819
Surzycki S. A., W. R. Belknap, 1999 Characterization of repetitive DNA elements in Arabidopsis J. Mol. Evol 48:684-691[ISI][Medline]
Tatout C., L. Lavie, J. M. Deragon, 1998 Similar target site selection occurs in integration of plant and mammalian retroposons J. Mol. Evol 47:463-470[ISI][Medline]
Tatout C., S. I. Warwick, A. Lenoir, J. M. Deragon, 1999 SINE insertions as clade markers for wild Cruciferae species Mol. Biol. Evol 16:1614-1621
The Arabidopsis Genome Initiative. 2000 Analysis of the genome sequence of the flowering plant Arabidopsis thaliana Nature 408:796-815[ISI][Medline]
Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalities and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]
Tikhonov A. P., C. Tatout, L. Lavie, J. L. Bennetzen, Z. Avramova, J. M. Deragon, 2001 Matrix-attachment regions (MARs) as target sites for SINE integration in Brassica genomes Chromosome Res 9:325-337[ISI][Medline]
Weiner A. M., 2000 Do all SINEs lead to LINEs? Nat. Genet 24:332-333[ISI][Medline]