*Department of Genètica Molecular, IBMB-CSIC, Barcelona;
Department of Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
MITEs are a particular class of TE first described in plants but later found to be present in other eukaryote genomes (Bureau and Wessler 1992
; Oosumi, Garlick, and Belknap 1996
; Tu 1997
). They are structurally similar to defective class II elements, but their high copy number and the size and sequence conservation of most MITE families suggest that they can be amplified by a replicative mechanism. It has been recently proposed that MITEs could be a particular type of defective class II element (Feschotte, Jiang, and Wessler 2002
), some of them related to the pogo subclass of Mariner transposons or to bacterial insertion sequence elements (Feschotte and Mouchès 2000a
; Le, Wright, and Bureau 2000
; Zhang et al. 2001
). Nevertheless, whereas it has been proposed that some MITE families could still be active in plants (Casacuberta et al. 1998
; Zhang, Arbuckle, and Wessler 2000
; Zhang et al. 2001
), the characterization of a mobile MITE copy allowing the analysis of its transposition mechanisms is still lacking. In this context, the analysis of the evolution of MITE families of elements within their host genomes is probably the best approach to analyze the lifestyle of these elements and the impact of their mobility on host genomes.
Here we present a genome-wide analysis of the Emigrant family of MITEs in Arabidopsis thaliana. In order to be able to detect elements with a divergent internal sequence, representing either ancient Emigrant elements, or previously undescribed low copy number Emigrant subfamilies, we have developed a computer program to detect putative MITEs in a genomic sequence based solely on their terminal inverted-repeat (TIR) sequences. This approach has allowed us to perform, for the first time, an evolutionary analysis of a family of MITEs within a particular genome. Our results show that different Emigrant subfamilies of elements have probably been generated by the amplification of a small number of founder elements. Our results also show that, although Emigrant elements target very rich AT regions for insertion, elements closely linked to genes are more frequently maintained during evolution.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The clustering program SPAT groups the sequences into a hierarchical classification, i.e., a nested sequence of partition (Gordon 1999
). Given a similarity measure between each pair of sequences, the complete weighted graph is designed where the nodes are the sequences and the weight of each edge is the similarity measure between the pair of sequences connected by this edge. Then the maximum spanning binary tree (MST) is found. This tree "spans" the graph connecting all the nodes in such away that the sum of the weights of the edges is maximized. The algorithm proceeds by removing the edge with minimum weight and dividing the tree into two disjoint subtrees (Zahn 1971
; Delattre and Hansen 1980
). A cluster is determined by the value of the removed edge that creates it and the value of the edge that divides it. A cluster of sequences is cohesive when a consecutive extraction of a significant number of edges does not change the composition of the cluster.
TRANSPO and SPAT are available at www.lsi.upc.es/alggen.
Emigrant Element Mining
The TRANSPO program was used to look at the entire available Arabidopsis genome sequence (www.arabidopsis.org) for inverted repeated sequences 75% identical to the first 20 nt of the previously defined Emigrant TIR (CAGTAAAACCTCTATAAATT) located within a range of 200700 nt. Overlapping elements generated from subterminal inverted-repeat sequences were eliminated. A pairwise similarity matrix was calculated and sequences were grouped using the SPAT program, and a graphical distribution of the different elements in Arabidopsis chromosomes was obtained using the program CLUPH.
To obtain information about the open reading frame (ORF) located close to the Emigrant elements, the 30 nt flanking each Emigrant element upstream and downstream were used as probes in sequence similarity searches (BLAST 2.0; Altschul et al. 1990
; http://www.ncbi.nlm.nih.gov/blast/). A table containing the BAC accession number and the nucleotide position of each Emigrant element, as well as the name and the distance of the elements to the closest upstream and downstream ORF can be obtained as additional information. Sequence similarity searches (BLAST) with the sequences flanking Emigrant elements were also used to look for related empty sites (RESites).
Phylogenetic Analysis
Sequences belonging to a particular group obtained with SPAT were aligned using the multiple-alignment program CLUSTAL W using the default parameters (version 1.5; Thompson, Higgins, and Gibson 1994
), with some minor refinements. DNADIST in Felsentein's package PHYLIP (Felsenstein 1989
) was used to generate a distance matrix based on the Jukes-Cantor algorithm (Jukes and Cantor 1969
). This was used to generate neighbor-joining trees (Saitou and Nei 1987
). Bootstrap analyses were performed using the programs Seqboot and Consense from PHYLIP (Felsenstein 1989
). Sequence variability, as measured by Nei's measure of nucleotide diversity,
(Nei 1987
), and its standard deviation were calculated using the program DnaSP (Rozas and Rozas 1999
).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We have searched the complete available Arabidopsis sequence, which covers 115.4 Mbp of the 125-Mbp genome and does not include telomeres, centromeres or the rDNA repeated regions, for Emigrant elements by looking for Emigrant TIRs (tolerating up to 25% divergence) separated by more than 200 nt and less than 700 nt. We have localized 151 sequences that could represent Emigrant elements. All these sequences present Emigrant-like TIRs, are very AT-rich, and do not have coding capacity, and most of them are flanked by the dinucleotide TA. Although all these characteristics are reminiscent of MITE-related sequences, most of these sequences are not annotated as Emigrant elements, MITEs, or possible TEs in the databases.
The high variability of the internal sequence does not allow the correct alignment of the 151 sequences and their analysis by phylogenetic methods. We have developed the program SPAT that proceeds by elimination of the most divergent sequence of a given group in order to tentatively group the sequences and be able to apply conventional phylogenetic methods. SPAT gave a tentative classification of most of the 151 sequences into three main groups, that we have named EmiA (41 sequences), EmiB (26 sequences), and EmiC (37 sequences), based on pairwise identity comparisons. Forty-seven sequences were too divergent to be included in any of the defined groups and have been named as Emi0 elements.
All the previously described Emigrant elements (Casacuberta et al. 1998
) belong to the EmiA group. We have previously demonstrated that EmiA elements were mobile in the recent past because some of them were found to be polymorphic among Arabidopsis ecotypes (Casacuberta et al. 1998
). In order to obtain data on the possible mobility of the other groups of elements, we searched the Arabidopsis genome for RESites, representing genome duplication events occurring prior to the transposition of these newly described elements. The presence of RESites within a genome has been successfully used as an indication of mobility when analyzing possible TEs within a single genome (Le, Wright, and Bureau 2000
; Tu 2001
). We found more than 20 well-conserved RESites corresponding to the different groups of Emigrant elements, although we found more RESites corresponding to the EmiA, EmiB, and EmiC classes than to the Emi0 class (not shown). These data, and the presence in each case of a TA duplication accompanying the insertion of the element, strongly suggest that the different elements described here are indeed mobile elements related to the Emigrant family of MITEs.
Analysis of the Sequence and Size Variability of the Different Emigrant Subfamilies
The relatively high sequence identity within each of the EmiA, EmiB, and EmiC groups of elements has allowed us to perform conventional phylogenetic analysis. Sequences belonging to each group were aligned using CLUSTAL W, and the alignments were used to obtain neighbor-joining trees. Figure 1
presents the trees obtained. Different monophyletic groups supported by high bootstrap values can be defined within each tree. Within each Emigrant group most of the sequences can be subdivided into three different subfamilies (A1, A2, and A3; B1, B2, and B3; C1, C2, and C3). By performing new alignments with the sequences belonging to each subfamily, we have deduced a consensus sequence for each of them and compared these consensus sequences in order to obtain information about the phylogenetic relationships among the different Emigrant subfamilies. A neighbor-joining tree, obtained comparing the consensus sequences of each subfamily, is also shown in figure 1
. The three EmiA subfamilies, the three EmiB subfamilies, and the three EmiC subfamilies seem phylogenetically related because the three different groups cluster together with high bootstrap values.
|
|
|
Position of Emigrant Elements Relative to ORFs
Although MITEs seem to target AT-rich regions, they have often been found close to transcribed sequences (Wessler, Bureau, and White 1995
; Yang et al. 2001
). We analyzed the regions flanking the 151 Emigrant insertions and calculated the distance from the ATG or STOP codon of the closest predicted gene. Ten percent of the elements lie within a predicted gene (7% in introns and 3% in exons), 24% lie at less than 500 nt from an ORF, and 23% lie at more than 500 nt and less than 1,000 nt from an ORF. Twenty-nine percent are located at more than 1,000 nt from any ORF, and 13% are inserted within a repetitive region. Nevertheless, the position of the Emigrant elements with respect to the predicted genes greatly varies among the different subfamilies analyzed (see table 1). While 55.5% of the EmiB3 elements are found at less than 500 nt from an ORF, and 42% of Emi0 are located within or close to a predicted gene, the vast majority of the EmiA2 elements (85%) are located at more than 500 nt from any ORF.
Among the 53 Emigrant elements located at less than 500 nt from the closest ORF, 46.5% are located downstream, 27.5% are located upstream, and 26% are located within a predicted gene. These elements can affect promoter activity, splicing, transcriptional termination, or RNA stability, as well as the coding capacity of the ORF. We have thus analyzed these insertions in some detail, and figure 3
shows examples of such close-gene insertions. Figure 3A
shows an Emi0 element, found within the transcribed downstream region of the Det1 gene, as an example of an element lying downstream of an ORF. The availability of the genomic and the cDNA sequence for the Det1 gene has allowed us to determine that the transcription of the Det1 gene stops within the Emigrant element, probably using polyadenylation sequences provided by Emi116. Figure 3B
shows an example of an Emi0 element located within a predicted gene coding for a GATA-like transcription factor. The insertion of the element has provided a new putative ATG and 48 new amino acids within the C-terminal region of the protein. We also found five Emigrant elements lying at less than 500 nt from two different ORFs. The insertion of those elements could potentially affect the expression of both the upstream and the downstream genes. Alternatively, the insertion of an Emigrant element in these extremely short intergenic regions could help to avoid transcriptional interference between both genes. Related to this, it is interesting to note that it has been proposed that some MITEs could act as matrix attachment regions isolating their neighboring genes (Tikhonov, Bennetzen, and Avramova 2000
). This possible effect of MITE insertion could be particularly useful in Arabidopsis, which has a very compact genome, and genes are sometimes found extremely close to one another.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Different Amplification Bursts of Emigrant Elements have Occurred During Arabidopsis Evolution
The 47 Emi0 sequences are too divergent to be included in any of the nine Emigrant subfamilies defined here. The high divergence of these elements suggests that they represent old Emigrant insertions that have accumulated a high number of mutations. The phylogenetic analysis of the other Emigrant elements shows that they belong to different subfamilies with different degrees of variability. Whereas the 20 EmiA2 elements are highly homogeneous, the EmiB3 subfamily is highly variable. This suggests that different amplification bursts have occurred at different times during the evolution of Arabidopsis, giving rise to these different subfamiliesthe more variable a subfamily is, the more ancient the amplification burst that has generated it should be. The start-type topology of the Emigrant subfamilies in the different trees suggests that each subfamily has been generated from the amplification of a single Emigrant element. This could be explained by the presence of only one active or master element capable of amplification at a particular moment, as predicted by the master gene model developed for SINEs (Deininger and Batzer 1995
), or simply by assuming that the amplification of MITEs is an extremely rare event occurring stochastically on any Emigrant element that would thus act as a founder element for a new subfamily. In any of both scenarios the result will be that only very few Emigrant elements had been amplified during the evolution of Arabidopsis and that the insertion dynamics of Emigrant elements has been very similar to that of most SINEs, in spite of the important differences in the transposition mechanisms.
This evolutionary dynamics shown here for the Emigrant element is probably shared by other elements, as the presence of highly conserved subfamilies within a single host genome has been described for other MITEs (Tu 2001
; Yang et al. 2001
).
Elements Close to Genes have been Preferentially Maintained During Arabidopsis Evolution
Although MITEs seem to target very highly AT-rich regions, they have often been found close to transcribed sequences (Wessler, Bureau, and White 1995
; Yang et al. 2001
; Feschotte, Jiang, and Wessler 2002
). Nevertheless, it is not known if this preferential location is the result of their insertion specificity. On the other hand, a recent survey failed to detect transposon insertions in A. thaliana coding regions, suggesting a purifying selection against deleterious mutation (Le, Wright, and Bureau 2000
). The presence of mobile elements at particular locations within a genome is the result of their transpositional activity and of the selection of the best fit genomes. Thus, elements transposing randomly within a genome can be found at particular locations as the result of a positive selection of their insertion within those sites or the negative selection of insertion in other locations. The effect of target site specificity should be more easily detected for recently inserted elements, whereas the effect of selection will be more apparent for ancient insertions. The comparison of the distribution of ancient versus recent elements should reveal the effect of selection and thus the impact of transposon insertions. So, we compared the relative distribution of the different subfamilies of Emigrant elements, which represent amplification bursts occurring at different times of the evolution of Arabidopsis, with respect to predicted genes in order to determine their insertion specificity as well as the effect of selection and the impact of Emigrant insertions.
EmiA2 is the most homogeneous subfamily described here, both in sequence and size, and probably represents the most recent burst of amplification of Emigrant elements. Eighty-five percent of the 20 EmiA2 elements lie at more than 500 nt from the closest ORF (see table 1). The genome of Arabidopsis is extremely compact, and the intergenic regions are very short. The mean size of Arabidopsis genes is 2 kbp and there is one gene every 5 kbp, which implies that the mean distance between two genes is only 3 kbp (The Arabidopsis Genome Initiative 2000
). Thus genic regions occupy 40% of the genome space, and the regions closely linked to the genes, that most probably contain gene regulatory regions (arbitrarily taken here as 500 nt) occupy 20% of the genome space, which means that 60% of the genome is occupied by genes and their potentially regulatory regions. The regions not linked to genes occupy only 40% of the genome space (20% the region arbitrarily defined here as between 500 and 1,000 nt, and 20% the region arbitrarily defined here as >1,000 nt). The distribution of EmiA2 elements is thus far from random, with Emigrant elements inserting preferentially far from ORFs. This difference is statistically highly significant, with a chi-square value of 19.76, whereas the chi-square value with three degrees of freedom and 99% probability is 11.34. The strict specificity of Emigrant and other MITEs for the TA dinucleotide as insertion site, as well as the preference for very highly AT-rich regions (74.3% AT in the case of Emigrant) probably helps these elements to avoid genes even in extremely compact genomes such as that of Arabidopsis.
This preference for regions far from genes is less pronounced for other Emigrant subfamilies. More than 50% of the EmiB3 elements, and 43% of the Emi0 group of elements, are less than 500 nt from the closest ORF. Interestingly, the Emi0 group contains the most divergent Emigrant elements, and the EmiB3 subfamily is one of the most variable subfamilies, suggesting that both the Emi0 group and the EmiB3 subfamily represent the most ancient insertions and have been subjected to selection for a relatively long period of time. In particular, the differences of distribution of Emi0 and EmiA2 elements are statistically significant, with a chi-square value of 19.76 (three degrees of freedom; 99% probability 2 = 11.34). The other Emigrant subfamilies show different distribution patterns with respect to ORFs, but although the low number of elements makes it difficult to draw conclusions in some cases, the more variable a subfamily, the more closely it is associated to genes.
A possible explanation for these results would be a domestication of the Emigrant transposase that would have learned to avoid genes during evolution, inserting Emigrant elements farther and farther from genes. Alternatively, these results suggest that whereas Emigrant elements preferentially insert far from ORFs, the elements closely linked to genes are more frequently maintained during evolution. This is reminiscent of what has been shown for the Alu family of SINEs in the human genome. Alu's tend to insert in AT-rich regions, and recently transposed Alu subfamilies are found in gene-poor regions, whereas ancient Alu subfamilies are found preferentially in GC-rich regions closely associated to genes (International Human Genome Sequencing Consortium 2001
). Although other possible explanations have been pointed out (Brookfield 2001
; Batzer and Deininger 2002
), it has been proposed that a positive selection in favor of the minority of Alu's in GC-rich DNA could explain the difference in distribution between old and new Alu subfamilies (International Human Genome Sequencing Consortium 2001
). Emigrant and other MITEs resemble SINEs in their short size and their high copy number, and here we have shown that their amplification dynamics is also very similar. Thus, although the redistribution of Emigrant elements could also be the result of a preferential loss of those elements located far from genes, it is tempting to hypothesize that, as has been proposed for Alu elements within the human genome, there has been a positive selection for Emigrant elements lying within or close to genes during Arabidopsis evolution.
A Role for Emigrant Elements in the Evolution of Arabidopsis Genes
Over the last 10 years a growing body of evidence has pointed toward a modular nature for the regulation of gene expression. Promoters, and probably terminators, are constituted by a complex array of regulatory elements. Most of these elements are found in many different promoters or terminators, although each promoter-terminator contains a particular combination of them. With the completion of genome sequencing projects it has become more and more clear that coding regions of eukaryote genes are also often composed of domains or modules that have been reshuffled during evolution. There are many different mechanisms that can account for the amplification and distribution of particularly successful coding or regulatory modules, but short replicative elements such as SINEs and MITEs would be particularly suitable candidates for such a function. SINEs are frequently found within or close to genes in Arabidopsis (Lenoir et al. 2001
) and other organisms (Makalowski 1995
), and it has been recently found that some of them can play an important biological role as coding or transcriptional regulatory regions (Shimamura et al. 1998
; Ferrigno et al. 2001
; Goodyer, Zheng, and Hendy 2001
; Landry, Medstrand, and Mager 2001
; Ackerman et al. 2002
). Moreover, it has been proposed that B2 SINEs may have the potential to distribute a functional pol II promoter throughout the genome (Ferrigno et al. 2001
). Here we show that a high number of Emigrant elements within potential promoters, terminators, introns, and coding sequences which may affect gene coding capacity or regulation have been conserved during evolution. Although molecular experiments to determine unambiguously the impact of these insertions have yet to be performed, our results suggest that the insertion of Emigrant elements has played an important role in the evolution of Arabidopsis genes. MITEs, as has been proposed for SINEs, could have been recruited by genomes in an evolutionary mechanism to generate novel coding or regulatory sequences. The fact that MITEs can probably be excised (Petersen and Seberg 2000
; Yang et al. 2001
) makes them even more suitable for such a function.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: Arabidopsis
evolution
MITE
Emigrant
master element
Address for correspondence and reprints: Josep M. Casacuberta, Department of Genètica Molecular, IBMB-CSIC. Jordi Girona 18, 08034 Barcelona, Spain. E-mail: jcsgmp{at}cid.csic.es
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Ackerman H., I. Udalova, J. Hull, D. Kwiatkowski, 2002 Evolution of a polymorphic regulatory element in interferon-gamma through transposition and mutation Mol. Biol. Evol 19:884-890
Altschul S. F., W. Gish, W. Miller, E. W. Myers, D. J. Lipman, 1990 Basic local alignment search tool J. Mol. Biol 215:403-410[ISI][Medline]
Batzer M. A., P. L. Deininger, 2002 Alu repeats and human genomic diversity Nat. Rev. Genet 3:370-379[ISI][Medline]
Brookfield J. F., 2001 Selection on Alu sequences? Curr. Biol 11:R900-R901[Medline]
Bureau T. E., S. R. Wessler, 1992 Tourist: a large family of small inverted repeat elements frequently associated with maize genes Plant Cell 4:1283-1294
Casacuberta E., J. M. Casacuberta, P. Puigdomènech, A. Monfort, 1998 Presence of miniature inverted-repeat transposable elements (MITEs) in the genome of Arabidopsis thaliana: characterisation of the Emigrant family of elements Plant J 16:79-85[ISI][Medline]
Casacuberta E., P. Puigdomènech, A. Monfort, 2000 Distribution of microsatellites in relation to coding sequences within the Arabidopsis thaliana genome Plant Sci 157:97-104[ISI][Medline]
Deininger P. L., M. A. Batzer, 1995 SINE master genes and population biology Pp. 4360 in R. J. Maraia, ed. The impact of short interspersed elements (SINEs) on the host genome. RG Landes Company, Austin, Tex.
Delattre M., P. Hansen, 1980 Bicriterion cluster analysis IEEE Trans. Pattern Anal. Mach. Intelligence 4:277-291.
Felsenstein J., 1989 PHYLIPphylogeny inference package (version 3.56) Cladistics 5:164-166
Ferrigno O., T. Virolle, Z. Djabari, J. P. Ortonne, R. J. White, D. Aberdam, 2001 Transposable B2 SINE elements can provide mobile RNA polymerase II promoters Nat. Genet 28:77-81[ISI][Medline]
Feschotte C., N. Jiang, S. R. Wessler, 2002 Plant transposable elements: where genetics meets genomics Nat. Rev. Genet 3:329-341[ISI][Medline]
Feschotte C., C. Mouches, 2000a. Evidence that a family of miniature inverted-repeat transposable elements (MITEs) from the Arabidopsis thaliana genome has arisen from a pogo-like DNA transposon Mol. Biol. Evol 17:730-737
. 2000b. Recent amplification of miniature inverted-repeat transposable elements in the vector mosquito Culex pipiens: characterization of the Mimo family Gene 250:109-116[ISI][Medline]
Goodyer C. G., H. Zheng, G. N. Hendy, 2001 Alu elements in human growth hormone receptor gene 5' untranslated region exons J. Mol. Endocrinol 27:357-366
Gordon A. D., 1999 Classification Chapman & Hall/CRC, New York.
International Human Genome Sequencing Consortium. 2001 Initial sequencing and analysis of the human genome Nature 409:860-922[ISI][Medline]
Jukes T. H., C. R. Cantor, 1969 Evolution of protein molecules Pp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York.
Kumar A., J. Bennetzen, 1999 Plant retrotransposons Annu. Rev. Genet 33:479-532[ISI][Medline]
Landry J. R., P. Medstrand, D. L. Mager, 2001 Repetitive elements in the 5' untranslated region of a human zinc-finger gene modulate transcription and translation efficiency Genomics 76:110-116[ISI][Medline]
Le Q. H., S. Wright, T. Bureau, 2000 Transposon diversity in Arabidopsis thaliana Proc. Natl. Acad. Sci. USA 97:7376-7381
Lenoir A., L. Lavie, J. L. Prieto, C. Goubely, J. C. Cote, T. Pelissier, J. M. Deragon, 2001 The evolutionary origin and genomic organization of SINEs in Arabidopsis thaliana Mol. Biol. Evol 18:2315-2322
Makalowski W., 1995 SINEs as a genomic scrap yard: an essay on genomic evolution Pp. 81104 in R. J. Maraia, ed. The impact of short interspersed elements (SINEs) on the host genome. RG Landes Company, Austin, Tex
Myers G., 1998 A fast bit-vector algorithm for approximate string matching based on dynamic progamming. Proc. Ninth Combinatorial Pattern Matching Conference Springer-Verlag LNCS Series 1448:1-13
Nei M., 1987 Molecular evolutionary genetics Columbia University Press, New York
Oosumi T., B. Garlick, W. R. Belknap, 1996 Identification of putative nonautonomous transposable elements associated with several transposon families in Caenorhabditis elegans J. Mol. Evol 43:11-18.[ISI][Medline]
Petersen G., O. Seberg, 2000 Phylogenetic evidence for excision of Stowaway Miniature Inverted-Repeat Transposable Elements in Triticeae (Poaceae) Mol. Biol. Evol 17:1589-1596
Rozas J., R. Rozas, 1999 DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis Bioinformatics 15:174-175
Saitou N., N. Nei, 1987 The neighbour-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]
Shimamura M., M. Nikaido, K. Ohshima, N. Okada, 1998 A SINE that acquired a role in signal transduction during evolution Mol. Biol. Evol 15:923-925
Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, population-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]
The Arabidopsis Genome Initiative. 2000 Analysis of the genome sequence of the flowering plant Arabidopsis thaliana Nature 408:796-815[ISI][Medline]
Tikhonov A. P., J. L. Bennetzen, Z. V. Avramova, 2000 Structural domains and matrix attachment regions along colinear chromosomal segments of maize and sorghum Plant Cell 12:249-264
Tu Z., 1997 Three novel families of miniature inverted-repeat transposable elements are associated with genes of the yellow fever mosquito, Aedes aegypti Proc. Natl. Acad. Sci. USA 94:7475-7480
. 2001 Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae Proc. Natl. Acad. Sci. USA 98:1699-1704.
Turcotte K., S. Srinivasan, T. Bureau, 2001 Survey of transposable elements from rice genomic sequences Plant J 25:169-179[ISI][Medline]
Wessler S., T. Bureau, S. E. White, 1995 LTR-retrotransposons and MITEs: important players in the evolution of plant genomes Curr. Opin. Genet. Dev 5:814-821[ISI][Medline]
Yang G., J. Dong, M. B. Chandrasekharan, T. C. Hall, 2001 Kiddo, a new transposable element family closely associated with rice genes Mol. Genet. Genomics 266:417-424[ISI][Medline]
Zahn C. T., 1971 Graph-theoretical methods for detecting and describing gestalt clusters IEEE Trans. Comput C-20:68-86.
Zhang Q., J. Arbuckle, S. R. Wessler, 2000 Recent, extensive, and preferential insertion of members of the miniature inverted-repeat transposable element family Heartbreaker into genic regions in maize Proc. Natl. Acad. Sci. USA 97:1160-1165
Zhang X., C. Feschotte, Q. Zhang, N. Jiang, W. R. Eggleston, S. R. Wessler, 2001 P instability factor: an active maize transposon system associated with the amplification of Tourist-like MITEs and a new superfamily of transposases Proc. Natl. Acad. Sci. USA 98:12572-12577