Mobile Elements and the Genesis of Microsatellites in Dipterans

Jason Wilder2, and Hope Hollocher

Department of Ecology and Evolutionary Biology, Princeton University


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
Factors that influence the genesis and genomic distribution of microsatellite DNA are poorly understood. We have identified a novel class of Dipteran mobile elements, mini-me elements, which help elucidate both of these issues. These retroposons contain two internal proto-microsatellite regions that commonly expand into lengthy microsatellite repeats. These elements are highly abundant, accounting for approximately 1.2% of the Drosophila melanogaster genome, giving them the potential to be a prolific source of microsatellite DNA variation. They also give us the opportunity to observe the outcomes of multiple microsatellite genesis events (initiating from the same proto-microsatellite) at separate mini-me loci. Based on these observations, we determined that the genesis of microsatellites within mini-me elements occurs through two separate mutational processes: the expansion of preexisting tandem repeats and the conversion of sequence with high cryptic simplicity into tandemly repetitive DNA. These modes of microsatellite genesis can be generalized beyond the case of mini-me elements and help to explain the genesis of microsatellites in any sequence region that is not constrained by selection.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
Microsatellites are hypervariable DNA sequences composed of tandemly repeated short motifs (Tautz and Renz 1984aCitation ; Tautz 1989Citation ). They are among the most commonly used molecular markers in population and evolutionary biology and are widely used in applications such as parentage analysis, gene mapping, assessments of population structure, and, more recently, phylogenetics (Schlötterer and Pemberton 1994Citation ; Goldstein and Pollock 1997Citation ). Additionally, recent studies have implicated microsatellites as the cause of numerous human diseases (Rubinsztein 1999Citation ). Despite these tremendous impacts on medical and biological science, the factors that determine the genesis and genomic distribution of microsatellite sequences remain obscure (Hancock 1999Citation ). DNA slippage, or the misalignment of repetitive DNA strands, has been described as the mechanism causing variation in the repeat number of preexisting microsatellites (reviewed in Eisen 1999Citation ), but few studies have elucidated the mechanisms that are actually involved in the creation of microsatellite DNA. Here, we identify a novel class of Dipteran retroposons, microsatellite initiating mobile elements (or mini-me), which serve both as dispersal agents and as sites for the genesis of microsatellite DNA. These abundant elements are widely conserved across at least three Dipteran families and illustrate a novel mechanism through which microsatellites are generated.

Although the association between mobile elements and microsatellite sequences has previously been documented, the actual molecular mechanisms underlying this relationship are not well understood. The influence of mobile element activity on the genesis of microsatellite repeats is best known from Alu and other mammalian short interspersed elements (SINEs; Alexander, Rohrer, and Beattie 1995Citation ; Arcot et al. 1995Citation ; Nadir et al. 1996Citation ; Gallagher et al. 1999Citation ). In these mobile elements, it is thought that retrotranscripts undergo 3' polyadenylation prior to their incorporation into the genome. As a result, polyA microsatellites have a strong association with the 3' ends of these SINEs, a feature which may serve to guide their retroposition in the genome (Nadir et al. 1996Citation ). While this mechanism is a convincing explanation for the association between SINE elements and their 3' polyA microsatellites, it cannot be invoked to address the origins of microsatellites that are associated with other regions of retroposons. For instance, Ramsay et al. (1999)Citation found that microsatellites can also be associated with both 5' and internal regions of some retroposons, which is unexplained by the above mechanism.

In our study, we describe microsatellites that arise at internal regions of the mini-me retroposon. We have identified two loci, or proto-microsatellites, where microsatellites have repeatedly evolved. Because these elements are extremely abundant in the genome, we were able to observe the outcome of numerous microsatellite genesis events, all of which were generated from the same proto-microsatellites. Based on these observations, we characterize two modes of microsatellite genesis that can be applied to the creation of microsatellite loci anywhere in the genome.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
Drosophila Strains
We first detected mini-me elements in two species, Drosophila dunni dunni and Drosophila nigrodunni, members of the cardini species group. Strains were obtained from the National Drosophila Species Resource Center in Bowling Green, Ohio. Drosophila dunni specimens were taken from strain 15182-2291.0 (Mayaguez, Puerto Rico), and D. nigrodunni specimens were taken from strain 15182-2311.1 (Monkey Hill, Barbados).

DNA Isolation, Cloning, and Sequencing
Microsatellite libraries were created for D. dunni and D. nigrodunni by T. Diaz, R. W. O'Neill, and J. Kenney. Details of the microsatellite library construction will be presented in a separate manuscript; therefore, we present only a summary of the procedure here. Plasmid libraries for each species were created using DNA isolated from 200 starved, mixed-sex flies. Genomic DNA was digested using the restriction enzymes Sau3AI and RsaI and then ligated into the pBluescript II KS (+/-) cloning vector (Stratagene). The plasmid libraries were then probed using di-, tri-, and tetranucleotide repeat sequences labeled with [33P]{gamma}-dATP. We sequenced 85 clones that hybridized strongly with the microsatellite probes along a single strand, either manually with an AmpliCycle sequencing kit (Perkin-Elmer) or using an ABIPRISM 377 automated DNA sequencer (Perkin Elmer) maintained by the Princeton University synthesizing/sequencing facility.

Sequence Analysis and mini-me Isolation
We analyzed individually the sequences of 37 clones containing microsatellites from D. nigrodunni and 38 clones from D. dunni. Using the program BioEdit, version 4.7.4 (Hall 1999Citation ), we manually aligned the flanking regions of cloned microsatellites. Based on the alignment, we identified a short, highly conserved sequence adjacent to numerous microsatellites. Between November 1999 and March 2000, we performed BLAST searches using this short conserved region as a query string against the GenBank database (Altschul et al. 1997Citation ). Our analysis initially focused on three separate sequence regions: intron 1 of the RPII215 gene from the species Drosophila madeirensis, Drosophila subobscura, and Drosophila guanche (accession numbers Y18877, Y18876, and Y18878); the distal breakpoint of chromosomal inversion 2j in Drosophila buzzatii (accession numbers AF162799 and AF162797); and the DINE-1 element of Drosophila melanogaster (accession number U66884).

Sequences from the regions listed above were manually aligned using the highly conserved sequence region as a point of reference. Despite an overall lack of conserved nucleotide identity, numerous features were shared between these sequences. We identified a set of these conserved features (described below; fig. 1 ) and used these to create a composite pattern defining the mini-me class of Dipteran genomic elements.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 1.—The structure of Dipteran mini-me elements. A, Inverted repeats lie at both termini of the mobile elements (shaded triangles), with a partial duplication of the inverted repeat at the 5' end. The 3' repeat lies in a slightly preterminal position, 25–45 bp from the terminus of the retroposon. The highly conserved 33-bp core region is indicated by a black box. Immediately flanking this region are two separate proto-microsatellites (hatched boxes). Upstream of the core region is the (TA)n-producing proto-microsatellite, while the (GTCY)n-producing proto-microsatellite is downstream of the core region. B, The partial inverted repeat at the 5' terminus of mini-me has the capacity to produce a strong hairpin secondary structure, as illustrated by representative elements from Stomoxys calcitrans (Muscidae) and Drosophila nigrodunni. These loop structures are not produced at the 3' terminus of the retroposon

 
Phylogenetic Analysis of mini-me Across Dipteran Taxa
In order to identify copies of mini-me elements from additional taxa, we extended our BLAST search to the entire GenBank database. While our search string was still composed of the short conserved sequence region, positive hits were narrowed by comparing them with the newly characterized mini-me composite pattern. Sequences that contained the elements specified by the pattern were considered mini-mes and incorporated into subsequent analyses. These elements were examined for potential conserved coding regions and were also examined for the presence of RNA polymerase III promoter motifs and tRNA-derived elements using the program Pol3scan (Pavesi et al. 1994Citation ).

To confirm that the sequences isolated in this manner represented a discrete class of genomic elements (and not a random collection of sequences that had arbitrarily converged on our pattern-based definition), we performed a phylogenetic analysis on one section of the mini-me element. This section was approximately 120 bp in length and spanned from the putative 5' terminus of mini-me to the end of the conserved region (see fig. 1 ). Sequences were aligned using the program CLUSTAL W, version 1.4 (Thompson, Higgins, and Gibson 1994Citation ), and then rechecked manually. The alignment incorporated 28 copies of mini-me elements, isolated from 14 different species (accession numbers AF182164, D89934, M30316, M89990, U49102, AF098329, AF162799, X12536, L13721, AF012415, AF043638, X55391, Y18876, Y18877, AF043637, U66884, AF025540, X01918, AC003925, X62679, and AC005720). The elements used were chosen to maximize phylogenetic representation broadly across the Dipteran family. An unrooted neighbor-joining tree was constructed using PAUP* (Swofford 1998Citation ). Maximum likelihood was used to calculate genetic distances based on the HKY substitution model (Hasegawa, Kishino, and Yano 1985Citation ), allowing variable substitution rates, with model parameters estimated during the tree search.

Genomic Distribution of mini-me Elements in D. melanogaster
In order to determine the density and distribution of mini-me elements in D. melanogaster, the entire genome (Celera, Drosophila Genome Project, version 1) was searched globally using a mini-me element isolated immediately downstream of the hermaphrodite gene (accession number AF025540). Additionally, chromosome 4 was analyzed separately in order to verify the results obtained in the global genome analysis. For chromosome 4, each positive hit recorded from a local BLAST search was manually verified (in order to check for false positives), and the number of mini-me elements counted multiple times was tallied (in order to quantify overcounting). A BLAST E-value cutoff of 1.0 was used for each of these analyses, and the data were not filtered in any way.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
Analysis of Microsatellite-Flanking Regions from D. dunni and D. nigrodunni
We sequenced 75 clones from our plasmid library that hybridized successfully with the di-, tri-, and tetranucleotide repeat probes (38 clones from D. dunni and 37 clones from D. nigrodunni). From these, we isolated 51 microsatellite loci, 5 of which were of the tetranucleotide motif (GTCY)>=4. Each of these 5 loci contained a highly conserved 33-bp region immediately flanking the microsatellite (fig. 2 ).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 2.—Conserved region with downstream microsatellite. The 33-bp region immediately adjacent to the (GTCY)n microsatellite (underlined) is conserved across numerous loci in the species Drosophila nigrodunni and Drosophila dunni. The tetranucleotide microsatellite, which is variable in length between loci, is immediately downstream of the 33-bp sequence

 
Analysis of (GTCY)n-Flanking Regions Reveals mini-me Element
A BLAST search of GenBank revealed numerous sequences from many different Dipteran taxa with high similarity to the 33-bp (GTCY)n-flanking region. Initially, we focused on three different sets of sequences, chosen because they contained a (GTCY)n microsatellite flanked by the 33-bp conserved region and because they provided broad phylogenetic representation across the Drosophila genus. The first of these, DINE-1 had previously been described as a repetitive element from the genome of D. melanogaster and was thought to be the remnant of a mobile element from that species (Locke et al. 1999Citation ). The second, ISBu-1 from D. buzzatii, is a sequence whose presence is known to be polymorphic at the distal breakpoint of the 2j chromosomal inversion (Cáceres et al. 1999Citation ). The final locus, which is located in intron 1 of the RPII215 gene from members of the obscura species group, is present in recently derived members of the group (D. madeirensis and D. subobscura) but is absent in more ancestral species (D. guanche), indicating that the sequence has recently inserted itself at this locus (obscura group phylogeny from O'Grady [1999]). By analyzing each of these sequences in conjunction with the (GTCY)n-flanking regions that we isolated from D. dunni and D. nigrodunni, we found that each contained a number of previously unrecognized shared features (fig. 1 ), which we used to define the class of mini-me elements. These features are present despite an overall lack of sequence identity between mini-me elements from separate species.

Conserved at both termini of mini-me elements are perfect inverted repeats that range in size from 10 to 20 bp (table 1 ). At the 5' end of the mini-me elements, the inverted repeat is partially duplicated, forming a complementary palindrome with the apparent ability to form a hairpin secondary structure. The 3' inverted repeat, which is located in a slightly preterminal position, cannot form this hairpin structure. Also conserved in mini-me elements is a core 33-bp motif that begins approximately 110 bp from the 5' end of the element. This core region has high sequence identity (approximately 80%) across all copies of the element, indicating a possible functional role in transposition of the elements. From the (GTCY)n microsatellite to the 3' preterminal inverted repeat, sequence identity is totally lost in between-species comparisons of the mini-me elements. Subsequent analysis has shown that even between copies of mini-me elements within a single species, the length and sequence identity in this region can be extremely variable. Despite these variations, the 3' end of the mini-me elements is clearly defined by the presence of the previously mentioned preterminal inverted repeat.


View this table:
[in this window]
[in a new window]
 
Table 1 Inverted Repeat Sequences of Mini-me Elements from Different Host Oganisms

 
Mini-me Elements from Additional Taxa
Based on the features described above, we subsequently identified mini-me elements in 17 Drosophila species and 4 additional Dipteran species from two other families (Calliphoridae: Calliphora vicina and Lucilia cuprina; Muscidae: Musca domestica and Stomoxys calcitrans). Each of the sequences isolated from GenBank contains all of the features described above, and the sequences range from 500 to 1,200 bases in length. The majority of the length differences are due to insertions or deletions in the 3' region of the element or variation in the size of the (GTCY)n microsatellite.

Overall sequence identity is poorly conserved outside of the 33-bp region. This is evidenced in table 1 , which shows the sequences of the inverted repeats from mini-me elements. Both the length and the sequence of the repeats is variable between taxa, although their presence at the termini of the elements was universal. The region between the 5' inverted repeat and the 33-bp core region could be aligned between mini-me isolates from different species (described below). However, beyond this region, sequence alignments between species became completely arbitrary. Across mini-me elements from all species, we could find no apparent coding regions or RNA polymerase III promoters that might enhance the mobility of these sequences.

Phylogenetic Analysis of mini-me Elements
Because our definition of mini-me elements is based on conserved sequence patterns but not conserved sequence identity, we performed a phylogenetic analysis of the mini-me region between the 5' terminus and the 33-bp core region. This analysis revealed a tight clustering of mini-me elements in complete concordance with the recognized taxonomic grouping of their host species (Throckmorton 1975Citation ; fig. 3 ). Mini-me elements from closely related taxa are clearly more similar to each other than to elements from more distantly related taxa. The phylogeny of the mini-me elements reveals clades at several taxonomic levels, including within each major family group, within Drosophila subgenera, and within individual species or closely related species groups. Thus, we believe that mini-me elements have been transmitted vertically between species since very early in the Dipteran radiation. Through this process, the nucleotide sequences of mini-me elements evolved to become nearly unrecognizable between lineages but retained basic sequence features and structural characteristics throughout the radiation.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 3.—Phylogenetic relationships among mini-me taxa. A neighbor-joining tree was constructed using PAUP* among aligned sequences from 28 retroposon isolates, derived from 14 different species. Bootstrap values (out of 100) provide support for the major nodes that distinguish each major Dipteran clade

 
Genomic Density of mini-me Elements
Our searches of the entire D. melanogaster genome revealed 3,395 copies of the mini-me element, or a density of 1 per 36 kb of euchromatic sequence. Our BLAST search of chromosome 4 had a 6.3% rate of overcounting (single elements counted twice in the BLAST search), indicating that the total number in the genome might be closer to 3,200 copies (1 per 38 kb). We encountered no false positives in our search of chromosome 4. Mini-me elements were three times as abundant on chromosome 4 as in the genome as a whole, with a density of 1 per 13 kb of sequence (89 copies in 1.16 Mb of sequence). This indicates that the distribution of these elements is variable across the genome.

Analysis of Microsatellites Generated Within mini-me Elements
Microsatellites are regularly associated with internal regions of mini-me elements. Two sites in particular tend to be composed of microsatellite repeats. The first, the tetranucleotide repeat region which initially signaled to us the presence of mini-me elements, lies immediately downstream of the 33-bp core region (see fig. 1 ). Microsatellites that are derived from this locus tend to be based on only two different 4-bp repeat motifs, (GTCT)n and (GTCC)n. While perfect repeats as long as (GTCT)13 have been observed at this locus, longer complex microsatellites (based on mixtures of both 4-bp motifs) are also common. The longest observed allele generated at this locus contains mixed (GTCT)n and (GTCC)n motifs and has expanded to a total size of (GTCY)37, which is among the largest microsatellite alleles ever reported from any Drosophila species. While microsatellites are common at this site, they are not universal among mini-me elements. Of the 47 mini-me elements analyzed, 23 (49%) had at least one perfect tandem repeat and had an average of 5.1 perfectly repeated (GTCT)n or (GTCC)n tandem motifs. The remaining 51% did not contain any tandemly repetitive DNA but did contain a 12-bp sequence composed of only the bases, G, T, and C. The presence or absence of microsatellites at this site, as well as the lengths of microsatellites that did occur, was variable between mini-me copies both within and between separate Dipteran species.

The second microsatellite within mini-me elements produces simpler and shorter (TA)n repeats. Unlike the previous proto-microsatellite locus, tandem repeats were almost always present in this region, which lies immediately upstream of the 33-bp core sequence. In most copies of mini-me elements, including those which evidence suggests have been recently mobile, this region consists of a (TA)4 repeat. However, variability in size at this locus seems quite common, with continuous variation observed between (TA)1 and (TA)6. Again, the length variation at this microsatellite exists between mini-me loci both within and between species.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
Our analysis of microsatellite-flanking regions in D. dunni and D. nigrodunni indicated that (GTCY)n microsatellite repeats are strongly associated with a novel mobile element that is widely conserved across Dipteran taxa. The sequence composition of these mini-me elements is extremely variable between species, although a pattern of structural sequence features is universally present. Additionally, the propensity to form internal microsatellites appears to be a regular feature of these elements. We believe that these elements are mobile within the genome. Evidence for this notion comes from observations of the activity of these elements in a number of taxa. The mobility of mini-me elements can most clearly be seen in the case of ISBu-1, the mini-me from D. buzzatii, in which the insertion of the element is temporally correlated with the downstream insertion of the Galileo transposon (Cáceres et al. 1999Citation ). Additional evidence comes from analyses of orthologous genetic loci among closely related species groups. For instance, a mini-me element has been inserted into the intron of the RPII215 gene in D. subobscura and D. madeirensis but not D. guanche, from which the former two are recently derived. A similar case exists in D. mojavensis, in which a mini-me element is inserted between the Adh-1 and Adh-2 genes but is not present at this locus in closely related members of the repleta species group, such as D. mulleri (accession numbers X12536 and X03048).

Mini-me elements are unusual in that they do not appear to code for any transposition enzymes such as transposase or reverse transcriptase. Consequently, these elements probably do not contain the necessary genetic machinery to initiate their own transposition. We also have not observed them to excise themselves from the genome once incorporating at a locus, indicating that they are probably a retro-element of some sort. However, they do not contain the usual hallmarks of the SINE family of elements, which are among the most common retro-elements in the eukaryotic genome (Makalowski 1995Citation ). Most notably, mini-me elements lack an internal RNA polymerase III promoter or other type of tRNA-derived sequence. Because mini-me elements are not structurally similar to any other known type of mobile sequences, we classify them as a new group of non-LTR retroposons.

Mini-me Elements Are Highly Abundant in the Host Genome
If D. melanogaster can be considered typical, mini-me elements are extraordinarily abundant in the Drosophila genome. With a copy number of >3,000 and a minimal observed size of 500 bp, these elements represent at least 1.2% of the total euchromatic genome of D. melanogaster. Despite their high copy number, their presence in the genome has not been widely detected. This is true even in highly studied portions of the genome, such as the recently annotated 2.9-Mb Adh region (Ashburner et al. 1999Citation ), in which we detected at least 11 copies of mini-me elements. Although, as stated earlier, individual copies of mini-me elements have previously been isolated from a number of species, including D. buzzatii and D. melanogaster, and also from various species in the D. obscura and D. funebris species groups (Marfany and Gonzàlez-Duarte 1992Citation ; Steinemann and Steinemann 1993Citation ; Hagemann et al. 1998Citation ; Amador and Juan 1999Citation ), their prevalence in the host genome and the fact that these copies all belong to the same class of genomic elements has not previously been reported.

Microsatellite Genesis in mini-me Elements
We observed that two types of microsatellites are regularly associated with mini-me elements, (TA)n and (GTCY)n repeats. These microsatellites are variable in length among copies of mini-me elements both within and between host taxa. Since our data indicate that mini-me elements are more closely related to each other within species than between species, we believe that this microsatellite variation has arisen de novo in each host lineage. Because microsatellite genesis repeatedly occurs at two specific loci within mini-me elements, we consider these areas "proto-microsatellites." While we do not contend that mini-me elements are responsible for all, or even most, Dipteran microsatellites, our observations of the patterns of molecular evolution that characterize mini-me proto-microsatellites have revealed two major modes of microsatellite genesis that may be generalized across the entire eukaryotic genome.

The first mode of microsatellite genesis is illustrated by the (TA)n repeat and is relatively simple. At this locus, microsatellite variation arises due to the addition or subtraction of repeat motifs from preexisting tandemly repeated DNA. In this case, a (TA)4 repeat is well conserved across most mini-me retroposon copies. Because the bases T and A are overrepresented in the genome (each comprising approximately 28%), it is possible that this is simply a chance feature of the mini-me element that has remained in all Dipteran lineages. Alternatively, the conservation of this locus might be explained by its being a necessary sequence feature for mini-me retroposition. Regardless, mini-me copies that become fixed in the genome contain this (TA)4 sequence, and this repeat invariantly accumulates mutations along with the rest of the element (see fig. 4A ). Because this locus is composed of tandemly repetitive DNA, slippage mutations may be common, which would tend to produce variation in the number of repeated motifs. Slippage mutations are the mechanism commonly accepted to cause variation at most microsatellite loci (Levinson and Gutman 1987Citation ). Here, we simply apply this mechanism to tandem repeats that are inserted into the genome through retroposition. Microsatellites produced at this locus tend to be invariant with respect to repeat motif, indicating that slippage mutations begin to accumulate before site substitutions break up the (TA)4 proto-microsatellite.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 4.—A model for the genesis of mini-me microsatellite motifs. A, Functional mini-me elements containing the proto-microsatellite sequence motifs (hatched boxes flanking the conserved 33-bp core sequence represented in black) replicate and integrate into the genome at many different sites. Once incorporated into the genome, mini-me elements are fixed at the locus but can still replicate and produce functional copies. As is the case for most retroposons, selection no longer acts on the fixed element; therefore, mutations begin to accumulate (progressively lowering the ability of the element to produce functional copies). Preexisting tandem DNA repeats (e.g., (TA)4) are immediately subject to expansion or reduction through replication slippage. Cryptically simple DNA is subject to a two-step mutational process (see B) which generates tandemly duplicated DNA (e.g., (GTCY)2) that is then also subject to replication slippage, generating novel microsatellite loci. B, Six separate single-base substitutions are each capable of producing tandemly repeated DNA motifs in the mini-me proto-microsatellite, shown above. Of these possibilities, four of the substitutions represent transversion mutations (line a). The two remaining mutations are transition mutations that create tandemly duplicated (GTCC) and (GTCT) motifs, respectively (line b). Once tandem duplications are generated, slippage mutations may begin to occur, which can add to the number of tandemly duplicated repeat motifs (line c). Our observations of 23 different microsatellites produced at this proto-microsatellite indicate that only (GTCC) and (GTCT) repeats are generated at this locus, reflecting the expected substitution bias in favor of C:T transitions

 
The second proto-microsatellite in the mini-me elements reveals a much more complex and interesting pattern of microsatellite genesis. Because microsatellites originate at this locus from DNA that does not initially contain simple sequence repeats, a multistep model is necessary to explain microsatellite genesis. As stated earlier, in its initial form, this proto-microsatellite is composed of a well-conserved DNA sequence. Because of the highly biased nucleotide composition of this region (containing only C, T, and G), it can be defined as a site with high cryptic simplicity (Tautz, Trick, and Dover 1986Citation ). Like the (TA)n proto-microsatellite, this sequence is highly conserved across all mini-me elements, indicating that it, too, may be necessary for successful retroposition. Again, however, mutations will accumulate once a copy of the element becomes incorporated in the host genome (fig. 4A ). Initially, slippage mutations at this locus will be unlikely because of the absence of tandemly duplicated DNA. However, this proto-microsatellite is composed of a sequence in which individual substitution mutations at six different sites each have the ability to create tandemly duplicated DNA, four with the dinucleotide motifs (GT)2 and (CT)2, and two with the tetranucleotide motifs (GTCC)2 and (GTCT)2, respectively (fig. 4B ). If any of these specific substitutions occur, slippage mutations may begin to accumulate, albeit slowly at first, eventually causing the generation of the full-length microsatellites. Thus, the repeat motifs of the microsatellites that arise at this locus are dependent on two separate mutational events: first, a base substitution that creates a tandemly duplicated motif, and then a slippage mutation (or series of slippage mutations), which adds additional copies of the motif. Biases with regard to the initial base substitution will be reflected in the repeat motif of the microsatellites that result. We clearly observe such biases at this proto-microsatellite, which so far has been seen to produce only (GTCC)n and (GTCT)n repeats. We hypothesize that this pattern is due to the fact that the substitutions necessary to create these repeat patterns are both C:T transitions, which have been well documented to be the most frequent substitution type (reviewed in Li 1997Citation , pp. 30–34). The genesis of microsatellites based on any other sequence motif requires initial transversion mutations, which occur much less frequently and have not been observed.

These data show that some DNA regions with high cryptic simplicity can be converted directly into microsatellite DNA due to the biased pattern of substitution mutations. Interestingly, the reverse of this process has been postulated by Hancock (1999)Citation , who hypothesizes that microsatellites can be converted into cryptically simple DNA due to substitution mutations that corrupt long stretches of tandemly repetitive sequence. Taken together with our observations from mini-me elements, it appears that some repetitive DNA regions could undergo a cyclic form of molecular evolution mediated by both slippage and substitution mutations, where cryptically simple DNA is converted into microsatellites and vice versa.

Microsatellite Genesis Within mini-me Elements Is Variable Across Taxa
Despite the conservation of the proto-microsatellite sequences, microsatellites are not equally generated within mini-me elements across taxa. This is especially apparent at the (GTCY)n microsatellite, which has expanded to exceptional lengths in numerous mini-me elements. We were able to isolate a total of five mini-me copies from D. virilis, which had an average length of over 11.4 (GTCY)n repeats at the locus, with a range of 0 to 37 repeats. Our elements from the cardini group showed a similar pattern. The five elements we examined had an average of 11.6 repeats (with a range of 4 to 21). Mini-me elements from D. melanogaster, however, have comparatively shorter repeat regions at this locus. The average length of the 10 mini-me copies we used in the phylogenetic analysis was 2.2 tandem (GTCY)n repeats, with a range of 0 to 3. It has been noted that D. melanogaster has unusually short microsatellite loci relative to other taxa (Schug et al. 1998Citation ; Bachtrog et al. 1999Citation ). Indeed much longer microsatellites have been isolated from even closely related taxa, such as D. virilis (Tautz and Renz 1984bCitation ), although no systematic survey has been conducted across the Drosophila genus. Recent studies have also shown that D. melanogaster has an unusually high rate of DNA loss relative to other species (Petrov and Hartl 1997Citation ; Petrov et al. 2000Citation ). If these findings are peculiar to D. melanogaster and not the entire Drosophila genus, they might explain the variation in microsatellite length observed. In the case of mini-me elements, an increased rate of DNA loss might lead to faster degradation of the proto-microsatellite region (lowering the opportunities for DNA slippage) or remove repetitive DNA once slippage has caused it to accumulate.

Contribution of mini-me Elements to Overall Microsatellite Density
Mini-me elements are surely not the source of all microsatellites in the Dipteran genome. However, because these elements occur at a relatively high density in the genome and do regularly decay into microsatellite repeats, they can have a significant effect on the overall microsatellite density for some repeat types. This is definitely the case for the genomewide density of (GTCY)n microsatellites. We performed a BLAST search to survey the number of perfect (GTCC)>=6 and (GTCT)>=6 repeats among Dipterans. Of the 41 microsatellites identified, we found an approximately 2:1 ratio of those located inside of mini-me elements to those not associated with mini-me elements. Furthermore, we found that these types of microsatellites were two to three times as common in the Dipteran genome as arbitrary tetranucleotide repeats with similar base compositions. Because tetranucleotide repeats of any given base composition are relatively rare among Drosophila (Schug et al. 1998Citation ), and presumably other Dipterans, it is easy to envision how elements as common as mini-mes could have a dramatic effect on the relative frequency of any given repeat motif. This is not the case with (TA)n dinucleotide repeats, which are extremely common in the genome (Schug et al. 1998Citation ). We could detect no significant effect of mini-me elements on the overall distribution or frequency of this type of microsatellite.

General Implications For Microsatellite Genesis in Eukaryotes
Mini-me retroposons can clearly be implicated in the widespread genesis of microsatellite DNA in the Dipteran genome. Presently, we do not have examples from non-Dipteran taxa, but mini-me elements may represent a general mechanism of microsatellite genesis common to all organisms that harbor mobile retro-elements. Any such elements that contain preexisting simple or cryptic repeat regions may have the propensity to decay into microsatellite DNA, as we observe in the Dipteran genome. This mechanism offers a broad explanation for the association between mobile elements and microsatellites (e.g., Ramsay et al. 1999Citation ). Unlike the mechanism linking the 3' terminus of Alu elements with polyA microsatellites (Nadir et al. 1996Citation ), mini-me elements illustrate how microsatellites of any motif can be generated at both internal and terminal regions of retroposons.

The two modes of microsatellite genesis that we observe in mini-me elements are broadly applicable to contexts beyond mobile elements in all other eukaryotes. Preexisting tandem duplications or cryptically simple DNAs that are released from selection pressure (as might occur when retroposons integrate into the genome or when genes are retro-transcribed or duplicated; Messier, Li, and Stewart 1996Citation ) may tend to become microsatellites through the mutational processes that we observe in mini-me elements. We contend that this mechanism may be a very widespread source of microsatellite DNA across all taxa.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
Sequence data presented in this article from the species D. dunni and D. nigrodunni can be found in the GenBank database under accession numbers AF317291AF317295.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
We would like to acknowledge T. Diaz, R. W. O'Neill, and J. Kenney for their work in constructing and analyzing the D. dunni and D. nigrodunni genomic libraries; E. Dyreson and other members of the Hollocher lab for helpful discussions; A. Sainz, M. Sainz, and R. Knight for comments on the manuscript; and J. Locke for providing DINE-1 data prior to publication and helpful comments. We also thank two anonymous reviewers for suggestions that helped to improve the manuscript. This work was supported by funds awarded to H.H. by the National Science Foundation, the Alfred P. Sloan Foundation, and Princeton University.


    Footnotes
 
Julian Adams, Reviewing Editor

1 Keywords: microsatellite genesis mobile elements retroposons Diptera Drosophila Back

2 Address for correspondence and reprints: Jason Wilder, Department of Ecology and Evolutionary Biology, Guyot Hall/Washington Road, Princeton University, Princeton, New Jersey 08544. E-mail: jawilder{at}princeton.edu Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 

    Alexander, L., G. Rohrer, and C. Beattie. 1995. Porcine SINE-associated microsatellite markers: evidence for new artiodactyl SINEs. Mamm. Genome 6:464–468.

    Altschul, S., T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.[Abstract/Free Full Text]

    Amador, A., and E. Juan. 1999. Nonfixed duplication containing the Adh gene and a truncated form of the Adhr gene in the Drosophila funebris species group: different modes of evolution of Adh relative to Adhr in Drosophila. Mol. Biol. Evol. 16:1439–1456.[Abstract]

    Arcot, S., Z. Wang, J. Weber, P. Deininger, and M. Batzer. 1995. Alu repeats—a source for the genesis of primate microsatellites. Genomics 29:136–144.

    Ashburner, M., S. Misra, J. Roote et al. (27 co-authors). 1999. An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region. Genetics 153:179–219.

    Bachtrog, D., S. Weiss, B. Zangerl, G. Brem, and C. Schlötterer. 1999. Distribution of dinucleotide microsatellites in the Drosophila melanogaster genome. Mol. Biol. Evol. 16:602–610.[Abstract]

    Cáceres, M., J. Ranz, A. Barbadilla, M. Long, and A. Ruiz. 1999. Generation of a widespread Drosophila inversion by a transposable element. Science 285:415–418.

    Eisen, J. 1999. Mechanistic explanations for variation in microsatellite stability within and between species. Pp. 34–48 in D. Goldstein and C. Schlötterer, eds. Microsatellites: evolution and applications. Oxford University Press, Oxford, England.

    Gallagher, P. C., T. L. Lear, L. D. Coogle, and E. Bailey. 1999. Two SINE families associated with equine microsatellite loci. Mamm. Genome 10:140–144.

    Goldstein, D. B., and D. D. Pollock. 1997. Launching microsatellites: a review of mutation processes and methods of phylogenetic inference. J. Hered. 88:335–342.[ISI][Medline]

    Hagemann, S., W. Miller, E. Haring, and W. Pinsker. 1998. Nested insertions of short mobile sequences in Drosophila P elements. Chromosoma 107:6–16.

    Hall, T. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids. Symp. Ser. 41:95–98.

    Hancock, J. 1999. Microsatellites and other simple sequences: genomic context and mutational mechanisms. Pp. 1–9 in D. Goldstein and C. Schlötterer, eds. Microsatellites: evolution and applications. Oxford University Press, Oxford, England.

    Hasegawa, M., K. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174.[ISI][Medline]

    Levinson, G., and G. Gutman. 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4:203–221.[Abstract]

    Li, W.-H. 1997. Molecular evolution. Sinauer, Sunderland, Mass.

    Locke, J., L. Howard, N. Aippersbach, L. Podemski, and R. Hodgetts. 1999. The characterization of DINE-1, a short, interspersed repetitive element present on chromosome 4 and in the centric heterochromatin of Drosophila melanogaster. Chromosoma 108:356–366.

    Makaowski, W. 1995. SINEs as a genomic scrap yard: an essay on genomic evolution. Pp. 81–103 in R. Maraia, ed. The impact of short interspersed elements (SINEs) on the host genome. R. G. Landes, Austin, Tex.

    Marfany, G., and R. Gonzàlez-Duarte. 1992. Evidence for retrotranscription of protein-coding genes in the Drosophila subobscura genome. J. Mol. Evol. 35:492–501.[ISI][Medline]

    Messier, W., S.-H. Li, and C.-B. Stewart. 1996. The birth of microsatellites. Nature 381:483.

    Nadir, E., H. Margalit, T. Gallily, and S. A. Ben-Sasson. 1996. Microsatellite spreading in the human genome: evolutionary mechanisms and structural implications. Proc. Natl. Acad. Sci. USA 93:6470–6475.

    O'Grady, P. 1999. Revaluation of phylogeny in the Drosophila obscura species group based on combined analysis of nucleotide sequences. Mol. Phylogenet. Evol. 12:124–139.[ISI][Medline]

    Pavesi, A., F. Conterio, A. Bolchi, G. Dieci, and S. Ottonello. 1994. Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions. Nucleic Acids Res. 22:1247–1256.[Abstract]

    Petrov, D. A., and D. L. Hartl. 1997. Trash DNA is what gets thrown away: high rate of DNA loss in Drosophila. Gene 205:279–289.

    Petrov, D., T. Sangster, J. Spencer Johnston, D. Hartl, and K. Shaw. 2000. Evidence for DNA loss as a determinant of genome size. Science 287:1060–1062.

    Ramsay, L., M. Macaulay, L. Cardle, M. Morgante, S. degli Ivanissevich, E. Maestri, W. Powell, and R. Waugh. 1999. Intimate association of microsatellite repeats with retrotransposons and other dispersed repetitive elements in barley. Plant J. 17:415–425.[ISI][Medline]

    Rubinsztein, D. 1999. Trinucleotide expansion mutations cause diseases which do not conform to classic Mendelian expectations. Pp. 80–97 in D. Goldstein and C. Schlötterer, eds. Microsatellites: evolution and applications. Oxford University Press, Oxford, England.

    Schlötterer, C., and J. Pemberton. 1994. The use of microsatellites for genetic analysis of natural populations. Pp. 203–214 in B. Schierwater, B. Streit, G. Wagner, and R. DeSalle, eds. Molecular ecology and evolution: approaches and applications. Birkhauser Verlag, Basel, Switzerland.

    Schug, M., K. Wetterstrand, M. Gaudette, R. Lim, C. Hutter, and C. Aquadro. 1998. The distribution and frequency of microsatellite loci in Drosophila melanogaster. Mol. Ecol. 7:57–70.

    Steinemann, M., and S. Steinemann. 1993. A duplication including the Y allele of Lcp2 and the TRIM retrotransposon at the Lcp locus on the degenerating neo-Y chromosome of Drosophila miranda: molecular structure and mechanisms by which it may have arisen. Genetics 134:497–505.

    Swofford, D. 1998. PAUP* phylogenetic analysis using parsimony (*and other methods). Version 4.0b-4a edit. Sinauer, Sunderland, Mass.

    Tautz, D. 1989. Hypervariability of simple sequences as a general source for polymorphic DNA markers. Nucleic Acids Res. 17:6463–6471.[Abstract]

    Tautz, D., and M. Renz. 1984a. Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Res. 12:4127–4138.

    ———. 1984b. Simple DNA sequences of Drosophila virilis isolated by screening with RNA. J. Mol. Biol. 172:229–235.

    Tautz, D., M. Trick, and G. Dover. 1986. Cryptic simplicity in DNA is a major source of genetic variation. Nature 322:652–656.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.[Abstract]

    Throckmorton, L. 1975. The phylogeny, ecology and geography of Drosophila. Pp. 421–469 in R. King, ed. Handbook of genetics, Vol. 3. Invertebrates of genetic interest. Plenum Press, New York.

Accepted for publication November 14, 2000.