Department of Biological Sciences, Stanford University, Stanford, California
Correspondence: E-mail: ndsingh{at}stanford.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: DNAREP1_DM numt intergenic insertions deletions genome evolution
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
This strong bias toward DNA loss in the D. melanogaster lineage apparently evolved before the separation of D. melanogaster and D. virilis (Petrov, Lozovskaya, and Hartl 1996; Petrov et al. 1998), approximately 60 MYA (Russo, Takezaki, and Nei 1995). In 60 Myr, without a counterbalancing source of DNA addition, small deletions are expected to remove approximately 95% of unconstrained DNA. Thus, one possible explanation for the apparent recent stability of genome size within the D. melanogaster species subgroup is simply that the genomes in this subgroup are composed entirely of functional sequences.
Is it plausible that all sequences in the Drosophila genome are functional and thereby retained by purifying selection? Although this is difficult to judge, the observation that most noncoding sequences evolve very quickly (close to the expected neutral rate of evolution) casts doubt on this interpretation. For instance, a study by Bergman and Kreitman (2001) revealed the presence of only short interspersed blocks of constrained sequences within intergenic and intronic DNA. Overall, less than 30% of intergenic DNA appears to be constrained. Of course, this may be an underestimate if what matters is not the precise sequence but the presence of DNA of particular length at a particular location in the genome. Nevertheless, the current evidence suggests that a significant proportion of noncoding DNA in Drosophila is truly unconstrained at least in its exact sequence content.
Given the strong mutational pressure towards DNA loss that is surely operating in this species group, how then is this unconstrained DNA maintained? The first possibility, as alluded to earlier, is that there is purifying selection acting on the length rather than the sequence of intergenic regions. Under this model, the sequence content may often be of no selective importance, but the lengths of intergenic regions have functional significance and are, therefore, maintained by purifying selection. This model predicts that the length of an intergenic region should remain constant over evolutionary time.
The alternative explanation is that intergenic DNA in Drosophila is maintained through a dynamic equilibrium between large DNA insertions and small DNA deletions (Petrov 2002b). The current measurements of indel biases are limited to small (<400 bp) indels, and we know that among such indels, deletions predominate. However, if insertions are more common among large indels, they could potentially offset the loss of DNA from frequent but small deletions. Although little is known about the rate of large indels, we can surmise that insertions are likely to be more common among those large indels that reach fixation and, thus, ultimately affect the lengths of intergenic loci. In a compact, gene-rich genome, a large deletion is likely to disrupt neighboring genes (Ptak and Petrov 2002) with at least one of its two breakpoints; these large deletions will quickly be removed by strong selection for genic maintenance. Insertions, however, only have one breakpoint, and accordingly, large insertions will have as good a probability in landing in an unconstrained place as small ones. This effect, wherein large insertions have a greater chance of reaching fixation than similarly sized deletions, should be most pronounced in short intergenic regions. Under this model of intergenic DNA maintenance, the predictions are twofold: (1) The length of an intergenic region should vary widely over evolutionary time. (2) The sequence content of orthologous intergenic loci in closely related Drosophila species could differ dramatically as a direct consequence of the balance of these two stochastic processes.
To distinguish between these two hypotheses for the maintenance of intergenic DNA and to start quantifying the rates of DNA addition through large insertions, we chose to study an intergenic region that contained potentially unconstrained DNA sequences whose evolutionary history we could trace. It is possible to identify three nuclear insertions of mitochondrial DNA (numt) into the D. melanogaster genome (Petrov 2002a); the only numt that is sufficiently long for analysis is 566 bp and was inserted on the fourth chromosome approximately 230 kb from the centromere. In addition to the numt insertion, this intergenic locus on the fourth chromosome also contains a single insertion of a nonautonomous DNA element, DNAREP1_DM.
To date, all available evidence suggests that both the numt and the copy of DNAREP1_DM are noncoding and unconstrained at the level of their sequence. Numts have never been seen to retain any coding function in metazoans (Bensasson et al. 2001), most likely because the mitochondrial genetic code in animals is distinct from the nuclear genetic code. In addition, DNAREP1_DM elements have no open reading frame and appear to have been immobile for millions of years (Kapitonov and Jurka 2003). These considerations suggested that the 1200-bp region, including both the numt and the DNAREP1_DM insertion, was a good candidate for an intergenic region that was unconstrained in terms of its sequence content.
However, the location of this locus on the fourth chromosome opened the possibility that this locus was constrained with respect to its length, because it is quite close to genes on either side. The annotated genes that are nearest to this locus are Crk (500 bp downstream) and CG31998 (
3 kb upstream). Crk (synonym: CG1587) seems to be an SH3/SH2 adaptor protein involved in signal transduction and is expressed in the embryo. Although CG31998 (synonyms: CG11578 and CG11572) has no known function to date, the gene prediction is supported by EST data. Interestingly, these two genes are coded in opposite directions, and, accordingly, the region between them, in which our locus is found, is in the potential 5' upstream regulatory region for both genes. As a result, although this 1200-bp region may not be under selection for its sequence content, its length may in fact be constrained.
However, our results failed to show any evidence of selective maintenance of the ancestral length of this region. To the contrary, we argue that this region expanded in the ancestor of the D. melanogaster species complex through the insertions of the numt and several copies of DNAREP1_DM and has been going through persistent, and apparently random, shrinkage since then. Our results demonstrate the power of persistent DNA loss and support the predictions of the model of a dynamic equilibrium between rare but large insertions and more common but smaller deletions. The apparent stability of genomes in the D. melanogaster species complex belies a very rapid sequence turnover; although the amount of intergenic DNA may often be similar in sister species, very little of it may prove to be truly orthologous, even in very closely related species.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
PCR and Sequencing
Amplifying conditions for each of the four species are as follows: D. melanogaster, D. simulans, and D. mauritiana, 94°C for 2 min, 30 cycles of 94°C for 30 s, 59°C for 30 s, 72°C for 30 s, and a final extension of 72°C for 7 min; D. sechellia, 94°C for 2 min followed by 37 cycles of 94°C for 30 s, 55°C for 30 s, 72°C for 2 min, followed by a final extension of 72°C for 7 min. All PCR reactions were 20 µl, and each contained 2 µl 10X Quiagen PCR Buffer, 2 µl 1.25 mM dNTP, 0.2 µl of each 20 µM primer, 0.2 µl Quiagen Taq, 13.4 µl H2O, and 2 µl genomic DNA. Amplifying primers (3844mt ±) were designed from the sequence of D. melanogaster obtained from GenBank, and internal primers were eventually designed from our own sequence data. Two internal primers were designed for D. melanogaster (3844IntF/R), one was designed for D. simulans (3844SimF), and two were designed for D. sechellia (3844SechF1/R1). Primer sequences, 5' to 3' are as follows: 3844mt+, CGA ATA AGC CAA GAA CCC TAA; 3844mt-, CTC CGG TCG CTA TCT GAT; 3844IntF, AAT TGGT TAA AAC TTA ACG AAA AT; 3844IntR, TCT TGT AAA TTT CTA TCG ATT TG, 3844SimF, CTC GAC GTT CAT ACG GAC; 3844SechF1, TAT TTT ATA TGT AAA AAT TGC, 3844SechR1, AGA GAT TTA CTA GAT TCG TTG. PCR reactions were enzymatically cleaned with exonuclease I and shrimp alkaline phosphatase, and were cycle-sequenced in half-strength half-reactions with Big Dye under standard cycling conditions. These reactions were precipitated using ethanol and MgSO4 and sequenced on an ABI 377 sequencer.
DNAREP1_DM Analysis
To test hypotheses regarding the evolutionary history of DNAREP1_DM, we implemented a bioinformatic approach. We used NCBI's version of BlastN, blasting the reported consensus sequence for DNAREP1_DM (Kapitonov and Jurka 1999) against the D. melanogaster genome with the following parameters. The reward for a match, penalty for a mismatch, gap-opening penalty, and gap-extension penalty were 5, -5, 10, and 2, respectively. In addition, we used a word size of 23 bp, a Blast extension dropoff of 15, a final dropoff of 10, and an e-value of 0.01. We retrieved 5,000 one-line descriptions and 5,000 alignments for our genome-wide analyses of this element. Pairwise distances among elements were calculated based on the alignments performed in BlastN, discounting insertions and deletions. The parameters implemented above restrict retrievals to sequences differing by no more than 30% from the sequence query; in this respect, our search criteria were conservative.
To establish relationships of orthology and paralogy among the copies of DNAREP1_DM at our locus, we retrieved 76 fragments of DNAREP1_DM from the fourth chromosome of the D. melanogaster genome using all of the default parameters in NCBI's BlastN. Because these parameters are highly restrictive, the distribution of pairwise distances among these copies is extremely conservative for our purposes.
Sequence Alignment and Statistical Analyses
Sequences were aligned using a combination of Sequencher version 3.1.1 and MacVector version 7. Sequences were considered properly aligned if there was identity of at least 80% over a stretch of nucleotides. Sequences were screened for repetitive elements using RepeatMasker (Smit, A.F.A. & Green, P. RepeatMasker at http://ftp.genome.washington.edu/RM/RepeatMasker.html), which led to the identification of DNAREP1_DM elements in our sequence data. MEGA version 2.1 (Kumar, Tamura, and Nei 1994) was used to calculate and Tajima's D statistic (Tajima 1989). PAUP version 4.0b9 was used for reconstruction of the mitochondrial phylogeny, and was also used to compute pairwise distances among the 76 fourth chromosome copies of DNAREP1_DM.
The tests of goodness of fit were conducted using G-tests (Sokal and Rohlf 1997). Where necessary, the expectations from continuous distributions were converted into expectations for integer counts. For comparisons of molecular rates of evolution, this transformation involved calculating the expected number of substitutions over a particular amount of time for a sequence and comparing it to the observed number of substitutions. In addition, one goodness-of-fit test was performed on levels of polymorphism ( = 4Neµ); this test compared the number of segregating sites in sequences (whose lengths were known). The fit of certain data to the Poisson distribution was ascertained by testing whether the ratio of the observed variance to the observed mean was significantly different from one. The significance was derived from the
2 distribution with n-1 degrees of freedom, where n is the number of observations.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our sequence data revealed evidence of high sequence turnover because only 169 bp of the reconstructed ancestral sequence had been retained in all four species (fig. 1). We inferred that the minimum size of this locus in the most recent common ancestor (MRCA) of these four species was 1,847 bp. This estimate is highly conservative; in all likelihood this locus was over 2.3 kb in the MRCA. Because of the lack of sequence similarity, we were initially concerned that we were not amplifying orthologous loci in these four species. However, the PCR reactions were highly specific, with consistent production of one discrete band of approximately the correct size in each of these four species. In addition, the easily alignable sequences are located immediately adjacent to the primer sites on each side. Finally, comparing the sequence data from D. melanogaster and D. simulans in the region of overlap yielded a Jukes-Cantor substitution distance of 0.158, which is entirely consistent with interspecific divergence calculated from other pseudogene loci (table 1). Below we describe how we inferred the history of this region in detail.
|
|
|
However, in addition to the phylogenetic information implicit in the sequence of the numt, we can also investigate whether the lengths of the numt branches in the two trees are consistent with the known rates of molecular evolution of unconstrained Drosophila sequences. The hypothesis implied by the best tree suggests that the numt is 4.2 ± 0.95 Myr old, inserting in the nuclear genome approximately halfway between 6.2 and 2.3 MYA. Given that the numt is 11.4% different from its mitochondrial ancestor, we can then estimate the rate of evolution in the numt to be between 22 x 10-3 and 35 x 10-3 substitutions/site/Myr. This rate is similar to the rate reported for other Drosophila pseudogenes (33.3 x 10-3 substitutions/site/Myr) (table 1).
In contrast, the insertion of the numt after the split of the D. melanogaster and D. simulans lineages implies that the numt is less than 2.3 Myr old and, thus, has been evolving very quickly (faster than 50 x 10-3 substitutions/site/Myr). This rate is significantly higher than the average rate reported for other Drosophila pseudogenes (P = 0.02, G-test). It is also significantly higher that the rate of evolution at the shared sequence at this locus (P < 0.001, G-test), suggesting that the rate of evolution in this region is not generally elevated. On balance, these results strongly favor the hypothesis of the numt inserting approximately 4 MYA in the ancestor of the D. melanogaster species complex. The absence of the numt in D. simulans, D. mauritiana, and D. sechellia, therefore, implies the loss of this sequence in those lineages since their split from the D. melanogaster lineage.
DNAREP1_DM Analysis
DNAREP1_DM was originally described by Kapitonov and Jurka (1999) as a 594-bp nonautonomous DNA transposon. This element is ubiquitous in the D. melanogaster genome; previous analysis suggested that there were several thousand copies in the genome (Kapitonov and Jurka 2003), and our own analysis, which was restricted to mostly euchromatic sequence with conservative search parameters, yielded almost 1,100 copies of this element scattered all over the genome. The sequence of our region in D. melanogaster contains a single copy of DNAREP1_DM, whereas the sequences in D. simulans, D. mauritiana, and D. sechellia each contain two copies of DNAREP1_DM. To determine the history of the acquisition and loss of these elements in our region, we needed to answer several questions. For instance, we needed to determine both when the DNAREP1_DM elements entered the genome and whether DNAREP1_DM has been transpositionally active in the recent past. In addition, we needed to establish the paralogy and orthology relationships among the seven identified copies of DNAREP1_DM in our region.
![]() |
Evidence for the Burst of DNAREP1_DM Transposition in the Ancestor of the D. melanogaster Species Complex |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
It is entirely possible, however, that DNAREP1_DM has been active, albeit at a lower level, more recently than 4.6 MYA. Assuming that the active sequence has remained the same, we can estimate the proportion of recent transpositions by looking for DNAREP1_DM elements that are more similar to the consensus than expected under the Poisson distribution. The expected divergence of orthologous pseudogenes in D. melanogaster and D. simulans is 7.6%. Our analysis suggests that no more than 4% of all of the copies of DNAREP1_DM in the D. melanogaster genome could have transposed since the speciation of the D. melanogaster complex.
It is similarly possible that some old DNAREP1_DM elements have been duplicated (or even transposed) since their original transposition. Such copies might look old in a comparison with the ancestor, yet would have been inserted in our region recently. We can quantify the likelihood of this possibility by comparing the number of elements that are more similar to each other than expected under the Poisson distribution. We used the sequence of the copy of DNAREP1_DM present at our locus in D. melanogaster as a query for a Blast search and retrieved the 245 best hits for this sequence. This distribution of pairwise distances, with mean 19.9% and variance 17.9%, is not significantly different from a Poisson (P > 0.9, 2 distribution). If we use our sample mean to generate a Poisson distribution, the expected number of comparisons yielding pairwise divergences below and above 7.6% are 0.2 and 244.8, respectively, whereas we observed, respectively, 0 and 245 such comparisons. This strongly suggests that there have not been many recent duplications of this element. Taken together, these analyses indicate that all observed copies of DNAREP1_DM at the studied locus inserted in an ancestor of the D. melanogaster species complex.
![]() |
Paralogy and Orthology Relationships Among the Seven Identified Copies of DNAREP1_DM |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Although it is possible that different pseudogenes evolve at different rates, there is no significant difference in the substitution rate among the known pseudogenes (P > 0.75 for all pairwise comparisons, G-test). Moreover, the fact that the divergence among the orthologous copies of DNAREP1_DM in D. simulans, D. sechellia, and D. mauritiana is entirely consistent with expectation argues against DNAREP1_DM in general evolving at a higher rate than other pseudogenes. Altogether, our analyses suggest that the copy of DNAREP1_DM in D. melanogaster is indeed distinct from both copies present in its sister species, and as a result, that there were at least three copies of DNAREP1_DM at the studied locus in the MRCA of the D. melanogaster species complex.
![]() |
Estimation of the Deletion/Insertion Biases |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
With regard to DNAREP1_DM, we are less confident in the exact sequence of the ancestor, but we do have confidence in its length. We, therefore, decided to avoid the identification of individual deletion and insertion events and instead compare the remaining lengths of DNAREP1_DM elements with the predicted length under exponential deletion-induced decay. The length of a given element is expected to contract exponentially according to the formula: (starting length) x e-dt, where d is the rate of DNA loss per substitution per bp (which is conservatively estimated at 3.8 bp per substitution per bp [Blumenstiel, Hartl, and Lozovsky 2002] in D. melanogaster) and t is time measured in point substitutions per bp. Because DNAREP1_DM elements are on average 15.2% divergent from the ancestral sequence, they should be approximately 61% in length now (e[-3.8 x.152] = 0.61). The lengths of the elements, varying from 57% to 76%, are roughly consistent with these predictions.
Additionally, we can identify individual indels in a more shallow comparison, between D. simulans and D. sechellia (divergence time approximately 0.9 Myr). In this comparison, our observations are also consistent with other studies. The relative ratios of nucleotide substitutions to deletions (19:2) and deletions to insertions (2:0) are similar to ratios from both pseudogene studies (P = 0.98 and P = 0.80, G-test for substitutions versus deletions and deletions versus insertions, respectively) and transposable elements (P = 0.52, P = 0.80, G-test for substitutions versus deletions and deletions versus insertions, respectively).
Polymorphism Within Drosophila melanogaster
We also decided to compare patterns of polymorphism in our region with other known fourth chromosome loci. Using our amplifying primers, approximately 250 bp upstream of the numt and 400 bp downstream, we sequenced the resulting 1.2-kb product in 17 D. melanogaster strains. Fourteen strains came from North America: five strains from Ann Arbor, Mich. (A1, A3, A6, A8, and A18), seven strains from Davis, Calif. (WI1, WI15, WI41, WI45, WI68, WI83, and WI69), one strain from New York (W7), and one strain from Georgia, USA (W22). The remaining strains were collected in Australia (W9), Bermuda (W2), and Kenya (W31).
The polymorphic sites are shown in figure 5. Three sequence-length polymorphisms were detected at this locus, one of 1 bp, one of 6 bp, and one of 156 bp, which was associated with a T to G transversion at the nucleotide position immediately 5' of the deleted bases. Because these length polymorphisms were in regions of this locus that flank the numt and DNAREP1_DM, we cannot determine whether they are products of insertions or deletions. We also detected nine segregating single-nucleotide polymorphisms. The segregating sites (including indel polymorphisms) fall into nine distinct haplotypes, with no clear pattern of geographical structure. There are no recombination events that can be detected among the haplotypes. The number of observed haplotypes is within the range expected under neutrality, given the observed number of segregating sites; both the number of haplotypes (K) and the haplotype diversity (H) are within the 95% confidence interval for the expectation of these parameters under the assumption of no recombination (Depaulis and Veuille 1998). Although sequences for two haplotypes have missing data, the neutrality of the haplotype data is robust to possible findings of any of the polymorphic states within the missing data.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Interestingly, all of the patterns of substitution at our locus matched the expectations based on other unconstrained Drosophila sequences. For instance, we estimated the rate of divergence in the numt to be 27.2 x 10-3 substitutions/site/Myr, which is very close to the rate found in other pseudogenes (table 1). A very similar rate of evolution was observed for orthologous copies of DNAREP1_DM. Additionally, the length evolution of both the numt and the extant copies of DNAREP1_DM also conformed to the expectation derived from neutral sequences in Drosophila; indel sizes and rates matched the expectations deduced from several studies of noncoding DNA in this species group. Furthermore, the overall impact of deletions on the size of DNAREP1_DM copies since their insertion approximately 4.6 MYA is also in agreement with predictions based on previous work. As a result, we see no evidence of either changed mutational patterns in our locus or of selection substantially changing the rates or patterns of molecular evolution.
The Pattern of Intraspecific Variation at the Studied Locus
There has been a substantial effort devoted to understanding the dynamics of nucleotide polymorphism within D. melanogaster, particularly on the fourth chromosome, which facilitates comparative analysis to ensure that our region did not possess exceptional levels of nucleotide polymorphism. Whereas early studies revealed the absence of any observable genetic variation in the very proximal end of the fourth chromosome (Berry, Ajioka, and Kreitman 1991; Hilton, Kliman, and Hey 1994), a recent study (Wang et al. 2002) revealed that the fourth chromosome is organized into alternating blocks of high and low polymorphism (fig. 7). Although Wang et al. (2002) sequenced 18 gene regions, most of their efforts were concentrated in the half of the chromosome distal to the centromere, whereas our region is located in the proximal half of the chromosome. The two loci already characterized at the population level with respect to nucleotide polymorphism that are closest to our region are CG1710 (Wang et al. 2002) and ankyrin (Jensen, Charlesworth, and Kreitman 2002), both located at least 100 kb from the studied region. Ankyrin, which is located 140 kb 3' of our region showed extremely low levels of variation, whereas CG1710 (located 100 kb 5') showed relatively high levels. Our region showed an amount of variation statistically indistinguishable from that in CG1710 (estimates of are 0.0019 and 0.0027 per nucleotide for CG1710 and our region, respectively; P = 0.9, G-test) suggesting that the proximal half of the fourth chromosome, too, may be organized into blocks of high and low variation. Most relevant to this study, however, the amount of genetic variation at the studied region does not appear to be exceptional.
|
Two possibilities can be envisioned: Intergenic sequences might be maintained by (1) selection on length rather than on the exact sequence (selective constraint hypothesis) or by (2) the balance between addition of DNA through large insertions and attrition of DNA through small deletions (dynamic equilibrium hypothesis). The selective constraint hypothesis predicts that intergenic regions should remain stable through time. The lengths of particular loci in extant species should be comparable not only to one another but also to that of the ancestral sequence. In contrast, the dynamic equilibrium hypothesis predicts that individual intergenic regions will go through large fluctuations in size, increasing sharply through large insertions and then continually shrinking from small deletions. Under this model, the maintenance of intergenic DNA would largely occur in aggregate across the whole genome and to a lesser extent at any region in particular.
With respect to the dynamic equilibrium hypothesis, selection on the function of genes will affect the rates of intergenic length evolution by eliminating any indel that removes a functional site (Ptak and Petrov 2002). As an intergenic sequence becomes shorter, for example, such constraint would retard further reduction in length. It is also possible that intergenic regions have both a minimum and a maximum length. When the length becomes very close to the minimum, increases in length from insertions may be promoted by positive selection. More complex phenomena may also be involved, with small deletions becoming slightly deleterious (and insertions slightly advantageous) as the length becomes too short. Conversely, if the intergenic region becomes too long, selection may promote fixations of deletions and retard fixation of insertions. The specifics of these processes are likely to vary significantly among different regions. The critical distinction from the selective constraint hypothesis is that the dynamic equilibrium model postulates that the lengths of intergenic regions may vary substantially between the possible low and high limits without strong impairment of function. The maintenance of the length between such boundaries may then be caused by the neutral or nearly neutral fixation of frequent but small deletions and rare but longer insertions.
The results presented in this paper are consistent with the dynamic equilibrium hypothesis. We documented the insertion of at least three approximately 600-bp transposable elements and one approximately 500-bp sequence of a numt in this region between 3 and 5 MYA in an ancestor of the D. melanogaster species complex. We saw no more insertions in this region since the diversification of the species complex, corresponding to a total of approximately 6.4 Myr of evolutionary time. The current lengths of this region in all of the species are shorter than they were in the ancestor, yet similar to each other because the attrition process has been occurring at similar rates for a similar amount of time. Also consistent with the dynamic equilibrium hypothesis, very little orthologous DNA has remained in all four species.
Although the pattern toward DNA loss is clear, we cannot distinguish the relative contributions of small deletions (<400 bp) versus larger ones to DNA attrition at this locus. Based on previous estimates of the rate DNA loss through small deletions in Drosophila, we expect that since 2.3 MYA, an unconstrained region should retain approximately 75% of its DNA. In comparison, the average amount of DNA retained at the studied locus in the four species since their MRCA is between 77% (based on the minimum estimate of the ancestral length of 1,847 bp) and 62% (based on the more likely estimate of 2,329 bp). Based on these estimates, there appears to be no reason to invoke the effect of deletions longer than 400 bp.
However, the small size of the studied intergenic region (2 kb) and the requirement for a successful PCR and thus the presence of two priming sites approximately 1.2 kb apart has biased our observation against longer indels (Ptak and Petrov 2002). Any deletion larger than 2 kb would by necessity have been missed, although this bias is also present in all previous studies of indels in Drosophila (Pritchard and Schaeffer 1997; Ramos-Onsins and Aguade 1998; Robin et al. 2000; Blumenstiel, Hartl, and Lozovsky 2002; Petrov 2002a). Thus, it is entirely possible that deletions longer than 400 bp both occur with reasonable frequency and contribute to the length evolution of longer intergenic regions.
Our results demonstrate that at least some intergenic loci in Drosophila are substantially longer than the minimum allowable length and that their maintenance in this state may in part be mediated by the interplay between sporadic and long insertions and continuous but smaller deletions. The comprehensive study of the exact balance between mutational and selective forces in the maintenance of intergenic DNA will have to wait until the sequencing of multiple strains of D. melanogaster and its sibling species.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adams, M. D., S. E. Celniker, and R. A. Holt, et al. (100 co-authors). 2000. The genome sequence of Drosophila melanogaster. Science 287:2185-2195.
Bensasson, D., D.-X. Zhang, D. L. Hartl, and G. M. Hewitt. 2001. Mitochondrial pseudogenes: Evolution's misplaced witnesses. Trends Ecol. Evol. 16:314-321.[CrossRef][ISI][Medline]
Bergman, C. M., and M. Kreitman. 2001. Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res. 11:1335-1345.
Berry, A. J., J. W. Ajioka, and M. Kreitman. 1991. Lack of polymorphism on the Drosophila fourth chromosome resulting from selection. Genetics 129:1111-1118.
Blumenstiel, J. P., D. L. Hartl, and E. R. Lozovsky. 2002. Patterns of insertion and deletion in contrasting chromatin domains. Mol. Biol. Evol. 19:2211-2225.
Depaulis, F., and M. Veuille. 1998. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:1788-1790.
Hasegawa, M., and H. Kishino. 1990. Phylogenetic Inference from DNA Sequences. Fourth International Congress of Systematic and Evolutionary Biology. University of Maryland and The Smithsonian Institute.
Hilton, H., R. M. Kliman, and J. Hey. 1994. Using hitchhiking genes to study adaptation and divergence during speciation within the Drosophila melanogaster species complex. Evolution 48:1900-1913.[ISI]
Jensen, M. A., B. Charlesworth, and M. Kreitman. 2002. Patterns of genetic variation at a chromosome 4 locus of Drosophila melanogaster and D. simulans. Genetics 160:493-507.
Kapitonov, V. V., and J. Jurka. 1999. DNAREP1_DM. Repbase Update Release 3. 4.
Kapitonov, V. V., and J. Jurka. 2003. Molecular paleontology of transposable elements in the Drosophila melanogaster genome. Proc. Natl. Acad. Sci. USA 100:6569-6574.
Kumar, S., K. Tamura, and M. Nei. 1994. MEGA: molecular evolutionary genetics analysis software for microcomputers. Comput. Appl. Biosci. 10:189-191.[Abstract]
Petrov, D. A. 2002a. DNA loss and evolution of genome size in Drosophila. Genetica (Dordrecht) 115:81-91.[CrossRef]
Petrov, D. A. 2002b. Mutational equilibrium model of genome size evolution. Theor. Popul. Biol. 61:531-544.[CrossRef][ISI][Medline]
Petrov, D. A., Y.-C. Chao, E. C. Stephenson, and D. L. Hartl. 1998. Pseudogene evolution in Drosophila suggests a high rate of DNA loss. Mol. Biol. Evol. 15:1562-1567.
Petrov, D. A., E. R. Lozovskaya, and D. L. Hartl. 1996. High intrinsic rate of DNA loss in Drosophila. Nature 384:346-349.[CrossRef][ISI][Medline]
Powell, J. R. 1997. Progress and prospects in evolutionary biology: the Drosophila model. Oxford University Press, New York.
Pritchard, J. K., and S. W. Schaeffer. 1997. Polymorphism and divergence at a Drosophila pseudogene locus. Genetics 147:199-208.
Ptak, S. E., and D. A. Petrov. 2002. How intron splicing affects the deletion and insertion profile in Drosophila melanogaster. Genetics 162:1233-1244.
Ramos-Onsins, S., and M. Aguade. 1998. Molecular evolution of Cecropin multigene family in Drosophila: functional genes vs. pseudogenes. Genetics 150:157-171.
Robin, G. C. D. Q., R. J. Russell, D. J. Cutler, and J. G. Oakeshott. 2000. The evolution of an alpha-esterase pseudogene inactivated in the Drosophila melanogaster lineage. Mol. Biol. Evol. 17:563-575.
Russo, C. A. M., N. Takezaki, and M. Nei. 1995. Molecular phylogeny and divergence times of drosophilid species. Mol. Biol. Evol. 12:391-404.[Abstract]
Sokal, R., and F. J. Rohlf. 1997. Biometry. W. H. Freeman (New York).
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-596.
Wang, W., K. Thornton, A. Berry, and M. Long. 2002. Nucleotide variation along the Drosophila melanogaster fourth chromosome. Science 295:134-137.