An Investigation of the Cause of Low Variability on the Fourth Chromosome of Drosophila melanogaster

Martin Carr, Judith R. Soloway, Thelma E. Robinson and John F. Y. Brookfield

Institute of Genetics, University of Nottingham, Queens Medical Centre, United Kingdom


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The fourth chromosome of Drosophila melanogaster lacks meiotic recombination. There is also a lack of nucleotide variation on the chromosome. This lack of variation could have been caused by a recent selective sweep, by background selection, or by a combination of these two forces. It should be possible to differentiate between the two mechanisms by studying the frequencies of polymorphic sites on the chromosome: a selective sweep should have resulted in low-frequency polymorphisms, whereas higher frequency polymorphisms would indicate the action of background selection. We have analyzed retrotransposable element insertions on the fourth chromosome in 11 strains of D. melanogaster. The polymorphisms found have a range of frequencies, with the presence of some insertions with high frequencies suggesting that the lack of variation is the result of background selection. We summarize the data using two statistics: the number of sites shared by more than one of the sample of 11 chromosomes (internal sites) and the mean number of transposable element differences in presence or absence between the sampled chromosomes. Simulations indicate that a selective sweep occurring more than 15,000 (0.03N) generations ago cannot be ruled out from the number of internal sites, although the number of differences between the chromosomes suggests either background selection or a sweep occurring more than 60,000 (0.12N) generations ago. Our results show no homoplasies and are thus consistent with no recombination occurring on the chromosome. The difficulties of distinguishing between the models using polymorphism data are discussed.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The genome of Drosophila melanogaster shows a positive correlation between recombination rate and nucleotide diversity (Begun and Aquadro 1992Citation ; Aquadro and Begun 1993Citation ). Interspecific comparisons have shown that the low-recombination regions do not have reduced rates of evolution, and so the correlation cannot be the result of a direct mutagenic effect of recombination. Rather, the alleles sampled from populations from the low-recombination regions must have more recent shared ancestry than do alleles from regions of higher recombination. Two contrasting theories have been put forward to explain this recent shared descent seen in the low-recombination regions.

An advantageous mutation spreading to fixation in such a region would drag the linked neutral variants to fixation along with it. This phenomenon has been termed a selective sweep and the reduction in linked diversity described as being the result of hitch-hiking (Maynard Smith and Haigh 1974Citation ). The effects of hypothetical selective sweeps have recently been identified in the D. melanogaster genome (Nurminsky et al. 1998Citation ). Whereas the expected genealogy of alleles linked to, but showing some recombination with, an advantageous nucleotide substitution can be complex (Kaplan, Hudson, and Langley 1989Citation ; Stephan, Wiehe, and Lenz 1992Citation ; Barton 1998Citation ), if there is zero recombination in a chromosome, the occurrence and subsequent fixation of a single advantageous mutation would result in the fixation of that chromosomal variant in the population. Polymorphic sites will reappear within the population in time, but as the variants are caused by recent mutation events, they are likely to have low frequencies. A differing explanation for the correlation between recombination and gene diversity is that it is created by the effects of weakly deleterious mutations. There will be a loss of neutral variants because of their linkage to deleterious sites, and this has been termed background selection (Charlesworth, Morgan, and Charlesworth 1993Citation ). As neutral variants are constantly being lost through this process, the overall effect of background selection is to lower the effective size of a population. Because of the continual nature of background selection, the expected coalescent process of a sample of alleles evolving under background selection will be almost exactly the same as a neutral coalescent with a smaller effective size. The distribution of frequencies at variable sites will thus be as expected under neutrality. One consequence is that whereas the expected value of the D statistic of Tajima (1989)Citation will be strongly negative following a selective sweep, as the samples will show an unexpectedly high number of low-frequency sites, the expectation of D will be approximately zero under background selection. It is intuitively obvious that D will be reduced after a selective sweep with no recombination, and even if there is some recombination between the sampled locus and the site of the sweep, an excess of low-frequency sites, and thus a negative D, will still be expected (Braverman et al. 1995Citation ). However, samples of D. melanogaster alleles in low-recombination regions show values of D which are close to zero. This suggests that selective sweeps cannot, on their own, explain the reduction in variability seen in these low-recombination regions.

The ideal test to resolve between the theories, however, would be to look in the regions where there is the minimum amount of recombination, such as the fourth chromosome. The fourth chromosome of D. melanogaster has many unusual features; it is often referred to as a minichromosome as it is made up of roughly 1 Mb of euchromatin (Locke et al. 1999Citation ). Individuals can be viable as monoploids, triploids, and tetraploids, whereas aneuploidy for chromosomes two or three results in death (Hochman 1976Citation ). Much of the noncoding regions consist of repetitive sequences (Locke et al. 1999Citation ), and the polytene chromosome appears diffuse and is similar to ß-heterochromatin. It has long been known that the chromosome does not undergo meiotic recombination under natural conditions (Hochman 1976Citation ). However, recombination does occur if the chromosome is polyploid (Hochman 1976Citation ), under heat shock (Grell 1971Citation ), treated with X-rays (Williamson, Parker, and Manchester 1970Citation ), or if the individual is a recombination-defective meiotic mutant (Sandler and Szauter 1978Citation ).

Sequencing of a kilobase region of the gene cubitus interruptus Dominant by Berry, Ajioka, and Kreitman (1991)Citation in 10 lines of D. melanogaster showed no nucleotide substitutions. The same region was sequenced in nine lines of D. simulans, and only a single nucleotide substitution was found in one line. Despite the lack of variation within each species, the divergence of the gene between the two species was about 5%, which is typical for these species. This suggests that some mechanism of selection is causing the chromosomes to share recent common ancestry within each species. The initial interpretation of these data was that selective sweeps may be occurring (Berry, Ajioka, and Kreitman 1991Citation ; Charlesworth 1992Citation ) in each species, whereas more recent studies have suggested that background selection could be the cause (Charlesworth 1994Citation ; Charlesworth, Jarne, and Assimacopoulos 1994Citation ; Charlesworth, Charlesworth, and Morgan 1995Citation ).

In order to try to resolve which mechanism is acting on the chromosome, we have screened the fourth chromosomes of 11 strains of D. melanogaster for polymorphic sites. As nucleotide polymorphisms are extremely rare on this chromosome, the markers we have used are retrotransposons. Such markers have been used, for example, in determining speciation events in salmonid fish (Murata et al. 1996Citation ). However, we are using them as polymorphisms between individuals of the same species.

We performed in situ hybridization using seven retrotransposon probes. Despite being mildly deleterious to their host, retrotransposons have certain advantages as markers. Unlike class II elements, such as the P element, retrotransposons have no innate excision mechanism. Therefore, once a copy inserts, it will remain at that location, although the chromosome bearing it may be lost by selection or drift. There may be occasional deletions which nonspecifically remove Drosophila DNA, including retrotransposons, and we include the possibility of element loss in some of the simulations below (Petrov, Lozovskaya, and Hartl 1996Citation ; Petrov and Hartl 1998Citation ). The rate of transposition of these elements is sufficiently low, so that if two or more strains of D. melanogaster show the same occupied site, it is more likely, under most population models, to be the result of common ancestry than of independent insertion events. In situ hybridization allows the entire chromosome to be screened, for a particular element, in a single step. The location of the element can be determined to the nearest lettered subdivision using the maps produced by Lefevre (1976)Citation . Whereas the technique only allows localization to the polytene band, and thus insertions of the same element family at the same apparent location in different strains might be many kilobases apart, multiple insertions of this kind would often be expected to create homoplasies in the tree of the chromosomes. We do not see such homoplasies in our results, and in the analysis below, we condition upon an assumption of an absence of apparent homoplasies, but not an absence of multiple insertions into the same site, as detected by in situ hybridization.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Strains
Eleven inbred strains of D. melanogaster were used in the study: Bizarte (Tunisia), Canton S (United States), Draveil (France), Harwich (United States), Kishinev (Moldova), Loua (Zaire), Monty 9 and Monty 20 (France), Petit Bourg (Guadeloupe), Tahiti (Southern Pacific), and Tex (United States). It is known that our Canton S strain has been contaminated by another strain because P elements are present in its genome. However, the contamination is probably old as no heterozygous insertion sites were found, and the Canton fourth chromosome is not identical to any of the others in the study. Flies were raised on a yeast-glucose medium at 18° C as small mass cultures with an equal number of males and females in each vial.

Molecular Biology
The salivary glands of third instar larvae were removed and their polytene chromosomes prepared for in situ hybridization on poly-l-lysine–coated slides. The probes were biotinylated plasmids. The biotin was incorporated on Boehringer Mannheim biotin-16-dUTP via nick translation. Bound probe was visualized using Vector Laboratories' VECTASTAIN Elite ABC kit and diaminobenzidine tetrahydrochloride, darkened with nickel chloride. Prepared slides were washed in 2x standard saline citrate, 1x phosphate-buffered saline (PBS), and 1x PBS/0.1% Triton solutions. The polytene chromosomes were visualized using Giemsa's stain–improved R66 solution. The locations of element insertion sites were determined viewing the fourth and other chromosomes using an objective with 100x magnification.

Bands in division 101 are particularly difficult to determine. Both roo and 297 have two insertion sites in this division; these have been labeled {alpha} and ß, with the more distal sites being termed {alpha} and the proximal sites, ß.

Six of the retrotransposable elements are long terminal repeat (LTR) retrotransposable elements; these are 412 (provided by D. Finnegan), mdg-1, mdg-3, roo, and 297 (provided by K. O'Hare), and opus-2217 in Charlesworth, Lapid, and Canada (1992)Citation (provided by B. Charlesworth). One non-LTR retrotransposon was used, 2156 (provided by B. Charlesworth).

Simulations
In order to test the power of the data to resolve between the hypotheses, simulated phylogenetic trees were created using a coalescent approach, upon which mutations were created.

Creating the Phylogenies—Background Selection
The sample is modeled using a coalescent process with an effective size of Nb (greatly reduced relative to the census population size because of the effects of background selection). Following Hudson (1990)Citation , the time of a coalescent starting with i lineages is obtained by selecting a random number P in the range 0–1, and then calculating the coalescence time from

The pair of lineages coalescing is chosen randomly from the i lineages still present. The process is started with i = 11, the number in our sample, and continued for the 10 coalescent events required to reduce the number of lineages to one.

Creating the Phylogenies—Selective Sweeps
Random trees are created as above, except that now the population size is not treated as constant. The size is constant recently, but during the period of the selective sweep prior to this, the population size is treated as being that of the subset of the population possessing the advantageous allele sweeping through. It is assumed that there is no background selection in this model, and the effective diploid population size after the selective sweep event is the same as that of the other autosomes, and is symbolized by N. The last selective sweep affecting the chromosome is assumed to have reached fixation T generations ago. Prior to this, the proportion of the population which possessed the advantageous allele spreading to fixation, and which thus could have been ancestors of the alleles in the sample, was increasing, and thus diminished as one follows time backwards in the coalescent process. Using an argument analogous to that of Stephan, Wiehe, and Lenz (1992)Citation and Braverman et al. (1995)Citation , we say that T generations ago, the wild-type allele (being replaced by a codominant allele with a selective advantage of s) had been reduced to one copy, and the frequency of the wild-type allele t generations earlier than T would be est/(2N + est). The frequency of the advantageous mutant allele would thus be 2N/(2N + est). The number of copies of the advantageous allele T + t generations ago would thus be 4N2/(2N + est).

If we have a coalescent process starting with i lineages {tau} generations into the selective sweep phase (with time being read backwards, i.e., T + {tau} generations ago), then if we are t - {tau} generations into this coalescent process, the probability of a coalescent event in a short time {delta}t, which we can call {delta}P (where P is the probability of a coalescence by time t), is given by


Integrating and taking limits implies


where {tau} is the time of the start of the ith coalescent. Thus, during the selective sweep phase, timings between coalescents are found by taking random P values and, by a process of iteration, finding t. The spread of the advantageous mutation is here treated as deterministic. Whereas one would expect some stochasticity during the initial spread of an advantageous mutation, particularly because a low-frequency advantageous mutation that ends up being fixed will typically spread more rapidly than expected, given its selective advantage (Barton 1998Citation ), this will have only a minor effect on the expected genealogy under the selective sweep model, when, as here, T > 1/s.

Making the Mutations
It is assumed that the ancestral fourth chromosome has x element insertions. There are 56 distinguishable mutation events which are possible, representing eight chromosomal divisions and seven element families. As a given chromosomal division could contain more than one element of a given family, an array is defined in which each lineage has a certain number of insertions in each of the 56 boxes. The ancestral chromosome has x of the boxes with exactly one insertion, and the rest with none. The mutational events of each of the 56 boxes in each of the branches of the phylogeny are considered in turn. The rate of insertion of any element into any chromosomal division is assumed to have a constant expectation of µ per generation, and the rate of loss of any element is yµ. Thus, y represents how many times more likely it is that an element will be lost from an occupied division than that an element of a given family will be inserted into a previously empty division.

For any branch and any box, therefore, the expected number of new insertions is time x µ, where time is the length of the branch in generations. With probability 1 - exp(time x µ), which is approximately time x µ if this is small, the number of elements in the box is increased by one. If there are m elements in the box at the start of a branch, then with probability 1 - exp(-time x µym), which is approximately time x µym if this is small, the number of elements in the box is decreased by one.

There are three ways in which an insertion of a given element into a given division can have a frequency greater than 1 in the sample, and yet not be fixed. Firstly, if x > 0, an insertion could be descended from an element present in the ancestral chromosome, which has been deleted in some of the descendants. Secondly, the insertion, although a unique event, could be an insertion into a branch which has multiple descendants in the sample. Thirdly, there could be more than one insertion of the same type, occurring independently in different branches of the tree. Each process would create a site frequency greater than 1, but in the first and the third cases, but not in the second, there would be a possibility that homoplasy would be detected in the data. For this reason, once the mutations have been placed on the tree, all site-element combinations with a frequency above 1 in the sample are examined in all possible pairwise combinations, and for each pair, we see if there is evidence for homoplasy. In this context, a pair of sites are said to show evidence for homoplasy if all the four combinations are seen: (++), (+-), (-+), and (--), where (++), for example, represents the presence, in at least one individual in the sample, of elements at both sites.

Once the tree has been created and the mutations placed on the tree, the simulated genotypes of the sample are examined. If n is the total number of site-element combinations occupied in at least one of the individuals of the observed sample, we see if the number of sites in the simulated sample is n. If it is, the tree is retained provided that the homoplasy test shows no homoplasious combinations of sites.

For each type of simulation, background selection and selective sweep models with different values of T, µ are chosen by trial and error such that the average n in the simulated data matches the n observed. s is always 0.005 in the selective sweep models. Then large numbers of simulations are performed, and from each of those retained in the analysis (retained because there are n sites and no homoplasies), we record the number, j, of what we call internal sites. These are sites where more than one chromosome in the sample possesses the site, and more than one chromosome lacks it. j can thus range from 0 to n. n - j is defined as the number of external sites. Thus, the expected frequency distribution of j (given n and the model) is observed. In addition, for simulations retained in the analysis, and thus with n sites, we observe ki (the number of chromosomes occupied out of 11) of the ith site in the sample, and calculate the mean number of differences between the chromosomes in the sample by


We thus also compare the mean number of differences between the chromosomes with the mean number in the data.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
A total of 16 insertion sites were found on the fourth chromosomes of the 11 strains studied (Table 1 ). The element mdg-3 was the only one which was not found on the fourth chromosome of any strain. In contrast to mdg-3, 297 (which has previously been shown to be comparatively common on the fourth chromosome [Charlesworth, Lapid, and Canada 1992Citation ]) was present on 9 of the 11 fourth chromosomes inspected. The frequencies of the sites occupied on fourth chromosomes range from 0.091 to 0.364.


View this table:
[in this window]
[in a new window]
 
Table 1 Distribution of Retrotransposons in the Fourth Chromosomes of 11 Drosophila melanogaster Strains

 
Independent insertions into the same chromosomal division will elevate site frequencies. However, three of the four strains which have 297 in the same position on their fourth chromosomes also have an opus insert in the same location. In addition to this, two of these strains also share a 412 insertion in the same location. The two strains which share two mdg-1 sites also share a 297 site. The results are congruent with a bifurcating tree of the fourth chromosome, and there is no need to infer independent insertions or recombination between strains. The majority of sites have low frequencies, with only a small number of higher-frequency polymorphisms (Table 2 ).


View this table:
[in this window]
[in a new window]
 
Table 2 Numbers of Chromosomes Occupied (out of 11) for the 16 Retrotransposon Sites on the Fourth Chromosome

 
The insertions can be used to estimate a phylogenetic tree of the fourth chromosome (fig. 1 ). As there are only data from seven retrotransposable elements and 16 different insertion sites, the tree is not fully resolved. The 11 strains fall into six distinct clades.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 1.—Topological tree of the fourth chromosome. Insertions of elements are shown

 
The tree shows no geographical partitioning in the evolution of the fourth chromosome. The largest clade contains strains from France, the United States, Tunisia, and Guadeloupe. The other two multistrain clades contain strains from Zaire and France, and the United States and the Southern Pacific, respectively. Eight strains from three geographical areas, Bizarte and Loua from Africa, Draveil, Monty 9, and Monty 20 from France, and Canton S, Harwich, and Tex from the United States, were used in the study. None of the strains from the same geographical region fall within the same clade. This implies that the structure of the tree does not result from geographical subdivision in the ancestors.

The data show the number of variable element-division combinations to be 16, and the number of sites found more than once in the sample (internal sites) to be seven. Thus, in the simulations, we tried a background selection model and many selective sweep models, differing in the time to the selective sweep, and adjusted the insertion rate until the expected number of insertion sites was 16.

From the data, Tajima's D has a value of minus 0.905. This is not significantly different from 0, but the inaccuracy of the infinite sites model implies that D would be expected to be weakly positive, given neutrality.

The initial simulations assumed that there was no deletion process, i.e., y = 0. Consequently, as there were no sites fixed in the sample, the number of sites in the ancestral chromosome, x, must also be zero.

The background selection model used Nb = 2,000. For this model, the µ value which gives an expected n of 16 is 1.48 x 10-5 per element per division per generation. The selective sweeps all use N = 5 x 105 and s = 0.005. T has been varied, and for each T, a value of µ has been found that gives a mean number of 16 sites in the simulated trees. Thus, µ decreases with increasing T.

For each model, it is possible to observe the proportion of simulated trees in which there are a total of 16 sites and no homoplasies. For example, for the background selection model, 14,850 of 500,000 simulated trees met these two criteria. From these successful trees, we can observe the proportions of trees with all the possible numbers of internal sites (sites seen more than once and less than 10 times in the sample). Of the 14,850 successful trees, 842 or 5.67% had exactly the same number of seven internal sites as was seen in the data. The simulations also create distributions of the values of the mean number of differences between the chromosomes when 16 sites are present. Our distribution of site frequencies means that the 16 sites create a gene diversity of 4.218, i.e., two randomly chosen chromosomes show an average of 4.218 differences in the presence or absence of elements. This value is less than that expected under the background selection model with a total of 16 sites, as predicted from the negative D of Tajima, but more than that expected under any of the selective sweep models tried.

Table 3 shows the proportions of successful trees that had seven internal sites for each of the models tried. As can be seen, the likelihood of seven successful sites is reasonably high for the background selection model, very low for selective sweep models with low T, and quite high for selective sweep models with reasonably high T (above 50,000 generations to the sweep). Figure 2 shows the distribution of the numbers of internal sites in the simulations for the selective sweep model with T = 2,000, the selective sweep model with T = 100,000, and the background selection model. Even though the probabilities of the seven internal sites under the latter two models are similar, the overall distributions are very different.


View this table:
[in this window]
[in a new window]
 
Table 3 Proportions of Successful Trees that Have Seven Internal Sites and the Proportions with the Observed Mean Number (or more) of Differences Between Chromosomes, for a Background Selection Model and for a Set of Differing Selective Sweep Models. The Background Selection Model Has a Population Size of 2000, and for the Selective Sweep Models, the Population Size After the Sweep Was 500,000. s Is 0.005 During the Sweep

 


View larger version (16K):
[in this window]
[in a new window]
 
Fig. 2.—The distribution of the numbers of internal sites out of 16 under the selective sweep model and two selective sweep models with differing T. An internal site is defined as one seen between two and nine times in the sample of 11 chromosomes. Our data showed seven internal sites. {diamondsuit}, Selective sweep: T = 100,000; {blacksquare}, selective sweep: T = 2,000; {blacktriangleup}, background selection

 
Figure 3 shows the variation in the probability of the seven internal sites as a function of the time to the selective sweep, T, using the data from Table 3 .



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 3.—The percentage of trees with seven internal sites for the selective sweep models with differing T. Data are taken from Table 3

 
As can be seen, only the recent selective sweeps, under 50,000 (0.1N) generations ago, give probabilities of the seven internal sites that are much less than the 5.67% expected under background selection. Indeed, a likelihood ratio of 20 in favor of the background selection model requires that the selective sweep have taken place more recently than 15,000 (0.03N) generations ago; thus, using this heuristic argument, only selective sweeps that were this recent can be said to be shown to be unlikely by our data. Clearly, the earlier that the most recent selective sweep took place, the less would be the reduction in fourth-chromosome nucleotide variability expected in a sample of chromosomes of today. For each selective sweep model, we optimize µ such that the expected number of sites is 16. As the expected number of sites corresponds closely to this mutation rate multiplied by the total branch length, the total expected branch length will vary with the reciprocal of the estimated mutation rate. A selective sweep 20N generations ago, for example, requires a mutation rate of 0.059 x 10-6, which we take as the mutation rate required to create 16 variable sites in the absence of any selective sweep. The nucleotide variation on the fourth chromosome, as measured by the number of variable sites in a sample of 11 chromosomes, will increase with the total branch length. Thus, each value of T tried, in addition to yielding a likelihood ratio in favor of background selection, will yield, through its optimized value of µ, a prediction of the expected reduction in nucleotide variability (measured by the number of variable sites) compared with a population in which no sweep occurred.

The relationship between the likelihood ratio from our data and the expected reduction in nucleotide variability on the fourth chromosome is shown in figure 4 . Thus, if one believed that the nucleotide variation on the fourth chromosome was 5% of that of chromosomal regions unaffected by background selection or sweeps, then our data would produce a likelihood ratio of around 10 in favor of background selection. If one believed that nucleotide variability on the fourth chromosome was 20% of that in chromosomes unaffected by background selection or sweeps, our data would be more consistent with the selective sweep model.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 4.—The relationship between the likelihood ratio in favor of background selection supplied by our data when compared to different selective sweep models and the expected number of polymorphic nucleotide sites in 11 chromosomes under the relevant selective sweep model, relative to the number of polymorphic sites in a freely recombining chromosome. The data points represent the likelihood ratios when background selection is compared to selective sweeps occurring (from left to right) 2,000, 5,000, 10,000, 20,000, 40,000, 60,000, 80,000, 100,000, 120,000, and 140,000 generations ago

 
Table 3 also shows the probabilities under each selective sweep model of as high a mean difference between chromosomes as that seen in the data. The probability of the exact mean difference of 4.218 changes between random chromosomes seen in the data is very low for all models, but the probabilities in the last column of Table 3 allow all the selective sweep models with the sweep more recent than 60,000 (0.12N) generations ago to be rejected at the 5% level. The evidence against a selective sweep within the last 10,000 (0.02N) generations is significant at the 0.1% level. The background selection model, while predicting a higher mean difference between the chromosomes conditional upon 16 sites, nevertheless yields a mean difference as low as that seen in 17.2% of the simulations, or lower. There is thus no significant evidence against the background selection model from this statistic.

The estimated insertion rates can be converted into transposition rates per element copy in the following way. As µ is the rate of insertion of a given family into a fourth chromosomal division, and as each fourth chromosomal division is around 0.166% of the total genome, the rate of insertion per genome, per element family, is around 600µ. If each family is represented by 30 element copies per haploid genome, the rate of transposition per element copy is 20µ. Thus, from the selective sweep model results in Table 3 , transposition rates per element copy range from 0.56 x 10-5 to 10.2 x 10-5. These rates are consistent with independent estimates of the rates of retrotransposon movement, e.g., those described by Charlesworth, Sniegowski, and Stephan (1994)Citation as "of the order of 10-4 or 10-5." Thus, the mutation rates required to create these insertion numbers cannot be used to resolve the timing of any selective sweep.

These simulations have all assumed an absence of insertion sites in the ancestral chromosome. However, if there were sites in the ancestral chromosome, these could have been lost in some or all of the descendant chromosomes, with the result that a site in the frequency range from two to nine chromosomes (an internal site) could be descended from a site in the ancestral chromosome which has been deleted in the ancestor(s) of two or more of the chromosomes in the sample. Thus, we repeated the simulations for T = 10,000, but now allowing y to be 1 or 5, and allowing x to range from 0 to 3. The results are shown in figure 5ac. Figure 5a shows the distribution of external sites when a single site was present in the ancestral chromosome, and with three y values (relative deletion rates) of zero, one, and five times the insertion rate. There is very little difference between the three distributions. The same is seen in figure 5b. Here, the distributions of external sites, when the deletion rate is equal to the insertion rate, are shown when there are zero to three sites in the ancestral chromosome. The corresponding distributions when the deletion rate is five times the insertion rate are shown in figure 5c. However, the higher the deletion rate, the lower the probability that the tree will be accepted because there will be a much higher probability of apparent homoplasies. Thus, with the deletion rate five times the insertion rate in figure 5c, the proportions of trees accepted, relative to the corresponding proportions for the equal rates simulations in figure 5b, are 99%, 72%, 57%, and 47% for zero, one, two, and three initial sites, respectively.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 5.—The effect of sites in the ancestral chromosome. T = 10,000, and s = 0.005 in all these simulations. (a) The distributions of the numbers of internal sites as a function of the relative deletion rate, y, when the number of elements in the ancestral chromosome, x, is one. —{blacktriangleup}—, y = 0; ...x..., y = 1; - -{blacksquare}- -, y = 5. Insertion rates, µ, are 2.15 x 10-6, 2.21 x 10-6, and 2.54 x 10-6 for the three cases, respectively. (b) The distributions of the number of internal sites as a function of the number of sites in the ancestral chromosome, x, when y = 1. Insertion rates, µ, for x = 0, 1, 2, and 3 are 2.34 x 10-6, 2.21 x 10-6, 2.09 x 10-6, and 1.97 x 10-6, respectively. (c) The distributions of the number of internal sites as a function of the number of sites in the ancestral chromosome, x, when y = 5. Insertion rates, µ, for x = 0, 1, 2, and 3 are 2.72 x 10-6, 2.54 x 10-6, 2.38 x 10-6, and 2.21 x 10-6, respectively

 
When x is 0, there are still many trees rejected in the simulations because of apparent homoplasies. The proportion of trees rejected for this reason varies in the simulations reported in Table 3 from 47.3% (for background selection) to 64.3% (for the selective sweep with T = 140,000).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We found 16 retrotransposable element insertion sites within the fourth chromosomes of 11 different strains of D. melanogaster. Only nine of these sites were unique to particular strains. No homoplasies were found, but it is probable that some of our shared sites might result from independent insertions. The fact that only eight chromosomal divisions can be distinguished means that the situation is unlikely to fit exactly with the infinite sites model for mutation. This is illustrated by considering our background selection simulation. This model uses the neutral coalescent upon which insertion mutations are placed. For an effective size of 2,000, we found that the mutation rate that maximized the probability of seeing 16 sites in the sample was 1.48 x 10-5. According to the infinite sites model, the expected total number of 56 types of insertions in the tree, not all of which will be distinguishable from each other, in a sample of 11, is given by


where µ is the rate of each one of the 56 types of insertion possible. In our case this is 1,312,178µ, but with µ = 1.48 x 10-5, the expected number of sites is 19.4, not the 16 observed, and the difference is the result of independent insertions of the same element into the same chromosomal division in different lineages. A few of these are thus typically occurring in the simulations, even when no homoplasies are apparent.

There is evidence that transposable elements have a weakly deleterious effect on their bearers, and this may be reflected in our data showing slightly fewer sites with nonunique frequencies than is expected under the background selection model. The mean number of differences between chromosomes is also reduced. It could be argued that retrotransposable element insertion sites are not a good choice of marker on account of their deleterious effects. However, the impact of weak selection will be to lower the frequencies of sites in the sample relative to the neutral expectation, and thus lower the proportion of internal sites and the mean number of differences between the chromosomes. This is expected by theory and confirmed by our simulations (results not shown here). Thus, as selection would make the data resemble those expected under selective sweeps, we can regard our use of neutral simulation, and the conclusion that background selection is occurring, as a conservative interpretation.

The weakly deleterious effect of insertions is also probably why, despite most chromosomes in the sample having at least one element insertion, it appears that the ancestral chromosome possessed no elements. A chromosome free of harmful elements is more likely to end up being ancestral. If a deletion process exists, it is possible that an internal site could be derived from a site present in the ancestor, but simulations with ancestral sites and a deletion process yield probability distributions for internal sites (fig. 5 ) that are very similar to those generated under an insertion process alone. This similarity is probably because if deletions occurred early, or more than once, the tree would have a high chance of showing homoplasy, and thus not being included in the analysis. It is possible that a hybrid selective sweep model could be produced. This model would include sites in the original chromosome spread by a selective sweep, followed by weak selection for one or more variants which had lost the ancestral site(s). This hybrid model might give a higher number of internal sites than are calculated under these neutral models. However, such a complex model would succeed because a series of selective sweeps were being postulated, the last of which had not reached fixation, and the resulting genealogy was mimicking that expected under neutrality. As selection can theoretically take any form, it is always possible to produce selective models which resemble the predictions of neutrality.

Our data are consistent with background selection, but are also consistent with a reasonably ancient selective sweep. The one published study of variation on the fourth chromosome (Berry, Ajioka, and Kreitman 1991Citation ) revealed no variation at all, and thus, were one to explain these data with a selective sweep, the maximum likelihood estimate of the time of the sweep would be the present. However, these authors interpreted their data as suggesting a sweep 0.28N generations ago. For our N, this corresponds to 140,000 generations ago. A sweep at this time is even more consistent with our number of internal sites than background selection. However, this estimate of 0.28N generations is based on an assumption of a flat prior probability distribution of the timing of the most recent selective sweep, and can be regarded as inflated to an entirely arbitrary degree. Indeed, another analysis of the same results estimated a much more recent sweep (Charlesworth 1992Citation ).

This discussion has focused on whether the data support the background selection model or the selective sweep model, as if these are simple alternatives and exactly one will be correct. However, both processes could be acting simultaneously, and even if a selective sweep were primarily responsible for the reduction in fourth-chromosome variability, it seems very probable that background selection (which is an inevitable consequence of deleterious mutation) must be playing a minor role also. Furthermore, other evolutionary models can also predict the correlation between recombination rate and genetic variability. In particular, the TIM model, created by Takahata and co-workers (Takahata, Iishi, and Matsuda 1975Citation ; Takahata and Kimura 1979Citation ) and discussed by Gillespie (1994)Citation is expected to create datasets which look quite like ours. This model postulates selection coefficients which vary randomly with time. It predicts a reduction in diversity in low-recombination regions, but with a negative Tajima's D of around -1 (Gillespie 1994Citation ). However, there are also some problems with this model. The time-dependent changes in selection intensity seem intrinsically unlikely. Furthermore, Takahata and Kimura (1979)Citation suggest that if sufficient selected sites are linked together, the potential for temporally varying selection to lower variability will be reduced. With 83 genes on the fourth chromosome, it seems likely that the number of selected sites under selection would be too high for the TIM model to predict a major loss of variability.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank Brian Charlesworth and Kevin O'Hare for supplying transposable element plasmids, and two anonymous referees for their comments on an earlier draft of this manuscript. M.C. was supported by the Natural Environmental Research Council (U.K.) studentship GT4/96/220/T.


    Footnotes
 
Jeffrey Long, Reviewing Editor

Keywords: retrotransposons background selection Drosophila melanogaster fourth chromosome Back

Address for correspondence and reprints: John Brookfield, Institute of Genetics, University of Nottingham, Queens Medical Centre, Nottingham NG7 2UH, United Kingdom. john.brookfield{at}nottingham.ac.uk . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Aquadro C. F., D. J. Begun, 1993 Evidence for and implications of genetic hitchhiking in the Drosophila genome Pp. 159–178 in N. Takahata and A. G. Clark, eds. Mechanisms of molecular evolution. Sinauer, Sunderland, Mass

    Barton N. H., 1998 The effect of hitch-hiking on neutral genealogies Genet. Res. Camb 72:123-133[ISI]

    Begun D. J., C. F. Aquadro, 1992 Levels of naturally occurring DNA polymorphism correlate with recombination rates in Drosophila melanogaster Nature 356:519-520[ISI][Medline]

    Berry A. J., J. W. Ajioka, M. Kreitman, 1991 Lack of polymorphism on the Drosophila fourth chromosome resulting from selection Genetics 129:1111-1117[Abstract/Free Full Text]

    Braverman J. M., R. R. Hudson, N. L. Kaplan, C. H. Langley, W. Stephan, 1995 The hitchhiking effect on the site frequency spectrum of DNA polymorphisms Genetics 140:783-796[Abstract/Free Full Text]

    Charlesworth B., 1992 Evolutionary biology—new genes sweep clean Nature 356:475-476[ISI][Medline]

    Charlesworth B., 1994 The effect of background selection against deleterious mutations on weakly selected linked variants Genet. Res. Camb 63:213-227[ISI][Medline]

    Charlesworth D., B. Charlesworth, M. T. Morgan, 1995 The pattern of neutral molecular variation under the background selection model Genetics 141:1619-1632[Abstract/Free Full Text]

    Charlesworth B., P. Jarne, S. Assimacopoulos, 1994 The distribution of transposable elements within and between chromosomes in a population of Drosophila melanogaster. III. Element abundances in heterochromatin Genet. Res. Camb 64:183-197[ISI][Medline]

    Charlesworth B., A. Lapid, D. Canada, 1992 The distribution of transposable elements within and between chromosomes in a population of Drosophila melanogaster. II. Inferences on the nature of selection against elements Genet. Res. Camb 60:115-130[ISI][Medline]

    Charlesworth B., M. T. Morgan, D. Charlesworth, 1993 The effect of deleterious mutations on neutral molecular variation Genetics 134:1289-1303[Abstract/Free Full Text]

    Charlesworth B., P. Sniegowski, W. Stephan, 1994 The evolutionary dynamics of repetitive DNA in eukaryotes Nature 371:215-220[ISI][Medline]

    Gillespie J. H., 1994 Alternatives to the neutral theory Pp. 1–17 in B. Golding, ed. Non-neutral evolution: theories and molecular data. Chapman and Hall, New York

    Grell R. F., 1971 Heat induced exchange in the fourth chromosome of diploid females of Drosophila melanogaster Genetics 69:523-527[Free Full Text]

    Hochman B., 1976 The fourth chromosome of Drosophila melanogaster Pp. 903–928 in M. Ashburner and E. Novitski, eds. The genetics and biology of Drosophila, Vol. 1b. Academic Press, New York

    Hudson R. R., 1990 Gene genealogies and the coalescent process Pp. 1–44 in D. Futuyama and J. Antonovics, eds. Oxford surveys in evolutionary biology. 7th edition. Oxford University Press, Oxford

    Kaplan N. L., R. R. Hudson, C. H. Langley, 1989 The "hitchhiking effect" revisited Genetics 123:887-899[Abstract/Free Full Text]

    Lefevre G., 1976 A photographic representation of the polytene chromosomes of Drosophila melanogaster salivary glands Pp. 31–36 in M. Ashburner and E. Novitski, eds. Genetics and biology of Drosophila, Vol. 1a. Academic Press, London

    Locke J., L. Podemski, K. Roy, D. Pilgrim, R. Hodgetts, 1999 Analysis of two cosmid clones from chromosome 4 of Drosophila melanogaster reveals two new genes amid unusual arrangement of repeated sequences Genome Res 9:137-149[Abstract/Free Full Text]

    Maynard Smith J., J. Haigh, 1974 The hitch-hiking effect of a favourable gene Genet. Res. Camb 23:23-35[ISI][Medline]

    Murata S., N. Takasaki, M. Saitoh, H. Tachida, N. Okada, 1996 Details of retropositional genome dynamics that provide a rationale for a generic division: the distinct branching of all the pacific salmon and trout (Oncorhynchus) from the Atlantic salmon and trout (Salmo) Genetics 142:915-926[Abstract/Free Full Text]

    Nurminsky D. I., M. V. Nurminskaya, D. De Aguiar, D. L. Hartl, 1998 Selective sweep of a newly evolved sperm-specific gene in Drosophila Nature 396:572-575[ISI][Medline]

    Petrov D. A., D. L. Hartl, 1998 High rate of DNA loss in the Drosophila melanogaster and Drosophila virilis species groups Mol. Biol. Evol 15:293-302[Abstract]

    Petrov D. A., E. R. Lozovskaya, D. L. Hartl, 1996 High intrinsic rate of DNA loss in Drosophila Nature 384:346-349[ISI][Medline]

    Sandler I., P. Szauter, 1978 The effect of recombination-defective meiotic mutants on the fourth-chromosome crossing over in Drosophila melanogaster Genetics 90:699-712[Abstract/Free Full Text]

    Stephan W., T. H. E. Wiehe, M. W. Lenz, 1992 The effect of strongly selected substitutions on neutral polymorphism—analytical results based on diffusion theory Theor. Popul. Biol 41:237-254[ISI]

    Tajima F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism Genetics 123:585-595[Abstract/Free Full Text]

    Takahata N., K. Iishi, H. Matsuda, 1975 Effect of temporal fluctuations of selection coefficient on gene frequency in a population Proc. Natl. Acad. Sci. USA 72:4541-4545[Abstract]

    Takahata N., M. Kimura, 1979 Genetic variability maintained in a finite population under mutation and autocorrelated random fluctuation in selection intensity Proc. Natl. Acad. Sci. USA 76:5813-5817[Abstract]

    Williamson J. H., D. R. Parker, W. G. Manchester, 1970 X-ray induced recombination in the fourth chromosome of Drosophila melanogaster females Mutat. Res 9:299-306[ISI][Medline]

Accepted for publication August 21, 2001.