Department of Ecology and Evolutionary Biology, University of California, Irvine
Correspondence: E-mail: sjm{at}uci.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: regulatory DNA enhancer conservation selection population structure
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Genes of the Enhancer of split Complex (E(spl)-C) act at the end of the Notch signaling pathway, and in D. melanogaster, the complex harbors 12 transcription units: the E(spl)bHLH genes, m, m
, mß, m3, m5, m7, and m8 are seven Notch-responsive basic helix-loop-helix (bHLH) transcription factors that act to repress neural cell fate (Delidakis and Artavanis-Tsakonas 1992; Knust et al. 1992; Jennings et al. 1994), and the E(spl)Brd genes, m
, m2, m4, and m6 are four Bearded family genes, overexpression of which antagonizes Notch signaling activity (Lai et al. 2000; Lai, Bodner, and Posakony 2000). The single gene m1 appears unrelated to the Notch pathway and likely encodes a protease inhibitor of the Kazal family (Wurmbach, Wech, and Preiss 1999; Lai, Bodner, and Posakony 2000). The different transcripts show marked differences in imaginal disc gene expression patterns (de Celis et al. 1996; Singson et al. 1994), implying the genes are functionally separable. This is further evidenced by the preservation of E(spl)-C gene number and organization in D. hydei (Maier et al. 1993), a species that diverged from D. melanogaster around 60 MYA.
Upstream of each of the E(spl)-C genes, an array of activator/repressor protein-binding sites have been identified (Tietze, Oellers, and Knust 1992; Kramatschek and Campos-Ortega 1994; Eastman et al. 1997; Nellesen, Lai, and Posakony 1999; Lai et al. 2000; Lai, Bodner, and Posakony 2000). The variety of imaginal expression patterns exhibited by the endogenous E(spl)-C genes can be replicated purely by these short enhancer regions in vivo; that is, transcription is independently regulated for each gene, and transcriptional control is primarily local (Bailey and Posakony 1995; Nellesen, Lai, and Posakony 1999; Cooper et al. 2000).
In addition to cis-regulatory elements, several classes of 3' UTR regulatory motifs are also known from E(spl)-C genes (Leviten, Lai, and Posakony 1997; Lai, Burks, and Posakony 1998; Nellesen, Lai, and Posakony 1999; Lai et al. 2000; Lai, Bodner, and Posakony 2000; Lai 2002). These have been shown to negatively regulate transcript accumulation and elicit phenotypic changes in the number of adult bristles (Lai and Posakony 1997; Lai, Burks, and Posakony 1998). The motifs are perfectly complementary to a subset of microRNAs, and it is postulated that this posttranscriptional regulation may be mediated by the formation of RNA duplexes (Lai and Posakony 1998; Lai 2002).
Many of the identified cis-regulatory and 3' UTR motifs have been tested in functional assays, and the E(spl)-C locus is among a handful of loci, such as even-skipped (Ludwig et al. 2000), that are highly characterized with respect to regulatory domains in Drosophila. Given the demonstrable effects of E(spl)-C regulatory motifs on gene expression and on adult bristles, effects on natural phenotypic variation attributable to E(spl)-C may be caused by regulatory substitutions rather than changes in coding regions.
If phenotypic variation is largely caused by regulatory variants, then the relatively small proportion of noncoding DNA harboring functional regulatory variants must be distinguished from the much larger amount of nonfunctional noncoding sequence in any given genome. Using population-genetic and molecular-evolutionary approaches, regions of DNA visible to selection can be identified. Although not all phenotypic variation will be associated with regions showing evidence of past selection, we hypothesize that identified regions will be enriched for functional elements. This is because some fraction of functional regulatory regions may show footprints of past selection, whereas nonfunctional regions can never show such footprints.
The extensive annotation available for E(spl)-C provides an ideal opportunity to examine the pattern of molecular variation across functionally separable domains and determine whether known regulatory regions show evidence of selection. Because members of E(spl)-C likely influence variation in bristle number in adult flies (Long et al. 1995; Norga et al. 2003; Nuzhdin, Dilda, and Mackay 1999; Dilda and Mackay 2002), and because bristle number has been shown to be subject to stabilizing selection (García-Dorado and González 1996), nonneutrally evolving regions in E(spl)-C are more likely to harbor variants affecting bristle number than are neutrally evolving regions. Because any region of E(spl)-C deviating from neutral expectation harbored variants affecting gene function in the past, we speculate that such regions are more likely to harbor segregating functional variants that contribute to current standing variation than are tracts of noncoding DNA showing no evidence for past selection.
We examine the molecular evolution of the E(spl)-C complex in Drosophila at several levels to identify regions experiencing nonneutral evolution. First, we assess conservation of the locus, particularly regulatory sequences, between the diverged species D. melanogaster and D. pseudoobscura. Second, we compare nucleotide diversity within and between the sibling species D. melanogaster and D. simulans, focusing on differences between regulatory and adjacent nonregulatory sequence. Third, we examine variation in the level of population structure exhibited by different regions of the locus using the FST statistic. Fourth, we examine E(spl)-C within a single population for evidence of nonneutral evolution, including the action of positive selection. Sites located in regions identified by these population genetics methods are more likely to represent bristle number QTN (quantitative trait nucleotides) than sites in regions showing no evidence for a departure from neutrality, and this hypothesis can be tested in subsequent functional genetics studies of bristle number. Thus, this system holds promise for eventually characterizing the phenotypic effect of interesting DNA variants.
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under the accession numbers AY779906 to AY779995.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Fifty-eight overlapping 1-kb PCR amplicons were developed to cover the entire approximately 47-kb region using the software PCR-Overlap (Rieder et al. 1998; all primer sequences available from http://cstern.bio.uci.edu/pubs.htm). Addition of 18-nt tails to the 5'-end of each oligo allowed sequencing of all fragments using just two common primers (universal forward: 21M13, 5'-TGTAAAACGACGGCCAGT-3' and universal reverse: M13reverse, 5'-CAGGAAACAGCTATGACC-3' [Rieder et al. 1998]). All amplicons were directly sequenced using ABI Big Dye terminator chemistry on an ABI377 automated sequencer. Sequence traces for each allele were assembled using the program SeqManII version 5.01 (DNASTAR, Inc.), and the contigs manually aligned using BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html).
We also sequenced nine of the 12 E(spl)-C coding regions in a single inbred line of D. yakuba (yakTAI27, AY779924 to AY779932).
Long Haplotype Analysis
The resequencing data showed two of the 16 alleles to be identical at all but one site across the entire E(spl)-C locus. To localize the ends of this potentially long haplotype, we sequenced approximately 700-bp PCR amplicons approximately 200 kb upstream (esU.200kb.F, 5'-TGGCAGCAATAAATGGATCA-3', esU.200kb.R, 5'-AATCCCAAACAAGCTGGATG-3'; GenBank accession numbers AY779964 to AY779979), 200 kb downstream (esD.200kb.F, 5'-ACTGAGCCAAAGCTGACGTT-3', esD.200kb.R, 5'-AATGTCTTCGGCTGGAATTG-3'; GenBank accession numbers AY779948 to AY779963), and 500 kb downstream (esD.500kb.F, 5'-CGTTGCATTAAGAGCAGCAA-3', esD.500kb.R, 5'-GGACGGAAACGAGAAACAAA-3'; GenBank accession numbers AY779980 to AY779995) of E(spl)-C.
To ascertain whether the long haplotype is associated with the nearby polymorphic cosmopolitan inversion In(3R)Payne (Bridges and Bridges 1938), we crossed each of the 15 extant lines to Canton-S and examined F1 salivary gland polytene chromosomes using standard protocols.
Finally, to assess whether the presence of a pair of sequences differing by a single site in a set of 16 sequences harboring 1,013 segregating sites deviates from neutral expectation, we employed the haplotype test of Hudson et al. (1994). This procedure involves generating random samples of 16 sequences under a neutral coalescent model, each with 1,013 polymorphic sites, with a specified level of recombination. Each replicate sample is then tested for the presence of a pair of sequences that differ by one site or are identical, and the fraction of samples containing such a pair is an estimate of the P value for the observation.
Assessing E(spl)-C Conservation Using Blast
We used a sliding-window approach to Blast consecutive overlapping small subsections of a 47,677-bp D. melanogaster E(spl)-C consensus sequence derived from an alignment of the 16 sequenced alleles, against the homologous region of D. pseudoobscura (GenBank accession number AADE01000136, positions 561 to 58129) using the bl2seq utility. For each 31-bp D. melanogaster query sequence, we recorded the position, orientation, and score of the highest Blast hit in D. pseudoobscura, sliding through the region in 15-bp steps. Only Blast hits with scores above 45 were considered in further analyses.
Population Genetic Analyses
Estimates of nucleotide diversity and Tajima's D statistic (Tajima 1989) were performed using DnaSP version 4.0 (Rozas et al. 2003). MK G-tests (McDonald and Kreitman 1991) and tests analogous to the MK and HKA tests (Hudson, Kreitman, and Aguadé 1987) for regulatory sequences were performed on the counts of polymorphisms and fixed differences using a custom routine in the statistical programming language R (www.R-project.org). The probabilities of obtaining the observed values for Tajima's D statistic were determined by simulating neutral genealogies (Hudson 1990) using the program "ms" (Hudson 2002; http://home.uchicago.edu/rhudson1/source/mksamples.html). Simulations were replicated 10,000 times, conditional on the empirical sample size, the observed number of segregating sites, and the alignment length in bp, with the population recombination rate parameter,
(or 4N0r) set to the values 0, 1, 10, and 100. For the sliding-window analysis of Tajima's D statistic, we employed the Perlscript SCANMS (Ardell 2004; http://www.lcb.uu.se/
dave/SCANMS), using a window of 2 kb with 200-bp steps, which uses the coalescence simulator "ms" to generate probabilities while accounting for multiple testing.
We estimated the value of (denoted by
W00) for the effectively haploid sequence data in a sliding-window framework based on the number of haplotypes and the minimum number of recombination events, as described in Wall et al. (2003). Only those biallelic SNPs and InDels identified by resequencing that had no missing data and showed the minor allele in at least two of the 16 sequenced alleles were used in the analysis (total = 477 sites, window size = 20 sites). We also estimated
(denoted by
H01d) from unphased diploid genotype data (below) using the program RECSLIDER (Wall et al. 2003; http://genapps.uchicago.edu/labweb/index.html), with a sliding window of 20 segregating sites and an initial estimate of
= 0.01.
Genotyping
A subset of the polymorphisms identified by resequencing were genotyped in four outbred population samples of D. melanogaster using an oligonucleotide ligation assay approach (described in Genissel et al. 2004). Genotyped sites were concentrated in and around the 12 E(spl)-C transcripts (40 in exons, eight in 5' UTRs, 22 in 3' UTRs, 12 in enhancers, and 35 in intergenic regions), and the set was selected such that the members showed minimal linkage disequilibrium (LD) with each other in the resequenced alleles. Population samples were (a) Napa Valley, Calif. (N = 60), (b) Southern France (N = 46), (c) Madang, Papua New Guinea (N = 60), and (d) Benin, West Africa (N = 60). The flies from Napa Valley were directly sampled from nature in 2001, whereas the other three samples were harvested from large populations maintained in laboratory cages with overlapping generations since inception from wild-caught individuals in 1999 (Southern France), 1998 (Madang), and 1970 (Benin).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Figure 1 highlights the similarity at E(spl)-C between D. melanogaster and D. pseudoobscura, species thought to have diverged 25 MYA (Russo, Takezaki, and Nei 1995). Overall the conservation is strong, although the region around m1/m2 and the region from m4 to m8 both show fewer high Blast hits (score > 45) than the does rest of the locus, suggesting they have undergone greater evolutionary change. The D. pseudoobscura locus is slightly expanded relative to D. melanogaster, but otherwise, rearrangements appear to be absent, and virtually all the Blast hits are in the same direction (i.e., there are no inversions). Only 13 of the 433 hits with Blast scores above 45 were most similar to reverse complemented D. pseudoobscura sequences (nine of these 13 reverse complement hits exist in three coincident triplets, so only seven lines can be easily seen in figure 1). None of these represent identical-by-descent inversions: 12 are hits between sites in the bHLH areas of different, and oppositely transcribed, E(spl)bHLH genes, and one is between positions centered on Suppressor of Hairlessbinding sites upstream of different E(spl)Brd genes (these binding sites can exist in either orientation).
|
Pattern of Nucleotide Diversity Across E(spl)-C
The extensive annotation of E(spl)-C allows us to parse the locus into separate categories and estimate population genetic parameters within each. Table 1 documents nucleotide diversity for various regions of the E(spl)-C locus using a 48,512-bp alignment of 16 D. melanogaster and two D. simulans alleles. For the coding regions, nucleotide diversity, , within D. melanogaster is 0.0032, a value not inconsistent with those observed for other autosomal loci (Moriyama and Powell's table 1 [1996]). However,
= 0.0047 for the nonregulatory intergenic portions of the locus, which, although greater than the value for coding regions, is perhaps not as high as expected: Moriyama and Powell (1996) estimate
= 0.0118 averaged over noncoding regions for autosomal loci. Nucleotide divergence between D. melanogaster and D. simulans is also much lower for E(spl)-C (K = 0.0379) than values observed for X and 3R chromosome loci (Begun and Whitley 2000). A sliding-window analysis of nucleotide diversity shows that polymorphism is not uniformly low in intergenic regions (fig. 2), and occasional peaks of within-species or between-species diversity exist. For instance, 5' to m
, there is a peak of intraspecific nucleotide diversity, while just 3' to m
, there is a peak of between-species diversity. There are also wide peaks of both within-species and between-species variation 5' to m3.
|
|
Under neutrality, the ratio of polymorphisms to fixed differences should be identical for both synonymous and nonsynonymous sites, and deviation from equality can be assessed using the MK test (McDonald and Kreitman 1991). Levels of diversity are generally too low for this test to be applied satisfactorily on a per-gene basis, and grouping the E(spl)bHLH and E(spl)Brd genes shows no significant difference from neutral evolution (table 2). Nonetheless, m2 seems to exhibit a significant excess of fixed synonymous sites in the D. melanogasterD. simulans comparison (MK G-test, P = 0.017) and approaches significance in the D. melanogasterD. yakuba comparison (MK G-test, P = 0.061), which is suggestive of some form of selection acting on this gene. However, we note that neither test retains significance after correcting for multiple comparisons over genes. Over all the E(spl)-C genes, dS is highest at m2 (m2 dS = 0.145, whereas for the other 11 genes, 0.035 < dS < 0.130), although the dN/dS ratio at m2 is within the range observed for the other genes. Because m2 does not markedly differ in codon usage from the other genes in any of the tested species (data not shown), the significant MK test at m2 does not appear to be related to a change in optimal codon usage at this gene.
|
Table 1 shows that as a class, the 3' UTR motifs exhibit similar, although marginally lower, intraspecific diversity to the remaining, nonmotif, portions of the 3' UTRs, as well as reduced interspecific nucleotide divergence. This same pattern holds when comparing binding sites and nonbinding sites within the upstream enhancer modules. Considering only those annotated regulatory elements conserved between D. melanogaster and D. pseudoobscura, the disparity in both intraspecific and interspecific nucleotide diversity between regulatory and nonregulatory sequence increases, particularly for the enhancer module regions.
To test whether nucleotide diversity differs between regulatory and adjacent nonregulatory sequence, we compared the ratio of the number of segregating sites (table 1, S) to nonsegregating sites (table 1, Length S) between these regions within D. melanogaster. The test is not significant for the 3' UTR sequences (all motifs, G-test, P = 0.085; conserved motifs, G-test, P = 0.083) but is significant for the conserved enhancer-binding sites (all binding sites, G-test, P = 0.333; conserved binding sites, G-test, P < 104), in the direction of lower diversity in binding sites. This difference either could be explained by selective constraint on enhancer-binding sites or could be generated by a lower mutation rate around these regulatory sites.
We adapted the MK and HKA tests (McDonald and Kreitman 1991; Hudson, Kreitman and Aguadé 1987) to test whether the ratio of fixed changes to polymorphisms is the same within regulatory and nonregulatory sequence sharing similar evolutionary history, as expected under neutrality (table 3). For the 3' UTR, there is no significant difference between regulatory and nonregulatory sequence (G-test, P = 0.260), whereas for the enhancer modules, there is a significant difference (G-test, P = 0.004), suggesting some form of selection is acting on the enhancer regions of E(spl)-C genes. Comparing conserved regulatory and nonregulatory sequence in the upstream enhancer modules eliminates the significant G-test (table 3), perhaps because of the low level of nucleotide diversity within conserved binding site sequences.
|
Using the two tests described above, we also examined whether the enhancer regions of the E(spl)-C genes could be distinguished from intergenic sequence. Nucleotide diversity did not differ significantly between the enhancers and intergenic regions, but there was a significant difference in the ratio of fixed changes to polymorphisms (G-test, P = 0.007). This difference appears to be almost entirely related to the binding sites in the enhancers, as P = 0.098 for intergenic versus enhancer-nonbinding regions, and P = 0.0002 for intergenic versus enhancer-binding regions.
The polymorphism spectrum, as summarized by Tajima's D statistic, also appears to differ between regulatory and nonregulatory DNA: D is positive for 3' UTR motifs, whereas for nonmotifs D is negative, although this comparison may not be reliable because of the very low polymorphism exhibited by 3' UTR motif regions (table 1). The enhancer-module sequences show a negative value of D, with binding-site regions showing considerably lower D than nonbinding-site regions (binding sites, D = 1.029; nonbinding sites, D = 0.345). This difference is increased when comparing conserved binding sites to nonbinding sites. These data suggest that polymorphisms present in, or close to, binding sites for regulatory transcription factors are more rare than are those polymorphisms present in the surrounding nonbinding-site regions of the enhancers.
Population Subdivision at E(spl)-C
To examine the pattern of population subdivision at E(spl)-C, we genotyped 117 polymorphisms in samples of D. melanogaster from four continents: Napa Valley (North America), southern France (Europe), Madang (Australia), and Benin (Africa). Ancestral African populations of D. melanogaster are thought to have colonized Europe after the last ice age and more recently been introduced by man to North America from Europe and to Australia from African and European populations (David and Capy 1988).
Figure 3 (lower panel) shows the frequency of polymorphisms in the samples from Southern France, Madang, and Benin, as a deviation from the frequency in Napa Valley. The majority of sites show similar frequencies in the southern France and Napa Valley samples, likely reflecting their recent shared ancestry, whereas a number of sites show large frequency differences in the samples from Benin, and particularly, Madang.
|
Although the genotyped sites can be separated by the region in which they reside (e.g., exon, UTR, enhancer, and so on), and it is possible to look at differentiation across the different functional regions, the variance in FST within each category is too high for any meaningful interpretation (data not shown).
Haplotype Structure Around E(spl)-C
Our resequencing effort demonstrated the presence of a long haplotype across E(spl)-C: two of the 16 lines were identical at all but one of 1,013 biallelic SNPs and simple InDels. The discrepant site, an A/G polymorphism at position 19662 in the D. melanogaster alignment, exhibits the minor G-allele in one of the two lines showing the long haplotype and the major A-allele in all other lines. Sequencing approximately 700 bp from regions 200 kb upstream, 200 kb downstream, and 500 kb downstream of E(spl)-C, showed that 12/16, 33/33, and 21/33 biallelic polymorphisms, respectively, had the same allele in the two lines with the long haplotype. Hence, the haplotype breaks down 0 to 200 kb upstream, and 200 to 500 kb downstream of E(spl)-C. Because E(spl)-C is present at cytological position 96F9-10, and the breakpoint of the common inversion polymorphism In(3R)Payne is thought to be 96A18-19 (Bridges and Bridges 1938), a distance of approximately 1,300 kb, we were concerned that this inversion may be present in the pair of lines showing the long haplotype. However, cytological analysis of the 15 extant sequenced strains, including the pair showing the long haplotype, revealed no inversions close to the cytological position of E(spl)-C.
We used the haplotype test put forward by Hudson et al. (1994) to determine whether the presence of a pair of sequences differing at just one of the 1,013 polymorphic sites across E(spl)-C deviates from neutral expectation. Using values of the population recombination rate from = 0 to 2000, the P value for the test was 0.528 > P > 0.118, suggesting the observation is consistent with neutrality. Without polymorphism data from the remainder of the region covered by the long haplotype (greater than 200 kb of sequence), we are unable to assess whether the existence of the full haplotype is also consistent with neutral expectation.
Finally, we note that six SNPs defining the sequenced haplotype were genotyped in the larger outbred population samples, yet none of the 226 individuals show genotypes consistent with the existence of this haplotype, and we conclude that the haplotype must be rare.
Indications of Positive Selection at E(spl)-C
We compared the within-species polymorphism (), the frequency spectrum of observed polymorphisms as summarized by Tajima's D statistic, and the population recombination rate (
) per base pair across E(spl)-C. Because the locus is almost 50 kb, encompassing various coding and regulatory regions, a sliding-window approach is likely to be more informative than simply examining the summary statistics for the entire region. Figure 4 shows the pattern of
, D, and
across the 47,677-bp D. melanogaster E(spl)-C alignment. Spatial variation in
and D is correlated:
and D were calculated over all 86 independent 500-bp windows of E(spl)-C, showing a Pearson correlation coefficient of r = 0.527 (P < 106) and a 95% confidence interval of 0.355 to 0.665. Recombination across E(spl)-C varies threefold to fourfold, showing three pronounced peaks of recombination at approximately 8.9 kb, approximately 14.2 kb, and approximately 19.1 kb and two marked dips. These two zones of very low recombinationabout m1/m2 and m7/m8correspond to regions of reduced nucleotide diversity, and strongly negative D (fig. 4).
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Deep Phylogenetic Divergence
Numerous regulatory motifs have been localized to the 3' UTRs of E(spl)-C genes, and many transcription factorbinding sites identified in enhancer regions. We show that 134 of the 182 functional elements initially identified in D. melanogaster are conserved in D. pseudoobscura, suggesting that these 134 elements maintain a similar function in the two species. We also observed that flanking sequences were often conserved along with the regulatory element, and this raises two possibilities: genomic context may be an important determinant of the binding efficiency at a given site, and/or the species may be insufficiently distant for all nonfunctional sequences to have diverged. This issue may be resolved using phylogenetic shadowing (Boffelli et al. 2003) at E(spl)-C across several species of Drosophila. Because mutations should accumulate randomly in nonfunctional DNA within each of the species, only truly important regions will be conserved among all of the tested species. This method has been successful in identifying functional regions of the yeast genome (Kellis et al. 2003).
Overall, we were unable to detect 26% of the regulatory motifs identified in D. melanogaster in D. pseudoobscura, and given that many of these elements have not been functionally assayed, the simplest explanation is that these nonconserved elements do not have regulatory function and are, therefore, unconstrained and free to evolve. However, some or all may represent real species differences.
It seems unlikely that the species differ in transcript expression because of differences in the binding-site complement of the upstream enhancer modules, as the m gene enhancer modules from the diverged species D. melanogaster and D. hydei drive nearly identical patterns of gene expression in D. melanogaster larval wing discs (Nellesen, Lai, and Posakony 1999). Another possibility might be that expression patterns across species are maintained despite changes in the DNA sequence of enhancer regions. This hypothesis predicts that known transcription factorbinding sites in D. melanogaster that are not conserved in D. pseudoobscura should be functionally substituted with other (perhaps unrecognized) binding sites to achieve identical transcript expression. This model of functional compensation during enhancer evolution has been used to explain the maintenance of even-skipped stripe 2 embryonic expression in D. melanogaster and D. pseudoobscura, despite few of the D. melanogaster binding sites being conserved in D. pseudoobscura at this locus (Ludwig et al. 2000).
Shallow Divergence and Polymorphism
The E(spl)-C shows a lower level of nucleotide diversity than the average observed for other autosomal loci (see Moriyama and Powell [1996]). It is possible that the entire region is particularly constrained, although this seems unlikely, given that UTR and intergenic sequencesregions thought to experience different levels of evolutionary functional constraintexhibit similar diversity (table 1). Andolfatto, Depaulis, and Navarro (2001) showed that loci up to 1,000 kb away from an inversion breakpoint can be subject to a reduction in diversity, and because E(sp)-C is positioned approximately 1300 kb downstream of the breakpoint of the common inversion polymorphism In(3R)Payne, it appears unlikely it has had a major role in shaping variability at E(spl)-C. Nevertheless, data from Andolfatto, Depaulis, and Navarro (2001) are based only on 10 genes, and the breakpoints of In(3R)Payne have not been precisely mapped. Systematic sequencing of regions various distances from the breakpoint would allow the extent of diversity suppression to be measured more precisely.
We note that any effect of In(3R)Payne on the diversity at E(spl)-C would be limited to D. melanogasterbetween-species diversity would be unaffected. Thus, the reduction in divergence between D. melanogaster and D. simulans at E(spl)-C can be used to reject the hypothesis that an inversion influences E(spl)-C diversity within D. melanogaster.
Our observed conservation of regulatory sequences across species suggests that these elements are largely under purifying selection. Using our polymorphism data, we show that in the E(spl)-C enhancer modules, binding sites show much lower levels of diversity than do nonbinding sites, and more of the polymorphisms are rare, suggesting a similar process of purifying selection. This effect is particularly apparent when we consider only those binding sites conserved between D. melanogaster and D. pseudoobscura, suggesting that conserved regulatory binding sites are less likely than nonconserved binding sites to contribute to standing phenotypic variation or microevolution. Therefore, strategies to identify regulatory regions based on sequence conservation across two or more evolutionarily diverged species (Boffelli et al. 2003; Kellis et al. 2003) may in fact be less likely to detect elements influencing complex trait variation within a species.
We also demonstrated, using a test similar to the MK and HKA tests (McDonald and Kreitman 1991; Hudson, Kreitman, Aguadé 1987), that the ratio of fixed changes to polymorphisms differs between binding sites and nonbinding sites in upstream enhancer regions of E(spl)-C. Phinchongsakuldit, MacArthur, and Brookfield (2003) have previously reported a similar result for the bx-32.8 enhancer of Ubx. Unfortunately, the tests are difficult to interpret, and it is unclear how to polarize any deviation from neutrality. The data from E(spl)-C enhancer regions and from Phinchongsakuldit, MacArthur, and Brookfield (2003) are compatible with too few fixed regulatory sites, too many polymorphic regulatory sites, too few polymorphic nonregulatory sites, or too many fixed nonregulatory sites. None of these are mutually exclusive.
The observation of too few fixed changes within transcription factorbinding sites suggests conservation of the binding sites across species. In contrast, a greater number of polymorphisms within binding sites could indicate the maintenance of balanced polymorphisms but could also be explained if mutations in binding sites are slightly deleterious, as they will then contribute to within-species heterozygosity, but are unlikely to become fixed (Nachman et al. 1996). The greater number of rare polymorphisms observed in enhancerbinding sites also provides support for the idea that mutations here are mildly deleterious, as such polymorphisms are less likely to attain high frequency compared with neutral substitutions. An excess of fixed nonregulatory changes is possible if there is selection on nonbinding sites, which could imply that some of the "nonregulatory" portions of the enhancer modules are actually functional.
Local Adaptation
Another way to detect selection is to examine the level of population differentiation. It is known that D. melanogaster is not a panmictic population, as extensive among-population variation has been demonstrated for several loci, including the mitochondrial DNA (Hale and Singh 1991; Begun and Aquadro 1993), but under neutral evolution, all loci are expected to show the same level of differentiation among subpopulations. Natural selection can alter the apparent level of subdivision at variants that are favored in some populations; thus, by examining geographic variation in allele frequency, one can identify targets of local adaptation (e.g., the Duffy blood group locus in humans [Hamblin and Di Rienzo 2000], clinal variation at the Adh locus in D. melanogaster [Berry and Kreitman 1993]). We have shown that across the E(spl)-C locus, there is variation in the degree of observed population structure (fig. 3). In particular, regions of elevated FST around the genes m and m6 are possibly indicative of functional population differentiation, and because some of the sites are in noncoding regions, may show differential regulatory activity across populations. However, in general, sites in regulatory regions did not show a level of population differentiation different from sites in other nonregulatory regions.
We note that the level of population differentiation can also be elevated by background selection against deleterious alleles, as demonstrated by Nordborg (1997) using coalescent simulation. However, in terms of our goal of identifying regions under selection, detecting either background selection or diversifying selection implicates a region as having function.
Within-Population Selection
Two regions (around m1/m2 and m7/m8 [fig. 4]) exhibit patternshigh LD, low diversity, and skewed polymorphism-frequency distributionindicative of past positive selection (Kim and Stephan 2000; Andolfatto and Przeworski 2001). The observation of a significant excess of fixed synonymous mutations at m2 is also consistent with a scenario of past selection in/around this gene. However, the evidence is insufficient to lend strong support to a hypothesis of positive selection. The two putatively selected regions have negative values of D but are not significant after multiple testing, and although the nucleotide diversity is reduced in these zones, other regions also exhibit low levels of heterozygosity. These difficulties highlight a concern that as we collect population-genetic data sets encompassing very large genomic regions, effects will need to be much more pronounced to be found significant using sliding-windowtype approaches.
There are several other reasons why selective sweeps are difficult to detect. The power of Tajima's D statistic to detect a selective sweep is strongly dependent not only on the number of sequenced alleles but also on the selective strength of the event and on the number of generations since it occurred (Simonsen, Churchill, and Aquadro 1995). Old sweeps will be obscured by the accumulation of neutral mutations, whereas weak sweeps will reduce heterozygosity less efficiently. Thus, it is possible that the two putative cases of hitchhiking we outline represent selection of weakly advantageous mutations or are perhaps very distant events and, hence, do not achieve significance. Also, as demonstrated by Kim and Stephan (2002), individual realizations of a simulated selective sweep vary in the size of the area of reduced heterozygosity, the extent of the reduction, and the position of the valley relative to the selected polymorphism.
However, other models can account for aspects of the data we observe without implicating a selective sweep. Background, or purifying selection also eliminates variation (Charlesworth, Morgan, and Charlesworth 1993), although this model does not predict a skew in the polymorphism frequency spectrum (Kim and Stephan 2000; Andolfatto and Przeworski 2001).
![]() |
Conclusion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We have described nucleotide diversity across different functional regions of the bristle number candidate locus E(spl)-C, showing that previously identified regulatory elements are visible to selection. We also highlight other regions exhibiting signatures of nonneutral evolution, implying they are also of functional importance in regulating E(spl)-C genes. Because E(spl)-C is likely to have a role in the genetic control of natural variation in bristle number (Dilda and Mackay 1995; Long et al. 1995; Norga et al. 2003; Nuzhdin, Dilda, and Mackay 1999), a trait under stabilizing selection (García-Dorado and González 1996), sites within these candidate functional regions are more likely to be QTN for bristle number than are sites in regions showing neutral evolution. Regions showing no departure from neutrality may still harbor functional sites, but natural selection has not acted in a detectable manner on these regions in the recent past.
In general, enriching association-mapping studies for sites more likely to contribute to phenotypic variation will streamline the process of detecting genetic variants underlying natural variation in complex traits. We suggest that together with other motivating factors, selecting sites for genotyping in association studies should be informed by the results of sequence analysis methods that detect the action of natural selection.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Andolfatto, P., F. Depaulis, and A. Navarro. 2001. Inversion polymorphisms and nucleotide variability in Drosophila. Genet. Res. 77:18.[CrossRef][ISI][Medline]
Andolfatto, P., and M. Przeworski. 2001. Regions of lower crossing over harbor more rare variants in African populations of Drosophila melanogaster. Genetics 158:657665.
Ardell, D. H. 2004. SCANMS: adjusting for multiple comparisons in sliding window neutrality tests. Bioinformatics 20:19861988.
Bailey, A. M., and J. W. Posakony. 1995. Suppressor of Hairless directly activates transcription of Enhancer of split complex genes in response to Notch receptor activity. Genes Dev. 9:26092622.[Abstract]
Begun, D. J., and C. F. Aquadro. 1993. African and North American populations of Drosophila melanogaster are very different at the DNA level. Nature 365:548550.[CrossRef][ISI][Medline]
Begun, D. J., and P. Whitley. 2000. Reduced X-linked nucleotide polymorphism in Drosophila simulans. Proc. Natl. Acad. Sci. USA 97:59605965.
Berry, A., and M. Kreitman. 1993. Molecular analysis of an allozyme cline: alcohol dehydrogenase in Drosophila melanogaster on the East Coast of North America. Genetics 134:869893.
Boffelli, D., J. McAuliffe, D. Ovcharenko, K. D. Lewis, I. Ovcharenko, L. Pachter, and E. M. Rubin. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299:13911394.
Braverman, J. M., R. R. Hudson, N. L. Kaplan, C. H. Langley, and W. Stephan. 1995. The hitchhiking effect on the site frequency spectrum of DNA polymorphism. Genetics 140:783796.
Bridges, C. B., and P. N. Bridges. 1938. Salivary analysis of inversion-3R-Payne in the "venation" stock of Drosophila melanogaster. Genetics 23:111114.
Charlesworth, B., M. T. Morgan, and D. Charlesworth. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:12891303.
Cooper, M. T. D., D. M. Tyler, M. Furriols, A. Chalkiadaki, C. Delidakis, and S. Bray. 2000. Spatially restricted factors cooperate with Notch in the regulation of Enhancer of split genes. Dev. Biol. 221:390403.[CrossRef][ISI][Medline]
David, J. R., and P. Capy. 1988. Genetic variation of Drosophila melanogaster natural populations. Trends Genet. 4:106111.[CrossRef][ISI][Medline]
de Celis, J. F., J. de Celis, P. Ligoxygakis, A. Preiss, C. Delidakis, and S. Bray. 1996. Functional relationships between Notch, Su(H) and the bHLH genes of the E(spl) complex: the E(spl) genes mediate only a subset of Notch activities during imaginal development. Development 122:27192728.
Delidakis, C., and S. Artavanis-Tsakonas. 1992. The Enhancer of split [E(spl)] locus of Drosophila encodes seven independent helix-loop-helix proteins. Proc. Natl. Acad. Sci. USA 89:87318735.
Dilda, C. L., and T. F. C. Mackay. 2002. The genetic architecture of Drosophila sensory bristle number. Genetics 162:16551674.
Eastman, D. S., R. Slee, E. Skoufos, L. Bangalore, S. Bray, and C. Delidakis. 1997. Synergy between Suppressor of Hairless and Notch in regulation of Enhancer of split mg and md expression. Mol. Cell. Biol. 17:56205628.[Abstract]
Fay, J. C., and C-I. Wu. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:14051413.
García-Dorado, A., and J. A. González. 1996. Stabilizing selection detected for bristle number in Drosophila melanogaster. Evolution 50:15731578.[ISI]
Genissel, A., T. Pastinen, A. Dowell, T. F. C. Mackay, and A. D. Long. 2004. No evidence for an association between common nonsynonymous polymorphisms in Delta and bristle number variation in natural and laboratory populations of Drosophila melanogaster. Genetics 166:291306.
Hale, L. R., and R. S. Singh. 1991. A comprehensive study of genic variation in natural populations of Drosophila melanogaster. IV. Mitochondrial DNA variation and the role of history vs. selection in the genetic structure of geographic populations. Genetics 129:103117.
Hamblin, M. T., and A. Di Rienzo. 2000. Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am. J. Hum. Genet. 66:16691679.[CrossRef][ISI][Medline]
Hudson, R. R. 1990. Gene genealogies and the coalescent process. Pp. 144 in D. Futuyma and J. Antonovics, eds. Oxford surveys in evolutionary biology. Oxford University Press, Oxford.
. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18:337338.
Hudson, R. R, K. Bailey, D. Skarecky, J. Kwiatowski, and F. J. Ayala. 1994. Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster. Genetics 136:13291340.
Hudson, R. R., M. Kreitman, and M. Aguadé. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153159.
Jan, Y. N, and L. Y. Jan. 1994. Genetic control of cell fate specification in Drosophila peripheral nervous system. Annu. Rev. Genet. 28:373393.[CrossRef][ISI][Medline]
Jennings, B., A. Preiss, C. Delidakis, and S. Bray. 1994. The Notch signaling pathway is required for Enhancer of split bHLH protein expression during neurogenesis in the Drosophila embryo. Development 120:35373548.
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21120 in H. M. Munro, ed. Mammalian protein metabolism. Academic Press, New York.
Kaplan, N. L., R. R. Hudson, and C. H. Langley. 1989. The "hitchhiking effect" revisited. Genetics 123:887899.
Kellis, M., N. Patterson, M. Endrizzi, B. Birren, and E. S. Lander. 2003. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241254.[CrossRef][ISI][Medline]
Kim, Y., and W. Stephan. 2000. Joint effects of genetic hitchhiking and background selection on neutral variation. Genetics 155:14151427.
. 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160:765777.
Knust, E., H. Schrons, F. Grawe, and J. A. Campos-Ortega. 1992. Seven genes of the Enhancer of split complex of Drosophila melanogaster encode helix-loop-helix proteins. Genetics 132:505518.
Kramatschek, B., and J. A. Campos-Ortega. 1994. Neuroectodermal transcription of the Drosophila neurogenic genes E(spl) and HLH-m5 is regulated by proneural genes. Development 120:815826.
Lai, E. C. 2002. Micro RNAs are complementary to 3' UTR sequence motifs that mediate negative post-transcriptional regulation. Nat. Genet. 30:363364.[ISI][Medline]
Lai, E. C., R. Bodner, J. Kavaler, G. Freschi, and J. W. Posakony. 2000. Antagonism of Notch signaling activity by members of a novel protein family encoded by the Bearded and Enhancer of split gene complexes. Development 127:291306.
Lai, E. C., R. Bodner, and J. W. Posakony. 2000. The Enhancer of split complex of Drosophila includes four Notch-regulated members of the Bearded gene family. Development 127:34413455.
Lai, E. C., C. Burks, and J. W. Posakony. 1998. The K box, a conserved 3' UTR sequence motif, negatively regulates accumulation of Enhancer of split complex transcripts. Development 125:40774088.
Lai, E. C., and J. W. Posakony. 1997. The Bearded box, a novel 3' UTR sequence motif, mediates negative post-transcriptional regulation of Bearded and Enhancer of split complex gene expression. Development 124:48474856.
. 1998. Regulation of Drosophila neurogenesis by RNA:RNA duplexes? Cell 93:11031104.[ISI][Medline]
Leviten, M. W., E. C. Lai, and J. W. Posakony. 1997. The Drosophila gene Bearded encodes a novel small protein and shares 3' UTR sequence motifs with multiple Enhancer of split complex genes. Development 124:40394051.
Long, A. D., S. L. Mullaney, L. A. Reid, J. D. Fry, C. H. Langley, and T. F. C. Mackay. 1995. High resolution mapping of genetic factors affecting bristle number in Drosophila melanogaster. Genetics 139:12731291.
Ludwig, M. Z., C. Bergman, N. H. Patel, and M Kreitman. 2000. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403:564567.[CrossRef][ISI][Medline]
Mackay, T. F. C. 1995. The genetic basis of quantitative variation: numbers of sensory bristles of Drosophila melanogaster as a model system. Trends Genet. 11:464470.[CrossRef][ISI][Medline]
Maier, D., B. M. Marte, W. Schäfer, Y. Yu, and A. Preiss. 1993. Drosophila evolution challenges postulated redundancy in the E(spl) gene complex. Proc. Natl. Acad. Sci. USA 90:54645468.
McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652654.[CrossRef][ISI][Medline]
Moriyama, E. N., and J. R. Powell. 1996. Intraspecific nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13:261277.[Abstract]
Nachman, M. W., W. M. Brown, M. Stoneking, and C. F. Aquadro. 1996. Nonneutral mitochondrial DNA variation in humans and chimpanzees. Genetics 142:953963.
Nellesen, D. T., E. C. Lai, and J. W. Posakony. 1999. Discrete enhancer elements mediate selective responsiveness of Enhancer of split complex genes to common transcriptional activators. Dev. Biol. 213:3353.[CrossRef][ISI][Medline]
Nuzhdin, S. V., C. L. Dilda, and T. F. C. Mackay. 1999. The genetic architecture of selection response: inferences from fine-scale mapping of bristle number quantitative trait loci in Drosophila melanogaster. Genetics 153:13171331.
Nordborg, M. 1997. Structured coalescent processes on different time scales. Genetics 146:15011514.
Norga, K. K., M. C. Gurganus, C. L. Dilda, A. Yamamoto, R. F. Lyman, P. H. Patel, G. M. Rubin, R. A. Hoskins, T. F. C. Mackay, and H. J. Bellen. 2003. Quantitative analysis of bristle number in Drosophila mutants identifies genes involved in neural development. Curr. Biol. 13:13881397.[CrossRef][ISI][Medline]
Phinchongsakuldit, J., S. MacArthur, and J. F. Y. Brookfield. 2004. Evolution of developmental genes: molecular microevolution of enhancer sequences at the Ubx locus in Drosophila and its impact on developmental phenotypes. Mol. Biol. Evol. 21:348363.
Rieder, M. J., S. L. Taylor, V. O. Tobe, and D. A. Nickerson. 1998. Automating the identification of DNA variations using quality-based fluorescence re-sequencing: analysis of the human mitochondrial genome. Nucleic Acids Res. 26:967973.
Robin, C., R. F. Lyman, A. D. Long, C. H. Langley, and T. F. C. Mackay. 2002. hairy: a quantitative trait locus for Drosophila sensory bristle number. Genetics 162:155164.
Rozas, J., J. C. Sánchez-DelBarrio, X. Messeguer, and R. Rozas. 2003. DnaSP: DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:24962497.
Russo, C. A., N. Takezaki, and M. Nei. 1995. Molecular phylogeny and divergence times of drosophilid species. Mol. Biol. Evol. 12:391404.[Abstract]
Simonsen, K. L., G. A. Churchill, and C. F. Aquadro. 1995. Properties of statistical tests of neutrality for DNA polymorphism data. Genetics 141:413429.
Singson, A., M. W. Leviten, A. G. Bang, X. H. Hua, and J. W. Posakony. 1994. Direct downstream targets of proneural activators in the imaginal disc include genes involved in lateral inhibitory signaling. Genes Dev. 8:20582071.[Abstract]
Stern, D. L. 1998. A role of Ultrabithorax in morphological differences between Drosophila species. Nature 396:463466.[CrossRef][ISI][Medline]
Stern, D. L. 2000. Evolutionary developmental biology and the problem of variation. Evolution 54:10791091.[ISI][Medline]
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585595.
Tietze, K., N. Oellers, and E. Knust. 1992. Enhancer of splitD, a dominant mutation of Drosophila, and its use in the study of functional domains of a helix-loop-helix protein. Proc. Natl. Acad. Sci. USA 89:61526156.
Wall, J. D., L. A. Frisse, R. R. Hudson, and A. Di Rienzo. 2003. Comparative linkage-disequilibrium analysis of the ß-globin hotspot in primates. Am. J. Hum. Genet. 73:13301340.[CrossRef][ISI][Medline]
Weir, B. S., and C. C. Cockerham. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38:13581370.[ISI]
Wittkopp, P. J., K. Vaccaro, and S. B. Carroll. 2002. Evolution of yellow gene regulation and pigmentation in Drosophila. Curr. Biol. 12:15471556.[ISI][Medline]
Wray, G. A., M. W. Hahn, E. Abouheif, J. P. Balhoff, M. Pizer, M. V. Rockman, and L. A. Romano. 2003. The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20:13771419.
Wurmbach, E., I. Wech, and A. Preiss. 1999. The Enhancer of split complex of Drosophila melanogaster harbors three classes of Notch responsive genes. Mech. Dev. 80:171180.[CrossRef][ISI][Medline]