DNA Sequence Variation at a Duplicated Gene: Excess of Replacement Polymorphism and Extensive Haplotype Structure in the Drosophila melanogaster bicoid Region

John F. Baines*,1, Ying Chen*{dagger},1, Aparup Das* and Wolfgang Stephan*

*Department Biologie II—Evolutionsbiologie, Universität München, Germany;
{dagger}Department of Biology, University of Rochester, New York


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Acknowledgements
 References
 
The bicoid (bcd) gene of Drosophila has played an important role in understanding the system of developmental regulatory genes that controls segmentation in the fruit fly. Several studies in Drosophila and closely related insects suggest that bcd may be the result of a gene duplication in the Dipteran lineage. In addition, the presence of a large, conserved secondary structure in the 3' untranslated region (UTR) makes the bcd gene a good candidate for studying compensatory evolution and the relationship between RNA secondary structure and patterns of standing variation in natural populations. Despite these interesting aspects, a population-level analysis has until now not been performed on bcd. In this study, DNA sequence variation was examined for a 4-kb region of the bcd gene, including a portion of the 5' UTR, the entire coding region, and the 3' UTR, for 25 Drosophila melanogaster isofemale lines from Zimbabwe and one allele from D. simulans. Statistical tests revealed a significant excess of replacement polymorphisms in the D. melanogaster lineage that are clustered in two putative linker regions of the Bicoid protein. This result is consistent with a relaxation of selective constraints in these regions. In addition, we found a distinct haplotype structure and a significantly smaller number of haplotypes than predicted by the standard neutral model. It is unlikely that the haplotype structure is maintained by epistatic selection acting on the secondary structure in the 3' UTR or by the association of the bcd gene with polymorphic inversions. Instead, our two main observations, namely the occurrence of a haplotype structure and the excess of replacement polymorphisms, may indicate that the selective history of this gene is rather complex, involving both the relaxation of purifying selection in some parts of the protein and the action of positive selection in other parts of the gene region.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Acknowledgements
 References
 
The bicoid (bcd) gene of Drosophila has played an important role in understanding the system of developmental genes that regulate pattern formation in the early fly embryo. bcd mRNA is maternally transcribed and localized to the anterior pole of the embryo (Berleth et al. 1988Citation ). The cis-acting sequences necessary for mRNA localization fall within a large (~700 nucleotides), phylogenetically conserved, and well-characterized secondary structural element in the 3' untranslated region (UTR) (Macdonald and Struhl 1988Citation ; Macdonald 1990Citation ). Translation of this localized transcript gives rise to a gradient of Bicoid protein that controls development of the head and thorax (Frohnhöfer, Lehmann, and Nüsslein-Volhard 1986Citation ; Driever and Nüsslein-Volhard 1988Citation ). This is achieved by two important functions of Bicoid. First, it transcriptionally activates zygotic segmentation genes in a threshold-dependent fashion. Second, it functions as a translational repressor of uniformly distributed caudal mRNA by binding to its 3' end, thus resulting in a gradient of Caudal protein in the opposite direction (Dubnau and Struhl 1996Citation ).

In addition to its crucial role in the determination of anterior-posterior polarity in the fruit fly, two aspects of bcd make it an interesting candidate for evolutionary analysis. First, recent studies in Drosophila and closely related insects suggest that bcd, a derived Hox3 homologue, may be unique in insect developmental systems in terms of function and evolutionary history (Stauber, Jäckle, and Schmidt-Ott 1999Citation ). Numerous laboratories have consistently failed in attempts to isolate bcd from insects other than Cyclorrhaphan flies, despite the usual ease in cloning homologues of other developmental genes from distantly related species (Stauber, Jäckle, and Schmidt-Ott 1999Citation ). One possibility is that bcd is a rapidly evolving homeobox gene, and failure to clone it is the result of technical difficulties (Schröder and Sander 1993Citation ; Patel 2000Citation ). However, when bcd was cloned from a basal Cyclorrhaphan fly (Megaselia abdita), it was found to be most closely related to the Megaselia zerknüllt (zen) gene, suggesting that bcd may be the result of a gene duplication and diversification, leading to a novel regulatory protein (Stauber, Jäckle, and Schmidt-Ott 1999Citation ). Most recently, a single Hox3 homologue (more similar to D. melanogaster zen than bcd) having expression patterns characteristic of both zen and bcd was identified in three non-Cyclorrhaphan flies (Stauber, Prell, and Schmidt-Ott 2002Citation ). Finally, Schaeffer et al. (2000)Citation found functional redundancy between Bicoid and the terminal system's role in thorax development, supporting the recent evolution of an anterior morphogenetic center comprised of both Bicoid and the terminal system. The second interesting feature of bcd is the presence of a large, conserved secondary structure in the 3' UTR. This makes the bcd gene a good candidate for studying compensatory evolution and the relationship between RNA secondary structure and patterns of standing variation in natural populations (Chen et al. 1999Citation ).

Despite these intriguing observations, a population-level analysis has until now not been performed on bcd. In this study, DNA sequence variation was examined for a 4-kb region of the bcd gene, including a portion of the 5' UTR, the entire coding region, and the 3' UTR, for 25 D. melanogaster isofemale lines from Lake Kariba, Zimbabwe. This population was chosen because it has previously been shown to harbor more than twice the amount of genetic variation and lower levels of linkage disequilibrium than non-African populations of D. melanogaster (Begun and Aquadro 1993Citation , 1995a,Citation 1995bCitation ). Thus, Zimbabwe likely represents an ancestral population closer to mutation-drift equilibrium, enabling the selective forces determining DNA sequence variation to be more easily elucidated (David and Capy 1988Citation ; Begun and Aquadro 1995bCitation ). The goals of this study are (1) to test whether there is evidence for natural selection attributable to the relatively recent origin of bcd in the Dipteran lineage, and (2) to test whether the pattern of variation in the 3' UTR is consistent with the presence of a large conserved mRNA secondary structure.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Acknowledgements
 References
 
Drosophila Strains
A total of 25 D. melanogaster isofemale lines collected from Lake Kariba, Zimbabwe (kindly provided by C. Aquadro) were used in this survey. To isolate individual bcd alleles, a male from each of the isofemale lines was crossed to virgin females from a Df(3R)MAP117 pp/TM3, Sb1 line (obtained from the Bloomington Fly Stock Center), which contains a deleted bcd region. A single male offspring with wild-type bristles was again crossed to Df(3R)MAP117 pp/TM3, Sb1 virgin females, and offsprings with wild-type bristles were used for genomic DNA extraction. For the interspecific comparison, one D. simulans stock collected in Davis, California was used (kindly provided by H. A. Orr). All flies were raised and crossed at room temperature and on standard media.

DNA Extraction, PCR Amplification, and Direct Sequencing of the bcd Alleles
Genomic DNA was extracted from homozygous whole flies with the DNeasy tissue kit (Qiagen). Oligonucleotides for amplification and direct sequencing were designed based on a previously published D. melanogaster bcd sequence (GenBank accession number X07870). These primers were used in PCR reactions to amplify a 4-kb region of bcd, comprising 450 bp of 5' flanking region, the entire coding region, and 1 kb of 3' flanking region. PCR products were purified with QIAquick columns (Qiagen), and both strands were subsequently sequenced using primers spaced ~400 bp apart. Sequencing was performed on an ABI377 automated sequencer with the Dye Terminator chemistry (Perkin-Elmer). The homologous region of D. simulans was amplified using the PCR primers designed for D. melanogaster, and new D. simulans primers were designed based on the available D. simulans sequence and used if the D. melanogaster primers failed in the sequencing reactions because of mismatches. The D. melanogaster sequences are deposited in GenBank as a population set with accession numbers AF466621–45, and the accession number of the D. simulans sequence is AF465792. The coordinates according to the reference sequence (GenBank accession X07870) are used throughout this paper.

Sequence Analysis
Sequences were assembled and aligned with the SeqEd program (Perkin-Elmer), and all variable sites were checked manually and verified in both strands. The bcd gene of D. melanogaster is alternatively spliced at intron 2 (positions 2216–2270, 55 bp; or positions 2216–2255, 40 bp). Because smaller introns (<51 bp) are usually spliced less efficiently (Mount et al. 1992Citation ), the assignment of coding and noncoding regions are according to the major transcript; i.e., positions 2256–2270 are regarded as noncoding region. The homologous region of D. simulans was aligned to the D. melanogaster bcd sequence, and gaps in the alignment were not used in the sequence analysis. After the sequence alignment, the coding and noncoding regions of D. simulans bcd were assigned according to the D. melanogaster sequence. The DnaSP program version 3.50 (Rozas J and Rozas R 1999Citation ) was used for most intraspecific and interspecific analyses. Nucleotide diversity, {theta}, was estimated according to Watterson (1975)Citation and {pi} according to Nei (1987, p. 256)Citation . Nucleotide divergence, {kappa}, between D. melanogaster and both D. simulans and D. pseudoobscura (GenBank accession number X55735) was estimated according to Nei (1987, p. 65)Citation .

The following neutrality tests were performed using the program DnaSP (Rozas J and Rozas R 1999Citation ): The HKA test (Hudson, Kreitman, and Aguadé 1987Citation ), Tajima's (1989)Citation test, and the McDonald and Kreitman (1991)Citation test. The probabilities for the McDonald and Kreitman (1991)Citation test were obtained by both the two-tailed Fisher's exact test and the G-test. To detect heterogeneity in the ratio of polymorphism to divergence in the region surveyed, the program DNA slider (McDonald 1996Citation , 1998Citation ) was used.

Coalescent simulations for obtaining the probabilities of the number of haplotypes and haplotype diversity (Depaulis and Veuille 1998Citation ) were performed using the program DnaSP (Rozas J and Rozas R 1999Citation ), and the haplotype test of Hudson et al. (1994)Citation was performed using a program written by J. Braverman (kindly provided by J. Parsch). The test of Hudson et al. (1994)Citation determines the probability of observing a subset of alleles of size i with j or fewer segregating sites, given an overall sample of n alleles with S segregating sites. The recombination parameter, R, was estimated by three methods, including two based on polymorphism data (Hudson 1987Citation ; Hey and Wakeley 1997Citation ) and one determined from experimental laboratory crosses (Comeron, Kreitman, and Aguadé 1999Citation ). The program of Comeron, Kreitman, and Aguadé (1999)Citation (kindly provided by J. Comeron) estimates recombination rates in D. melanogaster based on cytological map position using polynomial curves (Kliman and Hey 1993Citation ) as a function of the amount of DNA in each chromosomal division versus the change in cytological map position (Sorsa 1988Citation ). For the coalescent simulations, 10,000 replicates were performed for each test.

Inversion Analysis
Larvae were grown in yeast-rich corn meal-sugar media at 18°C without larval crowding. Late third instar larvae were dissected in a drop of insect Ringer's solution. Both salivary glands were placed under a coverslip with a drop of lacto-aceto-orcein stain. After 5 min, the glands were squashed, and polytene chromosomes were analyzed under a phase contrast microscope at 100x. Inversion break points were determined according to the photographic map of Lefevre (1976)Citation .


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Acknowledgements
 References
 
A total of 25 D. melanogaster bcd alleles derived from isofemale lines from Lake Kariba, Zimbabwe, and one D. simulans allele were sequenced for a 4,019-bp region (coordinates 971–4,990, according to GenBank accession X07870; position 3599 of the reference sequence is deleted in our 25 lines), spanning the 3' portion of the 5' UTR and nearly the entire transcript length. A total of 40 nucleotide and 5-length polymorphisms were detected in the 13 D. melanogaster haplotypes found in this sample (fig. 1 ). Of the 40 nucleotide polymorphisms, 7 occurred in the coding region and 33 in noncoding regions (5' flanking region, intron 1, intron 3, 3' UTR, and 3' flanking region). Interestingly, out of the seven polymorphic sites in the coding region, one was synonymous compared with six nonsynonymous polymorphisms. A large-length polymorphism (a 7-bp insertion [AGGGAAG] followed by a perfect 138-bp duplication of positions 3427–3564 at position 3565) in intron 3 was found in four alleles. A second complex change in intron 1 is only found in one allele and involves multiple indels: ATATGAATTGTGGGGCAAC in the reference sequence at positions 1828–1846 are replaced by TACGTTATTGTTATAATTGTTAA in line 377. Interspecific comparison with D. simulans revealed 32 synonymous and 6 nonsynonymous differences in the coding region, with divergence at silent and noncoding sites falling within the range of several previously surveyed genes (Eanes, Kirchner, and Yoon 1993Citation ).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 1.—DNA polymorphisms of the bcd gene region found in 25 lines of D. melanogaster. The line number of each strain is indicated on the far left side, and the halotype classes they belong to are shown as H1 to H4. The nucleotides of the consensus sequence are shown along the top. The coordinates above the sequence represent the positions of each segregating site according to the reference sequence (GenBank accession X07870). The five length polymorphisms are noted as follows: a, deletion of G at position 1141; b, deletion of GTA at positions 1811–1813; c, complex changes (multiple indels) at positions 1828–1846; d, T4 in place of T3 at positions 2133–2135; e, a 7-nt insertion (AGGGAAG) followed by a perfect 138 nt duplication from 3427 to 3564 at position 3565

 
Silent Polymorphism and Divergence
Estimates of {pi} (Nei 1987, p. 256Citation ) and {theta} (Watterson 1975Citation ) are based on the number of equivalent silent sites according to Rozas J and Rozas R (1999)Citation (table 1 ). The nucleotide diversity for noncoding regions ({theta} = 0.0035, {pi} = 0.0038) is lower than average values for D. melanogaster (~0.011, Moriyama and Powell 1996Citation ), probably because of the lower than average recombination rate of the bcd region. In particular, the level of polymorphism at synonymous sites in the coding region ({theta} = 0.0008, {pi} = 0.0002) was substantially lower than the average values of {theta} = 0.014 and {pi} = 0.013 (Moriyama and Powell 1996Citation ). On the other hand, the estimates of nucleotide diversity are two- to threefold higher than the values found in regions of restricted recombination rates at the tip and base of the X chromosome in a Zimbabwe population (Begun and Aquadro 1995aCitation ). Divergence estimates for both introns ({kappa} = 0.102) and the coding region ({kappa} = 0.094) are typical for D. melanogaster and D. simulans (Moriyama and Powell 1996Citation ), whereas the 3' UTR, which contains an important secondary structural element involved in mRNA localization (Macdonald 1990Citation ), was more conserved ({kappa} = 0.040).


View this table:
[in this window]
[in a new window]
 
Table 1 Polymorphism and Divergence in the bcd gene

 
Two tests of neutrality were performed on the silent polymorphism of this data set. First, Tajima's (1989)Citation test was applied to test whether the frequency spectrum of silent polymorphism significantly deviates from the neutral expectation. The value of the D statistic (D = -0.073) did not significantly deviate from zero; thus, the null hypothesis that the silent polymorphisms are selectively neutral cannot be rejected.

A second test of neutrality, the HKA test (Hudson, Kreitman, and Aguadé 1987Citation ), examines the prediction of the neutral mutation hypothesis that levels of intraspecific polymorphism are positively correlated with levels of interspecific divergence. We tested six possible pairwise comparisons using 5' UTR, intron 1, intron 3, and the 3' UTR. Though none of these tests were significant, comparisons between the 5' and 3' UTR (P = 0.065) and between intron 3 and the 3' UTR (P = 0.053) approach significance (table 2 ). Comparisons between both the 5' UTR and intron 3 with intron 1 also give small P values of around 0.1. These may suggest lower levels of polymorphism in 5' UTR and intron 3 when compared with the two other reference noncoding loci and heterogeneity in the ratio of polymorphism to divergence along the bcd DNA sequence. However, the HKA test was applied somewhat post hoc, so the results should be interpreted with caution.


View this table:
[in this window]
[in a new window]
 
Table 2 Pairwise HKA Tests Between Various Regions of bcd

 
Replacement Polymorphism and Divergence
A surprising observation was that six out of seven polymorphic sites in the coding region are replacement polymorphisms. Under the hypothesis of neutral protein evolution, the ratio of replacement to synonymous fixed differences between species should be the same as the ratio of replacement to synonymous polymorphisms within species (McDonald and Kreitman 1991Citation ). The comparison between D. melanogaster and D. simulans revealed 32 synonymous and six replacement fixed differences. Using the statistical test developed by McDonald and Kreitman (1991)Citation , we found a highly significant difference between these two ratios (P = 0.0007 by Fisher's exact test; see table 3 ). This is not expected under the standard neutral model and may be interpreted as either an excess of replacement or a deficiency of synonymous polymorphism within the D. melanogaster population. To distinguish between these two possibilities, we used the method of Tavaré (1984)Citation and Hudson (1990)Citation . Our observation of one synonymous polymorphism is not inconsistent with a value of {theta} = 0.0033 for the coding region (P = 0.115). This suggests that if our estimate of {theta} = 0.0033 obtained for the entire bcd region is representative for the coding region as well, the significant result of the McDonald-Kreitman test is mainly caused by an excess of replacement polymor-phism.


View this table:
[in this window]
[in a new window]
 
Table 3 Result of the McDonald–Kreitman test

 
To further investigate the forces determining the population dynamics of these polymorphisms, we analyzed their frequency spectrum. All six replacement polymorphisms are at low frequency (<=4/25), three of which are singletons. Though Tajima's (1989)Citation test applied specifically to these six replacement polymorphisms did not significantly deviate from zero (D = -1.184), its negative value is indicative of an excess of low-frequency variants, consistent with the hypothesis that they are slightly deleterious.

Another noteworthy observation is the distribution of both replacement fixed differences and polymorphisms along the Bicoid protein (fig. 2 ). All six replacement polymorphisms and five of six replacement fixed differences cluster within one of two regions of the protein. The first region (amino acids 249–332, corresponding to positions 2765–3025 of the reference sequence) contains a cluster of three polymorphisms and four fixed differences and overlaps opa-like repeats and portions of the protein with no known function (i.e., linker or hinge regions) (Seeger and Kaufman 1990Citation ). The second region (amino acids 433–455, corresponding to positions 3839–3907 of the reference sequence) contains a cluster of three polymorphisms and one fixed difference and overlaps a region that may contain an RNA recognition motif (Seeger and Kaufman 1990Citation ). In an interspecific comparison of the D. melanogaster sequence with D. pseudoobscura, a sliding window analysis of divergence at replacement sites revealed a peak of divergence in this region (fig. 3 ), and the ratio of replacement to silent substitutions, {kappa}a/{kappa}s, was nearly four times greater ({kappa}a/{kappa}s = 0.554) than the value for the entire coding region ({kappa}a/{kappa}s = 0.139).



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 2.—Sliding window plot of silent nucleotide diversity ({pi}) within D. melanogaster and silent nucleotide divergence ({kappa}) between D. melanogaster and D. simulans. Note that different scales for {pi} and {kappa} are used. The size of the sliding window is 100 silent sites, and step size is 25 sites. The structure of the bcd gene is represented below the plot. Exons are shown as bars, and translated regions are shaded. The vertical lines above the gene structure indicate the positions of amino acid–replacement polymorphisms within D. melanogaster; the vertical lines below the gene structure indicate the positions of fixed amino acid differences between species. The nucleotide numbering is according to the reference sequence (GenBank accession X07870)

 


View larger version (19K):
[in this window]
[in a new window]
 
Fig. 3.—Sliding window plot of the divergence of replacement sites between the D. melanogaster and D. pseudoobscura coding regions. Note that only the coding portion is shown because of difficulties with the alignment of the noncoding regions. The coordinates at the bottom refer to nucleotide positions in the coding region (not to the reference sequence, GenBank accession X07870). The size of the sliding window is 100 nt, and step size is 25 nt

 
Haplotype Structure
On an observation of the 13 haplotypes in our data set, the pattern of linkage disequilibrium easily lends itself to a classification of haplotypes, which we designate H1 through H4 (fig. 1 ). Classes H1, H3, and H4 are monomorphic or show few polymorphisms within class but are distinct from one another at at least six sites. The remaining haplotypes comprise the H2 class. Linkage disequilibrium is extensive and includes nearly the entire 4,019-bp region. Out of a total of 300 pairwise comparisons made between 25 polymorphic sites (only informative sites are considered), 102 are significant by Fisher's exact test, 24 of which are significant after a Bonferroni correction (fig. 4 ); and 119 are significant by the {chi}2 test, with 40 significant after a Bonferroni correction. Some of the most highly significant (P < 0.001 after Bonferroni correction) comparisons are between sites 3-kb apart. Though the average correlation over all pairwise comparisons, ZnS (Kelly 1997Citation ), did not significantly deviate from the neutral expectation, linkage disequilibrium does not decay with distance, as expected under neutrality.



View larger version (66K):
[in this window]
[in a new window]
 
Fig. 4.—Linkage disequilibria between polymorphic sites in the bcd gene from 25 D. melanogaster lines. Only the informative sites were used in Fisher's exact test. Squared, lined, and dotted boxes indicate the significance levels at 0.001, 0.01, and 0.05, respectively. Black boxes indicate significant linkage disequilibria after Bonferroni correction. The comparisons indicated by the white boxes are not significant

 
We applied several statistical tests to determine whether the apparent structuring of haplotypes deviated from the neutral expectation. The recombination parameter, R, was estimated by three methods (see Materials and Methods). The method of Hudson (1987)Citation yielded an estimate of R = 25, the method of Hey and Wakeley (1997)Citation 43, and that of Comeron, Kreitman, and Aguadé (1999)Citation 39. Using the more conservative estimate of R = 25, we first performed coalescent simulations using the computer program DnaSP (Rozas J and Rozas R 1999Citation ). We found a significant reduction in both the number of haplotypes (P = 0.032) and haplotype diversity (P = 0.021) (Depaulis and Veuille 1998Citation ) in the observed data when compared with the results of the coalescent simulations. Significantly low values indicate a structuring of polymorphic sites into a small number of haplotypes (Depaulis and Veuille 1998Citation ). Second, the haplotype test of Hudson et al. (1994)Citation indicates a significantly lower number of segregating sites within haplotype 13 (H4 class) than expected under a neutral equilibrium model (P = 0.039).

Given the pattern of linkage disequilibrium and significant structuring of haplotypes, we further investigated our data in the context of two possible explanations: associations with polymorphic inversions or RNA secondary structure (or both). First, because D. melanogaster is known to be polymorphic for well over 300 inversions (Das and Singh 1991Citation ; Lemeunier and Aulard 1992Citation ), four of which are cosmopolitan and reach appreciable frequencies, we investigated whether certain haplotype classes (i.e., H1 through H4) may be associated with chromosomal inversion types. Though the cosmopolitan inversion In(3R)P (breakpoints 89C to 96A) was found segregating at approximately 14%, there is no association between this inversion and haplotype classes, and no inversions associated with the cytological interval containing bcd (84A5) were detected.

Second, we analyzed linkage disequilibrium with respect to the large, conserved secondary structural element in the 3' UTR (Macdonald 1990Citation ). This structure plays an essential role in the localization of bcd mRNA and has been well characterized by both mutational (Ferrandon et al. 1997Citation ; Macdonald and Kerr 1998Citation ) and phylogenetic analyses (Parsch, Braverman, and Stephan 2000Citation ). However, there seems to be no obvious association between the observed haplotype structure and the predicted mRNA secondary structure in the 3' UTR (further discussed subsequently).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Acknowledgements
 References
 
Main Observations
We surveyed nucleotide variation of the bcd transcriptional unit (from the 3' portion of the 5' UTR to the end of the 3' UTR) of a population of D. melanogaster from Zimbabwe. Our survey produced several salient features that will each be discussed in turn. The first and most striking observation is the significant excess of replacement polymorphism found by the McDonald-Kreitman test. Second, extensive linkage disequilibria across the region and a structuring of variants at polymorphic sites into haplotype classes are observed. Third, and less significant, evidence for heterogeneity in the ratio of polymorphism to divergence ({pi}/{kappa}) along the gene region is detected. In the following, we discuss the possible evolutionary forces underlying these patterns of variation.

Evidence for Relaxed Purifying Selection
A significant excess of replacement polymorphisms is usually interpreted as a relaxation of selective constraints. The effect of purifying selection may be weaker on some amino acid replacement mutations than others, and slightly deleterious polymorphisms may persist at low frequencies within a population for a period of time because of genetic drift but are unlikely to either rise in frequency or become fixed (Kimura 1983, p. 44Citation ; Ohta 1992Citation ). Under this scenario, slightly deleterious mutations contribute more to intraspecific polymorphism than to interspecific fixed differences (Kimura 1983Citation ; Ohta 1992Citation ). Several other studies have reported similar cases of an excess of intraspecific replacement polymorphism in mitochondrial DNA (Ballard and Kreitman 1994Citation ; Nachman, Boyer, and Aquadro 1994Citation ; Nachman et al. 1996Citation ; Rand and Kann 1996Citation ; Wise, Sraml, and Easteal 1998Citation ) and in one case a nuclear gene, Pgm (Verrelli and Eanes 2000Citation , 2001Citation ), and interpretation has ranged from that of slightly deleterious mutations to positive selection.

At least some of our observed amino acid polymorphisms at the bcd locus appear to be slightly deleterious. All the polymorphisms are at relatively low frequency (4%–16%), three of which are singletons; all of them are derived. Three of these polymorphisms also cause drastic changes in amino acid property (table 4 ). It is usually hard to imagine that replacement polymorphisms, especially those that drastically change encoded amino acid properties, are only slightly deleterious. Therefore, we investigated the protein regions where the replacement polymorphisms are found. The three found in exon 3 are located within an opa-like repeat region and regions with no known function (possibly linker or hinge regions in the polypeptide chain) (Seeger and Kaufman 1990Citation ). Therefore, despite the changes in amino acid property, their effect could be minimal because of the functional insignificance of their location. The other three replacement polymorphisms are located in exon 4, within a region containing a putative, but ill-characterized RNA recognition motif (Seeger and Kaufman 1990Citation ). Sliding window analysis of divergence between D. melanogaster and D. pseudoobscura shows that this region overlaps with a peak in both divergence and the replacement to silent substitution ratio ({kappa}a/{kappa}s = 0.554) (fig. 3 ). {kappa}a/{kappa}s ratios are usually kept low by purifying selection ({kappa}a/{kappa}s = 0.139 for the entire coding region). The rise in {kappa}a/{kappa}s ratio may suggest that this region is under less constraint than other parts of the molecule, and the replacement polymorphisms (sites 3839, 3899, and 3905) found here are caused by a relaxation of purifying selection.


View this table:
[in this window]
[in a new window]
 
Table 4 Polymorphic Replacement Changes in the bcd gene

 
It is possible that selection on parts of bcd is relaxed because of its apparent evolutionary history as a duplicated gene. Homologues of bcd have been identified only in Drosophila and some of its close relatives (higher Dipterans). It was originally thought that "rapid" evolution of the homeoprotein led to a subsequent deterioration of homology. However, caudal genes, as well as some other homeobox genes of the Drosophila segmentation gene cascade have been found to be well conserved in evolution (Patel 2000Citation ). This poses the question of how bcd has evolved and adopted its function in anterior patterning. Recent studies suggest that bcd may have originated from a duplication of zen, which is downstream of bcd in the hox gene cluster (Dearden and Akam 1999Citation ; Stauber, Jäckle, and Schmidt-Ott 1999Citation ; Patel 2000Citation ).

If bcd arose through tandem duplication from zen, then it would initially have a function very similar to that of zen. This would allow deleterious mutations to rise in frequency without being eliminated by purifying selection. At the same time, positively selected mutations may also arise, go to fixation, and thus give bcd new functions. However, the functions of bcd and zen may still be similar enough that adaptive sweeps and relaxed selection in localized regions of the gene occur simultaneously. In the following, we discuss the evidence that, in addition to the relaxed purifying selection, positive selection has also occurred at bcd or at near sites. This evidence is for the most part based on the other two observations, namely the extensive haplotype structure and the apparent heterogeneity in the polymorphism-to-divergence ratio along the gene.

Evidence for Positive Selection
First, we observed extensive linkage disequilibria and distinct haplotype structure in our sample of bcd alleles, and positive natural selection is likely involved in maintaining this structure. We found a significantly smaller number of haplotypes than the neutral expectation (P = 0.032) by the Depaulis and Veuille (1998)Citation test of haplotypes. Coalescent simulations with the most conservative value of the recombination parameter, R, gave the range for the number of haplotypes as [13, 22], and we observed 13 haplotypes in our sample. It appears that the variants at the polymorphic sites structure into a few haplotypes. Because the Zimbabwe population typically exhibits less linkage disequilibria than non-African populations and is thought to be a panmictic population close to mutation-drift equilibrium (Begun and Aquadro 1993Citation ), demographic and bottleneck effects should be minimal. Thus, it seems likely that balancing selection or partial selective sweeps of haplotypes are contributing to this pattern. In the latter model, variants are positively selected for, but fail to go to fixation because of "traffic" with haplotypes where selection is acting on other sites (Kirby and Stephan 1996Citation ).

In particular, the H4 haplotype appears to be the target of positive selection operating at or near the bcd locus. This haplotype has a frequency of 28% and has no within-class variation. By applying a statistical test for high-frequency haplotypes (Hudson et al. 1994Citation ), our results show that there is too little variation within the H4 class, given the level of variation in the rest of the sample, and cannot be explained by a neutral equilibrium model of mutation and drift. This pattern suggests that this haplotype has arisen recently and is being pulled to high frequency because of directional selection at or near the region we surveyed.

A second phenomenon possibly attributable to positive selection is the heterogeneity in the ratio of polymorphism to divergence along the bcd region. The {pi}/{kappa} ratio is high in the intron 1 region (0.0949), whereas it is low in the coding region (0.0025) (table 1 ), which is suggestive of balancing selection on sites within intron 1 or a selective sweep (or both) that occurred in the region of exon 2 to exon 4. Comparisons between gene regions involving intron 1 by the HKA test approach significance (table 2 ). In addition, the run's test (McDonald 1996Citation , 1998Citation ), a measure of heterogeneity, produces a marginally significant result (P {approx} 0.05). However, the results of these tests alone do not put forth convincing evidence for such underlying selective mechanisms. It is difficult to distinguish this pattern of heterogeneity from merely neutral fluctuations of polymorphisms along the gene region (Kim and Stephan 2002Citation ).

Haplotype Structure and mRNA Secondary Structure in the bcd 3' UTR
The cis-acting sequences necessary for the localization of bcd mRNA fall within a large, phylogenetically conserved and well-characterized secondary structural element in the 3' UTR (Macdonald and Struhl 1988Citation ; Macdonald 1990Citation ; Seeger and Kaufman 1990Citation ; Ferrandon et al. 1997Citation ; Macdonald and Kerr 1998Citation ; Parsch, Braverman, and Stephan 2000Citation ). We investigated whether the observed linkage disequilibrium and haplotype structure is related to the maintenance of the mRNA secondary structure in the 3' UTR.

Of the 40 nucleotide polymorphisms observed within the entire bcd gene region, nine fall within the region of the localization signal in the 3' UTR, and only two (A4350C and G4612A) are located within the pairing regions supported by both phylogenetic study (Parsch, Braverman, and Stephan 2000Citation ) and mutational analysis (Ferrandon et al. 1997Citation ; Macdonald and Kerr 1998Citation ). Both these substitutions cause mismatches in the original Watson-Crick pair of the mRNA secondary structure. No covariations (compensatory mutations) with the pairing regions were observed in our sample of the Zimbabwe population. This is in qualitative agreement with the theoretical results developed under a two-locus, two-allele, reversible mutation-compensatory model (Innan and Stephan 2001Citation ). According to the predictions of this model, the populations spend most of the time in the first stage of waiting for a successful double mutant to appear in the population. The second stage of fixing the successful double mutant in the population is much shorter than the first stage (Innan and Stephan 2001Citation ). Their simulations showed that almost no linkage disequilibrium caused by compensatory interactions is expected during the first stage, so it is unlikely to observe much linkage disequilibrium or covariations caused by epistatic selection on mRNA secondary structures. Thus, the strong haplotype pattern and linkage disequilibria of our data set cannot be explained by epistatic selection on the bcd 3' UTR.


    Conclusion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Acknowledgements
 References
 
We examined DNA sequence variation for a 4-kb region of the bicoid gene of 25 D. melanogaster isofemale lines from Zimbabwe and one allele from D. simulans. Statistical tests revealed a significant excess of replacement polymorphisms in the D. melanogaster lineage that are clustered in two putative linker regions of the Bicoid protein. These are likely caused by relaxed purifying selection on functionally unimportant regions of bcd. In addition, we found extensive linkage disequilibria across the region and a significantly smaller number of haplotypes than the neutral expectation, which may suggest that positive selection has also played a role in the history of this duplicated gene. On the other hand, we did not find any relationship between the strong haplotype pattern of our data set and epistatic selection maintaining the mRNA secondary structure in the bcd 3' UTR.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Acknowledgements
 References
 
We would like to thank Yuseob Kim and John Parsch for helpful discussions and comments on the manuscript and Jake Russell for assistance in sequencing and fly crosses. This work was supported in part by National Institutes of Health grant GM-58405 and funds from the University of Munich to W.S. and an Ernst Caspari fellowship from the University of Rochester to Y.C.


    Footnotes
 
David Rand, Reviewing Editor

1 Contributed equally to this work Back

Keywords: replacement polymorphism haplotype structure gene duplication biciod Drosophila melanogaster Back

Address for correspondence and reprints: Wolfgang Stephan, Department Biologie II—Evolutionsbiologie, Universität München, Luisenstr. 14, 80333 München, Germany. stephan{at}zi.biologie.uni-muenchen.de Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusion
 Acknowledgements
 References
 

    Ballard J. W., M. Kreitman, 1994 Unraveling selection in the mitochondrial genome of Drosophila Genetics 138:757-772[Abstract/Free Full Text]

    Begun D. J., C. F. Aquadro, 1993 African and North American populations of Drosophila melanogaster are very different at the DNA level Nature 365:548-550[ISI][Medline]

    ———. 1995a. Evolution at the tip and base of the X chromosome in an African population of Drosophila melanogaster Mol. Biol. Evol 12:382-390[Abstract]

    ———. 1995b. Molecular variation at the vermilion locus in geographically diverse populations of Drosophila melanogaster and D. simulans Genetics 140:1019-1032[Abstract/Free Full Text]

    Berleth T., M. Burri, G. Thoma, D. Bopp, S. Richstein, G. Frigerio, M. Noll, C. Nüsslein-Volhard, 1988 The role of localization of bicoid RNA in organizing the anterior pattern of the Drosophila embryo EMBO J 7:1749-1756[Abstract]

    Chen Y., D. B. Carlini, J. F. Baines, J. Parsch, J. M. Braverman, S. Tanda, W. Stephan, 1999 RNA secondary structure and compensatory evolution Genes Genet. Syst 74:271-286[ISI][Medline]

    Comeron J. M., M. Kreitman, M. Aguadé, 1999 Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila Genetics 151:239-249[Abstract/Free Full Text]

    Das A., B. N. Singh, 1991 Chromosomal polymorphism in Indian natural populations of Drosophila melanogaster Korean J. Genet 13:97-112

    David J. R., P. Capy, 1988 Genetic variation of Drosophila melanogaster natural populations Trends Genet 4:106-111[ISI][Medline]

    Dearden P., M. Akam, 1999 Developmental evolution: axial patterning in insects Curr. Biol 9:R591-R594[ISI][Medline]

    Depaulis F., M. Veuille, 1998 Neutrality tests based on the distribution of haplotypes under an infinite-site model Mol. Biol. Evol 15:1788-1790[Free Full Text]

    Driever W., C. Nüsslein-Volhard, 1988 The bicoid protein determines position in the Drosophila embryo in a concentration-dependent manner Cell 54:95-104[ISI][Medline]

    Dubnau J., G. Struhl, 1996 RNA recognition and translational regulation by a homeodomain protein Nature 379:694-699[ISI][Medline]

    Eanes W. F., M. Kirchner, J. Yoon, 1993 Evidence for adaptive evolution of the G6pd gene in the Drosophila melanogaster and Drosophila simulans lineages Proc. Natl. Acad. Sci. USA 90:7475-7479[Abstract/Free Full Text]

    Ferrandon D., I. Koch, E. Westhof, C. Nüsslein-Volhard, 1997 RNA-RNA interaction is required for the formation of specific bicoid mRNA 3' UTR-STAUFEN ribonucleoprotein particles EMBO J 16:1751-1758[Abstract/Free Full Text]

    Frohnhöfer H. G., R. Lehmann, C. Nüsslein-Volhard, 1986 Manipulating the anteroposterior pattern of the Drosophila embryo J. Embryol. Exp. Morphol 97: (Suppl.) 169-179

    Hey J., J. Wakeley, 1997 A coalescent estimator of the population recombination rate Genetics 145:833-846[Abstract/Free Full Text]

    Hudson R. R., 1987 Estimating the recombination parameter of a finite population model without selection Genet. Res 50:245-250[ISI][Medline]

    ———. 1990 Gene genealogies and the coalescent process Oxf. Surv. Evol. Biol 7:1-44

    Hudson R. R., K. Bailey, D. Skarecky, J. Kwiatowski, F. J. Ayala, 1994 Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster Genetics 136:1329-1340[Abstract/Free Full Text]

    Hudson R. R., M. Kreitman, M. Aguadé, 1987 A test of neutral molecular evolution based on nucleotide data Genetics 116:153-159[Abstract/Free Full Text]

    Innan H., W. Stephan, 2001 Selection intensity against deleterious mutations in RNA secondary structures and rate of compensatory nucleotide substitutions Genetics 159:389-399[Abstract/Free Full Text]

    Kelly J. K., 1997 A test of neutrality based on interlocus associations Genetics 146:1197-1206[Abstract/Free Full Text]

    Kim Y., W. Stephan, 2002 Detecting a local signature of genetic hitchhiking on a recombining chromosome Genetics 160:765-777[Abstract/Free Full Text]

    Kimura M., 1983 The neutral theory of molecular evolution Cambridge University Press, Cambridge, Mass

    Kirby D. A., W. Stephan, 1996 Multi-locus selection and the structure of variation at the white gene of Drosophila melanogaster Genetics 144:635-645[Abstract/Free Full Text]

    Kliman R. M., J. Hey, 1993 Reduced natural selection associated with low recombination in Drosophila melanogaster Mol. Biol. Evol 10:1239-1258[Abstract]

    Lefevre G., 1976 A photographic representation and interpretation of the polytene chromosomes of Drosophila melanogaster salivary glands Pp. 31–66 in M. Ashburner and E. Novitski, eds. The genetics and biology of Drosophila. Academic Press, New York

    Lemeunier F., S. Aulard, 1992 Inversion polymorphism in Drosophila melanogaster Pp. 339–405 in C. B. Krimbas and J. R. Powell, eds. Drosophila inversion polymorphism. CRC Press, Boca Raton, Fla

    Macdonald P. M., 1990 bicoid mRNA localization signal: phylogenetic conservation of function and RNA secondary structure Development 110:161-171[Abstract]

    Macdonald P. M., K. Kerr, 1998 Mutational analysis of an RNA recognition element that mediates localization of bicoid mRNA Mol. Cell. Biol 18:3788-3795[Abstract/Free Full Text]

    Macdonald P. M., G. Struhl, 1988 cis-acting sequences responsible for anterior localization of bicoid mRNA in Drosophila embryos Nature 336:595-598[ISI][Medline]

    McDonald J. H., 1996 Detecting non-neutral heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence Mol. Biol. Evol 13:253-260[Abstract]

    ———. 1998 Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence Mol. Biol. Evol 15:377-384[Abstract]

    McDonald J. H., M. Kreitman, 1991 Adaptive protein evolution at the Adh locus in Drosophila Nature 351:652-654[ISI][Medline]

    Moriyama E. N., J. R. Powell, 1996 Intraspecific nuclear DNA variation in Drosophila Mol. Biol. Evol 13:261-277[Abstract]

    Mount S. M., C. Burks, G. Hertz, G. D. Stormo, O. White, C. Fields, 1992 Splicing signals in Drosophila: intron size, information content, and consensus sequences Nucleic Acids Res 20:4255-4262[Abstract]

    Nachman M. W., S. N. Boyer, C. F. Aquadro, 1994 Nonneutral evolution at the mitochondrial NADH dehydrogenase subunit 3 gene in mice Proc. Natl. Acad. Sci. USA 91:6364-6368[Abstract]

    Nachman M. W., W. M. Brown, M. Stoneking, C. F. Aquadro, 1996 Nonneutral mitochondrial DNA variation in humans and chimpanzees Genetics 142:953-963[Abstract/Free Full Text]

    Nei M., 1987 Molecular evolutionary genetics Columbia University Press, New York

    Ohta T., 1992 Theoretical study of near neutrality. II. Effect of subdivided population structure with local extinction and recolonization Genetics 130:917-923[Abstract/Free Full Text]

    Parsch J., J. M. Braverman, W. Stephan, 2000 Comparative sequence analysis and patterns of covariation in RNA secondary structures Genetics 154:909-921[Abstract/Free Full Text]

    Patel N. H., 2000 It's a bug's life Proc. Natl. Acad. Sci. USA 97:4442-4444[Abstract/Free Full Text]

    Rand D. M., L. M. Kann, 1996 Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans Mol. Biol. Evol 13:735-748[Abstract]

    Rozas J., R. Rozas, 1999 DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis Bioinformatics 15:174-175[Abstract/Free Full Text]

    Schaeffer V., D. Killian, C. Desplan, E. A. Wimmer, 2000 High bicoid levels render the terminal system dispensable for Drosophila head development Development 127:3993-3999[Abstract/Free Full Text]

    Schröder R., K. Sander, 1993 A comparison of transplantable bicoid activity and partial bicoid homeobox sequences in several Drosophila and blowfly species (Calliphoridae) Roux's Arch. Dev. Biol 203:34-43[ISI]

    Seeger M. A., T. C. Kaufman, 1990 Molecular analysis of the bicoid gene from Drosophila pseudoobscura: identification of conserved domains within coding and noncoding regions of the bicoid mRNA EMBO J 9:2977-2987[Abstract]

    Sorsa V., 1988 Chromosome maps of Drosophila CRC Press, Boca Raton, Fla

    Stauber M., H. Jäckle, U. Schmidt-Ott, 1999 The anterior determinant bicoid of Drosophila is a derived Hox class 3 gene Proc. Natl. Acad. Sci. USA 96:3786-3789[Abstract/Free Full Text]

    Stauber M., A. Prell, U. Schmidt-Ott, 2002 A single Hox3 gene with composite bicoid and zerknüllt expression characteristics in non-Cyclorrhaphan flies Proc. Natl. Acad. Sci. USA 99:274-279[Abstract/Free Full Text]

    Tajima F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism Genetics 123:585-595[Abstract/Free Full Text]

    Tavaré S., 1984 Line-of-descent and genealogical processes, and their applications in population genetics models Theor. Popul. Biol 26:119-164[ISI][Medline]

    Verrelli B. C., W. F. Eanes, 2000 Extensive amino acid polymorphism at the pgm locus is consistent with adaptive protein evolution in Drosophila melanogaster Genetics 156:1737-1752[Abstract/Free Full Text]

    ———. 2001 Clinal variation for amino acid polymorphisms at the Pgm locus in Drosophila melanogaster Genetics 157:1649-1663[Abstract/Free Full Text]

    Watterson G. A., 1975 On the number of segregating sites in genetical models without recombination Theor. Popul. Biol 7:256-276[ISI][Medline]

    Wise C. A., M. Sraml, S. Easteal, 1998 Departure from neutrality at the mitochondrial NADH dehydrogenase subunit 2 gene in humans, but not in chimpanzees Genetics 148:409-421[Abstract/Free Full Text]

Accepted for publication January 8, 2002.