* Institute of Molecular Evolutionary Genetics, Department of Biology, Pennsylvania State University
Department of Ecology and Evolution, University of Chicago
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Drosophila regulatory regions transcription factor regulatory evolution binding site divergence
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Understanding the evolutionary processes that regulatory sequences undergo will substantially improve our understanding of their functional constraints. Numerous comparative methods (termed "phylogenetic footprinting") have been developed that use sequence conservation to infer function (reviewed in Hardison et al. 1997 and Hardison 2000; Miller 2001). It may be problematic, however, to assume that all functional sequences are conserved and all nonfunctional sequences have diverged. For instance, sequence comparisons of closely related species produce many false positive results, and comparison of distantly related species has low power to detect functional elements in the species compared (Dermitzakis and Clark 2002). Moreover, it has been shown that regulatory sequences can maintain regulatory function despite structural reorganization as a result of species-specific loss and gain of transcription factor binding sites (Piano et al. 1999; Ludwig et al. 2000; Cuadrado, Sacristan, and Antequera 2001).
Given that the binding site is one of the fundamental units of regulatory structure, it is also likely to be one of the fundamental units of regulatory evolution. Therefore, methods that can predict the binding site composition of a sequence should provide a powerful means for quantifying regulatory sequence divergence in comparative analyses. Application of predictive tools such as probability weight matrices (PWMs) has been useful in the annotation of regulatory regions (Stormo 2000; Berman et al. 2002), and such tools have been shown to be useful for predicting the evolution of regulatory regions as well (Chuzhanova et al. 2000; Liu, Wu, and He 2000; Ludwig et al. 2000). Despite the preliminary reports, more research is necessary to determine which approaches to binding site prediction will be most useful for understanding the process of binding site turnover in regulatory regions.
Most methods for binding site prediction consider the background sequence as "null" and aim to identify a short sequence that has an outstanding match to the weight matrix (Berg and von Hippel 1987; Durbin et al. 1998; Eddy 1998). One of the main problems with such an approach is that the criteria for distinguishing a binding site from the background sequence are not well defined. For example, if the model uses nucleotide composition to generate the null hypothesis, and both the weight matrix and the sequence under study have the same nucleotide composition bias (e.g., it is A-T rich), it will be difficult to obtain a significant match to the weight matrix, because such a motif will have a high probability to occur by chance in the given sequence. Comparison of short sequences with the background sequence is also not biologically meaningful, because binding affinity does not depend on how different a binding site is from the background sequence, and the same sequence segment can be functional in many different sequence environments. Therefore, weighing the significance of predicted binding sites relative to the properties of the background sequence may not be the most powerful approach to binding site prediction.
In this report we quantitatively describe the evolutionary changes that occur in regulatory sequences of several well-studied early development genes in Drosophila: the proximal promoter of engrailed, the even-skipped stripe-2 enhancer, the "zebra element" of fushi tarazu, the proximal promoter of hunchback, and the enhancer of knirps. These five genes are embedded in a regulatory network that defines the anteroposterior pattern of the Drosophila early embryo. Because of the spatial and temporal sensitivity of the system, gene regulation is crucial for proper embryonic development (Driever et al. 1989; Struhl 1989; Rivera-Pomar and Jackle 1996). We initially present a description of the pattern of polymorphism and divergence in these sequences. We then introduce an approach for binding site prediction that does not take into account the background sequence, using PWMs and bootstrap sampling to define significance levels. This approach appears to be more biologically meaningful than several methods described previously, and it is more flexible for evolutionary analysis. We use this approach to obtain quantitative measures of divergence of the above regulatory regions and to dissect the process of binding site turnover in regulatory sequences.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Sequences were submitted to the GenBank under accession numbers AY184070AY184106. We also used data for the regulatory regions of eve and hb already available in GenBank (see Ludwig and Kreitman 1995, Ludwig, Patel, and Kreitman 1998; Tautz and Nigro 1998). Table 1 summarizes the regions and the lengths of the sequence used, and figure 1 shows the functional D. melanogaster binding site sequences retrieved from the TRANSFAC database (Wingender et al. 2000) analyzed in this article, and their alignment to orthologous sequences in other Drosophila species.
|
|
Derivation of Ancestral Sequences
Ancestral sequences were derived using the maximum likelihood program PAML (Yang 1997) based on the approach of Yang, Kumar, and Nei (1995). This approach is not reliable for alignment gaps, so we derived the ancestral sequences of alignment gaps manually, applying parsimony criteria. For this approach D. yakuba and D. orena/D. erecta were considered equally distant from D. melanogaster, because our divergence analysis did not resolve reliably the relationships of D. melanogaster, D. yakuba, and D. orena/D. erecta (see also Jeffs, Holmes, and Ashburner 1994).
Binding Site Data and PWM Estimation
To obtain a reliable PWM that will represent the specificity of the transcription factor of interest, we have used a large sample of binding sites collected from the primary literature and personal communications. The two transcription factors studied here are the products of the genes bicoid (Bcd) and hunchback (Hb), for which we have used 51 and 97 binding sites, respectively. These data were initially reported in Ludwig et al. (2000) and have subsequently been published in full by Berman et al. (2002). Sequences, genomic locations, and primary references for the binding sites used in this study can be found at http://www.fruitfly.org/~cbergman/ or http://www.fruitfly.org/cis-analyst/. Alignments were performed with the Web-based motif extraction method YEBIS (Yada et al. 1998), which uses a hidden Markov model to obtain the optimal alignment and extract a motif from a given set of sequences. The length of the consensus Bcd binding site is 8 nucleotides, and the length of the consensus Hb binding site is 14 nucleotides.
Binding Site Prediction
We have devised a novel method for the prediction of transcription factor binding sites within a query sequence. In contrast to the usual way of setting up such a test, the null hypothesis of this test is that a short sequence is a binding site, and we reject the hypothesis that sequences are binding sites for a given factor when they deviate from a pattern consistent with the properties of the observed set of binding sites. The rationale behind the construction of this model is explained in the Results.
Initially, we align a set of binding sites for a given transcription factor and generate a PWM, which describes the probability of each of the four nucleotides at each position of the binding site sequence (see Supplementary Data) as expressed by their observed frequency in the respective position in the alignment of functional sites. We then use this matrix to generate a value L that reflects how likely it is that a given sequence with the same length as the binding site is compatible with the matrix. The value L is calculated as:
|
To obtain the critical threshold, we use the L values that the individual sites used to generate the matrix acquire (Schneider 1997). Because a PWM is derived from a sample of binding sites, the confidence we have in the weight matrix is dependent on the size of the sample of sites. To determine how sensitive the weight matrix is to the particular sample of binding sites used to generate it, we drew 1,000 bootstrap samples of the binding sites (sampling with replacement), and from each sample we generated a corresponding weight matrix. Because this collection of weight matrices represents the sampling variability in our confidence in a determination of L, the distribution of L values across these 1,000 matrices is taken to be the null distribution of L (fig. 2). We used the bootstrap to take into account the heterogeneity of the sample of binding sites when setting the critical threshold. Any sequence that obtains an L value below the 5% cut-off of the distribution will be considered "rejected" as a binding site of the respective transcription factor, because fewer than 5% of the weight matrices accept it as such. To obtain the predicted sites within a query sequence, we calculate the L values by taking a sliding window of sequence equal to the length of the binding site and stepping one nucleotide at a time from the 5' end to the 3' end of the query sequence. We also scan the query sequence with the matrix in its reverse orientation to identify binding sites in the opposite strand.
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Comparison of the spatial distribution of divergence and polymorphism along the sequence may provide some clue about the similarity in the selective constraints within and between species. We compared the patterns of polymorphism within D. melanogaster and substitution between D. melanogaster and D. simulans/D. sechellia using a modification of the randomization test described in Dermitzakis and Clark (2001). Under neutrality the expectation is that the two patterns will be random and not significantly different from each other. We applied the test in four of the five genes surveyed here for which polymorphism data were available. In no case did the spatial patterns of polymorphism and substitution differ significantly. We noticed, however, that the P values generated from our random permutations tended to be close to 1. We decided therefore to consider the opposite null hypothesis and use the other tail of the permuted null distribution to test the hypothesis that the polymorphic sites are excessively close to the interspecific substitutions. Using this hypothesis, one of the four genes (eve) showed significant proximity (clustering) of substitutions and polymorphisms. We subsequently applied Fisher's combined probability (Sokal and Rohlf 1997) for all four genes, and the result was significant with P = 0.0319. Note that the test results for the other two genes reflected a degree of clustering that was close to significant (en, ftz, proximal hb).
The significance of the clustering of polymorphic and divergent sites can be explained by the fact that these sequences contain many functional regulatory elements. If the vast majority of the sequence is constrained as a consequence of functional importance, then the remainder will be only a small fraction of the sequence, and that is where there is freedom for the accumulation of mutations and substitutions. When this pattern is compared to a completely random pattern assuming no constraint (as our permutations assume) it will show significant clustering of the two types of nucleotide change. The pattern of polymorphism and divergence suggests that the regulatory fraction of the sequence is higher than estimated by the binding site sequences characterized so far.
Many binding sites have been previously identified in these regions in D. melanogaster (TRANSFAC; Wingender et al. 2000), some of which contain polymorphisms and substitutions or indels in the other species (see figure 1 for binding site alignments). The distribution of divergence values in these binding sites between D. melanogaster and other species is shown in figure 3. In some cases the mutational events are so radical that the binding sites are most likely not functional in other species. For instance, one of the caudal sites in the ftz zebra element has obtained a two-base insertion in D. simulans, leading to a probable loss of this essential site. Application of the modified McDonald-Kreitman test for regulatory regions (Jenkins, Ortori, and Brookfield 1995; Ludwig and Kreitman 1995) did not show any significant deviation from neutral expectation. This result is consistent with the observation that polymorphic sites and substitutions tend to cluster.
|
A Novel Binding Site Prediction Method
To capture the process of binding site evolution, we use a binding site prediction model as an indicator of potentially functional regulatory sequence components. Our binding site prediction approach relies on the initial assumption that a short sequence is a binding site. We generate a distribution of a test statistic (L), which quantifies the functional binding potential of a sequence by the transcription factor of interest. Our method, described in Methods, tests whether the observed L value falls within the 95% confidence interval of known functional sites (fig. 2) of the same transcription factor. The rationale behind the null hypothesis of this approach follows:
We generated models for Bcd and Hb because these transcription factors play an important role in the regulation of the genes examined here. Many experimentally verified binding sites of these two transcription factors have been previously identified in their regulatory regions. As a first step, we wanted to obtain an estimate of the power of our method to predict transcription factor binding sites. We approached this by performing a jackknife, each time excluding one of the Bcd or Hb sites, generating the matrix and the thresholds from the remaining sites, and testing whether the excluded site was predicted as functional. This method predicted as functional 92.7% of the observed Bcd sites (47 of 51) and 94.8% of the observed Hb sites (92 of 97).
These power calculations are not typical because our test is not designed to reject the hypothesis that a sequence is not a site, but rather it is designed to reject the null hypothesis that a sequence is a binding site (see Methods for justification). In essence, the above calculations are for type-I error, the probability that we will reject the hypothesis that a site is functional when in fact it is. The reassuring result is that these calculations produce values very close to 95%, which is what we expect given the way the test is performed.
Another issue is the frequency of false positive results generated by our prediction method. This is the real value of the test, because it shows how many times we include a nonfunctional site in the functional rangei.e., how frequently we fail to reject that a site is functional when it is not functional. This is a difficult issue to address because we do not have a clear picture of the structure of a nonfunctional Bcd or Hb binding site. There have been no systematic experiments showing what are the types of sequences that Bcd or Hb do not bind and therefore we cannot generate models assuming nonfunctionality.
To obtain an idea of how high our false positive rate is, we compared our results to those obtained using a binding site prediction method that tests the null hypothesis that a sequence is not a binding site. We used the method developed by Yada et al. (1998) implemented in the Web-based software YEBIS (http://www.btls.jst.go.jp/yebis). We use the D. melanogaster sequences and the Bcd and Hb weight matrices obtained from our binding site data. Two LOD score thresholds were used for the binding site prediction with YEBIS. When the cut-off LOD score was set to 3, YEBIS predicted 26% fewer Bcd binding sites and 35% more Hb binding sites than our method, but all the YEBIS Bcd sites were also predicted with our method. For a cut-off LOD score of 2, the YEBIS results were almost identical to our method, with slight variations across genes for Bcd, but nearly twice as many more Hb sites were predicted relative to our method (see table 2 for summary of these results). This demonstrates one of the useful properties of our method. The weight matrix for Hb is more degenerate than that of Bcd, and we see that a conventional method predicts many more Hb sites than our method, even with a strict threshold, indicating the increased power of our method for degenerate weight matrices.
|
Predicting the Pattern of Binding Site Evolution
Using the matrices and cutoff determined above, we searched the regulatory regions in the different species for putative Bcd and Hb binding sites. The objective of this analysis was to determine whether the nucleotide substitutions that have occurred within the experimentally verified Bcd or Hb binding sites have potentially eliminated or altered function. Additionally, we sought to evaluate whether nucleotide changes outside the known binding sites provide any evidence of gain and loss of Bcd and Hb binding sites. In figure 4 we show the pattern of predicted Bcd binding sites in regions that are regulated by Bcd, in all five Drosophila species studied. An important aspect of these data is that all but one of the known functional Bcd binding sites (Bcd_3 in eve stripe-2 enhancer in D. yakuba and D. erecta), and all but one of the Hb binding sites (Hb_1 in eve stripe-2 enhancer in D. erecta) are predicted as functional in all species. This suggests that within the range of divergence between D. melanogaster and D. yakuba/D. orena/D. erecta, purifying selection was strong enough to maintain most sites that are essential for gene regulation.
|
|
|
Simulation of the Evolutionary Potential of Regulatory Regions
The basis of this study is that binding sites within regulatory regions undergo turnover, which results in the frequent gain and loss of functional elements but maintains the function of the regulatory region as a whole on the expression pattern of the gene (Hancock et al. 1999; Piano et al. 1999; Ludwig et al. 2000; Cuadrado, Sacristan, and Antequera 2001; McGregor et al. 2001). If this is true, then we expect to find traces of intermediate steps of this turnover process within these regions. These traces could either be lost binding sites affected by mutational events or sites that are close to becoming functional. The latter possibility is likely because the binding site sequences are short, and the probability of obtaining a new functional binding site by chance is relatively high, given sufficient time for the accumulation of mutations. It is also possible that some regulatory regions have properties such as nucleotide composition bias or repetitive sequences that allow the emergence of new binding sites for some transcription factors with high probability.
We simulated a range of divergence rates with or without mutation bias (see Methods; Stone and Wray 2001) in the regulatory regions of knirps enhancer and hunchback proximal promoter and asked whether there are regions where a new Bcd site, not predicted in the original D. melanogaster sequence, appears consistently at the same position in the simulated diverged sequences. We did 1,000 simulations of divergence of the D. melanogaster sequences terminating with an overall divergence of 10%, 20%, 30%, and 40% for each case. We then searched the simulated diverged sequences with the weight matrix for putative Bcd binding sites and obtained positions that produce an L value higher than the threshold for Bcd. We then counted how many of the 1,000 diverged sequences had a new Bcd site predicted in the position that was not predicted in the original D. melanogaster sequence.
In numerous cases a Bcd site was consistently generated at the same position with occurrence between 10% and 30% (fig. 6). This indicates that there are sequences within the regulatory regions that are close enough to becoming a functional site that a few substitutions can make then potentially functional. The original sequences (before divergence) can initially obtain substitutions and become weak sites, which may then be favorably selected if another functional Bcd site is lost or if the addition of the new binding site is advantageous. If favorable selection for new binding sites operates, the rate at which some of these are fixed will be higher. We also observed that, from the known functional sites, some are more robust to substitutions than others, and therefore the rate of loss of function is different for different sites. This is a result of the degeneracy of Bcd binding requirements and the sequence evolutionary starting point. Another interesting observation is that, at 10% divergence, the occurrence of a new site at the same position was lower than at 20% divergence, but then the occurrence reached a plateau at higher divergence values. This suggests that with 10% divergence there were not enough substitutions to generate the site, but above 20% divergence the number of substitutions was high enough to reach the maximum probability of gaining a site.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Many studies have demonstrated high sequence conservation and conserved function of regulatory regions across distant Drosophila species (e.g., Langeland and Carroll 1993; Lukowitz et al. 1994). However, recent studies indicate that the system is more complicated and fluid, with regulatory regions having an underlying pattern of evolution not directly visible from simple sequence comparisons (Hancock et al. 1999; Piano et al. 1999; Ludwig et al. 2000; see Tautz 2000 for review). One of the reasons for such a contradiction is that the rules that govern the evolution of regulatory elements are not well known. It is therefore difficult to determine which changes are contributing to functional differences. One of the problems is that we do not know exactly how selection operates on regulatory elements. Furthermore, it is difficult to quantify the effect of a nucleotide change just with sequence comparisons, even within known functional sequences, because different changes will alter the properties of a binding site in different ways.
In this study we devised a binding site prediction method that uses PWMs generated from known binding site data and that uses the distribution of a similarity statistic (L) to obtain reliable thresholds that determine whether a tested sequence is likely to be functional. The use of such matrices allows for discrimination of nucleotide changes that affect the binding of the transcription factor from changes that are not likely to alter the binding affinity of the protein. This method is a special extension of the comparison of rates of synonymous versus nonsynonymous changes in coding sequences, but with two important distinctions: (1) it is not a binary value (synonymousnonsynonymous) but rather it is a measure of the extent of deviation as described by the difference in L values in the same sequence in two different species; (2) it accounts for compensatory changes because the same likelihood value can be obtained even if several nucleotide changes have occurred.
This quantification of divergence of regulatory elements allows for analysis of the dynamics and potential of regulatory regions from the standpoint of the relevant structural unit, the binding site. We were able to identify putative binding sites for Bcd and Hb in these regions that are predicted with high enough L values that they are likely functional or potentially functional. What we do not know is how essential these putative binding sites are for the regulation of the gene. It is possible that these sites are used only in a subset of the tissues in which Bcd and Hb regulate these genes, and that is the reason why they are not identified in functional studies. They may also have little contribution to the regulation of the gene but could serve as a reserve in cases of evolutionary transitions. Finally, because Bcd and Hb are morphogens, these may be weak sites that contribute to a finer sensing of the nuclear regulatory environment by the gene so that downstream pathways are activated or suppressed in a more accurate way (Burz et al. 1998). This possibility is relevant because these segmentation genes are usually expressed in a striped fashion, and thus an accurate sensing of the regulatory environment is necessary to define the precise boundaries of the expression domains.
Theoretical studies have predicted that the early development system of genes in Drosophila is highly fluid and allows for many evolutionary changes without serious distortion of the embryonic patterning (Gibson 1996; von Dassow et al. 2000). Transgenic experiments have shown that excessive expression of bicoid initially distorts the anterior-posterior pattern but is later corrected by fusion and cell death (Namba et al. 1997). Our analysis shows that there is possibly underlying fluidity in the system, not readily observed, as indicated by the properties of nucleotide sequence within these regulatory regions. This is also supported by a recent study that looked at the co-evolution of components of the same system (Shaw et al. 2002).We have shown with simulations that there are short sequences within these regions that have a high potential of becoming Bcd binding sites by many different combinations of substitutions. This potential could derive from "dead" sites that were lost not long ago, or because the probability of emergence of a binding site is increased by a skewed nucleotide composition or features such as mononucleotide runs of As or Ts that may provide good background sequence for Bcd and Hb sites to eventually emerge. It is finally important to mention that gene regulation is not carried out solely through the interaction of transcription factors with their binding sites. Several other aspects such as chromatin structure, flanking sequences, and other properties of the sequence environment are critical for the efficient interaction of proteins with regulatory DNA.
We have shown that regulatory regions have the potential to undergo turnover. However, the main development in this study is that we have devised a way to follow this evolutionary process and explain how these transitions can occur. Based on our results, it is not surprising that recent studies have demonstrated extensive gain and loss of binding sites (Cuadrado, Sacristan, and Antequera 2001; Dermitzakis and Clark 2002). We have shown that there is a finer organization in these regulatory regions that, together with the redundancy of binding site requirements, allow the reassignment of roles to some regulatory elements and the occasional emergence of new regulatory properties. This process is achieved at a slow rate and possibly involves radical transitions rather than a continuous evolutionary process.
![]() |
Footnotes |
---|
2 Present address: Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory.
3 Present address: Molecular Biology and Genetics, Cornell University.
E-mail: emmanouil.dermitzakis{at}medecine.unige.ch.
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Berg, O. G., and P. H. von Hippel. 1987. Selection of DNA binding sites by regulatory proteins. J. Mol. Biol. 193:723-750.[ISI][Medline]
Berman, B. P., Y. Nibu, B. D. Pfeiffer, P. Tomancak, S. E. Celniker, M. Levine, G. M. Rubin, and M. B. Eisen. 2002. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. USA 99:757-762.
Burz, D. S., R. Rivera-Pomar, H. Jackle, and S. D. Hanes. 1998. Cooperative DNA-binding by Bicoid provides a mechanism for threshold-dependent gene activation in the Drosophila embryo. EMBO J. 17:5998-6009.
Chuzhanova, N. A., M. Krawczak, L. A. Nemytikova, V. A. Gusev, and D. N. Cooper. 2000. Promoter shuffling has occurred during the evolution of the vertebrate growth hormone gene. Gene 254:9-18.[CrossRef][ISI][Medline]
Cuadrado, M., M. Sacristan, and F. Antequera. 2001. Species-specific organization of CpG island promoters at mammalian homologous genes. EMBO Rep. 2:586-592.
Dermitzakis, E. T., and A. G. Clark. 2001. Differential selection after duplication in mammalian developmental genes. Mol. Biol. Evol. 18:557-562.
Dermitzakis, E. T., and A. G. Clark. 2002. Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover. Mol. Biol. Evol. 19:1114-1121.
Driever, W., G. Thoma, and C. Nusslein-Volhard. 1989. Determination of spatial domains of zygotic gene expression in the Drosophila embryo by the affinity of binding sites for the bicoid morphogen. Nature 340:363-367.[CrossRef][ISI][Medline]
Dunn K. A, J. P. Bielawski, and Z. Yang. 2001. Substitution rates in Drosophila nuclear genes: implications for translational selection. Genetics 157:295-305.
Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. 1998. Biological sequence analysis. Cambridge University Press, Cambridge.
Eddy, S. 1998. Profile hidden Markov models. Bioinformatics 14:755-763.[Abstract]
Gibson, G. 1996. Epistasis and pleiotropy as natural properties of transcriptional regulation. Theor. Pop. Biol. 49:58-89.[CrossRef][ISI][Medline]
Hancock, J. M., P. J. Shaw, F. Bonneton, and G. A. Dover. 1999. High sequence turnover in the regulatory regions of the developmental gene hunchback in insects. Mol. Biol. Evol. 16:253-265.[Abstract]
Hardison, R. C. 2000. Conserved non-coding sequences are reliable guides to regulatory elements. Trends Genet. 16:369-372.[CrossRef][ISI][Medline]
Hardison, R. C., J. Oeltjen, and W. Miller. 1997. Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res. 7:959-966.
Jeffs, P. S., E. C. Holmes, and M. Ashburner. 1994. The molecular evolution of the alcohol dehydrogenase and alcohol dehydrogenaserelated genes in the Drosophila melanogaster species subgroup. Mol. Biol. Evol. 11:287-304.[Abstract]
Jenkins, D. L., C. A. Ortori, and J. F. Brookfield. 1995. A test for adaptive change in DNA sequences controlling transcription. Proc. R. Soc. Lond. Ser. B Biol. Sci. 261:203-207.[ISI][Medline]
Langeland, J. A., and S. B. Carroll. 1993. Conservation of regulatory elements controlling hairy pair-rule stripe formation. Development 117:585-596.
Leung, J. Y., F. E. McKenzie, A. M. Uglialoro, P. O. Flores-Villanueva, and B. C. Sorkin. 2000. Identification of phylogenetic footprints in primate tumor necrosis factoralpha promoters. Proc. Natl. Acad. Sci. USA 97:6614-6618.
Liu, T., J. Wu, and F. He. 2000. Evolution of cis-acting elements in 5' flanking regions of vertebrate actin genes. J. Mol. Evol. 50:22-30.[ISI][Medline]
Ludwig, M. Z., C. Bergman, N. H. Patel, and M. Kreitman. 2000. Evidence for stabilizing selection in eukaryotic enhancer element. Nature 403:564-567.[CrossRef][ISI][Medline]
Ludwig, M. Z., and M. Kreitman. 1995. Evolutionary dynamics of the enhancer region of even-skipped in Drosophila. Mol. Biol. Evol. 12:1002-1011.[Abstract]
Ludwig, M. Z., N. Patel, and M. Kreitman. 1998. Functional analysis of the eve-stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. Development 125:949-958.
Lukowitz, W., C. Schroder, G. Glaser, M. Hulskamp, and D. Tautz. 1994. Regulatory and coding regions of the segmentation gene hunchback are functionally conserved between Drosophila virilis and Drosophila melanogaster. Mech. Dev. 45:105-115.[ISI][Medline]
McGregor, A. P., P. J. Shaw, J. M. Hancock, D. Bopp, M. Hediger, N. S. Wratten, and G. A. Dover. 2001. Rapid restructuring of bicoid-dependent hunchback promoters within and between Dipteran species: implications for molecular coevolution. Evol Dev. 3:397-407.[CrossRef][ISI][Medline]
Miller, W. 2001. Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics 17:391-397.[Abstract]
Moriyama, E. N., and J. R. Powell. 1996. Intraspecific nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13:261.[Abstract]
Namba, R., T. M. Pazdera, R. L. Cerrone, and J. S. Minden. 1997. Drosophila embryonic pattern repair: how embryos respond to bicoid dosage alteration. Development 124:1393-1403.
Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.
Piano, F., M. J. Parisi, R. Karess, and M. P. Kambysellis. 1999. Evidence for redundancy but not trans factor-cis element coevolution in the regulation of Drosophila Yp genes. Genetics. 152:605-616.
Rivera-Pomar, R., and H. Jackle. 1996. From gradients to stripes in Drosophila embryogenesis: filling the gaps. Trends Genet. 12:478-483.[CrossRef][ISI][Medline]
Schneider, T. D. 1997. Information content of individual genetic sequences. J. Theor. Biol. 189:427-441.[CrossRef][ISI][Medline]
Shaw, P. J., N. S. Wratten, A. P. McGregor, and G. A. Dover. 2002. Coevolution in bicoid-dependent promoters and the inception of regulatory incompatibilities among species of higher Diptera. Evolution 4:265-277.[CrossRef][ISI]
Small, S., R. Kraut, T. Hoey, R. Warrior, and M. Levine. 1991. Transcriptional regulation of a pair-rule stripe in Drosophila. Genes Dev. 5:827-839.[Abstract]
Sokal, R. R., and F. J. Rohlf. 1997. Biometry: the principles and practice of statistics in biological research. W. H. Freeman, New York.
Stone, J. R., and G. A. Wray. 2001. Rapid evolution of cis-regulatory sequences via local point mutations. Mol. Biol. Evol. 18:1764-1770.
Stormo, G. D. 2000. DNA binding sites: representation and discovery. Bioinformatics 16:16-23.[Abstract]
Struhl, G. 1989. Differing strategies for organizing anterior and posterior body pattern in Drosophila embryos. Nature 338:741-744.[CrossRef][ISI][Medline]
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.
Tautz, D. 2000. Evolution of transcriptional regulation. Curr. Opin. Genet. Dev. 10:575-579.[CrossRef][ISI][Medline]
Tautz, D., and L. Nigro. 1998. Microevolutionary divergence pattern of the segmentation gene hunchback in Drosophila. Mol. Biol. Evol. 15:1403-1411.
von Dassow G., E. Meir, E. M. Munro, and G. M. Odell. 2000. The segment polarity network is a robust developmental module. Nature 406:188-192.[CrossRef][ISI][Medline]
Wasserman, W., M. Palumbo, W. Thompson, J. W. Fickett, and C. E. Lawrence. 2000. Human-mouse genome comparisons to locate regulatory sites. Nat. Genet. 26:225-228.[CrossRef][ISI][Medline]
Wingender, E., X. Chen, R. Hehl, H. Karas, I. Liebich, V. Matys, T. Meinhardt, T. Pruss, I. Reuter, and F. Schacherer. 2000. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 28:316-319.
Yada, T., Y. Totoki, M. Ishikawa, K. Asai, and K. Nakai. 1998. Automatic extraction of motifs represented in the hidden Markov model from a number of DNA sequences. Bioinformatics 14:317-325.[Abstract]
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555-556.[Medline]
Yang, Z., S. Kumar, and M. Nei. 1995. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:1641-1650.
Zhao, C., V. Dave, F. Yang, T. Scarborough, and J. Ma. 2000. Target selectivity of Bicoid is dependent on nonconsensus sites recognition and protein-protein interaction. Mol. Cell. Biol. 20:8112-8123.