Institute of Genetics, University of Nottingham, Queens Medical Centre, Nottingham, United Kingdom
Correspondence: E-mail: john.brookfield{at}nottingham.ac.uk.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: enhancers evolution Drosophila
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Evolutionary changes in the cis control of gene expression can occur through a number of mechanisms recently reviewed by Tautz (2000), Ludwig (2002), and Wray et al. (2003). These reviews give many examples of how enhancers evolve in various species and in a variety of regulatory systems.
Many studies have found within-species variation in enhancer sequences and expression patterns, providing the raw material for natural selection. For example, a large proportion of human cis-regulatory polymorphisms have a greater than twofold effect on transcription levels (Rockman and Wray 2002). It has also been shown that enhancer variation is acted upon by selection (Crawford et al. 1999; Segal et al. 1999).
There does not appear to be a single model to fit the evolution of all enhancers, but many systems have features in common. For example, a number of studies have shown how enhancer function can be maintained without maintaining sequence. The best-studied example of this is the even-skipped stripe 2 enhancer system, where stabilizing selection is thought to be maintaining the expression pattern but the sequence is changing, possibly due to weakly deleterious changes followed by compensatory mutations (Ludwig et al. 1998; Ludwig et al. 2000).
A previous simulation study examined how long it would be expected to take for new binding sites to arise in a regulatory sequence via point mutations (Stone and Wray 2001). Here the authors considered a sequence of DNA evolving in a random neutral walk by point mutation, and observed how long it would take to find a particular binding site for a transcription factor. They showed, for example, for Drosophila, that to find two 6-bp transcription factor binding sites in a 200-bp sequence would take around 54,000 years, which seems reasonable given the rate at which Drosophila may evolve its gene expression patterns. However, the corresponding time for humans, with their longer generation times, was over 13 million years, which seems too slow a rate to be likely to be useful in evolutionary change. In addition, each of these times was unrealistically low, given the neutral random walk model, in that the effect of the assumed population size of a million (in each species) was incorporated inappropriately. The authors, in assuming an effective population size of a million, converted the expected time to find binding sites for a single lineage to an expected time for a population of a million by dividing the expected time by two million (with the two representing diploidy). This procedure assumes that the two million copies of the gene in the population are carrying out independent random walks, and the time being calculated is the time until the first of these random walks finds the binding sites that are required. In reality, the two million gene copies will not all be expected to be independently evolving, but will be evolving in concert, since they will all be expected to share a common ancestry (under neutrality) around four million generations ago. Simply dividing the expected time by two million gives a value that is far too small. In addition, the time that is calculated is the expected time until the required sequence arises in one of the two million gene copies existing in a diploid population. However, unless the selection favoring this evolved sequence is very strong, the sequence will probably be lost by drift soon after its creation. Its probability of fixation is only approximately 2s, where s is its selective advantage (or approximately N, (in a diploid of effective size N) if this is larger). Once these problems with the incorporation of the effective population size are appreciated, the appropriate conclusion from such a model is that the method of random neutral walk would not be effective at evolving multiple new binding sites in organisms with long generation times and small population sizes (such as metazoans).
Another previous approach to the evolution of enhancers was that of Carter and Wagner (2002), who explicitly imagined enhancer evolution occurring through pairs of individually deleterious but compensatory mutations. They considered a population genetics model in which, in large populations, evolution of enhancer sequences through compensatory mutations may be quite rapid. In this, they differ from our model described below, in which only neutral or advantageous mutations may spread. Gerland and Hwa (2002) studied the problem of the evolution of the interaction between a transcription factor and a particular binding site in terms of how strongly selection must favor binding in order to preserve a site in the face of inactivating mutations.
![]() |
The Model |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In this study, we consider a situation where a population of organisms is in a new environment, where the expression of an existing protein in a new tissue or at a new time would be advantageous. The evolution of a new expression pattern for one of a pair of genes following a gene duplication provides another example of a situation in which the evolution of new expression pattern could be favored by selection.
The model we present is one of a population, not of individual alleles. Mutations are introduced into a population and may go to fixation. If a particular base change that is a neutral mutation is occurring at rate µ, then the rate at which the population will be expected to change by incorporating this mutation is still simply µ. However, a particular base change that creates a selective advantage s will also be arising at rate µ in an individual and thus at a rate 2Nµ in a diploid population of effective size N. If this advantageous mutation has a probability of fixation of 2s, the evolutionary rate for this mutation will be 4Nsµ. Thus, the likelihood of a particular selected mutation arising and spreading in the population relative to the rate for a particular neutral mutation is 4Ns. Using this approach, in each step of the simulations, a mutation arises and is fixed in the population. The relative likelihoods of all possible base changes in evolution of the sequence are calculated, and used to weight the probabilities of each possible change. The particular evolutionary change occurring is chosen, using a pseudorandom number generator, from these probabilities. The expected time for each of these mutational/evolutionary steps can also be calculated from the proportion of changes that will be selectively driven and their individual Ns values.
We consider the situations of de novo enhancer formation from nonfunctional DNA, and those in which an already existing enhancer is modified to make it respond to a transcription factor that it has previously ignored. Using real enhancer sequences, and randomized sequences, and specific Drosophila transcription factors, we consider the effect of variation in selection and population size on the time taken to reach a specified transcriptional output.
![]() |
Model Description |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
DNase I footprints for a transcription factor are lined up, but, rather than creating a consensus sequence, each base's frequency at each position is recorded in a matrix. Table 1 shows an example of a PWM for the Drosophila Bicoid transcription factor.
|
|
|
By comparing the matrix similarity scores (for a given transcription factor) of all sequences to which the factor is known to bind, we can establish a level of matrix similarity that is required to create a functional binding site. Known binding sites can be compared to predicted binding sites at different levels of matrix similarity. The exact level used was calculated following the method of Papatsenko et al. (2002). Here the matrix similarity, used as the cutoff between functional and nonfunctional sequences, is chosen as the level that gives the maximum p value from the following equation.
|
In order to include selection in the model, we need to calculate the effect on fitness of any given change in the DNA sequence. We achieved this by assigning an "output" value to a sequence based on its similarity to the matrix. If a point mutation increases the total output of the sequence then it can be selected for, as this implies an increase in transcriptional activation and therefore an increase in protein expression.
A maximum output level of two units is set for a sequence that has a matrix similarity score of 1. This is the maximum level of output a single binding site can contribute on its own. Each position in the matrix is assigned a proportion of this output based on its Ci level. The equation for this is shown as follows.
|
|
This way of defining output makes it possible to weight base changes based on the importance of each position in the binding site. Positions that are in the core-binding region contribute more to the output of the binding site than do positions that are less important.
From these equations, it is possible to compare any sequence to the matrix, calculate its matrix similarity and therefore, by comparison to the threshold, determine whether it is an active binding site. If it is an active binding site then its output level can be determined.
The output level is calculated for individual binding sites, and the total output of the element is the sum of these individual outputs. However, the model also incorporates position effects. Binding sites which are close together and pairs of sites that are on the same side of the DNA stimulate transcription more than do those which are further apart and on opposite sides of the DNA (Mao et al. 1994; Liu and Little 1998). The model simulates this effect by adding what is called an "output bonus," shown as follows.
This graph is created using the formula below, using a range, r, of 100 bp.
|
As figure 1 shows, the further a site is from an existing binding site, the lower the bonus awarded. In addition, if a binding site is on the opposite side of the DNA strand, i.e., not a multiple of 10 bp away, then it also receives a lower bonus. The output bonus is added to the output that is calculated as the sum of all sites' matrix similarities, multiplied by 2. Each binding site receives a bonus for every binding site within range, so output is increased by the sum of the output bonuses from all pairwise interactions. In this way, the effect of sites being clustered together can be to increase the total output of the sequence greatly, as is seen in real enhancers.
|
Simulations were started from real enhancer sequences and continued until a required level of output was reached. Unless otherwise stated, this output level is 30 units, with a single site perfectly matching the matrix providing an output of 2 units, and the distance bonus given by the formula earlier. The time in generations taken to reach this output level was recorded. Simulations to examine the effects of the distance model used a target of twenty binding sites rather than a particular output level. This was to allow the ratio of the binding sites' individual outputs and the distance output bonus to be examined. In some simulations, the sequence was allowed to evolve further once the output has been reached as a way of modeling the turnover in binding sites possible for a sequence with constant function.
Randomized sequences of length equal to the original sequence were constructed by picking bases based on the nucleotide composition of the original sequence with replacement. Where random sequences are used, a different random sequence is used for each individual run.
Sequences and binding site matrices were chosen in order to cover a range of lengths and types of sequence. Transcription factor matrixes used were those for Bicoid (M00140), Krüppel (M00021), and Abd-B (M00090). Accession numbers are for the TFMATRIX databank from the TRANSFAC database (Wingender et al. 2001).
Sequences used were the Krüppel Kr730 enhancer (Hoch et al. 1991; Hoch et al. 1992), the even-skipped stripe 2 element (S2E) (AF042709), and the alcohol dehydrogenase intron 1 (AF201423). The computer program was written in ActivePerl 5.6.1.633 and run on a Compaq Tru64 UNIX system. Twenty replicates of each simulation were carried out unless otherwise stated.
Harmonic means are used to express the averages over runs unless otherwise stated. The use of this form of mean was chosen due to the nature of the results, where some runs may take very long times to reach a target relative to the arithmetic mean. Because of nonnormality in the distributions of times to threshold outputs being reached, significance tests on the results were conducted using the Mann-Whitney U test (Sokal and Rohlf 1997).
The interpretation of the results involved a measure of a kind of cryptic simplicity in the sequence, defined by the numbers of homonucleotide runs in the sequence. Our measure of nucleotide simplicity was calculated by counting the number of repeats of each base in a sequence (two to five homonucleotide repeats) and comparing it with the mean number of repeats in 100 randomized versions of the same sequence, which was used to give an expected value. The level of simplicity was then calculated by ((observed/expected) 1) x 100%. Hereafter, this measure is referred to as a nucleotide simplicity score.
Analysis of the nucleotide simplicity of transcription factor binding sites was carried out by comparing every possible sequence of the same length as a matrix to the factor's matrix. Every sequence that matched the matrix with a similarity score equal to or greater than that of the cutoff value was stored. The nucleotide simplicity score and base compositions of these sequences could then be analyzed.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The effect of the GC ratio on the time taken to evolve new binding sites can be seen clearly in figure 3. Here sequences with AT:GC ratios that vary from 10:90 to 90:10 were used to evolve binding sites for three transcription factors.
|
Distance Model
An interesting question to ask about the evolution of binding sites is what causes the clustering of binding sites seen in real enhancer sequences. To explore this, various ratios of binding site output to distance bonus were explored. The results of these simulations are shown in figure 4.
|
Enhancer Versus Intron Sequences
We can examine how real sequences evolve compared to randomized versions of themselves. One such study is the evolution of Abdominal B binding sites in the Adh intron, the Kr730 enhancer, and the eve stripe 2 enhancer (S2E). None of these sequences is thought to have any binding sites for Abd-B, and so all sites that arise in the sequence must be newly evolved sites.
Of the three sequences, Kr730 reaches the target fastest and the Adh intron the slowest. Our analysis of the nucleotide simplicity score of the sequences show that Kr730 has a very high level of simplicity, and that the Abd-B binding site is itself simple (Table 3).
|
The Adh intron has a low level of simplicity and, since randomized sequences have an expected simplicity of zero, the Adh intron takes an equally long period of time to reach the target output. Although the Adh intron does not take a significantly longer number of generations to reach the target than does the random sequence, the difference seen in the harmonic means may be accounted for by the fact that a lack of presites means it has to randomly walk to reach binding sites. However, some of the random sequences will, by chance, contain sites or presites for Abd-B, which will reduce their time to reaching the output required. The randomized sequences show an average of 0.65 sites to start with, compared with 0 for the real Adh.
In order to determine how existing binding sites in a sequence affect their evolution, simulations were run in which known binding sites were conserved (table 2). Here, known binding sites for any transcription factor could not be mutated. However, the positions within existing sites may be used as part of binding sites in an overlapping fashion. This was designed to determine if the presites in enhancer sequences were the result of existing binding sites with similar motifs.
|
Length
As mentioned earlier, all three binding sites in this study bind to sequences of similar AT:GC ratios (Bicoid 64:36, Krüppel 53:47, and Abdominal-B 56: 44). However, in many cases, such as shown in figure 3, the Krüppel matrix finds binding sites faster than do the other two matrixes. This is despite the fact that the Bicoid matrix is only 8 bases long, and any particular 8-base sequence should theoretically occur more frequently in a sequence than will a 14-bp sequence, such as that of the Abd-B binding site. However, due to the composition of the different matrixes, and their cutoff values, a much smaller proportion of sequences of the appropriate length match the Bicoid matrix than match the Krüppel and Abdominal-B matrices.
Figure 5 shows that it is feature of matrices that, on average, longer matrices have a lower mean Ci value per position than do shorter matrices. As matrices are calculated from empirical data, we can say that transcription factors with longer binding sites are less specific and will bind to a larger proportion of sequences of their target length than would be predicted by extrapolation from shorter binding sites. This result shows that transcription factors with shorter binding sites will not necessarily evolve targets faster than will those with longer binding sites and that presites for longer binding sites will not necessarily be rare in a random sequence.
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The results imply that it is possible for there to be rapid evolution of enhancer sequences in their capacity to bind new transcription factors. The rapidity and ease with which evolution can perform this task arise because binding sites show a low complexity and because sites which are not perfect versions of the binding site consensus can nevertheless produce some activational output, and thus can be subject to selection. The rapidity of adaptation, however, will depend on the particular value chosen, and
is unknown. However, the reciprocal relationship between the time taken and
, in examples such as those on figure 2, when the process is driven by selective changes, means that, were we to doubt the
value chosen, we can infer directly the impact on the time taken of differing
values.
One interesting result is that sequences with high simplicity are more likely to find binding sites for new transcription factors than are sequences with lower simplicity. Cryptic simplicity (defined slightly differently from our simplicity measure) has been reported in a number of species, and it thought to be the result of DNA slippage (Tautz et al. 1986; Schlotterer and Tautz 1992). Cryptic simplicity may be high in enhancers and, given that binding sites often include repeated bases; this would tend to make the finding of binding sites more rapid. It may be that enhancers have, in the past, contained binding sites for numerous transcription factors that they no longer can bind. The similarities between these binding sites and those forming the targets of the new transcription factor may mean that sites can be reactivated by a few base substitutions.
The ability to find new binding sites increases, as expected, with the length of the sequence examined. Also as expected, the time depends strongly on the relationship between the CG content of the binding site and that of the enhancer sequence that is evolving. However, the time taken to find binding sites does not always increase with the length of the binding site- if the matrix is such that selective rewards are not produced until there is a close match with the consensus, a short binding site could be harder for evolution to find than a longer one that does not have this property.
The simulations take much longer when there is a lower output threshold, with selection only operating above this. Table 4 shows the big increase in the harmonic mean time required when an output of eight is required (corresponding to more than four binding sites) to cross this lower threshold, compared to situations when outputs of zero or four are required. It is possible to calculate the probabilities of there being any given number of binding sites in a random sequence of length, as here, of 730 bp. This depends on the proportion of possible sequences of the appropriate length that would act as binding sites. Thus, 0.0458% of all possible eight-base pair sequences qualify as bicoid sites. This means that, considering both strands of the 730-bp sequence, the expected number of bicoid-binding sites will be 0.6623, and the actual number will be approximately Poisson distributed with this expectation, leading to a probability that a random sequence would contain two sites of 11.31%, but a probability that a random sequence would contain four sites of only 0.41%. (Corresponding figures for Abdominal B and Krüppel sites are 15.02% and 0.86% and 22.29% and 2.87%, respectively.) Since a single mutation in a presite would typically be capable of moving a sequence with two sites above a lower output threshold of four units, or moving a sequence with four sites above a lower output threshold of eight units, we should not be surprised that simulations with a lower output threshold of eight units are typically much slower than those with a lower output threshold of zero or four units.
Some simulations (data not shown) were continued after the output was reached. Output was prevented from dropping below the target level by selection, but was free to change above this target. Once above the target output, sites can be lost and gained, creating a kind of "turnover" of binding sites, although base changes in the binding sites still occur at a slower rate than the rate of neutral change in the sequence. However, such a process of stochastic loss and gain of sites through site number having climbed above the number necessary, may not be the true cause of the turnover sometimes seen in binding sites (Ludwig et al. 2000). Weakly deleterious changes that cause the loss of binding sites (which are disallowed in our model), may create alleles that persist at low frequency in populations and may then be compensated by the appearance of new binding sites in the same haplotype (Carter and Wagner 2002). Apart from their influence on this turnover process, we do not know whether the inclusion of weakly deleterious changes would greatly affect the outcome of the model. It would be possible to allow weakly deleterious changes, which would have an S value, predicting the relative probability of spread to fixation, which is nonzero but is less than the that is used for a neutral change. The S value corresponding to a weakly deleterious mutation would be (following Kimura (1962)) 2N(e2s 1)/(3(e4Ns 1)), where s is now the selective disadvantage of the mutation. Such weakly deleterious mutations might allow access to more advantageous mutations that strictly neutral changes might not.
We have assumed that the population can be considered as a single genotype at all times. New sites appear in this genotype, with selectively advantageous changes having a 4Ns-fold higher chance of being fixed than have neutral changes. (In more realistic diploid situation, s should be seen as the selective advantage of the changed allele when it is in the heterozygous state, since this is what will determine the probability of its eventual fixation.) It is thus assumed that mutations that will be fixed will spread rapidly, relative to the separation in time between successive fixed mutations. This assumption will be inaccurate if there are many opportunities of selective change in the sequence and if the product of the population size and the mutation rate is not many orders of magnitude less than 1. With the rapid evolution seen in some of our simulations, it would seem likely that a mutation may not have been fixed in the population before the next adaptive change in the sequence arrives. This issue is considered by Gerrish and Lenski (1998) in the context of bacterial evolution. Here, the effect of clonality is a requirement for the spread of one advantageous mutation before the next can be incorporated in the same clone, with the effect that there is an effective speed limit imposed on evolution. In our short sequence, recombination during the spread of an advantageous allele will be so rare that the evolutionary process can also be seen as the successive replacement of one nonrecombining haplotype by another. The first mutation might thus be at a low frequency, p, when the next one arrives in the population, in which case the second mutation would have to arise in an allele that had the first mutation. If we imagine a mutation that creates an increase in fitness of s, the expected time taken for this mutation to spread, until it has a frequency in the population, p, of a half, is almost exactly (ln(4Ns))/s. However, at the population level, the rate of production of mutations into the population that subsequently spread to fixation by selection will be 4Nsµs, where µs is the rate of mutation in the sequence as a whole to advantageous mutations. Thus, the expected time between successive advantageous fixations is 1/(4Nsµs) generations. Thus, for mutations to spread to fixation rapidly, and before the next mutation would be expected, 1/(4Nsµs) must be much greater than ln (4Ns)/s. This implies that 1/(4Nµs) >> ln(4Ns). If s is chosen to be equal to the most commonly used value of , at 2.5 x 103, and N is 106, this means that µs << 2.71 x 108. This means that, since we have a mutation rate per base of 109, there cannot be very many possible selectively advantageous mutations in the sequence if mutations are to be able to spread to fixation before the next mutation arises. (It should be remembered that, since, typically, at most only one of the three possible base substitutions at a given base will be advantageous, the mutation rate for any one of these three will be 109/3. This means that µs << 2.71 x 108 implies that the number of bases in the sequence that can selectively change is much less than around 80, which seems probable given the length of the sequences being considered.) Increasing the selective advantage, s, will reduce the time between successive fixations. However, such increases will also reduce the time taken for individual mutations to spread. The result is that, overall, s does not have a strong influence on whether successive substitutions are effectively independent.
If mutations are arising before the last one has fixed this will tend to slow down the rate of evolution, since mutations will have to arise in the subset of the population that received the previous mutation (and assuming that the sequence is so small that recombination is not significant on this time scale). The situation is not simple, however. A mutation of advantage s that arises in a subset of the population that is already increasing as a result of an earlier mutation will have a probability higher than 2s of spreading to fixation. In general, our simulations correspond to situations where heterozygosity at the base pair level is low, which would exist when Nµ << 1, which would typically be seen in multicellular organisms, but not is some viruses, such as RNA viruses, for example (Berg, Lässig, and Radic 2003).
Another factor involving the population genetics of the situation is that we have not included in our simulations the reduction in the expected time required to find the first of the advantageous mutations in the process. This first advantageous mutation will be found more quickly since the starting population will be genetically heterogeneous and many mutations will exist at low frequency in the population. The quantitative effect of this genetic variation would depend on the frequency distribution of genetic variants in the population, and so would be impossible to model without the assumption of a neutral equilibrium. However, since the population genetic variation will affect only the time taken to find the first of what might be very many selectively driven changes as the sequence evolves, its effect on the overall time for the evolutionary process will be comparatively small.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Arnone, M. I., and E. H. Davidson. 1997. The hardwiring of development: Organization and function of genomic regulatory systems. Development 124:1851-1864.
Berg, J., M. Lässig, and S. Radic. 2003. On the evolution of gene regulation. http://xxx.lanl.gov/PS_cache/cond-mat/pdf/0301/0301574.pdf.
Berg, O. G., and P. H. von Hippel. 1987. Selection of DNA binding sites by regulatory proteins. J. Mol. Biol. 193:723-750.[ISI][Medline]
Carter, A. J. R., and G. P. Wagner. 2002. Evolution of functionally conserved enhancers can be accelerated in large populations: a population-genetic model. Proc. R. Soc. Lond. Ser. B-Biol. Sci. 269:953-960.[CrossRef][ISI][Medline]
Crawford, D. L., J. A. Segal, and J. L. Barnett. 1999. Evolutionary analysis of TATA-less proximal promoter function. Mol. Biol. Evol. 16:194-207.[Abstract]
Doebley, J., A. Stec, and L. Hubbard. 1997. The evolution of apical dominance in maize. Nature 386:485-488.[CrossRef][ISI][Medline]
Frech, K., G. Herrmann, and T. Werner. 1993. Computer-assisted prediction, classification, and delimitation of protein-binding sites in nucleic-acids. Nucl. Acids Res. 21:1655-1664.[Abstract]
Gerland, U., and T. Hwa. 2002. On the selection and evolution of regulatory DNA motifs. J. Mol. Evol. 55:386-400.[CrossRef][ISI][Medline]
Gerrish, P. J., and R. E. Lenski. 1998. The fate of competing beneficial mutations in an asexual population. Genetica 103:127-144.[CrossRef][ISI]
Hanes, S. D., G. Riddihough, D. Ish-Horowicz, and R. Brent. 1994. Specific DNA recognition and intersite spacing are critical for action of the bicoid morphogen. Mol. Cell. Biol. 14:3364-3375.[Abstract]
Hoch, M., N. Gerwin, H. Taubert, and H. Jackle. 1992. Competition for overlapping sites in the regulatory region of the Drosophila gene Kruppel. Science 256:94-97.[ISI][Medline]
Hoch, M., E. Seifert, and H. Jackle. 1991. Gene-expression mediated by cis-Acting sequences of the Kruppel gene in response to the Drosophila morphogens bicoid and hunchback. EMBO J. 10:2267-2278.[Abstract]
Kimura, M. 1962. On the probability of fixation of mutant genes in a population. Genetics 47:713-719.
Liu, Z., and J. W. Little. 1998. The spacing between binding sites controls the mode of cooperative DNA-protein interactions: implications for evolution of regulatory circuitry. J. Mol. Biol. 278:331-338.[CrossRef][ISI][Medline]
Ludwig, M. Z. 2002. Functional evolution of noncoding DNA. Curr. Opin. Genet. Dev. 12:634-639.[CrossRef][ISI][Medline]
Ludwig, M. Z., C. Bergman, N. H. Patel, and M. Kreitman. 2000. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403:564-567.[CrossRef][ISI][Medline]
Ludwig, M. Z., N. H. Patel, and M. Kreitman. 1998. Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. Development 125:949-958.
Mao, C. H., N. G. Carlson, and J. W. Little. 1994. Cooperative DNA-protein interactions effects of changing the spacing between adjacent binding-sites. J. Mol. Biol. 235:532-544.[CrossRef][ISI][Medline]
Papatsenko, D. A., V. J. Makeev, A. P. Lifanov, M. Regnier, A. G. Nazina, and C. Desplan. 2002. Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. Genome Res. 12:470-481.
Quandt, K., K. Frech, H. Karas, E. Wingender, and T. Werner. 1995. MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucl. Acids Res. 23:4878-4884.[Abstract]
Rockman, M. V., and G. A. Wray. 2002. Abundant raw material for cis-regulatory evolution in humans. Mol. Biol. Evol. 19:1991-2004.
Schlotterer, C., and D. Tautz. 1992. Slippage synthesis of simple sequence DNA. Nucl. Acids Res. 20:211-215.[Abstract]
Schulte, P. M., M. Gomez-Chiarri, and D. A. Powers. 1997. Structural and functional differences in the promoter and 5' flanking region of Ldh-B within and between populations of the teleost Fundulus heteroclitus. Genetics 145:759-769.
Segal, J. A., J. L. Barnett, and D. L. Crawford. 1999. Functional analyses of natural variation in Sp1 binding sites of a TATA-less promoter. J. Mol. Evol. 49:736-749.[ISI][Medline]
Sokal, R. R., and F. J. Rohlf. 1994. Biometry: the principles and practice of statistics in biological research. Pp. 440447. 3rd edition. W. H. Freeman, New York.
Stone, J. R., and G. A. Wray. 2001. Rapid evolution of cis-regulatory sequences via local point mutations. Mol. Biol. Evol. 18:1764-1770.
Stormo, G. D., and D. S. Fields. 1998. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem. Sci. 23:109-113.[CrossRef][ISI][Medline]
Tautz, D. 2000. Evolution of transcriptional regulation. Curr. Opin. Genet. Dev. 10:575-579.[CrossRef][ISI][Medline]
Tautz, D., M. Trick, and G. A. Dover. 1986. Cryptic simplicity in DNA is a major source of genetic-variation. Nature 322:652-656.[ISI][Medline]
Taylor, H. S. 1998. A regulatory element of the empty spiracles homeobox gene is composed of three distinct conserved regions that bind regulatory proteins. Mol. Reprod. Dev. 49:246-253.[CrossRef][ISI][Medline]
Wingender, E., X. Chen, and E. Fricke, et al. (14 coauthors). 2001. The TRANSFAC system on gene expression regulation. Nucl. Acids Res. 29:281-283.
Wray, G. A., M. W. Hahn, E. Abouheif, J. P. Balhoff, M. Pizer, M. V. Rockman, and L. A. Romano. 2003. The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20:1377-1419.