Department of Ecology and Evolution, State University of New York at Stony Brook
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The genetic basis for many of these evolutionary changes probably resides within the cis-regulatory regions that control transcription (Arnone and Davidson 1997
; Davidson 2001
). (We use the term promoter to denote the entire cis-regulatory apparatus). Yet, promoter evolution remains poorly understood for several reasons. First, there is a limited amount of information available concerning promoter sequence variation and its consequences within and among species. The general organization of some promoter sequences has been maintained for 107 years (e.g., Damjanovski et al. 1998
; Ludwig, Patel, and Kreitman 1998
), although functional differences can evolve over comparable or even shorter time intervals (e.g., Franks et al. 1988
; Ross, Fong, and Cavener 1994
; Wang et al. 1999
). Sequence comparisons indicate that single transcription factor binding sites can appear and disappear among relatively closely related species (e.g., Gonzalez et al. 1995
; Damjanovski et al. 1998
) and even within populations (e.g., Tournamille et al. 1995
; Segal, Barnett, and Crawford 1999
). However, no obvious relation between degree of sequence divergence and change in gene expression has emerged (e.g., Maduro and Pilgrim 1996
). Second, a comprehensive understanding of promoter evolution is confounded by the complex encoding of regulatory information within genomes (Yuh, Bolouri, and Davidson 1998
; Davidson 2001
). Functional consequences of particular mutations within coding regions, such as the introduction of nonsynonymous substitutions, stop codons, and frameshifts (Gillespie 1991
; Li 1997
), can often be predicted from sequence data. In contrast, sequence data alone provide little direct information concerning conservation or change of promoter function. Instead, experimental tests in the form of expression assays are required. Finally, there exists no conceptual framework for understanding promoter evolution and guiding empirical studies.
As a first step toward achieving such a framework, we considered the following question: what time period would be required for new transcription factor binding sites to evolve (i.e., appear and become fixed within populations) as a consequence of local point mutations within promoters under an assumption of neutral evolution? Because individual binding sites are the functional elements of promoters, the answer to this question will provide insights into the origin of genetic variation relevant to promoter function and the rate at which transcriptional regulation evolves. Of course, promoters actually consist of a few to more than 50 such sites (Arnone and Davidson 1997
; Latchman 1999
). Although we acknowledge the plausibility that genomic rearrangements can transfer existing regulatory regions between locations within genomes and thereby comprise an important component of promoter evolution, with our approach we are explicitly testing whether local point mutations can comprise a significant component of promoter evolution by rapidly establishing binding sites. Evidence is accumulating that promoter variants that differ by as little as a single binding site may be subject to selection (e.g., Tournamille et al. 1995
; Segal, Barnett, and Crawford 1999
; Ludwig et al. 2000
). To address the evolutionary origin of individual binding sites, we simulated the evolution of promoters by using a computer program to implement standard mutation models and scanning iteratively for the appearance of particular binding sites, then calculating the likelihood that these binding sites would become fixed using population genetic theory. The results indicate that binding sites capable of altering gene expression can evolve via local point mutations on short timescales without invoking selection.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Sequences
Eight DNA sequences were obtained from GenBank to represent segments flanking genes that might contain cis-regulatory elements (accession numbers L13454; M1022, X00479; M36469; Z4824; U04269; AE001274, AC003011, AC002552, U60409, AF008205, AC002134, AF008206, U70253, AC002305, AF008207, AC003679, AC004018; M99054; X06157). The first 200 and 2,000 bp of these sequences were chosen for analysis, without knowledge of whether they corresponded to exons, introns, or intergenic segments. These 200- and 2,000-bp flanking sequences ("regions") represent single modules (or enhancers) and entire promoters, respectively. Nine short sequences also were chosen for analysis. The lengths (59 bp) and base compositions of these short sequences ("binding sites") represented actual transcription factor binding sites within well-characterized regions: GAGAG, eukaryote GAGA site; TATAA, eukaryote TATA box; AGGATT, Endo16 Otx binding site; TCCCCG, Endo16 GCF1 binding site; ACCAAAA, Endo16 P binding site; ATCAAAG, Endo16 CG4 binding site; AAGTGATTA, Endo16 Z binding site excluding final A; TTTTTAAGA, even-skipped stripe 2 enhancer hb binding site 9; TTCCCCGAA, even-skipped stripe 2 enhancer DSB2 binding site (Arnone and Davidson 1997
; Yuh, Bolouri, and Davidson 1998
; Ludwig and Kreitman 1995
).
Computer Simulation
A computer program (available at http://www.zoo.utoronto.ca/stone/PPE/ppe.htm) was developed that modified regions according to standard mutation models and scanned for the appearance of specific new binding sites. Each of the 16 regions was paired with each of the 9 binding sites, and each of the 144 regionbinding-site pairs was entered into the computer program. The computer program iteratively introduced base changes ("mutations") into the regions according to any of three different mutation models: a one-parameter (Jukes and Cantor), two-parameter (Kimura 1980
), or standing-distribution (Felsenstein 1981
) model of nucleotide substitution. In each iteration ("generation"), a pseudorandom number generator was used to determine whether a mutation would occur (mutation rate = 10-9 per base per generation; Li 1997
); if so, a pseudorandom number generator was used to determine the location at which the mutation would be realized and, according to the mutation model chosen, what the resulting base would be. In each generation, allowance was made for the possibility of a second mutation.
Scanning for the appearance of binding sites was performed in only one direction and prior to every generation, until binding sites were established. Because binding sites within regions are typically position-independent with respect to the basal promoter (Latchman 1999
; Davidson 2001
), new binding sites were allowed to appear anywhere within the regions. At the end of each run, the computer program returned the location of the binding site along the region ("match site"), the number of generations, the number of mutations, and the minimum possible number of mutations required for establishment of the binding site ("shortest path at match site"). (More formally, the shortest path at a particular site is the minimum number of changes required for establishment of a "substring" at that particular position within a "string," given a finite set of symbolshere, the letters A, C, G, and T. For example, consider the segment ACGT at position 4 within the 10-base string GGGACGTCCC. The minimum number of changes required to establish the substring ACAT at that position is 1. Of course, many other, longer, paths that establish ACAT at that position could occur, most involving multiple substitutions at the same site or reversals. The shortest path at a particular site cannot exceed the length of the substring.) One thousand replicates were performed for each of the 144 regionbinding-site pairs using a one-parameter model of nucleotide substitution. Additional computer simulation was conducted for some regionbinding-site pairs using either a two-parameter or standing-distribution model of nucleotide substitution.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Two additional assumptions involved in our computer simulation might have affected waiting times. First, we assumed that all binding sites within a region could mutate without disrupting promoter function, whereas real promoters contain functional binding sites that are preserved by selection (Segal, Barnett, and Crawford 1999
; Ludwig et al. 2000
). This would effectively reduce the number of locations within a cis-regulatory region that could vary and thereby would increase waiting times. Because binding sites comprise a small proportion of nucleotides within regions (approximately 2%15% in 5' flanking sequences; Arnone and Davidson 1997
), this increase in waiting times should be modest. Second, we assumed that we could neglect the effects of nonhomologous recombination, because this probably occurs much less frequently than does point mutation. In any case, recombination should introduce no systematic bias in waiting times.
Fixation Times Within Populations
To estimate fixation times for real-world cases, numbers of generations derived from the simulations can be combined with realistic parameter values. For example, with an effective population size of 106, a new 6-bp binding site will appear somewhere within the region extending 2 kb 5' of a given gene in one individual approximately every 2,250 generations. (The average median value corresponding to 6-bp binding sites was 4,506,870,000 generations; given an effective population size of 106, the presence of diploidy, and the possibility of the binding site appearing on either DNA strand, the estimated fixation time was (4,506,870,000 generations/(106 individuals x 2 DNA strands)) 2,254 generations.) Typically, this will be the case for each gene in a genome and every 6-bp binding site.
Using realistic but conservative generation times, estimated fixation times can be calculated for a variety of organisms (table 1
). Most of these fixation times are less than a millennium, and all are less than 600,000 years. Including realistic parameter values associated with real-world cases, more specific estimated fixation times can be calculated. For example, consider the evolution of a new hunchback protein binding site within the 600 bp even-skipped stripe 2 enhancer in Drosophila. Given an effective population size of 106, the presence of diploidy, a 6-bp binding site, the possibility of the binding site appearing on either DNA strand, and a generation time of approximately 5 weeks, the estimated fixation time is approximately 75 years. As hunchback can bind to several variants on the consensus binding site, the actual waiting time would be shorter. Thus, we conclude that the evolution of new transcription factor binding sites is a continuous process, occurring on microevolutionary timescales.
|
Waiting and Fixation Times: Binding-Site Pairs
Given the importance of interactions among transcription factors while binding to promoters (Latchman 1999
; Davidson 2001
), it is instructive to calculate estimated fixation times associated with regions containing particular combinations of binding sites. We therefore simulated the simultaneous appearance of two binding sites within 200- and 2,000-bp regions via local point mutations, again without invoking selection, and converted waiting times into estimated fixation times. We conservatively assumed that the two binding sites must reside on the same DNA strand and that selection would occur only after both binding sites had been established. Some of the waiting times, and therefore fixation times (last two columns in table 1
), were short on macroevolutionary timescales. For example, given an effective population size of 106, the presence of diploidy, the possibility of the binding-site pair appearing on either DNA strand, and a generation time of approximately 5 weeks, the estimated fixation time associated with two 6-bp binding sites within a 200-bp region in Drosophila is approximately 55,000 years (third row, third column in table 1 ). Thus, even simple combinations of new transcription factor binding sites can evolve on microevolutionary timescales without invoking selection.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The promoter regions of eukaryotic genes are complex and include approximately a dozen to several dozen transcription factor binding sites (Arnone and Davidson 1997
). The likelihood of a dozen binding sites evolving simultaneously without selection is infinitesimally small, as can easily be estimated by extrapolating the trend apparent in table 1
. We envision instead that complex regulatory systems are the result of long and complex evolutionary histories involving stepwise assembly and turnover of binding sites. Local point mutations, transposition, and recombination all likely play important roles in this process. For example, changes in transcription could result from the gain (or loss) of a key binding site as a consequence of local point mutations or from the insertion (or deletion) of several new binding sites following transposition (or recombination). There is ample empirical evidence for both processes, the former from sequence comparisons among species that reveal gains and losses of single binding sites (e.g., Gonzalez et al. 1995
; Ludwig and Kreitman 1995
; Tournamille et al. 1995
; Margarit et al. 1998
), and the latter from sequences indicative of transpositional origins for binding sites (e.g., Britten 1997
; Kidwell and Lisch 1997
).
In contrast to transposition and recombination, the establishment of new binding sites via local point mutations would be accomplished incrementally, requiring from one to several independent point mutations. Nevertheless, the results of our computer simulation suggest that the evolution of specific new binding sites in flanking sequences adjacent to each gene in a genome is virtually inevitable in populations of realistic sizes over timescales of months to millions of years, depending on generation time. Although the assembly of simple combinations of new transcription factor binding sites will take much longer (approximately 250-fold longer for two 6-bp binding sites than for a single 6-bp binding site; table 1 ), even the waiting times associated with such pairwise combinations are usually short intervals in macroevolutionary terms.
It is important to note that these results concern the evolution only of one or two particular 6-bp binding sites within a 200- or 2,000-bp region 5' of a single gene. Typical metaphyte or metazoan genomes contain many features that could expedite the evolution of single binding sites and combinations of binding sites: >104 genes, dozens of different sequences that could serve as binding sites, sequence heterogeneity within populations, and other regions where binding sites could function (within introns, 3' of the gene, and farther 5'). For example, our results suggest that new 6-bp binding sites for real transcription factors will evolve somewhere within 2 kb of the start site of transcription of any gene in humans at a rate of 0.013 per genome per generation. Within a population of 106 individuals, this translates to
12,500 new 6-bp sites during each generation. (This estimate is highly approximate, as the fraction of the 4,096 possible 6-bp sequences that correspond to the consensus binding sites of real transcription factors is not known for any organism, and as our computer simulation involved several simplifying assumptions; however, the estimate is conservative, as our computer simulations assumed an initially isogenic population, whereas significant genetic heterogeneity exists in noncoding DNA within real populations.) For comparison, an empirically based estimate of the neutral mutation, and therefore fixation rate for point mutations, in humans is
175 per genome per generation (Nachman and Crowell 2000), or
175 x 106 new single-nucleotide polymorphisms in a population of the same size. When corrected for the difference in mutation rates between the two studies (10-9 vs. 2.5 x 10-8), Nachman and Crowell's (2000) estimated rate of SNP origins is
270-fold higher than our predicted rate of 6-bp binding site origins. Given that the origin of most 6-bp sites will require more than one point mutation (fig. 2B
), this difference seems reasonable. The fundamental conclusion that may be drawn from these considerations is that local point mutations should continuously generate considerable genetic variation within natural populations that is capable of altering transcription.
|
Another application of the results involves deducing the evolutionary dynamics of cis-regulatory sequences. In our computer simulation, the number of local point mutations that actually occurred in converting the original base sequence at the match site into the binding site was typically greater than the minimum number possible (shortest path at match site) and scaled approximately exponentially with binding-site length (fig. 2A ). As binding-site length increased, the skew of the distribution of shortest paths at match sites changed from positive (5-bp binding site), to zero (7-bp binding site), to negative (9-bp binding site) (fig. 2B ). Thus, the efficiency of establishment of binding sites (measured as the number of mutations that occurred) was dependent on binding-site length: shorter binding sites typically involved substitutions at only a few positions, whereas longer binding sites typically involved complete turnover of bases. This accords with intuition, as steps intermediate toward a match can be eliminated by reverse mutations, and the likelihood of this occurring is greater for longer than for shorter binding sites.
The short fixation times that we predict for binding sites in populations of realistic sizes suggest that functional differences in promoter sequences among species require neither extended divergence times nor genomic rearrangements. We encourage researchers interested in understanding the developmental genetic basis for phenotypic evolution to test these predictions with empirical studies of variation in promoter structure and function within populations and among species.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Present address: Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada.
2 Present address: Department of Biology, Duke University.
1 Keywords: computer simulation
enhancer
evolution of development
promoter
transcription factor
2 Address for correspondence and reprints: Gregory Wray, Department of Biology, Box 90338, Duke University, Durham, North Carolina 27708-0338. gwray{at}duke.edu
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Arnone M. I., E. H. Davidson, 1997 The hardwiring of development: organization and function of genomic regulatory systems Development 124:1851-1864
Britten R. J., 1997 Mobile elements inserted in the distant past have taken on important functions Gene 205:177-182[ISI][Medline]
Burcin M., R. Arnold, M. Lutz, B. Kaiser, D. Runge, F. Lottspeich, G. N. Filippova, V. V. Lobanenkov, R. Renkawitz, 1997 Negative protein 1, which is required for function of the chicken lysozyme gene silencer in conjunction with hormone receptors, is identical to the multivalent zinc finger repressor CTCF Mol. Cell. Biol 17:1281-1288[Abstract]
Damjanovski S., M.-H. Huyah, K. Motamed, E. H. Sage, M. Ringuette, 1998 Regulation of SPARC expression during early Xenopus development: evolutionary divergence and conservation of DNA regulatory elements between amphibians and mammals Dev. Genes Evol 207:453-461[ISI][Medline]
Davidson E. H., 2001 Genomic regulatory systems: development and evolution Academic Press, San Diego, Calif
Felsenstein J., 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach J. Mol. Evol 17:368-376[ISI][Medline]
Franks R. R., B. R. Hough-Evans, R. J. Britten, E. H. Davidson, 1988 Spatially deranged though temporally correct expression of a Strongylocentrotus purpuratus actin gene fusion in transgenic embryos of a different sea urchin family Genes Dev 2:1-12[Abstract]
Gerhart J., M. Kirschner, 1997 Cells, embryos, and evolution Blackwell Science, Maldon, England
Gillespie J. H., 1991 The causes of molecular evolution Oxford University Press, New York
Gonzalez P., P. V. Rao, S. B. Nunez, J. S. Zigler, 1995 Evidence for independent recruitment of Zeta-Crystallin/Quinone Reductase (CRYZ) as a crystallin in camelids and hystriocomorph rodents Mol. Biol. Evol 12:773-781[Abstract]
Grenier J. K., T. L. Garber, R. Warren, P. M. Whittington, S. Carroll, 1997 Evolution of the entire arthropod Hox gene set predated the origin and radiation of the onychophoran/arthropod clade Curr. Biol 7:547-553[ISI][Medline]
Jukes T. H., C. R. Cantor, 1969 Evolution of protein molecules Pp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York
Keys D. N., D. L. Lewis, J. E. Selegue, B. J. Pearson, L. V. Goodrich, R. L. Johnson, J. Gates, M. P. Scott, S. B. Carroll, 1999 Recruitment of a hedgehog regulatory circuit in butterfly eyespot evolution Science 283:532-534
Kidwell M. G., D. Lisch, 1997 Transposable elements as sources of variation in animals and plants Proc. Natl. Acad. Sci. USA 94:7704-7711
Kim H. K., G. Siu, 1998 The notch pathway intermediate HES-1 silences CD4 gene expression Mol. Cell. Biol 18:7166-7175
Kimura M., 1980 A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences J. Mol. Evol 16:111-120[ISI][Medline]
Kleffe J., E. Grau, 1993 The joint distribution of patterns in random sequences with applications to the RC-measures for expressivity CABIOS 9:275-283[Abstract]
Kleffe J., U. Langbecker, 1990 Exact computation of pattern probabilities in random sequences generated by Markov chains CABIOS 6:347-353[Abstract]
Latchman D., 1999 Eukaryotic transcription factors 3rd edition. Academic Press, San Diego, Calif
Leclerc S., W. Eskild, S. L. Guerin, 1997 The rat growth hormone and human cellular retinol binding protein 1 genes share homologous NF1-like binding sites that exert either positive or negative influences on gene expression in vitro DNA Cell Biol 17:951-967[ISI]
Li Q. L., C. A. Blau, C. H. Clegg, A. Rohde, G. Stamatoyannopoulos, 1998 Multiple epsilon-promoter elements participate in the developmental control of epsilon-globin genes in transgenic mice J. Biol. Chem 273:17361-17367
Li W.-H., 1997 Molecular evolution Sinauer, Sunderland, Mass
Lowe C. J., G. A. Wray, 1997 Radical alterations in the roles of homeobox genes during echinoderm evolution Nature 389:718-721[ISI][Medline]
Ludwig M. Z., C. Bergman, N. H. Patel, M. Kreitman, 2000 Evidence for stabilizing selection in a eukaryotic enhancer element Nature 403:564-567[ISI][Medline]
Ludwig M., M. Kreitman, 1995 Evolutionary dynamics of the enhancer region of even-skipped in Drosophila Mol. Biol. Evol 12:1002-1011[Abstract]
Ludwig M. Z., N. H. Patel, M. Kreitman, 1998 Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change Development 125:949-958
Maduro M., D. Pilgrim, 1996 Conservation of function and expression of unc-119 from two Caenorhabditis species despite divergence of non-coding DNA Gene 183:77-85[ISI][Medline]
Margarit E., A. Guillen, C. Rebordosa, J. Vidal-Tabadoada, M. Sanchez, F. Ballesta, R. Oliva, 1998 Identification of conserved potentially regulatory sequences of the SRY gene from 10 different species of mammals Biochem. Biophys. Res. Comm 245:370-377[ISI][Medline]
Nachman M. W., S. W. Crowell, 2000 Estimate of the mutation rate per nucleotide in humans Genetics 156:297-304
Patel N. H., E. Martin-Blanco, K. G. Coleman, S. J. Poole, M. C. Ellis, T. B. Kornberg, C. S. Goodman, 1989 Expression of engrailed proteins in arthropods, annelids, and chordates Cell 58:955-968[ISI][Medline]
Raff R. A., 1996 The shape of life University of Chicago Press, Chicago
Raff R. A., T. C. Kauffman, 1983 Embryos, genes, and evolution Indiana University Press, Bloomington
Ross J. L., P. P. Fong, D. R. Cavener, 1994 Correlated evolution of the cis-acting regulatory elements and developmental expression of the Drosophila GLD gene in 7 species of from the subgroup Melanogaster Dev. Genet 15:38-50[ISI][Medline]
Segal J. A., J. L. Barnett, D. L. Crawford, 1999 Functional analysis of natural variation in Sp1 binding sites of a TATA-less promoter J. Mol. Evol 49:736-749[ISI][Medline]
Tournamille C., Y. Colin, J. P. Cartron, C. Le Van Kim, 1995 Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy-negative individuals Nat. Genet 10:224-228[ISI][Medline]
Wang R.-L., A. Stec, J. Hey, L. Lukens, J. Doebley, 1999 The limits of selection during maize domestication Nature 398:236-238[ISI][Medline]
Yuh C.-H., H. Bolouri, E. H. Davidson, 1998 Genomic cis-regulatory logic: experimental and computational analysis of a sea urchin gene Science 279:1896-1902