* Department of Ecology and Evolutionary Biology Yale University, New Haven, Connecticut; Bioinformatik, Institut für Informatik, Universität Leipzig, Leipzig, Germany;
Institut für Theoretische Chemie und Molekulare Strukturbiologie Universität Wien, Wien, Austria
Correspondence: E-mail: gunter.wagner{at}yale.edu
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The quantitative analysis of dynamical aspects of footprint loss and acquisition, however, is complicated by the fact that we cannot independently observe individual regulatory DNA regions. Instead, phylogenetic footprinting always detects regulatory elements in pairs of sequences. As a consequence, even very simplistic models of footprint loss lead to rather sophisticated inference and test methods as we shall see in this contribution. We will focus on two questions: (1) How can we detect rate differences in footprint modification in two different lineages? (2) Can we determine periods in evolutions with exceptionally large or small footprint modification?
![]() |
Data Acquisition |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Conserved non-coding sequences are detected using the tracker program (Prohaska et al. 2004a). Very briefly, this approach is based on BLAST (Altschul et al. 1990) for the initial search of all pairs of input sequences restricted to homologous intergenic regions. The resulting list of pairwise sequence alignments is then assembled into groups of partially overlapping regions that are subsequently passed through several filtering steps and finally aligned using the segment-based multiple alignment tool DIALIGN2 (Morgenstern 1999). The final output of the program is the list of these aligned "footprint cliques" (see Supplementary Material online: URL: http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/04-007/.)
The alignments of all footprint cliques are concatenated and padded with gap characters where data are missing, i.e., where a footprint detected between some sequences does not have a counterpart in others. Consequently, all gap characters are treated as unknown nucleotides rather than as deletions. Conserved positions between groups of sequences are counted as specified in eq. (1) below. To take unknown nucleotides into account, we discount columns with gaps in the relevant sequences by a factor of 1/4 for each gap; data are summarized in tables 1 and 2.
|
|
![]() |
A Model |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
![]() | (5) |
![]() | (6) |
We are interested in the variance 2 of the difference of the loss rates along PA and PB, which equals twice the variance of the exponential process along one of the lineages. Thus
![]() | (7) |
![]() | (8) |
Equation (8) gives a test statistic which assumes that the loss of conservation at each nucleotide position is stochastically independent. This assumption, however, is not plausible, assuming that the elementary event in the evolution of an enhancer is the loss or gain of a transcription factor binding site. Typically, transcription factor binding sites are between 5 and 20 nucleotide positions long, but they have various degrees of degeneracy. Evolutionary changes in the number and kind of transcription factor binding sites thus induces a stochastic dependency among the nucleotide positions compared here. To account for this stochastic dependency, we scale the predicted sampling variance with the average length of contiguous CNCN sequence elements in our data, This value is typically between
and thus at the same scale as many known transcription factor binding motifs. The resulting test statistic then is
![]() | (9) |
![]() |
Estimating Footprint Loss Rates |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
One problem with the raw estimates obtained from solving the equations of the model for the parameters T is that the CNCN sequences detected in the comparison between the two outgroup species, O and X, contain spurious CNCN positions. These are nucleotide positions which are identical between the sequences of O and X but are only identical by chance rather than as a consequence of purifying selection. While tracker and other alignment procedures are designed to identify significant stretches of conserved sequence, there is a possibility that at the borders of conserved sequence blocks spurious sites are included in the count of CNCN sites. There is no objective way to eliminate them from the sequence alignment, but it is possible to determine their influence on the estimates of the rate parameters.
Let o(T) be the rate parameter observed from a comparison in which the most recent common ancestor of A and B lived T years before the present, and let us assume the loss of CNCN positions is time homogeneous. Then this estimated rate is determined by the true rate
c, as well as by the number of spurious CNCN positions. The true CNCN positions evolve at a rate
c, but the spurious sites randomize much more quickly than the true CNCN sites. Over the timescales we consider in this article, these spurious positions randomize instantaneously, and thus they contribute an additive term to the true rate to give the observed rate
![]() | (10) |
![]() | (11) |
![]() |
The HoxA Clusters of Gnathostomes |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The comparison of the Xenopus sequence with the three mammalian data sets shows that the rate of modification of CNCN positions in the Xenopus lineage is higher than in the mammalian lineage. The Xenopus lineage retains about 33% of the CNCN positions detected in the comparison of shark and bichir sequences, whereas the mammalian lineages retain about 35%. All these differences are significant, with the comparison between Xenopus and human being significant at the 0.011 level, the comparison with mouse at the 0.039 level, and the comparison with rat marginally significant at the 0.067 level. Hence it seems that the Xenopus lineage experiences a higher rate of modifications of CNCN positions than the mammalian lineage.
The results from the new test were compared to the Tajima relative rate test (Tajima 1993), which can also be applied to the kind of data analyzed here (see Appendix>). In table 2 the results for the Tajima test of the same data as in table 1 are summarized. The results are consistent with the ones from the z'-statistic (table 1), confirming that the Xenopus lineage evolves faster than the mammalian lineage. None of the comparisons of mammalian HoxA clusters are significant, but all the comparisons between Xenopus and the mammals are significant at least at the 5% level.
The rate parameter estimated from the model for the three different mammalian lineages varies depending on the outgroup taxa used. The rate parameters are consistently smaller, the more distant is the most recent common ancestor of the compared taxa. This effect is anticipated based on the arguments put forward above (see under Estimating Footprint Loss Rates). The problem is that the comparison of the two outgroup species O and X will identify a number of spurious CNCN positions, which are identical in O and X due to chance. These CNCN positions then enter the estimation of the rate of evolution since the most recent common ancestor of A and B and inflate the rate estimate.
To correct for this effect we performed a linear regression of rate estimates over the inverse of the time T since the most recent common ancestor of A and B. First we analyzed the rate estimates for the mammalian data with all possible combinations of outgroup species. The intercept was 0.218, but the data revealed a deviation from linearity in the plot of the residuals over 1/T. The regressions were thus repeated for data points using either only more distant (360 and 112 Myrs) or only the more recent common ancestors (112 and 40.7 Myrs). The rate estimates are 0.153 ± 0.071 (events 110g years) for the more recent time points and 0.378 ± 0.067 for the more distant time points. These results suggest that there is systematic rate variation in the evolution of mammalian lineages such that the rate of modification of CNCN positions is considerably higher in the stem lineage of amniotes and mammals than among placental mammals.
The slope of the regression equation (11)> over 1/T allows an estimate of the fraction of spurious CNCN positions. These values suggest that only between 2% and 5% of the CNCN positions entering these calculations are spurious and thus do not greatly affect the variance used in calculating the z' statistic for the relative rate test.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The method might be useful to test the following hypothesis. It is plausible that the adaptation of a gene to a new function is not limited to the coding region of the gene, but also affects the cis-regulatory elements determining the location, timing, and the level of expression. Although it is relatively routine to detect selection in coding regions (Liberles et al. 2001), adaptive evolution of cis-regulatory elements is hard to detect in general [but see Kohn et al. (2004)]. The following hypothesis, however, is testable: if the coding region of a group of genes is under directional selection in one lineage (say, B) but not in another lineage (say, A), then the cis-regulatory elements will also evolve more quickly in lineage B than in lineage A. This hypothesis could be tested by comparing the rate of modification of CNCN sequences in the these two lineages.
Another hypothesis testable by the proposed approach is that cis-regulatory elements of duplicated genes diverge asymmetricallyi.e., that one of the duplicates diverges faster than the other. This has been shown to be the case for coding sequences [e.g., Wagner (2002), Conant and Wagner (2003), Kondrashov et al. (2002), Zhang et al. (2003), and Kellis M (2004)], but it has not to our knowledge been demonstrated for cis-regulatory elements. Another hypothesis is that putative cis-regulatory elements evolve faster when the expression patterns of the genes in the same genomic region undergo evolution. A limited result along these lines has been presented in the example data set analyzed in this article, i.e., the HoxA cluster sequences of gnathostomes. The results suggest that, in the stem lineage of mammals and amniotes, the rate of CNCN sequence evolution is more than twice as high as among the placental mammals, human, mouse, and rat. This result is preliminary because of limited taxon sampling, but it is consistent with the idea that body-plan evolution involves major rewiring of transcriptional regulation of developmental genes (Davidson 2001).
The usefulness of the proposed method strongly depends on the extent of taxon sampling. The example data set analyzed for this article consists of the complete sequences of HoxA clusters from six species. The continuing efforts to sequence the genomes from representatives of major clades will certainly increase the number of taxa that can be included in a comparative study of their non-coding sequences. Data sets from many different species will have considerable statistical and cladistic power if analyzed with appropriate statistical tools.
![]() |
Appendix |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() | (12) |
![]() | (13) |
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403410.[CrossRef][ISI][Medline]
Arnone, M. I., and E. H. Davidson. 1997. The hardwiring of development: Organization and function of genomic regulatory systems. Development 124:18511864.
Blanchette, M., and M. Tompa. 2002. Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res. 12:739748.
Carroll, S. B., J. K. Grenier, and S. D. Weatherbee. 2001. From DNA to Diversity. Blackwell Science, Malden, Mass.
Carter, A. J., and G. P. Wagner. 2002. Evolution of functionally conserved enhancers can be accelerated in large populations: a population-genetic model. Proc. R. Soc. Lond. B Biol. Sci. 269:953960.[CrossRef][ISI][Medline]
Chiu, C.-h., C. Amemiya, K. Dewar, C.-B. Kim, F. H. Ruddle, and G. P. Wagner. 2002. Molecular evolution of the HoxA cluster in the three major gnathostome lineages. Proc. Natl. Acad. Sci. USA 99:54925497.
Chiu, C.-H., K. Dewar, G. P. Wagner, K. Takahashi, F. Ruddle, C. Ledje, P. Bartsch, J.-L. Scemama, E. Stellwag, C. Fried et al. 2004. Bichir HoxA cluster sequence reveals surprising trends in rayfinned fish genomic evolution. Genome Res. 14:1117.
Conant, G. C., and A. Wagner. 2003. Asymmetric sequence divergence of duplicate genes. Genome Res. 13:20522058.
Davidson, E. 2001. Genomic Regulatory Systems. Academic Press, San Diego, Calif.
Dermitzakis, E. T., C. M. Bergman, and A. G. Clark. 2003. Tracing the evolutionary history of Drosophila regulatory regions with models that identify transcription factor binding sites. Mol. Biol. Evol. 20:703714.
Duret, L., and P. Bucher. 1997. Searching for regulatory elements in human noncoding sequences. Curr. Opin. Struct. Biol. 7:399406.[CrossRef][ISI][Medline]
Fickett, J. W., and W. W. Wasserman. 2000. Discovery and modeling of transcriptional regulatory regions. Curr. Opin. Biotechnol. 11:1924.[CrossRef][ISI][Medline]
Ghanem, N., O. Jarinova, A. Amores, L. Qiaoming, G. Hatch, B. K. Park, J. L. R. Rubenstein, and M. Ekker. 2003. Regulatory roles of conserved intergenic domains in vertebrate Dlx bigene clusters. Genome Res. 13:533543.
Kellis M, L. E., and B. W. Birren. 2004. Proof and evolutionary analysis of ancient genome duplication in the yeast saccharomyces cerevisiae. Nature 428:617624.[CrossRef][ISI][Medline]
Kim, C. B., C. Amemiya, W. Bailey, K. Kawasaki, J. Mezey, W. Miller, S. Minosima, N. Shimizu, W. G. P., and F. Ruddle. 2000. Hox cluster genomics in the horn shark, Heterodontus francisci. Proc. Natl. Acad. Sci. USA 97:16551660.
Kohn, M. H., S. Fang, and C. I. Wu. 2004. Inference of positive and negative selection on the 5' regulatory regions of drosophila genes. Mol. Biol. Evol. 21:374383.
Kondrashov, F. A., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002. Selection in the evolution of gene duplications. Genome Biol. 3:RESEARCH0008.
Kumar, S., and B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917920.[CrossRef][ISI][Medline]
Leung, J. Y., F. E. McKenzie, A. M. Uglialoro, P. O. Flores-Villanueva, B. C. Sorkin, E. J. Yunis, D. L. Hartl, and A. E. Goldfeld. 2000. Identification of phylogenetic footprints in primate tumor necrosis factor- promoters. Proc. Natl. Acad. Sci. USA 97:66146618.
Liberles, D. A., D. R. Schreiber, S. Govindarajan, S. Chamberlin, and S. A. Benner. 2001. The adaptive evolution database (TAED). Genome Biol. 2:Research0028.
Ludwig, M. Z. 2002. Functional evolution of noncoding DNA. Curr. Opin. Genet. Dev. 12:634639.[CrossRef][ISI][Medline]
Ludwig, M. Z., C. Bergman, N. H. Patel, and M. Kreitman. 2000. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403:564567.[CrossRef][ISI][Medline]
Manen, J., V. Savolainen, and P. Simon. 1994. The atpB and rbcL promoters in plastid DNAs of a wide dicot range. J. Mol. Evol. 38:577582.[ISI][Medline]
Morgenstern, B. 1999. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15:211218.
Phinchongsakuldit, J., S. MacArthur, and J. F. Y. Brookfield, 2004. Evolution of developmental genes: molecular microevolution of enhancer sequences at the Ubx locus in Drosophila and its impact on developmental phenotypes. Mol. Biol. Evol. 21:348363.
Prohaska, S., C. Fried, C. Flamm, G. Wagner, and P. F. Stadler. 2004a. Surveying phylogenetic footprints in large gene clusters: applications to Hox cluster duplications. Mol. Phylogenet. Evol. 31:581604.[CrossRef][ISI][Medline]
Prohaska, S. J., C. Fried, C. T. Amemiya, F. H. Ruddle, G. P. Wagner, and P. F. Stadler. 2004b. The shark HoxN cluster is homologous to the human HoxD cluster. J. Mol. Evol. 58:212217.[CrossRef][ISI][Medline]
Santini, S., J. L. Boore, and A. Meyer. 2003. Evolutionary conservation of regulatory elements in vertebrate Hox gene clusters. Genome Res. 13:11111122.
Shashikant, C., C. B. Kim, M. A. Borbley, W. C. Wang, and F. H. Ruddle. 1998. Comparative studies on mammalian Hoxc8 early enhancer sequence reveal a baleen whalespecific deletion of a cis-acting element. Proc. Natl. Acad. Sci. USA 95:1544615451.
Stern, D. L. 2000. Evolutionary developmental biology and the problem of variation. Evolution 54:10791091.[ISI][Medline]
Tagle, D. A., B. F. Koop, M. Goodman, J. L. Slightom, D. L. Hess, and R. T. Jones. 1988. Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203:439455.[ISI][Medline]
Tajima, F. 1993. Simple methods for testing molecular clock hypothesis. Genetics 135:599607.
Tautz, D. 2000. Evolution of transcriptional regulation. Curr. Opin. Genet. Dev. 10:575579.[CrossRef][ISI][Medline]
Wagner, A. 2002. Asymmetric functional divergence of duplicate genes in yeast. Mol. Biol. Evol. 19:17601768.
Wray, G. A., M. W. Hahn, E. Abouheif, J. P. Balhoff, M. Pizer, M. V. Rockman, and L. A. Romano. 2003. The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol. 20:13771419.
Zhang, P., Z. Gu, and W. H. Li. 2003. Different evolutionary patterns between young duplicate genes in the human genome. Genome Biol. 4:R56.[CrossRef][Medline]