* Biology Department, University of Maryland, College Park
Department of Organismic and Evolutionary Biology, Harvard University
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: phylogenetic footprinting intron noncoding sequence conservation gene regulation interspersed repeats
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Twenty-five percent of the human genome consists of introns, transcribed noncoding spacer sequences interleaved between amino acid coding exons within genes (Venter et al. 2001). Transcription qualitatively distinguishes introns from other noncoding sequences in the genome, but among transcripts, introns are also quantitatively abundant. On average in humans, 95% of the primary transcript length consists of intron RNA that is subsequently spliced out (Venter et al. 2001). If transcription is a prerequisite for certain kinds of regulatory function such as trans-acting RNA modifiers, then the abundance of introns make them a likely source of these transcripts (Mattick and Gagen 2001). To date, however, the only general patterns of sequence constraint documented for introns occur in small terminal splice junctions and 3' end branch sites (Mount 1982; Lewin 1997), facultatively coding intron sequence in genes that are alternatively spliced, and the first (furthest 5') intron of genes (Venter et al. 2001). Early analyses of intron sequence evolution suggested that, on average, introns evolve at comparable rates to pseudogenes and synonymous sites in coding sequences, sites where selective forces are believed to be absent or very low (Hughes and Yeager 1997; Li 1997). Assumptions of intron neutrality based on these generalizations have motivated increased use of intron variation for historical analyses (Palumbi 1996; Friesen 2000; Chen and Li 2001; Hare 2001).
Rapid progress on the genomic sequencing of model taxa and recent improvement of sequence alignment algorithms have made it possible to make large-scale comparisons of homologous intron sequences between distantly related pairs of taxa (Drosophila spp.: Bergman and Kreitman 2001; human/mouse: Wasserman et al. 2000; Jareborg, Birney, and Durbin 1999; Levy, Hannenhalli, and Workman 2001; Dermitzakis et al. 2002; Caenorhabditis spp.: Shabalina and Kondrashov 1999). Using a variety of alignment methods, these studies have found 12% to 28% of intron sequence to be highly conserved. Conserved blocks of noncoding sequence are presumed to reflect functional constraints when the comparison involves species so divergent that random nucleotide substitutions and insertion/deletion (indel) events should have erased most sequence similarity under a neutral model of evolution (Jareborg, Birney, and Durbin 1999). Human and mouse, for example, have diverged for 80 Myr (Kumar and Hedges 1998). Given a substitution rate of 0.0037 substitutions per site per Myr (Li 1997), neutral sequence divergence of 60% between human and mouse is expected (40% without correction for multiple hits), close to the 75% divergence expected for nonhomologous sequences.
Known transcription factor (TF) binding sites were shown to be overrepresented in conserved blocks of human/mouse intron sequence (Levy, Hannenhalli, and Workman 2001), and this ability of sequence comparisons to reveal conserved binding sites has been called "phylogenetic footprinting" (Tagle et al. 1988). However, even in well-studied genes, known TF binding sites represent a small proportion of the noncoding sequence conserved between human and mouse (Wasserman et al. 2000), implying the existence of many additional TF binding sites, the presence of novel regulatory features, or chance stretches of sequence similarity. Exploring the extent and pattern of noncoding sequence conservation and testing it against null models of neutral evolution is a critical step toward understanding the functional role these sequences play, if any, beyond TF binding.
There are several sources of potential ambiguity in the results from noncoding sequence comparisons. First, the observed level and pattern of conservation can be sensitive to the alignment methods used (Bergman and Krietman 2001). Second, any process that generates substitution rate heterogeneity among nucleotide sites, such as mutational "cold spots," can produce blocks of conserved sequence across large divergence times (Clark 2001). Finally, the proportion of sequence blocks conserved by chance will be greater with comparison of less divergent sequences (Duret and Bucher 1997). Comparison of both more divergent sequences and more taxa can strengthen inferences of functional constraint from sequence conservation. In the example above, for example, the binomial probability of humanmouse identity (0.60) across 20 bases is . If a third independent lineage is included in the comparison, the probability falls to almost 10-9. Inferences of functional constraint are also strengthened when there is conservation of secondary sequence features such as spacing between conserved blocks of sequence (Ludwig et al. 2000; Kim 2001; Webb et al. 2002) or RNA folding free energy (Stephan and Kirby 1993; Kirby, Muse, and Stephan 1995; Leicht et al. 1995; Clark, Leicht, and Muse 1996).
Unfortunately, lineage-specific indel events in noncoding sequences generally prohibit useful multiple sequence alignments across large divergence times. Previous efforts to test for noncoding sequence constraints among three taxa have been based on multiple pairwise alignments using mismatch and gap penalty parameters (Dubchak et al. 2000; Dermitzakis et al. 2002). Here, we apply a powerful Bayesian alignment algorithm to generate and compare local gapless pairwise alignments of intron sequences from three species representing different mammalian orders. We show striking patterns of sequence conservation and conserved block spacing over and above random expectations, even assuming a moderate level of nucleotide substitution rate heterogeneity. The extent and distribution of this intron sequence conservation strongly suggests that these introns harbor novel functional domains that are likely to play a role in the regulation of gene expression.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
One of the benefits of this Bayesian alignment procedure is that the marginal posterior probabilities associated with each base reflect the uncertainty in k, the total number of conserved blocks. However, defining block boundaries based on the posterior probabilities is somewhat arbitrary. The default block definition implemented in the BBA is to start at the marginal posterior probability maximum (Pmax) and expand the block to each side until a position is reached whose marginal posterior probability is < 0.25(Pmax). This procedure is repeated for the remaining Pmax until k blocks have been defined. In practice, this generated some very short blocks (38 bp) at some loci. To be conservative, we only analyzed blocks with Pmax 0.5. The shortest such block was 7 bp.
All pairwise sequence alignments were colinear, that is, blocks were sequential in nucleotide position for each taxon and were nonoverlapping. However, using the human sequence as a positional reference for comparing the H/W, H/S, and S/W pairwise results (e.g., fig. 2), nontransitive conflicts occurred when, for example, the same segment of seal sequence aligned with two positionally different segments in whale and human. The noncolinearity produced by these conflicts in the three-taxon comparison was corrected by removing whichever conflicting block had a lower Pmax or by trimming a block if the conflict involved 4 bp. Less than 1% of the aligned sequence was removed for this reason.
|
Neutral sequence evolution without insertions or deletions was simulated for 1,000 bp using the HKY substitution model (Hasegawa, Kishino, and Yano 1985) assuming 0.0037 substitutions per site per MY, transition/transversion , 80 Myr divergence, and 39% GC content (empirical average for the 12 loci examined here) (Seq-Gen version 1.04, Rambaut and Grassly 1997). Gamma-distributed substitution rate heterogeneity among sites was incorporated in some simulations.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
The proportion of nucleotides contained in three-taxon blocks did not differ significantly among taxa but showed significant heterogeneity among the twelve loci (analysis of variance [ANOVA] ,
). The heterogeneity among loci was not correlated with either GC content or total intron size in humans (Spearman,
; table 1), but was correlated with intron position (P
0.05). In addition, the mean GC content of three-taxon blocks across genes was not significantly different than for the total sequence analyzed (paired t-test,
).
Simulations
We simulated sequence evolution along one kb of DNA to test whether the observed extent of intron sequence conservation is consistent with a simple pattern of neutral evolution. The model of evolution was parameterized using empirical averages and did not include insertions or deletions. After applying a homogeneous rate of substitution among sites to simulate the divergence of three sequences from a common ancestor, BBA analysis revealed only one three-taxon block 18 nucleotides long (0.02% of total length), far below the observed level of intron sequence conservation. Because mutational processes can produce substitution rate heterogeneity and increase the number and extent of conserved blocks, we also simulated neutral evolution with a level of substitution rate heterogeneity estimated previously for vertebrate noncoding DNA (gamma shape parameter of 2.0, Lum et al. 2000). A total of 10 three-taxon blocks (16% of total sequence length) resulted from BBA analysis, with a mean block length of 46 bp (range 8 to 109 bp). The three pairwise comparisons showed 26% to 33% of simulated sequence in conserved blocks. Thus, the total amount of sequence conservation produced by this simulation was roughly half that observed between human, whale, and seal.
Spatial Patterns of Conservation
We tested for spatial patterns in the degree of sequence conservation by calculating a conservation index at each nucleotide site across the three pairwise alignments. At each site the index ranged from 0 to 6, depending on (1) whether the site was part of a gapless block and (2) whether the site was identical for each of the three two-taxon comparisons. For six of nine genes where we have sufficient data, there was no significant difference in levels of conservation between 330 bp at the 3' and 5' intron ends (paired t-test, ). We then combined data from the 3' and 5' ends, as well as from the 5' end of ACTA2 (the 3' end is alternatively spliced). In these combined data, the 165-bp intron edge segments were significantly more conserved than the adjacent interior 165-bp segments (paired t-test, two-tailed
). Although there was a great deal of scatter, the conservation index scores decreased with increasing distance from the closest exon (fig. 4). Linearity of the function could not be rejected (
; Zar 1984, p. 278), and the linear function was significant even though conservation varied considerably among loci (
,
,
).
|
|
|
To identify known repetitive elements in the intron sequences we used RepeatMasker (version 07/16/2000 using the "other mammal" subgroup of Repbase version 03/30/2000). In the intron sequences we analyzed, this search identified 30 interspersed repeats (SINEs, LINEs, and MaLR elements), 12 simple or low complexity repeats, and one tRNA. For the purpose of calculating the proportion of conserved sequence, GMREs found in only one species by RepeatMasker were considered present at orthologous positions in other species if a conserved block was found within any part of the element by the BBA. The total length of all interspersed repetitive elements was 1,910 bp, and that of orthologous GMREs was 1,527 bp, 20% and 16% of the analyzed human intron sequence length, respectively (fig. 1). Interspersed repeats found in only one species included one Alu element in human (fig. 1), one MAR1- SINE and one tRNA in seal, and one bovine-specific SINE element in whale. Among the five loci that contained GMREs, two results indicate that the level of conservation was similar in GMREs and the surrounding unique sequence. The proportion of three-taxon identities contained within repetitive elements ranged from 10% to 56% per locus, and was correlated with the total proportion of GMRE nucleotide sites at each locus (Spearman ,
). Also, the average proportion of three-taxon identities in nonrepetitive intron sequence (total sequence analyzed minus all interspersed elements) was 29% (
) for three-taxon blocks and 50% (
) for pairwise blocks, no different than results using the total sequence.
To test the genomic uniqueness of conserved sequences, human sequence from conserved three-taxon blocks outside of GMREs was used to query human GenBank with MEGABLAST. The few matches found outside homologous genes generally had probabilities 0.05 of occurring by chance, indicating that nonrepetitive conserved intron sequences are unique in the genome, not represented at moderate frequency as expected if motifs were important for coregulation of several genes.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Spatial Patterns of Conservation
Two kinds of spatial regularity were found in the pattern of sequence conservation across three mammalian taxa. First, the 5' and 3' ends of introns, adjacent to exons, were more highly conserved than the adjacent interior portion of introns. This pattern was observed on a sequence length scale too large to be explained solely by intron splice junctions that extend only 6 to 10 bp into the 5' and 3' ends of introns, or by the 3' branch site that involves 7 bp at a distance of 1840 bp from the 3' end (Lewin 1997). Also, stronger conservation of intron edges in this sample was not caused by repetitive elements, because levels of conservation were similar in repetitive and unique sequence (see also Jareborg, Birney, and Durbin 1999). These results corroborate a similar trend reported for humanmouse intron comparisons by Jareborg, Birney, and Durbin (1999).
The second pattern of spatial conservation found in this study involved the spacing between conserved gapless blocks. Experimental studies have demonstrated that transcription factor binding sites are often co-localized within 5' regulatory regions (Wingender et al. 2001) and position effects occur between regulatory elements and transcriptional units (Willoughby, Vilalta, and Oshima 2000). If indel positions are random, then over time, interblock intervals should diverge in size among taxa, particularly if indel rates are lineage-specific (Petrov et al. 2000). Longer interblock intervals are expected to diverge faster than short intervals for a given indel rate. Chance convergence on similar interblock spacing, perhaps as a result of indels that "cancel out," is much less likely in three-taxon comparisons than in two-taxon comparisons. Thus, the high degree of three-taxon interblock length conservation in ACTA2 and FGG suggests spatially coordinated functions among blocks. Alternative explanations are possible, but the paucity of data on rates and patterns of indel evolution make it difficult to construct meaningful tests (Saitou and Ueda 1994; Ludwig et al. 2000; Kim 2001).
Interspersed Repetitive Elements
Assuming that the interspersed nature of repetitive elements prevents homogenization by concerted evolution, most orthologous GMRE sequences are pseudogenes that should diverge between taxa at a neutral rate (Dermitzakis et al. 2002). Phylogenetic footprinting provides a test of the null hypothesis that the degree of sequence conservation, and perhaps functional constraint, is equivalent in unique and GMRE noncoding DNA. This hypothesis could not be rejected for the introns studied here. Unique and GMRE sequences in introns evolve similarly; either they are both evolving under selective constraints or they are both affected by neutral mechanisms preserving sequence similarity.
The Link Between Sequence Conservation and Function
Our simulations demonstrated that the amount of pairwise noncoding sequence conservation observed in most studies is expected to result from a random pattern of nucleotide substitution with moderate rate heterogeneity. Mutation rates are known to vary locally and be dependent on sequence context, such as the tendency for C to mutate to T in CpG dinucleotides (Nachman and Crowell 2000; Chen and Li 2001). Spatial variation in selective constraints can also cause substitution rate heterogeneity, however. Distinguishing mutational and selective causes of rate heterogeneity will be much more challenging than measuring the degree and pattern of conservation. Progress will require more informed models of mutationally induced rate heterogeneity for substitutions and indels, perhaps achievable through studies of experimental evolution or pseudogene variation (Petrov 2002). Ultimately, although computational methods can be used to demonstrate that conserved noncoding DNA is enriched for known transcription factor binding sites (Wasserman et al. 2000; Levy, Hannenhalli, and Workman 2001), experimental approaches will be required to confirm the functional significance of particular blocks of conservation.
What functional roles are conserved introns likely to be playing? Recent estimates of the frequency of alternative splicing in the human genome indicate that more than 50% of genes have two or more alternatively spliced transcripts (International Human Genome Sequencing Consortium 2001), although most of this variation is probably due to exon skipping rather than facultative coding of intron sequences in different tissues (Croft et al. 2000). Evidence for alternative splicing exists for only one intron studied here, ACTA2 (fig. 1). The strictly noncoding portion of the ACTA2 intron had 37% of sites in three-taxon blocks, whereas the facultatively coding portion was 49% conserved. This contrast suggests that if alternative splicing were responsible for much of the overall intron conservation, then the higher conservation of intron edge sequences would be more pronounced.
Despite the historical allusion to sites of DNA binding made by the term phylogenetic "footprinting," transcription factor binding sites constitute only the most familiar possibility for regulatory function that can be revealed through sequence comparisons. For example, genes encoding nucleolar proteins contain various small nucleolar RNAs transcribed from their introns (Rebane et al. 1998; Rebane and Metspalu 1999). The fact that introns are co-transcribed with their home exons make these noncoding sequences the most likely candidates for coordinated expression of trans-acting RNA factors that could modulate transcription or facilitate gene-to-gene interactions (Mattick and Gagen 2001). Several studies have found similar levels of sequence conservation in intergenic versus intron sequence, however, indicating that the link between transcription and noncoding sequence function may be weak (Jareborg, Birney, and Durbin 1999; Shabalina and Kondrashov 1999; Bergman and Krietman 2001).
Comparison with Other Studies
Several factors make it difficult to compare results from this sample of introns with previously reported patterns of noncoding sequence conservation. Most important, the various sequence alignment methods used can affect results (Bergman and Kreitman 2001), and only Wasserman and colleagues (2000) used the powerful but computationally demanding Bayesian Block Aligner that we used here. Second, some previous studies examined a nonrandom sample, focusing on noncoding regions in genes with known tissue-specific regulation or with experimentally determined cis-regulatory activity (Wasserman et al. 2000; Bergman and Kreitman 2001). Third, our small sample of introns may not be genomically representative because of limited chromosome sampling or other inadvertent biases. For example, the intron examined in CD40L is the first (furthest 5') intron in that gene, where a higher density of regulatory sequences often reside (Levy, Hannenhalli, and Workman 2001; Venter et al. 2001). After removal of CD40L from this data set, the amount of sequence conservation was still very high, with proportions of identity in pairwise and three-taxon blocks averaging 45% () and 25% (
) of intron sequence length, respectively. Regardless of whether this small sample of introns proves to be genomically representative, our results indicate an extraordinary level of sequence conservation and presumed functional constraint in otherwise unremarkable and unstudied introns.
Implications and Conclusions
Evidence for widespread but variable evolutionary constraints in noncoding sequence has immediate repercussions for the use of these loci in population genetic and historical inferences made under the assumption of neutral evolution. Introns have been particularly attractive targets for analyses based on neutral theory because their positional homology within genes is phylogenetically conserved (Stoltzfus et al. 1997; Davidson et al. 2000) and flanking exons support the design of PCR primers (Palumbi 1996; Hare 2001). If 30% of intron sequence positions are not free to vary on average, then substitution rate estimates may be altered severalfold (Palumbi 1989). Moreover, substitution rates will vary dramatically from intron to intron, depending on the proportion of constrained sequence. Noncoding sequence constraints imply that some nucleotide substitutions in these regions will be slightly deleterious. Background selection against these variants is expected to reduce intraspecific intron variability below that expected for neutral nuclear sequences, disrupting coalescent predictions of the three-times rule (Palumbi, Cipriano, and Hare 2001).
When seeking neutral variants for historical and evolutionary analyses, the most predictable source may not be in noncoding DNA, but rather in protein-coding regions. Synonymous sites are relatively neutral and are expected at a maximum frequency of 25% within protein-coding sequences (Li 1997). Although introns are desirable markers because they have a higher density of silent sites on average (this study), the location of those sites is unpredictable without phylogenetic comparisons. Also, general patterns of nucleotide and indel substitution are relatively uncharacterized for noncoding DNA. Nonetheless, it may prove desirable to use phylogenetic footprinting to select noncoding DNA markers with relatively low selective constraints for use in historical analyses. Comparisons made here indicate that, for mammals, the seventh introns in ESD and G6PD are relatively free of selective constraints.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
E-mail: matt.hare{at}umail.umd.edu.
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped Blast and PSI-Blast: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.
Bergman, C. M., and M. Kreitman. 2001. Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res. 11:1335-1345.
Chen, F., and W. Li. 2001. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. 68:444-456.[CrossRef][ISI][Medline]
Clark, A. G. 2001. The search for meaning in noncoding DNA. Genome Res. 11:1319-1320.
Clark, A. G., B. G. Leicht, and S. V. Muse. 1996. Length variation and secondary structure of introns in the Mlc1 gene in six species of Drosophila. Mol. Biol. Evol. 13:471-482.[Abstract]
Croft, L., S. Schandorff, F. Clark, K. Burrage, P. Arctander, and J. Mattick. 2000. ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome. Nat. Genet. 24:340-341.[CrossRef][ISI][Medline]
Davidson, H., M. S. Taylor, A. Doherty, A. C. Boyd, and D. J. Porteous. 2000. Genomic sequence analysis of Fugu rubripes CFTR and flanking genes in a 60 kb region conserving synteny with 800 kb of human chromosome 7. Genome Res. 10:1194-1203.
Dermitzakis, E. T., A. Reymond, and R. Lyle, et al. (11 co-authors). 2002. Numerous potentially functional but non-genic conserved sequences on human chromosome 21. Nature 420:578-582.[CrossRef][ISI][Medline]
Dubchak, I., M. Brudno, G. Loots, L. Pachter, C. Mayor, E. Rubin, and K. Frazer. 2000. Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res. 10:1304-1306.
Duret, L., and P. Bucher. 1997. Searching for regulatory elements in human noncoding sequences. Curr. Opin. Struct. Biol. 7:399-406.[CrossRef][ISI][Medline]
Friesen, V. L. 2000. Introns. Pp. 274294 in A. J. Baker, ed. Molecular methods in ecology. Blackwell Science, Oxford.
Gilbert, N., and D. Labuda. 1999. CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs. Proc. Natl. Acad. Sci. USA 96:2869-2874.
Hare, M. P. 2001. Prospects for nuclear phylogeography. Trends Ecol. Evol. 16:700-706.[CrossRef][ISI]
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160-174.[ISI][Medline]
Hughes, A. L., and M. Yeager. 1997. Comparative evolutionary rates of introns and exons in murine rodents. J. Mol. Evol. 45:125-130.[ISI][Medline]
International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921.[CrossRef][ISI][Medline]
Jareborg, N., E. Birney, and R. Durbin. 1999. Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res. 9:815-824.
Kim, J. 2001. Macro-evolution of the hairy enhancer in Drosophila species. Mol. Dev. Evol. 291:175-185.
Kirby, D. A., S. V. Muse, and W. Stephan. 1995. Maintenance of pre-mRNA secondary structure by epistatic selection. Proc. Natl. Acad. Sci. USA 92:9047-9051.[Abstract]
Kumar, S., and S. B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917-920.[CrossRef][ISI][Medline]
Leicht, B. G., S. V. Muse, M. Hanczyc, and A. G. Clark. 1995. Constraints on intron evolution in the gene encoding the myosin alkali light chain in Drosophila. Genetics 139:299-308.
Levy, S., S. Hannenhalli, and C. Workman. 2001. Enrichment of regulatory signals in conserved non-coding genomic sequence. Bioinformatics 17:871-877.
Lewin, B. 1997. Genes VI. Oxford University Press, New York.
Li, W.-H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.
Ludwig, M. Z., C. Bergman, N. H. Patel, and M. Kreitman. 2000. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403:564-567.[CrossRef][ISI][Medline]
Lum, J. K., M. Nikaido, M. Shimamura, H. Shiimodaira, A. M. Shedlock, N. Okada, and M. Hasegawa. 2000. Consistency of SINE insertion topology and flanking sequence tree: quantifying relationships among cetartiodactyls. Mol. Biol. Evol. 17:1417-1424.
Lyons, L. A., T. F. Laughlin, N. G. Copeland, N. A. Jenkins, J. E. Womack, and S. J. O'Brien. 1997. Comparative anchor tagged sequences (CATS) for integrative mapping of mammalian genomes. Nat. Genet. 15:47-56.[ISI][Medline]
Mattick, J., and M. Gagen. 2001. The evolution of controlled multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms. Mol. Biol. Evol. 18:1611-1630.
Mount, S. 1982. A catalogue of splice junction sequences. Nucleic Acids Res. 10:459-472.[Abstract]
Nachman, M. W., and S. L. Crowell. 2000. Estimate of the mutation rate per nucleotide in humans. Genetics 156:297-304.
Palumbi, S. R. 1989. Rates of molecular evolution and the fraction of nucleotide positions free to vary. J. Mol. Evol. 29:180-187.[ISI][Medline]
Palumbi, S. R. 1996. Nucleic acids II: the polymerase chain reaction. Pp. 205247 in D. Hillis, C. Moritz, and B. Mable, eds. Molecular systematics. Sinauer Associates, Sunderland, Mass.
Palumbi, S. R., F. Cipriano, and M. P. Hare. 2001. Predicting nuclear gene coalescence from mitochondrial data: The three-times rule. Evolution 55:859-868.[ISI][Medline]
Petrov, D. A. 2002. DNA loss and evolution of genome size in Drosophila. Genetica 115:81-91.[CrossRef][ISI][Medline]
Petrov, D. A., T. A. Sangster, J. S. Johnston, D. L. Hartl, and K. L. Shaw. 2000. Evidence for DNA loss as a determinant of genome size. Science 287:1060-1062.
Rambaut, A., and N. C. Grassly. 1997. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13:235-238.[Abstract]
Rebane, A., and A. Metspalu. 1999. U82, a novel snoRNA identified from the fifth intron of human and mouse nucleolin gene. Biochim. Biophys. ActaGene Struct. Expr. 1446:426-430.[CrossRef][ISI][Medline]
Rebane, A., R. Tamme, M. Laan, I. Pata, and A. Metspalu. 1998. A novel snoRNA (U73) is encoded within the introns of human and mouse ribosomal protein S3a genes. Gene 210:255-263.[CrossRef][ISI][Medline]
Saitou, N., and S. Ueda. 1994. Evolutionary rates of insertion and deletion in noncoding nucleotide sequences of primates. Mol. Biol. Evol. 11:504-512.[Abstract]
Sankoff, D. 1972. Matching sequences under deletion/insertion constraints. Proc. Natl. Acad. Sci. USA 69:4-6.[Abstract]
Shabalina, S., and A. Kondrashov. 1999. Pattern of selective constraint in C. elegans and C. briggsae genomes. Genet. Res. Camb. 74:23-30.[CrossRef][ISI][Medline]
Smit, A. F. A., and A. D. Riggs. 1995. MIRs are classic, tRNA-derived SINEs that amplified before the mammalian radiation. Nucleic Acids Res. 23:98-102.[Abstract]
Stephan, W., and D. A. Kirby. 1993. RNA folding in Drosophila shows a distance effect for compensatory fitness interactions. Genetics 135:97-103.
Stoltzfus, A., J. J. M. Longsdon, J. D. Palmer, and W. F. Doolittle. 1997. Intron "sliding" and the diversity of intron positions. Proc. Natl. Acad. Sci. USA 94:10739-10744.
Tagle, D. A., B. F. Koop, M. Goodman, J. L. Slightom, D. L. Hess, and R. T. Jones. 1988. Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus): nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203:439-455.[ISI][Medline]
Venter, J. C., M. D. Adams, and E. W. Myers, et al. (271 co-authors). 2001. The sequence of the human genome. Science 291:1304-1351.
Wasserman, W. W., M. Palumbo, W. Thompson, J. W. Fickett, and C. E. Lawrence. 2000. Human-mouse genome comparisons to locate regulatory sites. Nat. Genet. 26:225-228.[CrossRef][ISI][Medline]
Webb, C. T., S. A. Shabalina, A. Y. Ogurtsov, and A. S. Kondrashov. 2002. Analysis of similarity within 142 pairs of orthologous intergenic regions of Caenorhabditis elegans and Caenorhabditis briggsae. Nucleic Acids Res. 30:1233-1239.
Willoughby, D. A., A. Vilalta, and R. G. Oshima. 2000. An Alu element from the K18 gene confers position-independent expression in transgenic mice. J. Biol. Chem. 275:759-768.
Wingender, E., X. Chen, and E. Fricke, et al. 2001. The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29:281-283.
Zar, J. H. 1984. Biostatistical analysis. Prentice-Hall, Englewood Cliffs, N.J.
Zhu, J., J. S. Liu, and C. E. Lawrence. 1998. Bayesian adaptive sequence alignment algorithms. Bioinformatics 14:25-39.[Abstract]