Center for Evolutionary Functional Genomics and Department of Biology, Arizona State University
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: transition/transversion bias substitution patterns mammalian genomes
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Often, transition/transversion mutation rate biases are assumed to vary considerably among genes and genomic segments (Jukes 1987; Wakeley 1996; Yang and Yoder 1999). This perception is commonly based on individual gene analysis, as the extent of transition bias among genomic regions and among species has not been characterized. Therefore, we have examined transition bias in the introns of 51 gene pairs and three orthologous intergenic regions from human, chimpanzee, and baboon, and 4,347 protein-coding genes (11,428 sequences) from seven mammal species. To use as much data as possible, we have taken a pairwise species approach to estimating the instantaneous transition rate bias () for genes, genomic regions, and species.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Variation in coding and noncoding regions were examined through analysis of fully concatenated exon and concatenated intron sequences from 51 gene pairs from humanchimpanzee (26 pairs) and humanbaboon (25 pairs). Orthologous intergenic regions from human, chimpanzee, and baboon were culled from five large contigs (human: AC002066, AC002080; chimpanzee: AC087253, AC087512; baboon: AC084730). Repeat elements were identified using RepeatMasker (http://ftp.genome.washington.edu/cgi-bin/RepeatMasker) and removed prior to sequence alignment. Using human genome annotations, all coding genes and 5 kb of flanking region from the upstream and downstream regions around each gene (to exclude potential regulatory regions) were removed from the contigs.
Estimating Transition Bias
There are numerous approaches to estimating the transition bias, (Kimura 1980; Wakeley 1994, 1996; Pollock and Goldstein 1995; Ina 1998; Yang and Yoder 1999). These approaches fall into two basic categories: those based on paired sequences and those based on greater numbers of sequences and phylogenetic trees. We use a paired sequences approach because it allows us to use substantially more genes and sequences then would be available if we restricted ourselves to phylogeny-based estimation. In any case, efficient estimators of
can be constructed in pairwise sequence analysis under the Hasegawa, Kishino, and Yano (HKY) (1985) model. Under this model, the frequency of change is dependent not only on transition and transversion substitution rates (
and ß, respectively), but also on equilibrium nucleotide frequencies (
). The instantaneous rate of change of a nucleotide i to nucleotide j is
j for transitional substitutions and ß
j for transversional substitutions. The transition bias
. This is the instantaneous ratio of transitional change to transversional change and differs from the actual expected number of transitions and transversions. For the HKY model, the ratio of the expected number of transitions (
) and transversions (
) is
|
For a set of genes from a specific species pair, we calculated an overall estimate of as the weighted average of the natural logarithms of the individual
estimates, where the weight for each estimate is equal to the sequence length (n) of the associated sequences, i.e.,
|
Spatial Patterns of Transition Bias
The spatial pattern of estimates in the human genome was determined from a correlogram (Sokal and Oden 1978; Cliff and Ord 1981) of 3,023 humanmouse homologous genes. Chromosomal locations for each gene were as in the human genome. Physical distance between two genes was measured as the number of nucleotides between the end of one gene and the beginning of the next (overlapping genes were given distances of zero). The autocorrelation coefficient, Moran's I (Moran 1950), was determined for pairs of genes located on the same chromosome within specific distances of each other; successive distance classes represented 500-kb windows. Spatial analyses were performed using PASSAGE (Rosenberg 2001).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
The average 's estimated from species pairs of the simulated data were quite close to the true simulated value (e.g., mouserat,
; cowpig,
) when we use equations (1) and (2). Use of the arithmetic mean rather than equation (2) leads to overestimation of
(mouserat,
; cowpig,
). Furthermore, figure 4 shows the simulated distribution of
estimates for mouserat paired genes (we chose these species as being the most closely related species with a large number of observed gene pairs). Both the observed (fig. 3b) and simulated (fig. 4) distributions have similar ranges and shapes. The spread (variance) of the observed estimates is larger than that of the simulated data, indicating that the relatively simplistic nature of the simulation cannot capture the full stochastic variation present in the observed data.
|
The second approach was to calculate a correlogram for estimates for 3,023 human genes (fig. 5). This analysis is motivated by the idea that if there is regional genomic variation in the fundamental mutation pattern (Matassi, Sharpa, and Gautier 1999; Williams and Hurst 2000; Lercher, Williams, and Hurst 2001), genes located near each other on a chromosome might have more similar transition biases than those located farther apart. This correlogram reveals no spatial patterning; genes located near each other on a chromosome show variation similar to that among those located farther apart. Furthermore, the average
was essentially the same whether taken across the entire genome or determined for individual chromosomes (results not shown).
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Other factors may have large effects on transition bias. Rate variation among sites can complicate the estimate of (Wakeley 1994); however, use of only fourfold-degenerate (neutral) sites should yield a subset of sites with relatively constant rates of substitution (Kumar and Subramanian 2002). Using 5 taxa and 14 genes from Springer et al. (2003), we found the gamma parameter (using ML) for fourfold-degenerate sites to be approximately 2. Use of this value in the computation of
produces essentially the same values already reported. Codon usage bias will often have large effects on substitution pattern and transition bias because all mutations at fourfold-degenerate sites may not be neutral. In mammals, however, codon usage bias is extremely weak (Akashi 2001; Urrutia and Hurst 2001) and is likely to have no effect on our estimates.
The results we have shown are for the neutral mutation transition bias. Sites under selection pressure are expected to show widely varying transition biases. Within coding sequences, transitional changes are often synonymous when transversional changes are not; furthermore, when both transitions and transversions lead to a change in the protein sequence, the transitional change is often less severe with respect to the chemical properties of the original and mutant amino acids (Grantham 1974; Zhang 2000). Furthermore, regional effects on transition bias cannot be completely discounted. We have already shown the effect of CpG sites (and thus, the correlated effect of GC content) on transition bias. There may easily be additional local variation as yet uncharacterized. More extensive analyses of mutational patterns among the genes of closely related species are needed, and it is important to exercise caution in using the universal average we report for analyses of individual genes.
Knowledge of the mutational transition/transversion rate bias allows a general prediction of time to saturation of substitutions at fourfold-degenerate sites (fig. 6). Given these observed mutational parameters, transversions become more common than transitions after 250 Myr, the time about which transitions become saturated (at 25% of sites). Transversions become saturated much more slowly, asymptotically beginning to reach 50% after about 750 Myr. We find that the observed number of transitional substitutions accumulates approximately linearly for about 100 Myr, while the transversional substitutions accumulate linearly for about 250 Myr.
|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Mark Springer, Associate Editor
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Akashi, H. 2001. Gene expression and molecular evolution. Curr. Opin. Genet. Dev. 11:660-666.[CrossRef][ISI][Medline]
Bird, A. P. 1980. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8:1499-1504.[Abstract]
Blanck, H. M., P. E. Tolbert, and J. A. Hoppin. 1999. Patterns of genetic alterations in pancreatic cancer: a pooled analysis. Environ. Mol. Mutagen. 33:111-122.[CrossRef][ISI][Medline]
Cliff, A. D., and J. K. Ord. 1981. Spatial Processes. Pion, London.
Duret, L., D. Mouchiroud, and M. Gouy. 1994. Hovergen: a database of homologous vertebrate genes. Nucleic Acids Res. 22:2360-2365.[Abstract]
Giannelli, F., T. Anagnostopoulos, and P. M. Green. 1999. Mutation rates in humans. II. Sporadic mutation-specific rates and rate of detrimental human mutations inferred from hemophilia B. Am. J. Hum. Gen. 65:1580-1587.[CrossRef][ISI][Medline]
Grantham, R. 1974. Amino acid difference formula to help explain protein evolution. Science 185:862-864.[ISI][Medline]
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160-174.[ISI][Medline]
Hollstein, M., D. Sidranksy, B. Vogelstein, and C. C. Harris. 1991. p53 mutations in human cancers. Science 253:49-53.[ISI][Medline]
Ina, Y. 1998. Estimation of the transition/transversion ratio. J. Mol. Evol. 46:521-533.[ISI][Medline]
Jukes, T. H. 1987. Transitions, transversions, and the molecular evolutionary clock. J. Mol. Evol. 26:87-98.[ISI][Medline]
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120.[ISI][Medline]
Krawczak, M., E. V. Ball, and D. N. Cooper. 1998. Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am. J. Hum. Gen. 63:474-488.[CrossRef][ISI][Medline]
Kumar, S., and S. Subramanian. 2002. Mutation rates in mammalian genomes. Proc. Natl. Acad. Sci. USA 99:803-808.
Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244-1245.
Lercher, M. J., E. J. B. Williams, and L. D. Hurst. 2001. Local similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse-rat comparisons: implications for understanding the mechanistic basis of male mutation bias. Mol. Biol. Evol. 18:2032-2039.
Light, R. J., and D. B. Pillemer. 1984. Summing up: the science of reviewing research. Harvard University Press, Cambridge.
Lynch, M., and J. S. Conery. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:1151-1155.
Martínez-Arias, R., E. Mateu, J. Bertranpetit, and F. Calafell. 2001. Profiles of accepted mutation: From neutrality in a pseudogene to disease-causing mutation on its homologous gene. Hum. Genet. 109:7-10.[CrossRef][ISI][Medline]
Matassi, G., P. M. Sharpa, and C. Gautier. 1999. Chromosomal location effects on gene sequence evolution in mammals. Curr. Biol. 9:786-791.[CrossRef][ISI][Medline]
Moran, P. A. P. 1950. Notes on continuous stochastic phenomena. Biometrika 37:17-23.[ISI]
Pollock, D. D., and D. B. Goldstein. 1995. A comparison of two methods for constructing evolutionary distances from a weighted contribution of transition and transversion differences. Mol. Biol. Evol. 12:713-717.[Abstract]
Rosenberg, M. S. 2001. PASSAGE: pattern analysis, spatial statistics, and geographic exegesis. Version 1.0. Department of Biology, Arizona State University, Tempe, AZ.
Rosenberg, M. S., and S. Kumar. 2001. Incomplete taxon sampling is not a problem for phylogenetic inference. Proc. Natl. Acad. Sci. USA 98:10751-10756.
Sokal, R. R., N. L. Oden. 1978. Spatial autocorrelation in biology 1. Methodology. Biol. J. Linn. Soc. 10:199-228.
Springer, M. S., W. J. Murphy, E. Eizirik, and S. J. O'Brien. 2003. Placental mammal diversification and the Cretaceous-Tertiary boundary. Proc. Natl. Acad. Sci. USA 100:1056-1061.
Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10:512-526.[Abstract]
Topal, M. D., and J. R. Fresco. 1976a. Base pairing and fidelity in codon-anticodon interaction. Nature 263:289-293.[ISI][Medline]
Topal, M. D. 1976b. Complementary base pairing and the origin of substitution mutations. Nature 263:285-289.[ISI][Medline]
Urrutia, A. O., and L. D. Hurst. 2001. Codon usage bias covaries with expression breadth and the rate of synonymous evolution in humans, but this is not evidence for selection. Genetics 159:1191-1199.
Wakeley, J. 1994. Substitution-rate variation among sites and the estimation of transition bias. Mol. Biol. Evol. 11:436-442.[Abstract]
Wakeley, J. 1996. The excess of transitions among nucleotide substitutions: new methods of estimating transition bias underscore its significance. Trends Ecol. Evol. 11:158-163.[CrossRef][ISI]
Williams, E. J. B., and L. D. Hurst. 2000. The proteins of linked genes evolve at similar rates. Nature 407:900-903.[CrossRef][ISI][Medline]
Yang, Z., and A. D. Yoder. 1999. Estimation of the transition/transversion rate bias and species sampling. J. Mol. Evol. 48:274-283.[ISI][Medline]
Zhang, J. 2000. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. J. Mol. Evol. 50:56-68.[ISI][Medline]