Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: doublet mutations tandem substitutions regional variation sequence context effects synonymous-nonsynonymous correlation
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The importance of quantifying the doublet mutation rate becomes evident if we consider the consequences of incorrectly assuming that all point mutations occur at single nucleotides. Fundamentally, the pattern of neutral evolution is determined by the pattern of mutation, so unless we understand mutational processes we may falsely reject neutrality. For example, the classic test of the neutral theory using the index of dispersion of substitutions (see Gillespie 1991) will be biased by doublet mutations. The neutral prediction, that the index of dispersion is one, is based on the Poisson model assumptions that mutations are random, independent, and single. Thus doublet mutations will generate overdispersion of the molecular clock: if all mutations are doublets then, relative to singleton mutations, the mean number of substitutions will be doubled but the variance in the number of substitutions will be quadrupled; hence the index of dispersion will be two rather than one. More generally, the explicit models of molecular evolution required for substitution rate estimation and phylogenetic inference usually involve single nucleotide changes (e.g., Li 1997). If doublet mutations are common then such models, and the results they generate, may be biased.
Finally, estimating the level of doublet mutations should help resolve the debate concerning the causes of the correlation between synonymous and nonsynonymous substitution rates in mammals (Wolfe and Sharp 1993; Smith and Hurst 1999). Doublets provide a mutational (i.e., neutral) explanation for why nonsynonymous changes in protein-coding genes (those changes in DNA sequence which affect the amino acid sequence) covary with synonymous changes (those changes in DNA sequence which do not affect the amino acid sequence owing to the degeneracy of the genetic code). For example, changes at the second codon position are always nonsynonymous while changes at the third codon position are mostly synonymous: thus a doublet mutation can simultaneously generate both a synonymous and a nonsynonymous mutation. Note, however, that there is also a methodological debate over whether there really is a synonymous-nonsynonymous correlation in mammals (Bielawski, Dunn, and Yang 2000).
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The positions of alignments in human contigs were determined by BLAST searches against the human genome, and comparison to the contig annotation files available at NCBI allowed the masking of coding regions within alignments. Repetitive sequence elements were masked with RepeatMasker (A.F.A. Smit and P. Green, unpublished). Repetitive elements were masked because they are known to evolve at higher rates than nonrepetitive sequence (Chen and Li 2001), and so may generate strong regional variation and sequence context effects. Microsatellites were masked using the program Sputnik (Abajian, unpublished, htpp://abajian.net/sputnik/) because microsatellites cause alignment problems and show unusual substitution patterns.
Lineage-specific substitutions were classified using parsimony (e.g., if the human-chimp-baboon sequences are A-C-C, then a C to A change is inferred down the human lineage). For both pairwise and lineage-specific substitution we ignored both singletons which may be due to the hypermutability of methylated CG dinucleotides (CG to TG and CG to CA). To nullify the effect of CG mutation on doublet estimates, all potential CG-mediated tandems (CH to TG and DG to CA, where H is A/C/T and D is A/G/T) were removed, as were the corresponding near-neighbors. Tandems are pairs of adjacent differences, whereas near-neighbors are pairs of differences separated by one nucleotide. The corresponding near-neighbors had to be removed because not all putative CG tandems are generated by CG hypermutation.
Given that there are several alternative strategies for counting different types of substitutions, here we present our algorithms in greater detail. Slight alterations to the algorithms made little difference to estimates of the doublet mutation rate (results not shown). All alignments were analyzed separately after masking for genes and repeats. The following classes of sites were defined for both the human-chimp and human-baboon pairwise comparisons and the human and chimp lineages: substitutions (s), conserved sites (c) and masked sites (m). In addition, some sites were undefined (u) in the human and chimp lineages (i.e., parsimony uninformative). Then potential CpGs were masked (s m) as described above (note that CpG masking differs between the pairwise comparisons and lineages because in the latter case the direction of substitution is known). Then types of substitutions were counted. The total number of substitutions was found by counting all the "s"-sites; tandems were found by counting all the adjacent pairs of "s"-sites (including overlapping pairs, e.g., "sss" contains two doublets), and near-neighbors were identified by counting all pairs of "s"-sites separated by one site (which could belong to any class, and again overlapping pairs were allowed, e.g., "scsms" contains two near-neighbors). Finally, the effective length of the alignment, required for calculating the Averof et al. (2000) measure of the doublet mutation rate, was found by counting all the "s"-sites and "c"-sites.
Polymorphism Data
We downloaded the tenth release of The SNP Consortium (TSC) database, consisting of 1,255,326 mapped single-nucleotide nucleotides (SNP's) from http://snp.cshl.org/index.html. Tandems and near-neighbors could be identified because the polymorphism data files gave the base positions of each SNP along the chromosomes as well as positions within human contigs. All SNP's identified as coding by BLAST searches of surrounding sequence against the June 2001 version of the human mRNA RefSeq database (Pruitt et al. 2000) were removed. Dawson et al. (2001) found no evidence for polymorphism tandems to occur preferentially in repetitive sequences, and so the polymorphism data was not masked for repeat sequences. Potential CG-mediated singletons, tandems, and near-neighbors were also removed as for substitutions. In the case of our polymorphism data, for which both the direction of mutation and the linkage patterns were unknown, the complementary removal of near-neighbors is particularly important because many of the putative CG tandems (identified as Y(R/K/S) and (M/S/Y)R where Y is C/T, R is A/G, K is G/T, S is C/G, and M is A/C) will not have been caused by CG hypermutation. Runs of three or more adjacent polymorphisms were also masked as these might be considered likely to result from sequencing errors; this procedure made no qualitative difference to our results (data not shown). Tandem polymorphisms were counted as all instances of pairs of polymorphisms one base pair apart; near-neighbor polymorphisms were counted as all instances of pairs of polymorphisms two base pairs apart.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The assumption of no rate variation between sites does not seem generally applicable (Yang 1996), however, and so it is worth considering the effect of rate variation between sites. We define two types of rate variation: sequence context and regional variation. By sequence context we refer to the dependence of mutation rates on the identities of nearby bases (Templeton et al. 2000; Zavolan and Kepler 2001), and by regional variation we refer to all larger scale variation in mutation rates.
Regional variation is suggested by KS variation in mammals, between genes throughout the genome (Wolfe, Sharp, and Li 1989; Matassi, Sharp, and Gautier 1999; Lercher, Williams, and Hurst 2001) as well as within genes (Tsunoyama, Bellgard, and Gojobori 2001), and by variation in noncoding substitution rates in primates (Smith, Webster, and Ellegren 2002). If there is regional variation in mutation rates, then it is easy to see that the expected number of tandems will be underestimated. For instance, consider 10 kb compared between two samples with a mean distance of 10%; if there is no variation in substitution rates, we expect 100 tandems (length of sequence times the square of the distance). Now imagine that there is rate variation with 9 kb at 8% and 1 kb at 28%, in which case although the mean distance is the same the expected number of tandems rises to 136. So, unless the regional variation is accounted for, 36 doublet mutations will be falsely inferred.
Sequence context occurs on a much finer scale than regional variation, but it can generate tandems in a similar way. For example, it has been shown that, with an adjacent 5' G, the mutagenic guanine product dG-AF induces mutations at a much higher frequency with an adjacent 3' C than with an adjacent 3' G (Shibutani et al. 2001). Thus if a GGG trinucleotide undergoes a primary mutation to GGC, then the chance of a secondary mutation affecting the middle nucleotide is greatly increased. Thus sequence context can make the generation of a tandem more likely than expected if mutations were independent.
Here we present a simple method to address the problem of regional variation. Instead of comparing the observed number of tandems to the number of tandems expected under a given model, we compare the observed number of tandems (To), pairs of differences one base apart, with the observed number of what we refer to as near-neighbors (NNo), pairs of differences two bases apart, e.g., as in AGGTT compared to ATGAT. Just as many additional near-neighbors will be generated by regional variation as tandems, so the difference gives the number of doublet mutations inferred from the data. The doublet mutation rate, Dtn, is then quantified relative to the number of singleton mutations as for equation 1:
|
The formula for Dtn also corrects for sequence context effects to some extent, because sequence context effects will generate near-neighbors as well as tandems. The correction for sequence context effects, however, will certainly not be perfect because such effects appear to be much stronger at a distance of one base than at two bases (Krawczak, Ball, and Cooper 1998). The strongest known case of sequence context, the hypermutability of CG dinucleotides, can be explicitly accounted for by ignoring potentially affected differences (see Materials and Methods). Such an approach is not possible if weak sequence context effects are common, and so our approach for dealing with sequence context effects is to consider sequence differences along lineages of different lengths. Short lineages provide few primary mutations to affect secondary mutations, and sequence context effects should thus be weak.
Simulations
We performed some simulations to confirm that both the Doe and Dtn methods are robust to two notable features of DNA sequence evolution: (i) variation in the rates at which different types of DNA mutations occur and (ii) rate variation between sites (assuming that such rate variation is random with respect to genomic position; i.e., there is no systematic regional variation). Intuitive reasoning suggests that neither factor should affect the methods because they do not affect the relative positions at which mutations appear. We performed simulations of DNA sequence evolution using the evolver program in the PAML package (Yang 1997). In all cases, we generated three sequences of 1 million base pairs each according to a tree with distances similar to those in the human-chimpanzee-baboon tree[(human:0.005, chimpanzee:0.005]:0.04, baboon:0.045])and we analyzed the simulated sequences using the same methods employed for the human-chimpanzee-baboon alignments (see Materials and Methods; for simplicity we did not apply CpG masking to the simulated data). Four sets of simulated data were generated using different DNA substitution models: (1) JC69 model (Jukes and Cantor 1969), (2) HKY85 model (Hasegawa, Kishino, and Yano 1985) with equal base frequencies and a transition/transversion ratio of 5, (3) as in (2), but with unequal base frequencies C = G = 0.4 and A = T = 0.1, and (4) as in (3,) but with rate variation between sites corresponding to a "discretized" gamma distribution (Yang 1994) with shape parameter 0.5 and eight rate categories.
The results of the simulations are shown in table 1, in which each simulation data set is analyzed in three comparisons of different lineage lengths corresponding to those performed on the human-chimpanzee-baboon data described below. In none of the 12 sets of analyses is there a significant difference between the observed and expected numbers of tandems (To and Te) or between the observed numbers of tandems and near-neighbors (To and NNo). These results suggest that neither the Doe nor the Dtn method is likely to generate biases through a failure to provide an explicit model of DNA evolution. The simulated "human-baboon" comparisons also suggest that the use of parsimony, or more specifically the failure to account for multiple substitutions at the same site, does not bias the Doe or Dtn methods. As one moves from simulation (1) to simulation (4), the failure to account for multiple hits leads to increasing underestimation of substitution rates: thus So, To, and NNo all decrease. However, this bias does not seem to affect the difference between To and Te or that between To and NNo. Therefore, although both the Doe and Dtn methods are somewhat ad hoc, the simulations indicate that they are not biased by multiple hits, nonregional rate variation between sites, or variation in the rates of different types of DNA mutations. Thus, although it may ultimately be desirable to address the issue of doublet mutations through explicit DNA models combined with likelihood or Bayesian approaches, the development of such methods does not appear to be a pressing concern.
|
For all three substitution comparisons, both Doe and Dtn were calculated, and table 2 shows that in each case Dtn is much lower than Doe (human-baboon pairwise Dtn = 0.95% and Doe = 1.74%, human-chimpanzee pairwise Dtn = 0.53% and Doe = 1.14%, human-baboon pairwise Dtn = 0.35% and Doe = 0.80%). Because Doe assumes no rate heterogeneity, the difference between Dtn and Doe indicates mainly the effect of regional variation within the 43 alignments ranging in length from 12 kb to 107 kb (but not variation between alignments because expected numbers of tandems were calculated separately for each alignment). The magnitude of the difference shows the necessity of accounting for regional variation in substitution rates.
|
Polymorphisms
Given the effect of lineage length on estimates of the doublet mutation rate, the obvious step after considering substitutions among primates is to turn to human polymorphisms. Not only is the human genome the subject of intensive efforts to determine polymorphisms (The International SNP Map Working Group 2001), but also humans have a lower effective population size than other primates (Kaessmann et al. 2001), and so offer the shortest possible lineage length among the primates, and thus the weakest sequence context effects. Because the average age, measured in generations, of neutral polymorphisms is four times the effective population size (Kimura 1983), estimates of a human effective population size of 10,000 (Jorde, Watkins, and Bamshad 2001) and a generation time of 25 years (Keightley and Eyre-Walker 2000) indicate that human polymorphisms have an average age of roughly 1 million years (possibly lower if ancestral generation times and effective population sizes were lower). Thus the lineage length of human polymorphisms is roughly five times shorter than the lineage length of human and chimpanzee lineage-specific substitutions (see table 2).
An excess of tandem SNP's relative to the number expected assuming no polymorphism level heterogeneity has previously been observed on human chromosome 22 (Dawson et al. 2001). Applying the Doe method to the singleton and doublet data given by Dawson et al. generates a doublet mutation rate of 0.70%. This value, however, is likely to be a serious overestimate because CG hypermutation mediated tandems (see Materials and Methods) were not removed from their analysis (indeed Dawson et al. (2001) found TG CA tandems to be the most common), and because there is extreme regional variation in levels of polymorphism (see Dawson et al., Figure 1).
The estimation of the doublet mutation rate Dtn using TSC SNP data, 0.27%, appears reasonable, although the results should be viewed with caution because the sampling of polymorphisms over the human genome is clearly patchy. The doublet mutation rate will only be underestimated if the sampling process is biased toward looking for polymorphisms at single nonadjacent sites, a finding which seems unlikely. Just as with the substitution comparisons, Dtn is much less than Doe. The large difference between the two doublet mutation rate estimates and the 25-fold excess of observed versus expected tandems probably reflect the fact that both polymorphism levels (see Figure 2b and supplementary material of The International SNP Map Working Group [2001]) and sampling intensity vary greatly across human chromosomes.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
It is desirable to consider the various possible types of doublet mutations to see if the results of mutagenesis studies match with the results of sequence comparisons. For example, it is interesting to see whether CC to TT mutations, which have been found in the p53 gene as the result of UV exposure (Nakazawa et al. 1994), are revealed by our sequence comparisons. The most appropriate data for this purpose are the human and chimpanzee lineage-specific substitutions, because the use of the baboon as an outgroup allows the direction of substitution to be inferred.
Because mutations affecting both strands need to be considered, we compare tandems and near-neighbors for GG to AA changes as well as CC to TT changes (GNG to ANA and CNC to TNT for near-neighbors). There are six tandems and two near-neighbors in our sequence comparisons, and this excess of tandems is consistent with mutagenesis studies, although the difference is not significant because of the small numbers of differences. The lack of data in the present study precludes a full analysis of all possible types of doublets, so the generation of additional SNP data should help such studies considerably. Linkage information will allow a more discriminating test of doublet mutations: apart from rare recombination events doublet SNP's should be in perfect linkage disequilibrium. Outgroup sequences, such as those from chimpanzee, will allow the determination of the direction of mutation in polymorphism studies, and thus the more accurate analysis of mutation processes.
What is the significance of our finding of a doublet mutation rate of 0.3% as opposed to the estimate of 2% obtained by Averof et al. (2000)? Does this discrepancy in doublet mutation rates result from the differences between the Doe and Dtn methods or perhaps from the differences between the data sets? We can discount the latter possibility in two ways. First, we reanalyzed the data set of Averof et al. (2000). Their alignment of primate pseudo eta globin sequences, masked for CpG mutations, was kindly provided by M. Averof. We used the program baseml in the PAML package (Yang 1997) to reconstruct the ancestral sequences and we analyzed the changes down the lineage leading to rhesus monkey (this is the longest lineage, with the most data, and it provides the strongest signal of doublet mutations). Using this method, slightly different from that employed by Averof et al. (2000), we identified 232 substitutions, 17 tandems, and 14 near-neighbors and calculated the expected number of tandems as 8.29. Thus the Doe method yields a doublet mutation rate of 3.8%, whereas the Dtn method gives 1.3%, suggesting that the two methods give different results in both data sets. Second, we can partition our human and chimp lineage-specific results according to alignment-specific substitution rates. This partitioning enables a test of the idea that our human-chimpanzee-baboon data set may contain selectively constrained regions in which doublet mutations may be particularly constrained (unlike the pseudogene analyzed by Averof et al. (2000), which is almost certainly free of constraint). If so, we would predict the doublet mutation rate to be lower in regions with low substitution rates. However, this prediction does not hold true (Dtn is 0.58% in low substitution rate alignments and 0.18% in high substitution rate alignments, varying as expected given that the doublet mutation rate is conditioned on the singleton mutation rate).
Given that we can be confident that the Doe and Dtn methods yield different results when applied to real sequence data, an important conclusion of this study is the strength of regional variation and sequence context, effects we consider responsible for most of the excess of observed tandems relative to the expectation with no rate heterogeneity. The results of our study of primate genomic sequences are thus consistent with regional variation in synonymous and noncoding substitution rates across mammalian nuclear genomes (Lercher, Williams, and Hurst 2001; Tsunoyama, Bellgard, and Gojobori 2001; Smith, Webster, and Ellegren 2002), sequence context in chloroplasts (Morton, Oberholzer, and Clegg 1997), sequence context in mitochondria (Howell and Smejkal 2000), and the utility of the auto-discrete-gamma model to describe sequence evolution in primate mitochondria (Yang 1995).
The direct effect of a low doublet mutation rate is obviously a diminished belief in the importance of doublet mutations. Such a low rate suggests that doublet mutations cannot fully explain the strong correlation between synonymous and nonsynonymous substitution rates, KS and KA, observed in mammals (Wolfe and Sharp 1993). In the comparison presented by Wolfe and Sharp (1993), of 363 genes in mouse and rat, there were on average 28 nonsynonymous and 56 synonymous differences per gene after correction for multiple hits. A doublet mutation rate of 0.3% means that the expected number of doublets is only 0.25: thus over three quarters of genes will not be affected by doublet mutation. The effect of doublets on the KAKS correlation can be investigated by simulating substitutions according to Poisson distributions based on the data of Wolfe and Sharp (1993), assuming that all genes have independent synonymous and nonsynonymous singleton substitution rates based on the mean numbers of substitutions and numbers of sites (hence the expected KAKS correlation coefficient in the absence of doublet mutations is zero). If all doublet mutations are considered to generate one synonymous and one nonsynonymous mutation (a conservative assumption for our purposes), the expected increase in the KAKS correlation coefficient arising from a doublet mutation rate of 0.25 is just 0.005. Applying the Averof et al. (2000) doublet mutation rate of 2% means 1.7 doublets on average, which increases the expected KAKS correlation confidence by 0.042. Thus it seems highly unlikely that doublet mutations are solely responsible for the observed increase in the KAKS correlation coefficient above the neutral expectation (Ohta 1995), given that this has been quantified as an increase of 0.141 in the mouse-rat study of Smith and Hurst (1999).
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25:3389-3402.
Averof, M., A. Rokas, K. H. Wolfe, and P. M. Sharp. 2000. Evidence for a high frequency of simultaneous double-nucleotide substitutions. Science 287:1283-1286.
Bielawski, J. P., K. A. Dunn, and Z. H. Yang. 2000. Rates of nucleotide substitution and mammalian nuclear gene evolution: Approximate and maximum-likelihood methods lead to different conclusions. Genetics 156:1299-1308.
Chen, F. C., and W. H. Li. 2001. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet 68:444-456.[CrossRef][ISI][Medline]
Dawson, E., Y. Chen, S. Hunt, et al 2001. A SNP resource for human chromosome 22: extracting dense clusters of SNPs from the genomic sequence. Genome Res 11:170-178.
Gillespie, J. H. 1991. The causes of molecular evolution. Oxford University Press, Oxford.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol 22:160-174.[ISI][Medline]
Howell, N., and C. B. Smejkal. 2000. Persistent heteroplasmy of a mutation in the human mtDNA control region: hypermutation as an apparent consequence of simple-repeat expansion/contraction. Am. J. Hum. Genet 66:1589-1598.[CrossRef][ISI][Medline]
Jorde, L. B., W. S. Watkins, and M. J. Bamshad. 2001. Population genomics: A bridge from evolutionary history to genetic medicine. Hum. Mol. Genet 10:2199-2207.
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules,. Pp. 21123 in H. N. Munro, ed., Mammalian protein metabolism, Academic Press, New York.
Kaessmann, H., V. Wiebe, G. Weiss, and S. Paabo. 2001. Great ape DNA sequences reveal a reduced diversity and an expansion in humans. Nat. Genet 27:155-156.[CrossRef][ISI][Medline]
Keightley, P. D., and A. Eyre-Walker. 2000. Deleterious mutations and the evolution of sex. Science 290:331-333.
Kimura, M. 1983. The neutral theory of evolution. Cambridge University Press, Cambridge.
Krawczak, M., E. V. Ball, and D. N. Cooper. 1998. Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am. J. Hum. Genet 63:474-488.[CrossRef][ISI][Medline]
Lercher, M. J., E. J. B. Williams, and L. D. Hurst. 2001. Local similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse-rat comparisons: implications for understanding the mechanistic basis of the male mutation bias. Mol. Biol. Evol 18:2032-2039.
Li, W. H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.
Matassi, G., P. M. Sharp, and C. Gautier. 1999. Chromosomal location effects on gene sequence evolution in mammals. Curr. Biol 9:786-791.[CrossRef][ISI][Medline]
Morton, B. R., V. M. Oberholzer, and M. T. Clegg. 1997. The influence of specific neighboring bases on substitution bias in noncoding regions of the plant chloroplast genome. J. Mol. Evol 45:227-231.[ISI][Medline]
Nakazawa, H., D. English, P. L. Randell, K. Nakazawa, N. Martel, B. K. Armstrong, and H. Yamasaki. 1994. UV and skin-cancer-specific P53 gene mutation in normal skin as a biologically relevant exposure measurement. Proc. Natl Acad. Sci. USA 91:360-364.[Abstract]
Ohta, T. 1995. Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J. Mol. Evol 40:56-63.[ISI][Medline]
Pruitt, K. D., K. S. Katz, H. Sicotte, and D. R. Maglott. 2000. Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet 16:44-47.[CrossRef][ISI][Medline]
Purvis, A. 1995. A composite estimate of primate phylogeny. Phil. Trans. R. Soc. Lond. B 348:405-421.[ISI][Medline]
Shibutani, S., N. Suzuki, X. Z. Tan, F. Johnson, and A. P. Grollman. 2001. Influence of flanking sequence context on the mutagenicity of acetylaminofluorene-derived DNA adducts in mammalian cells. Biochemistry 40:3717-3722.[CrossRef][ISI][Medline]
Smith, N. G. C., and L. D. Hurst. 1999. The effect of tandem substitutions on the correlation between synonymous and nonsynonymous rates in rodents. Genetics 153:1395-1402.
Smith, N. G. C., M. T. Webster, and H. Ellegren. 2002. Deterministic mutation rate variation in the human genome. Genome Res 12:1350-1356.
Templeton, A. R., A. G. Clark, K. M. Weiss, D. A. Nickerson, E. Boerwinkle, and C. F. Sing. 2000. Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am. J. Hum. Genet 66:69-83.[CrossRef][ISI][Medline]
The, International SNP Map Working Group. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928-933.[CrossRef][ISI][Medline]
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. ClustalWimproving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673-4680.[Abstract]
Tsunoyama, K., M. I. Bellgard, and T. Gojobori. 2001. Intragenic variation of synonymous substitution rates is caused by nonrandom mutations at methylated CpG. J. Mol. Evol 53:456-464.[CrossRef][ISI][Medline]
Wolfe, K. H., and P. M. Sharp. 1993. Mammalian gene evolutionnucleotide sequence divergence between mouse and rat. J. Mol. Evol 37:441-456.[ISI][Medline]
Wolfe, K. H., P. M. Sharp, and W. H. Li. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283-285.[CrossRef][ISI][Medline]
Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods. J. Mol. Evol 39:306-314.[ISI][Medline]
Yang, Z. 1995. A space-time process model for the evolution of DNA sequences. Genetics 139:993-1005.
Yang, Z. 1996. The among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol 11:367-372.[CrossRef][ISI]
Yang, Z. 1997. PAML: A program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci 13:555-556.[Medline]
Zavolan, M., and T. B. Kepler. 2001. Statistical inference of sequence-dependent mutation rates. Curr. Opin. Genet. Dev 11:612-615.[CrossRef][ISI][Medline]