How Important Is DNA Replication for Mutagenesis?

Gavin A. Huttley2,*, Ingrid B. Jakobsen*{dagger}, Susan R. Wilson{dagger} and Simon Easteal*

*John Curtin School of Medical Research, Australian National University, Canberra, Australia;
{dagger}Institute of Molecular Evolutionary Genetics, Pennsylvania State University; and
{ddagger}Centre for Mathematics and its Applications, Australian National University, Canberra, Australia


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Rates of mutation and substitution in mammals are generally greater in the germ lines of males. This is usually explained as resulting from the larger number of germ cell divisions during spermatogenesis compared with oogenesis, with the assumption made that mutations occur primarily during DNA replication. However, the rate of cell division is not the only difference between male and female germ lines, and mechanisms are known that can give rise to mutations independently of DNA replication. We investigate the possibility that there are other causes of male-biased mutation. First, we show that patterns of variation at ~5,200 short tandem repeat (STR) loci indicate a higher mutation rate in males. We estimate a ratio of male-to-female mutation rates of ~1.9. This is significantly greater than 1 and supports a greater rate of mutation in males, affecting the evolution of these loci. Second, we show that there are chromosome-specific patterns of nucleotide and dinucleotide composition in mammals that have been shaped by mutation at CpG dinucleotides. Comparable patterns occur in birds. In mammals, male germ lines are more methylated than female germ lines, and these patterns indicate that differential methylation has played a role in male-biased vertebrate evolution. However, estimates of male mutation bias obtained from both classes of mutation are substantially lower than estimates of cell division bias from anatomical data. This discrepancy, along with published data indicating slipped-strand mispairing arising at STR loci in nonreplicating DNA, suggests that a substantial percentage of mutation may occur in nonreplicating DNA.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Mutation in the germ line is a significant contributor to human disease and is also the ultimate source of genetic novelty, and thus of evolutionary potential. The factors that determine the rate of mutation are therefore important determinants of both patterns of disease and evolutionary processes. One factor influencing the rate of germ line mutation appears to be sex. It is now clear that mutation rates are greater in male than in female germ lines. This was first suggested by Haldane (1935, 1946, 1948)Citation and has been confirmed by recent comparisons of the sequences of genes on X and Y chromosomes and autosomes (Miyata et al. 1987Citation ; Shimmin, Chang, and Li 1993Citation ; Chang et al. 1994Citation ; Agulnik et al. 1997Citation ; Ellegren and Fridolfsson 1997Citation ; Huang et al. 1997Citation ; McVean and Hurst 1997Citation ). In humans, this may have considerable medical and public health implications (Crow 1995, 1997Citation ; Thomas 1996Citation ), and it is important to understand the underlying causes. The mutation rate difference is usually interpreted as resulting from a sex-specific difference in DNA replication frequency originating from different numbers of germ cell divisions (Vogel and Rathenberg 1975Citation ), although this interpretation has been questioned and involvement of other factors has been suggested (Easteal, Collet, and Betty 1995Citation ; Sommer and Ketterling 1996Citation ; Easteal and Herbert 1997Citation ; Hurst and Ellegren 1998Citation ).

In the comparative sequence studies referred to above, the ratio of substitution rates in Y- and X-linked sequences (Y and X, respectively) was used to estimate {alpha}{upsilon} (the ratio of male to female mutations) with the equation Y/X = 3{alpha}{upsilon}/(2 + {alpha}{upsilon}). Similarly, the ratio of X and A (autosome-linked sequences) was used to predict {alpha}{upsilon} with the equation X/A = (2/3)(2 + {alpha}{upsilon})/(1 + {alpha}{upsilon}) (Miyata et al. 1987Citation ; Chang et al. 1994Citation ). This approach has been applied to an assortment of genes showing that substitution rates occur according to the general inequality X < A < Y (Miyata et al. 1987Citation ), resulting in {alpha}{upsilon} estimates ranging from 2 to {infty} in mammals (Miyata et al. 1987Citation ; Shimmin, Chang, and Li 1993Citation ; Chang et al. 1994Citation ; Agulnik et al. 1997Citation ; Ellegren and Fridolfsson 1997Citation ; Huang et al. 1997Citation ; McVean and Hurst 1997Citation ).

These studies show that a difference in DNA replication frequency is a plausible cause of sex-specific differences in mutation rate, but they do not exclude other possible causes. The simple interpretation that DNA mutates more rapidly in male germ lines because it replicates more often there ignores the complexities of germ cell biology. Other differences between male and female germ cells that might affect mutation rates include the differential expression of genes encoding DNA repair enzymes (Allen et al. 1995Citation ; Blackshear et al. 1998Citation ) and differential DNA methylation (Driscoll and Migeon 1990Citation ; Bestor 1998Citation ). Furthermore, the dependence of mutation on replication is incomplete. Perhaps the best demonstration of this comes from studies of mutation in reporter genes in transgenic mice. These studies show that the rates of mutation are not significantly different between rapidly and slowly dividing cells (Lee et al. 1994Citation ; Nishino et al. 1995Citation ). Additionally, DNA repair mechanisms that underlie replication-independent single-base mutagenesis are known (Friedberg 1985Citation ; Francino et al. 1996Citation ). If these processes comprise a substantial proportion of total mutations, then, assuming they are sexually unbiased, {alpha}{upsilon} will be less than the ratio of male to female germ cell divisions ({alpha}).

In addition to differences in frequency of DNA replication, other distinctions between the germ lines of males and females may also contribute to male-biased evolution. The germ lines of male mammals are more methylated than those of females (Driscoll and Migeon 1990Citation ). The transition mutation C->T occurs with much higher frequency at methylated than at unmethylated C nucleotides (Coulondre et al. 1978Citation ; Duncan and Miller 1980Citation ), and these changes have been shown to occur at high frequencies in human pedigrees (Cooper and Youssoufian 1988Citation ; Sommer and Ketterling 1996Citation ). Thus, the difference in levels of methylation may account for at least some of the male bias in mutation and evolutionary rates.

If methylation of CpG dinucleotides has a major effect on mutation, then genes will exhibit patterns of nucleotide and dinucleotide composition that are determined in part by the amount of time spent in male germ lines. An enhanced mutation rate due to higher methylation levels in the male germ line suggests the following explicit predictions: (1) the CpG dinucleotide and the G and C nucleotide contents will occur in a chromosome-specific pattern of X > A > Y, and (2) there will be an elevated rate of C->T transitions for Y-linked or autosomal genes relative to their X-linked homologs.

A male bias in mutation rate is also indicated for short tandem repeat (STR) loci (Weber and Wong 1993Citation ; Brinkmann et al. 1998Citation ), although this indication is based on a small sample of loci. A reduced level of variation for X-linked STR loci (Weissenbach et al. 1992Citation ) is consistent with this. Slipped-strand mispairing, which results in a stepwise change in the number of tandem repeats, has been hypothesized to cause the high frequency of mutations at these loci (Levinson and Gutman 1987Citation ; Weber and Wong 1993Citation ). It is generally thought that slipped-strand mispairing occurs during DNA replication, although Levinson and Gutman (1987) showed that it could, theoretically, occur in either replicating or nonreplicating DNA, and Hentschel (1982) showed experimentally that it could occur in nonreplicating DNA. Mutation arising through slipped-strand mispairing is consistent with the stepwise mutation model (SMM; Ohta and Kimura 1973Citation ; Shriver et al. 1993Citation ; Brinkmann et al. 1998Citation ; Di Rienzo et al. 1998Citation ). The substantial body of population genetic theory based on this model can thus be applied to STR loci and used to investigate the role of male biased mutation in shaping population levels, and genomic patterns, of variation.

Here we investigate factors affecting the rates of mutation at both STR loci and individual nucleotides. We address two specific questions: (1) Does the chromosomal distribution of genetic variation at STR loci support the male bias in mutation predicted by the mutation-through-DNA-replication hypothesis? (2) Do the chromosomal distributions of nucleotides and dinucleotides indicate an effect of differential methylation on the male bias in mutation rates at individual nucleotides?


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
STR Analysis
Data
Genotype data for 5,054 autosomal and 214 X-linked STR loci resolved by the GÉNÉTHON gene mapping project using the European Utah and Amish CEPH families 1331, 1332, 1347, 1362, 1413, 1416, and 884 was obtained from http://www.genethon.fr/genethon_en.html (Weissenbach et al. 1992Citation ; Gyapay et al. 1994Citation ; Dib et al. 1996Citation ). Only genotypes from the grandparents, half of which were male, were used (54 haploid genomes). An analysis of STR loci supports the treatment of these families as representing a single population (Huttley et al. 1999Citation ).

Theory
The relationship between average homozygosity and mutation rate under the SMM is


where h is the average homozygosity, N is the effective population size, and {upsilon} is the mutation rate per chromosome (Ohta and Kimura 1973Citation ). Note that the effective population sizes for X-linked (Nx) and autosomal (Na) genes have the simple relationship

In males, the mutation rates for X- and autosome-linked genes are equal, and in females the same applies. Denoting the autosomal sex-averaged mutation rates per locus as {upsilon}a, the mutation rate per X chromosome locus as {upsilon}x, the male mutation rate as {upsilon}m, and the female mutation rate as {upsilon}f,


Solving these two equations in two unknowns gives


From these equations, it can be shown that the ratio of male and female mutation rates is independent of N and can be obtained thus:


Two types of sampling variation affect the estimate of {alpha}{upsilon}: (1) variation in homozygosities arising from the individuals sampled and (2) variation in homozygosities arising from the loci sampled. In our analysis, we sampled a large number of loci (~5,200) from a relatively small number of individuals (~27). The second source of sampling variance is thus likely to be more important. We therefore estimated the 95% confidence interval by resampling over individuals. Specifically, the 95% confidence interval of {alpha}{upsilon} was estimated from 1,000 randomizations of the data, where each randomization involved drawing with replacement samples of n haploid genomes and re-estimating homozygosities for each locus. n was constrained to equal the number of X chromosomes sampled, since part of the sampled data were for males. {alpha}{upsilon} was re-estimated from these randomized data sets.

Sequence Analysis
Data
To assess the hypothesis that the proportion of CpG dinucleotides decreases in frequency at homologous loci according to the inequality X > A > Y, we used genes with homologs either on an autosome and the X chromosome or on the X and Y chromosomes. For the X- and Y-linked comparison (X : Y), we used the zinc finger Zfx/Zfy and selected mouse cDNA Smcx/Smcy genes. For the equivalent avian W- versus Z-linked comparison (W : Z), we used the genes encoding chromo-helicase DNA-binding proteins Chdw/Chdz. For X chromosome versus autosome comparisons (X : A), we used the genes phosphoglycerate kinase (EC 2.7.2.3) Pgk-1 (X-linked)/Pgk-2 (autosomal) and pyruvate dehydrogenase (EC 1.2.4.1) E1{alpha} subunit Pdha-1 (X-linked)/Pdha-2 (autosomal). Accession numbers for the sequences used are as follows: Zfx/Zfy for Homo sapiens (humans; AF022232, X58926) and Mus musculus (mice; X58927, X14382); Smcx/Smcy for humans (Z29650, U52365), and mice (Z29651, Z29652); Pgk-1/Pgk-2 for humans (V00572, X05246) and mice (X55309, X55310); Pdha-1/Pdha-2 for humans (L13318, M86808) and mice (M76727, M76728); and Chdw/Chdz for Ficedula albicollus (Y12929, Y12933), Hirundo rustica (Y12930, Y12934), Luscinia svecica (Y12937, Y12938), Phylloscopus sibilatrix (Y12931, Y12935), and Phylloscopus trochilus (Y12932, Y12936). Sequences were aligned using CLUSTAL W (Thompson, Higgins, and Gibson 1994Citation ), and only homologous regions were used in the analyses.

Statistics
If methylation has little bearing on sex-specific differences in mutation rate, the incidence of CpG dinucleotides should be influenced primarily by nucleotide composition and not chromosomal origin. Since the data can be treated as a multidimensional contingency table, we tested this null hypothesis using log-linear modeling as implemented in the program GLIM (generalized linear interactive modeling). To evaluate the importance of sex-biased methylation and mutation of CpG dinucleotides, we constructed three models, distinguished by their interaction terms. Each model was fitted to the data using maximum likelihood, assuming a Poisson error distribution. To assess whether the data support a more complex model, the fit of the simpler and more complex models can be evaluated. Model fit is measured as the deviance (likelihood ratio test statistic) D = -2 ln(Lc/Lf), where Lc is the likelihood estimated under the model of interest and Lf is the likelihood estimated under the full model (for further details see Healy 1988Citation ; McCullagh and Nelder 1989Citation ; Francis, Green, and Payne 1993Citation ). To illustrate our application of log-linear modeling, consider a simplified example with the following factors: chromosome, dinucleotide base 1, and dinucleotide base 2. The number of levels for each factor are as follows: chromosome, 2; base 1, 4; and base 2, 4. Under the most saturated log-linear model employed, the log of the expected frequency fijk for dinucleotide jk on chromosome i can be expressed as


where µ represents the intercept (i.e., common to all dinucleotide frequencies); {alpha}i, the contribution to the frequency of being on chromosome class i; ßj, the contribution to the frequency of being base 1 nucleotide j; and {gamma}k the contribution to the frequency of being base 2 nucleotide k. The interaction terms ({alpha}ß)ij, ({alpha}{gamma})ik, and (ß{gamma})jk express the degree of nonindependence between any of the factors (see below).The effect of methylation on dinucleotide y is represented by the term {kappa}y.

Table 1 presents the degrees of freedom and a complete list of terms for each model used. First, we fit model I, the baseline, incorporating interactions between dinucleotide position and chromosome class but no interactions between the two bases of the dinucleotides. Next, model II was fitted. Model II specifies a unique parameter for each dinucleotide combination and assumes no interaction between the dinucleotide combination and the chromosome class. If the two bases composing the dinucleotides occur nonrandomly, the (ß{gamma})jk term will significantly improve the fit to the data. Significance of the (ß{gamma})jk contribution is assessed by comparing the difference between the D values of models I and II and comparing this value with a {chi}2 distribution with nine degrees of freedom (the difference in degrees of freedom between model II and model I). Finally, we assessed whether model III, which incorporates an additional methylation parameter and has 23 degrees of freedom, gave a significant improvement in the fit to the data. The fact that methylation occurs only at CpG dinucleotides, along with the argument that methylation occurs in a chromosome-biased fashion, explicitly indicates that CpG dinucleotides occur on either the Y chromosome or autosomes. The result is an additional classificatory factor for the table with two levels. The significance of this term is assessed by comparing the difference between the D values of model II and model III with a {chi}2 distribution with one degree of freedom.


View this table:
[in this window]
[in a new window]
 
Table 1 Nesting of Terms in the Log-Linear Models

 

    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
STR Loci
Figure 1A shows the estimates of N{upsilon} obtained using mean homozygosity estimates of autosomal (ha {approx} 0.299) and X-linked (hx {approx} 0.357) STR loci. The normalized distributions from which these estimates were obtained are presented in figure 1B. Using the mean homozygosity estimates from X- and autosome-linked STR loci, the ratio of male/female mutation rates was estimated as {upsilon} {approx} 1.866. A significant elevation in male over female mutation rates is apparent, as the 95% confidence interval for {upsilon} does not contain the value 1 (1.34–2.44; fig. 1 C). Levels of homozygosity at closely linked loci may be correlated due to linkage disequilibrium (LD). We therefore re-estimated {upsilon} using only loci that were unlikely to be in LD (Huttley et al. 1999Citation ). This reduced data set of 3,019 autosomal and 133 X-linked loci also supports a significantly elevated male mutation rate ({upsilon} {approx} 1.931; 95% confidence interval 1.37–2.77). Naive resampling of individuals in order to estimate the confidence interval of {upsilon} results in biased estimates of homozygosity. This bias in homozygosity arises because the observed sample defines the maximum number of observable alleles and, thus, a lower limit to homozygosity. However, the estimates of {upsilon} are unaffected by this bias. This was determined by repeating the operation, naively resampling the resampled populations (results not shown). Consequently, the distribution of homozygosity at STR loci affirms a male bias in the rate of mutation at these loci, consistent with the results from family studies (Weber and Wong 1993Citation ; Brinkmann et al. 1998Citation ). The extent of the male-biased mutation, however, is substantially less than would be expected if all mutations were replication-dependent.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 1.—Estimation of {alpha}{upsilon} from homozygosity distributions. A, Relationship between homozygosity (h) and N under the stepwise mutation model. B, Distributions of homozygosity for autosomal and X-linked genes. C, Distribution of the ratio of male to female mutation rates ({alpha}{upsilon}). The large filled vertical arrow indicates the observed estimate, while the smaller unfilled arrows indicate previous estimates of {alpha}{upsilon} (see table 3 ). A and X refer to autosomal and X-linked genes, respectively

 
Nucleotide Sequences
Although there are different levels of methylation between the sexes, the frequency of CpG dinucleotides may differ between the X chromosome and the autosomes due to different GC contents. In order to disentangle the effect of methylation at CpG dinucleotides from base composition differences and thus appraise the likely involvement of methylation in the male mutation bias, we performed a log-linear analysis as described in the Materials and Methods section. In cases where two or more species were used, the model I baseline incorporated interaction terms between species and nucleotides, while the additional terms added to models II and III were as described in the Materials and Methods section. The results from analysis of mammalian genes are summarized in table 2 and figure 2 . For three genes (Pgk, Zf, and Smc), a significant chromosome effect on base composition is apparent, with the autosomal and Y-linked genes exhibiting lower GC contents than their X-linked counterparts, as predicted by the mutation-of-methylated-dinucleotides hypothesis (fig. 3A ). Apparent in all cases is a very strong interaction between neighboring nucleotides (base 1 by base 2 interaction) for all genes. This effect also occurs in intronic sequences from the human Zf (result not shown) and bird Chd genes, indicating that the nonrandom occurrence of dinucleotides does not stem from codon composition. Incorporating methylation into the model resulted in a significantly improved fit to the data for both X : Y comparisons, with the Zf result being significant after a correction for multiple tests.


View this table:
[in this window]
[in a new window]
 
Table 2 Comparison of Log-Linear Models Testing for Chromosomal Patterns in Nucleotide and Dinucleotide Composition

 


View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2.—Distributions of methylation-influenced dinucleotides: (A) between X-linked genes and their autosomal homologs and (B) between X-linked genes and their Y-linked homologs. The Y-axis represents the observed number of CpG dinucleotides.C, Distribution of residuals for TpG dinucleotides between Z-linked genes and their W-linked homologs in birds. The Y-axis represents the Pearson residual (Francis, Green, and Payne 1993Citation ) of the difference between the observed and expected numbers of TpG dinucleotides calculated for each species separately

 


View larger version (32K):
[in this window]
[in a new window]
 
Fig. 3.—Difference in GC content ({Delta}%GC) between chromosome homologs. Presented are the differences in the GC% ({Delta}GC%) of the autosome or the male chromosome (Y in mammals Z in birds) genes from their female chromosome (X in mammals W in birds) homologs. A, Mammalian genes. B, Avian gene. Species abbreviations are as follows: Hsa, Homo sapiens; Mmu, Mus musculus; Hru, Hirundo rustica; Fal, Ficedula albicollus; Lsv, Luscinia svecica; Ptr, Phylloscopus trochilus; Psi, Phylloscopus sibilatrix.

 
Strictly speaking, the sampled dinucleotides are not statistically independent, and the assumed Poisson error distribution may therefore be incorrect. We assessed whether violation of this assumption contributed to the significance of the methylation parameter by reanalyzing the Zf data using only strictly independent (nonoverlapping) dinucleotides. The reanalysis of this smaller data set affirmed the importance of methylation in the evolution of the Zf genes (P = 0.017), indicating that assuming the Poisson error distribution is reasonable.

The mutation-through-DNA-replication hypothesis predicts that in birds, there will be higher male mutation and substitution rates, even though in birds females are the heterogametic sex (Miyata et al. 1987Citation ). Evidence has recently been presented that supports this prediction (Ellegren and Fridolfsson 1997Citation ). However, a male germ line methylation bias in birds would also generate these observations. We therefore analyzed the intron of Chd (Ellegren and Fridolfsson 1997Citation ). Although the sequences are short (~230 bp) and have extremely low GC contents (~25%), we detected significant nonrandomness in the occurrence of dinucleotides (table 2 ). The detected departure arises in part from an almost complete deficit of CpG (3 of 10 genes have a total of 6 CpG dinucleotides). Additionally, four of five species exhibit an excess of TpG dinucleotides (the dinucleotide that results from a C->T transition at a CpG dinucleotide) over the expected amount (fig. 2C ), and in no case was the observed frequency of TpG greater for the female W chromosome. We detect no significant chromosome interaction effects for either dimer abundance or base composition. However, the reduced GC content for the Z-linked genes relative to their W counterparts exhibited by 4 of the 5 avian species is consistent with the methylation model (table 2 and fig. 3B ).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Our analyses suggest that sex-specific mutation rates are affecting levels of variation at STR loci in humans. We estimate {alpha}{upsilon} {approx} 1.87 from the significantly different homozygosity distributions of autosomal and X-linked STR loci. This estimate is significantly greater than 1, but like estimates from other studies, it is substantially lower than what is predicted by the mutation-through-DNA-replication hypothesis. Anatomical studies of sectioned testes in humans (Vogel and Rathenberg 1975Citation ) have shown that ~30 cell divisions have occurred by puberty and, continuing at the rate of ~23 per year, totals of ~380 at age 28 and ~540 at age 35. In contrast, a total of ~24 cell divisions occur during oogenesis (Vogel and Rathenberg 1975Citation ). Thus, {upsilon} is a substantial underestimate of {alpha}, since {alpha} increases from ~8 by age 20 to ~23 by age 35.

The accuracy of {upsilon} is dependent on the assumptions of the SMM. The SMM may be violated in two general ways. First, mutations may not all involve a symmetric one-step process occurring in the same fashion at all loci. A modification to the SMM has been described that incorporates a parameter permitting asymmetry in the direction of allele size change (whether repeat numbers increase or decrease; Kimmel and Chakraborty 1996Citation ; Kimmel et al. 1996Citation ). While the formulation of Kimmel et al. (1996)Citation is in terms of the variance of repeat length and not homozygosity, it can be shown that the ratio {alpha}{upsilon} is independent of this mutation parameter in the same way that it is independent of N (result not shown).

Second, the SMM may be violated if mutation rates are locus- specific. In particular, the positive correlation between repeat number and heterozygosity (Weber 1990Citation ; Brinkmann et al. 1998Citation ) suggests that the mutation rate increases with an increasing number of repeats. However, the distributions of repeat numbers from autosomal and X-linked genes are inconsistent with this hypothesis. The mean numbers of repeats in the alleles of X-linked loci are marginally greater than those for autosomal loci (data not shown). Moreover, the linear relationship between the mutation mean square and repeat number variance measures of Di Rienzo et al. (1998)Citation suggests approximate uniformity of mutation rates among loci.

Another factor that may have influenced {upsilon} is a difference in N for males and females. A lower male N would lead to autosomal variation that was lower than expected when compared with X-linked variation. A human tendency for a mildly polygamous mating system, suggested by data from hunter-gatherer bands (Diamond 1992Citation , p. 71), would result in a smaller male N. However, this is unlikely to have occurred to an extent that would explain the discrepancy between the estimated values for {alpha}{upsilon} and {alpha}.

We propose that the discrepancy between {upsilon} and the value of {alpha}{upsilon} predicted by the mutation-through-DNA- replication hypothesis ({alpha}) reflects substantial replication independence of mutagenesis by slipped-strand mispairing.

Replication-independent mutagenesis of single nucleotides is also known to occur (see MacPhee 1995Citation ). This may account for similar discrepancies between observed and expected values of {alpha}{upsilon} for individual nucleotides obtained from molecular evolutionary studies. A summary of previous estimates of {alpha}{upsilon} derived from comparison of nucleotide sequences is presented in table 3 . As pointed out by Huang et al. (1997)Citation , the 95% confidence intervals for these estimates must be interpreted with caution, since they incorrectly assume a normal distribution for the Y/X and X/A ratios.


View this table:
[in this window]
[in a new window]
 
Table 3 Estimates of the Ratio of Male to Female Mutation ({{alpha}}\ib\{{upsilon}}\b\) Based on Individual Nucleotide Substitutions

 
The estimates of {alpha}{upsilon} for individual nucleotides are nearly all substantially greater than the estimate we obtained for STR loci. This might suggest a greater role for DNA replication in individual nucleotide mutagenesis. However, chromosome differences in mutation rate could be caused by factors other than differences in DNA replication rates. We have shown that one of these factors—differential DNA methylation—has contributed to the male bias in mutation rate.

In our analysis, a nonrandom occurrence of dinucleotides is apparent for all genes, including the introns of the Zf and Chd genes. The contribution of mutation at methylated CpG dinucleotides predominantly occurring in males is strongly supported for the Zf and Smc genes. Although a significant chromosome effect on nucleotide composition is observed for Pgk, methylation effects are not apparent for any A : X comparisons. Several factors mitigate the power to detect an effect for these genes. First, the expected difference between the autosomes and the X chromosome is less than that between the X and Y chromosomes. Second, gene conversion between homologs on different chromosomes, which will act to homogenize the genes, has been reported for several of the genes used in this study (Shimmin et al. 1993Citation ; Fitzgerald et al. 1996Citation ). Third, most of the sequences compared are coding sequences, which experience selective constraints on changes in sequence composition.

An interesting dimension to the investigation of male-biased mutation has been the prediction that genes on the W chromosome that occur only in the heterogametic females will exhibit reduced rates of mutation and substitution (Miyata et al. 1987Citation ). Evidence supporting this prediction has recently been presented (Ellegren and Fridolfsson 1997Citation ), but these results might also be explained by a methylation bias between male and female birds. The nonrandom occurrence of dinucleotides indicated by our analysis supports a role for methylation in bird molecular evolution. Additionally, four of the five species exhibit the predicted reduced GC content for the male Z chromosome and an excess (over expectation) of mutation-derived TpG dinucleotides. However, the small sequence length, combined with the very low GC content, reduces the power of this analysis. Nevertheless, the prediction that male avian germ lines are more heavily methylated than female ones warrants empirical evaluation.

DNA methylation is only one of a number of factors other than DNA replication rate that could cause chromosomal differences in genetic variation. Others include differences in effective population size and the associated differences in selection against slightly deleterious mutations (Charlesworth, Morgan, and Charlesworth 1993Citation ), adaptive differences in mutation rates (McVean and Hurst 1997Citation ), and differences in the expression of genes involved in DNA repair. The first two seem unlikely. An effect of chromosome-specific N is inconsistent with the data from birds, where the substitution rate is highest on the Z chromosome, whereas N is lower for the W chromosome. It also leads to the prediction that the rate of substitution for genes on the X chromosome will be intermediate between those on the Y chromosome and those on the autosomes. A similar inequality of substitution rates should hold if mutation rates have been optimized by selection, since the lowest mutation rates would be expected for Y-linked genes, as they are always hemizygous (McVean and Hurst 1997Citation ). Furthermore, the latter hypothesis would predict a substitution rate inequality of W > Z in birds. These predictions are clearly not supported by the data.

Differences in the expression of genes involved in DNA repair between male and female germ lines are known (Allen et al. 1995Citation ; Blackshear et al. 1998Citation ), so there is a distinct possibility that this contributes to differences in mutation rate. However, this possibility remains to be investigated.

Our analyses support a germ line bias in DNA methylation patterns as a factor contributing to a male bias in human mutation rates and in the molecular evolutionary rates of mammals and, plausibly, birds (but not insects, for which DNA methylation does not occur; Tweedie et al. 1997Citation ). When this factor is taken into consideration, the effect of DNA replication frequency appears weaker than previously indicated by molecular evolutionary analysis. The paternal age effect observed for achondroplasia, which occurs primarily due to mutations at a single CpG residue (Rousseau et al. 1994Citation ; Shiang et al. 1994Citation ), emphasizes the importance of mutation at methylated CpG dinucleotides. Further compelling evidence for the role of methylation in male-biased mutation rates comes from detailed analysis of Factor IX mutations within families. The overall ratio of male to female mutations is ~3.5, while the ratio at CpG dinucleotides is ~6 (Sommer and Ketterling 1996Citation ). This work also suggests that DNA replication is not a major determinant of mutation rate, since there is a maternal, but no paternal, age effect for single-base substitutions in the Factor IX gene (Sommer and Ketterling 1996Citation ).

The age effect on mutation is pertinent to issues of public health and clinical genetics. The contribution of parental sex to age effects appears to depend on the class of mutation. It probably also depends on a complex array of factors other than DNA replication, including a general decreased efficiency of DNA repair with age (Lee et al. 1994Citation ). Thus, if DNA replication is of relatively minor importance in determining mutation rates at both single nucleotides and STR loci, then, after adjusting for both mutations at CpG dinucleotides and the impact from a general decline in repair with parental age, a lack of paternal age effect is expected for diseases resulting from both kinds of mutations. Consequently, the concerns that Crow (1995, 1997)Citation and others have raised about the adverse public health impact of men reproducing later in life may have been overstated, at least for these classes of mutations.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
We thank the anonymous reviewers, R. Attenborough, J. Cummins, P. Donnelly, P. Hall, J. Long, M. Martin, M. Nei, and C. Schlötterer for their constructive comments.


    Footnotes
 
Jeffrey C. Long, Reviewing Editor

1 Keywords: male-biased mutation short tandem repeat loci CpG methylation DNA replication mammals mutation Back

2 Address for correspondence and reprints: Gavin A. Huttley, Human Genetics Group, John Curtin School of Medical Research, Australian National University, Canberra, ACT 0200, Australia. E-mail: gavin.huttley{at}anu.edu.au Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 

    Agulnik, A. I., C. E. Bishop, J. L. Lerner, S. I. Agulnik, and V. V. Solovyev. 1997. Analysis of mutation rates in the Smcy/Smcx genes shows that mammalian evolution is male driven. Mamm. Genome 8:134–138.

    Allen J. W., U. H. Ehling, M. M. Moore, and S. E. Lewis. 1995. Germ line specific factors in chemical mutagenesis. Mutat. Res. 330:219–231.[ISI][Medline]

    Bestor, T. H. 1998. Cytosine methylation and the unequal developmental potentials of the oocyte and sperm genomes. Am. J. Hum. Genet. 62:1269–1273.[ISI][Medline]

    Blackshear, P. E., S. M. Goldsworthy, J. F. Foley, K. A. McAllister, L. M. Bennett, N. K. Collins, D. O. Bunch, P. Brown, R. W. Wiseman, and B. J. Davis. 1998. Brca1 and Brca2 expression patterns in mitotic and meiotic cells of mice. Oncogene 16:61–68.

    Brinkmann B., M. Klintschar, F. Neuhuber, J. Huhne, and B. Rolf. 1998. Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am. J. Hum. Genet. 62:1408–1415.[ISI][Medline]

    Chang, B. H., L. C. Shimmin, S. K. Shyue, D. Hewett-Emmett, and W. H. Li. 1994. Weak male-driven molecular evolution in rodents. Proc. Natl. Acad. Sci. USA 91:827–831.

    Charlesworth B., M. T. Morgan, and D. Charlesworth. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289–1303.

    Cooper, D. N., and H. Youssoufian. 1988. The CpG dinucleotide and human genetic disease. Hum. Genet. 78:151–155.[ISI][Medline]

    Coulondre, C., J. H. Miller, P. J. Farabaugh, and W. Gilbert. 1978. Molecular basis of base substitution hotspots in Escherichia coli. Nature 274:775–780.

    Crow, J. F. 1995. Spontaneous mutation as a risk factor. Exp. Clin. Immunogenet. 12:121–128.[ISI][Medline]

    ———. 1997. The high spontaneous mutation rate: is it a health risk? Proc. Natl. Acad. Sci. USA 94:8380–8386.

    Di Rienzo, A., P. Donnelly, C. Toomajian, B. Sisk, A. Hill, M. L. Petzl-Erler, G. K. Haines, and D. H. Barch. 1998. Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories. Genetics 148:1269–1284.

    Diamond, J. 1992. The third chimpanzee: the evolution and future of the human animal. HarperCollins, New York.

    Dib, C., S. Faure, C. Fizames et al. (14 co-authors). 1996. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380:152–154.

    Driscoll, D. J., and B. R. Migeon. 1990. Sex difference in methylation of single-copy genes in human meiotic germ cells: implications for X chromosome inactivation, parental imprinting, and origin of CpG mutations. Somat. Cell Mol. Genet. 16:267–282.[ISI][Medline]

    Duncan, B. K., and J. H. Miller. 1980. Mutagenic deamination of cytosine residues in DNA. Nature 287:560–561.

    Easteal, S., C. Collet, and D. Betty. 1995. The mammalian molecular clock. R. G. Landes Company, Austin, Texas.

    Easteal, S., and G. Herbert. 1997. Molecular evidence from the nuclear genome for the time frame of human evolution. J. Mol. Evol. 44(Suppl. 1):S121–S132.

    Ellegren, H., and A. K. Fridolfsson. 1997. Male-driven evolution of DNA sequences in birds. Nat. Genet. 17:182–184.[ISI][Medline]

    Fitzgerald, J., H. H. Dahl, I. B. Jakobsen, and S. Easteal. 1996. Evolution of mammalian X-linked and autosomal Pgk and Pdh E1{alpha} subunit genes. Mol. Biol. Evol. 13:1023–1031.[Abstract/Free Full Text]

    Francino, M. P., L. Chao, M. A. Riley, and H. Ochman. 1996. Asymmetries generated by transcription-coupled repair in enterobacterial genes. Science 272:107–109.

    Francis, B., M. Green, and C. Payne. 1993. The GLIM system: release 4 manual. Oxford University Press, Oxford, England.

    Friedberg, E. C. 1985. DNA repair. Freeman, New York.

    Gyapay G., J. Morissette, A. Vignal, C. Dib, C. Fizames, P. Millasseau, S. Marc, G. Bernardi, M. Lathrop, and J. Weissenbach. 1994. The 1993–94 Généthon human genetic linkage map. Nat. Genet. 7:246–339.[ISI][Medline]

    Haldane, J. B. S. 1935. The rate of spontaneous mutation of a human gene. J. Genet. 31:317–326.

    ———. 1946. The mutation rate of the gene for hemophilia, and its segregation ratios in males and females. Ann. Hum. Genet. 3131:262–272.

    ———. 1948. The formal genetics of man. Proc. R. Soc. Lond. B Biol. Sci. 135:147–170.[ISI]

    Healy, M. J. R. 1988. GLIM: an introduction. Oxford University Press, Oxford, England.

    Hentschel, C. C. 1982. Homocopolymer sequences in the spacer of a sea urchin histone gene repeat are sensitive to S1 nuclease. Nature 295:714–716.

    Huang, W., B. H. J. Chang, X. Gu, D. Hewett-Emmett, and W. Li. 1997. Sex differences in mutation rate in higher primates estimated from Amg intron sequences. J. Mol. Evol. 44:463–465.[ISI][Medline]

    Hurst, L. D., and H. Ellegren. 1998. Sex biases in the mutation rate. Trends Genet. 14:446–452.[ISI][Medline]

    Huttley, G. A., M. W. Smith, M. Carrington, and S. J. O'Brien. 1999. A scan for linkage disequilibrium across the human genome. Genetics 152:1711–1722.

    Kimmel, M., and R. Chakraborty. 1996. Measures of variation at DNA repeat loci under a general stepwise mutation model. Theor. Popul. Biol. 50:345–367.[ISI][Medline]

    Kimmel, M., R. Chakraborty, D. N. Stivers, and R. Deka. 1996. Dynamics of repeat polymorphisms under a forward-backward mutation model: within- and between-population variability at microsatellite loci. Genetics 143:549–555.

    Lee, A. T., C. DeSimone, A. Cerami, and R. Bucala. 1994. Comparative analysis of DNA mutations in lacI transgenic mice with age. FASEB J. 8:545–550.[Abstract/Free Full Text]

    Levinson, G., and G. A. Gutman. 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4:203–221.[Abstract]

    McCullagh, P., and J. A. Nelder. 1989. Generalized linear models. Chapman and Hall, New York.

    MacPhee, D. G. 1995. Mismatch repair, somatic mutations, and the origins of cancer. Cancer Res. 55:5489–5492.[Abstract]

    McVean, G. T., and L. D. Hurst. 1997. Evidence for a selectively favourable reduction in the mutation rate of the X chromosome. Nature 386:388–392.

    Miyata, T., H. Hayashida, K. Kuma, K. Mitsuyasu, and T. Yasunaga. 1987. Male-driven molecular evolution: a model and nucleotide sequence analysis. Cold Spring Harb. Symp. Quant. Biol. 52:863–867.[ISI][Medline]

    Nishino, H., A. Knoll, V. L. Buettner, C. S. Frisk, Y. Maruta, J. Haavik, and S. S. Sommer. 1995. p53 wild-type and p53 nullizygous Big Blue transgenic mice have similar frequencies and patterns of observed mutation in liver, spleen and brain. Oncogene 11:263–270.

    Ohta, T., and M. Kimura. 1973. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. Camb. 22:201–204.[ISI][Medline]

    Rousseau, F., J. Bonaventure, L. Legeai-Mallet, A. Pelet, J. M. Rozet, P. Maroteaux, M. Le Merrer, and A. Munnich. 1994. Mutations in the gene encoding fibroblast growth factor receptor-3 in achondroplasia. Nature 371:252–254.

    Shiang, R., L. M. Thompson, Y. Z. Zhu, D. M. Church, T. J. Fielder, M. Bocian, S. T. Winokur, and J. J. Wasmuth. 1994. Mutations in the transmembrane domain of FGFR3 cause the most common genetic form of dwarfism, achondroplasia. Cell 78:335–342.

    Shimmin, L. C., B. H. Chang, D. Hewett-Emmett, and W. H. Li. 1993. Potential problems in estimating the male-to-female mutation rate ratio from DNA sequence data. J. Mol. Evol. 37:160–166.[ISI][Medline]

    Shimmin, L. C., B. H. Chang, and W. H. Li. 1993. Male-driven evolution of DNA sequences. Nature 362:745–747.

    Shriver, M. D., L. Jin, R. Chakraborty, and E. Boerwinkle. 1993. VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach. Genetics 134:983–993.

    Sommer, S. S., and R. P. Ketterling. 1996. The factor IX gene as a model for analysis of human germline mutations: an update. Hum. Mol. Genet. 5:1505–1514.[Abstract]

    Thomas, G. H. 1996. High male : female ratio of germ-line mutations: an alternative explanation for postulated gestational lethality in males in X-linked dominant disorders. Am. J. Hum. Genet. 58:1364–1368.[ISI][Medline]

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.[Abstract]

    Tweedie, S., J. Charlton, V. Clark, and A. Bird. 1997. Methylation of genomes and genes at the invertebrate-vertebrate boundary. Mol. Cell. Biol. 17:1469–1475.[Abstract]

    Vogel, F., and R. Rathenberg. 1975. Spontaneous mutation in man. Pp. 356–359 in H. Harris and K. Hirschhorn, eds. Advances in human genetics. Vol. 5. Plenum Press, New York.

    Weber, J. L. 1990. Informativeness of human (dC-dA)n.(dG-dT)n polymorphisms. Genomics 7:524–530.

    Weber, J. L., and C. Wong. 1993. Mutation of human short tandem repeats. Hum. Mol. Genet. 2:1123–1128.[Abstract]

    Weissenbach, J., G. Gyapay, C. Dib, A. Vignal, J. Morisette, P. Millasseau, G. Vaysseix, and M. Lathrop. 1992. A second-generation linkage map of the human genome. Nature 359:794–801.

Accepted for publication February 14, 2000.