Local Similarity in Evolutionary Rates Extends over Whole Chromosomes in Human-Rodent and Mouse-Rat Comparisons: Implications for Understanding the Mechanistic Basis of the Male Mutation Bias

Martin J. Lercher, Elizabeth J. B. Williams and Laurence D. Hurst

Department of Biology and Biochemistry, University of Bath, Bath, England


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
The sex chromosomes and autosomes spend different times in the germ line of the two sexes. If cell division is mutagenic and if the sexes differ in number of cell divisions, then we expect that sequences on the X and Y chromosomes and autosomes should mutate at different rates. Tests of this hypothesis for several mammalian species have led to conflicting results. At the same time, recent evidence suggests that the chromosomal location of genes on autosomes affects their rate of evolution at synonymous sites. This suggests a mutagenic source different from germ cell replication. To correctly interpret the previous estimates of male mutation bias, it is crucial to understand the degree and range of this local similarity. With a carefully chosen randomization protocol, local similarity in synonymous rates of evolution can be detected in human-rodent and mouse-rat comparisons. However, the synonymous-site similarity in the mouse-rat comparison remains weak. Simulations suggest that this difference between the mouse-human and the mouse-rat comparisons is not artifactual and that there is therefore a difference between humans and rodents in the local patterns of mutation or selection on synonymous sites (conversely, we show that the previously reported absence of a local similarity in nonsynonymous rates of evolution in the human-rodent comparison was a methodological artifact). We show that linkage effects have a long-range component: not one in a million random genomes shows such levels of autosomal heterogeneity. The heterogeneity is so great that more autosomes than expected by chance have rates of synonymous evolution comparable with that of the X chromosome. As autosomal heterogeneity cannot be owing to different times spent in the germ line, this demonstrates that the dominant determiner of synonymous rates of evolution is not, as has been conjectured, the time spent in the male germ line.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
If cell division is mutagenic and if the number of germ cell divisions is larger in males than in females (as is likely in most mammals), then we expect males to be the dominant source of point mutations. As Miyata et al. (1987)Citation noted, if synonymous mutations are neutral, we can estimate the extent of the male bias to the sex ratio of mutation rates ({alpha}) by comparing the rates of synonymous evolution on the X and Y chromosomes and autosomes, as they spend different times in the germ lines of the sexes. Just this method has been applied for flies (Bauer and Aquadro 1997Citation ), rodents (Chang et al. 1994Citation ; Chang and Li 1995Citation ; Li et al. 1996Citation ; McVean and Hurst 1997Citation ; Smith and Hurst 1999aCitation ), cats (Slattery and O'Brien 1998Citation ), birds (Ellegren and Fridolfsson 1997Citation ), and primates (Shimmin, Chang, and Li 1993, 1994Citation ; Chang, Hewett-Emmett, and Li 1996Citation ; Li et al. 1996Citation ; Huang et al. 1997Citation ; Nachman and Crowell 2000Citation ).

It is regularly claimed (Chang et al. 1994Citation ; Chang, Hewett-Emmett, and Li 1996Citation ; Crow 1997, 2000Citation ) that the figures so obtained for the extent of the male bias correspond to the differences in the number of germ cell divisions. These claims are, however, controversial (Hurst and Ellegren 1998Citation ), because (1) we are uncertain of what the expected ratio of germ cell divisions is in most lineages, not least because estimates are highly sensitive to assumptions about the age of male reproduction (Hurst and Ellegren 1998Citation ); (2) some estimates of {alpha} from primates fall out of the range of even the lowest estimates (Bohossian, Skaletsky, and Page 2000Citation ); and (3) in rodents, the figure appears to be dependent on the sequence comparison (while X-Y comparison [Chang et al. 1994Citation ; Chang and Li 1995Citation ; Li et al. 1996Citation ] gives {alpha} = 2, comparison of X with autosomes [McVean and Hurst 1997Citation ; Smith and Hurst 1999aCitation ] suggests {alpha} >> 2, and possibly even infinity). It seems important, then, to find other methods to determine whether we can be confident that figures for {alpha} derived using Miyata's method provide unbiased estimates of the male mutation bias and the ratio of germ cell divisions.

The critical assumption of Miyata et al. (1987)Citation is that any difference in evolutionary rate between the X chromosome, the Y chromosome, and autosomes is attributable to different times spent in the germ lines of both sexes. However, it is also reported that along autosomes, there are regional differences in the rates of synonymous evolution (Casane et al. 1997Citation ; Matassi, Sharp, and Gautier 1999Citation ; Williams and Hurst 2000Citation ). These within-autosome effects cannot result from differences in the times spent in the male germ line. If regional effects were associated with a considerable heterogeneity of autosomal rates, this would then cast serious doubt on the validity of the method. We therefore ask about the size of the domain of local similarity. Such information should also prove helpful in resolving the causes of the regionality of rates of evolution. Furthermore, if autosomal heterogeneity is great compared with the difference between the X chromosome and autosomes, we can be confident that there is a potent force other than germ cell replication affecting synonymous substitution rates. We estimate the extent of this effect below. To the same end, we examine how the X chromosome's rate of synonymous evolution compares with the rates of the slowest-evolving autosomes. If the X chromosome is not an outlier, we cannot be confident that the figures for {alpha} dominantly reflect the relative numbers of germ cell divisions in the two sexes.

However, prior to establishing these patterns, it is important to clarify the method, not least to understand the basis of discrepancies between previous analyses. In a recent paper on orthologous genes in the mouse and the rat, Williams and Hurst (2000)Citation reported that linked genes show significantly similar nonsynonymous rates of evolution (Ka). While this local similarity was unexpected for nonsynonymous rates, it had long been argued that mutation rate might vary along chromosomes (Sueoka 1962Citation ; Filipski 1987Citation ; Casane et al. 1997Citation ; Nachman and Crowell 2000Citation ). Assuming that synonymous sites are not under selective pressure in mammals (Wolfe, Sharp, and Li 1989Citation ; Eyre-Walker 1991Citation ; see, however, Eyre-Walker 1999Citation ), this can be tested by searching for local similarity in the synonymous rate of evolution (Ks). However, only marginally significant similarity in Ks was found by Williams and Hurst (2000)Citation in the mouse-rat comparison. In contrast to these results, Matassi, Sharp, and Gautier (1999)Citation found strong local similarity of synonymous rates of evolution for human-rodent orthologs but failed to detect significant similarity of nonsynonymous rates. Thus, at present, the nature and extent of local similarities in rates of evolution are unclear and appear to be heavily dependent on the species comparison employed.

However, these discrepancies may simply reflect methodological artifacts, rather than biologically important differences. Most notably, the statistical protocols used in the recent literature on linkage effects may not be optimally suited to the detection of local similarities. Matassi, Sharp, and Gautier (1999)Citation used a test function that summed over all pairs of genes situated within 1 cM of each other, allowing some genes to contribute multiple times. As the number of gene pairs in a linked cluster increases quadratically with cluster size, this protocol gives more weight to genes within larger clusters. A few large clusters of genes can thus dominate the test function, reducing the effective sample size and obscuring weak local similarities. Williams and Hurst (2000)Citation circumvented this problem by pairing each gene with at most only two near neighbors. However, this pairing may be arbitrary, and part of the available information is disregarded. In the present study, we employ a method that avoids both problems. Each gene that does have linked neighbors contributes to the test function once. Its evolutionary rate is compared with the mean rate of all neighbors, thereby using all available information.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
The Human-Rodent Data Set
Orthologous human and murid gene pairs were taken from the data set compiled by Duret and Mouchiroud (2000)Citation , which is accessible at http://pbil.univ-lyon1.fr/datasets/Duret_Mouchiroud_1999/data.html. Coding sequences were aligned using CLUSTAL W (Thompson, Higgins, and Gibson 1994Citation ). The numbers of substitutions per site at synonymous sites (Ks) and nonsynonymous sites (Ka) were computed with Li's (1993)Citation method, with correction for multiple hits according to Kimura's (1980)Citation two-parameter model. We also estimated evolutionary distances with the maximum-likelihood method introduced by Goldman and Yang (1994)Citation , implemented in the PAML package (Yang 1997Citation ). We refer to the results obtained with Li's (1993)Citation protocol as Ka and Ks, while the distances calculated with the maximum-likelihood method are denoted by KaML and KsML. We found mean evolutionary distances (±SD) of Ka = 0.073 ± 0.071 and Ks = 0.50 ± 0.14. As evolutionary distances Ka and Ks are proportional to the corresponding evolutionary rates, we use the terms "evolutionary rate" and "evolutionary distance" interchangeably.

The gene positions on the mouse genetic map were retrieved from LocusLink (http://www.ncbi.nlm.nih.gov/LocusLink/). Duplicate genes on mouse chromosomes were identified by BLAST analysis with the default parameters for pairwise BLAST at NCBI (http://www.ncbi.nlm.nih.gov/gorf/bl2.html, score >= 39). This was done under the assumption that genes duplicated after the divergence of rodents and primates would show significant sequence similarity, detected by BLAST. We eliminated all but one copy of multicopy genes on the same chromosome. This resulted in a final data set of 1,311 autosomal and 67 X-linked human-rodent orthologs with known positions on the genetic mouse map.

We also obtained physical positions on the October 7, 2000, build of the human genome (http://genome.ucsc.edu). (On average, 1.3 cM on human chromosomes corresponds to 1 Mb; Yu et al. 2001.Citation ) This resulted in a data set of 1,849 autosomal and 80 X-linked human-rodent orthologs with known positions on the physical human map.

Expression profiles of the genes in the data set were obtained by matching expressed sequence tag (EST) data to the coding sequences (Duret and Mouchiroud 2000Citation ). For the analysis excluding immune-specific genes, we used only genes with known expression in at least one nonimmune tissue. This reduced the sample sizes to 929 orthologs on mouse autosomes and 1,545 orthologs on human autosomes.

The Mouse-Rat Data Set
A data set of mouse-rat orthologs was collected by scrutinizing entries in Hovergen (the Homologous Vertebrate Gene Database, available at http://www.hgmp.mrc.ac.uk; Duret et al. 1994Citation ). Genes were considered orthologs if the gene family tree contained no internal nonrodent branch between the mouse and rat sequence branches and if at least one nonrodent sequence appeared as an outgroup to the mouse and rat sequences. This resulted in a data set of over 500 gene pairs.

Each mouse gene was inspected at LocusLink (www.ncbi.nlm.nih.gov/LocusLink/) via its accession number to establish mouse chromosomal location. These chromosomal locations were identical to those described at Mouse Genome Informatics (www.informatics.jax.org; Mouse Genome Informatics—The Jackson Laboratory 1996). Those without locations specified to the centimorgan and those on the X chromosome were eliminated from the data set. Pairwise BLAST searching was used to eliminate tandem duplicates from the data set. This resulted in a data set of 475 autosomal genes.

GENETRANS (GCG program suite at HGMP, http://hgmp.mrc.ac.uk) was used to automatically extract complete coding sequences. DNA alignments were carried out with PILEUP (also part of GCG) using the default setting. The alignments were checked by eye and modified if necessary. Substitutions per site were estimated as described for the human-rodent data set. We found mean evolutionary distances (±SD) of Ka = 0.036 ± 0.038 and Ks = 0.17 ± 0.05.

Statistics
For each gene, we calculated the difference between Ka (Ks) and the mean of all its neighbors within a certain distance range. The mean absolute difference was calculated by summing over all genes. We then created a set of 100,000 random mean differences by permuting the Ka (Ks) values of all genes at random. To test for within-chromosome local similarity, we permuted only genes within the same chromosome. To test if local similarity was caused by a covariation of the rates with local GC content, we swapped only genes within classes of similar GC contents at third codon positions (GC3). Each GC3 class contained 10% of the full data set. We defined a measure of the local similarity in Ks (and analogous in Ka) as the ratio of two mean absolute differences in Ks, i.e., of the observed (linked) difference and the difference expected without linkage effects (from randomization):


Thus, a value of {rho}s = 0.85 means that on average the difference between the synonymous rates of linked genes is only 85% of the difference expected by chance.

To compare our results with those obtained by Matassi, Sharp, and Gautier (1999)Citation , we defined a second test function as the mean squared Ka (Ks) difference of all linked pairs. This measure is equivalent to the I statistics employed by Matassi, Sharp, and Gautier (1999)Citation . The contribution of each gene to this test function is proportional to the number of its neighbors. A corresponding random distribution was created as above.

Chromosomal heterogeneity was measured with the test function


where Ki is the mean Ks for chromosome number i and vi is the expected variance of Ks on the same chromosome, derived from vi = v/Ni (v is the variance of Ks in the full data set, and Ni is the number of genes on the chromosome). We can test the hypothesis that there is heterogeneity by creating 1,000,000 randomized data sets and asking how many have {chi}2 values greater than that seen in the real data set. In each randomization run, genes were randomly reassigned to chromosomes, keeping only the total number of genes per chromosome intact. The distribution created by this randomization procedure is approximately {chi}2-distributed (data not shown).


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Linked Genes Have Similar Ka Values
Linked genes were defined as those within 1 cM of each other on the genetic mouse map. Employing the same statistical protocol as Matassi, Sharp, and Gautier (1999)Citation , we confirmed that with this method no significant local Ka similarity is found for human-rodent orthologs (P = 0.90). However, the protocol amplifies the influence of localized clusters of genes, which increases the variance of the randomized data sets. Weighting genes more evenly by using the mean Ka difference as defined in Materials and Methods, we found highly significant Ka similarity for linked genes ({rho}a = 0.917, P = 0.00010). This result was robust to the removal of immune-specific genes (P = 0.0026). In the mouse-rat comparison, linked genes also had significantly similar Ka values when analyzed with this protocol ({rho}a = 0.860, P = 0.0023). When linked genes on human autosomes were defined as those within 1 Mb of each other, we also found significant local similarity in Ka values ({rho}a = 0.946, P = 0.0027).

Linked Genes Have Similar Ks Values
In comparing the mean Ks difference of the human-rodent data set with randomized genomes, we found highly significant similarity for linked genes on mouse autosomes ({rho}s = 0.859, P < 10-5). The same significance was obtained after removal of immune-specific genes. On human autosomes, we found highly significant similarity for linked genes within 1 Mb of each other ({rho}s = 0.857, P < 10-5). In the mouse-rat comparison, Ks similarity was not significant ({rho}s = 0.941, P = 0.087). However, when we included more gene comparisons by extending the definition of linked genes to those within 5 cM of each other, local similarity reached significance ({rho}s = 0.935, P = 0.024). Nonetheless, there seems to be a discrepancy in the strengths of the local similarity between the two species comparisons.

Rate Similarities Extend over Whole Chromosomes
Within this wider linkage definition of d = 5 cM, Ks similarity in the human-rodent comparison was still highly significant. What is the range of this local similarity? For any examined linkage radius (d = 1, 2, 5, 20, and 200 cM), we found highly significant "local" Ks similarity (P < 10-5). A range of d = 200 cM includes all genes residing on the same mouse chromosome. Thus, the Ks similarity of linked genes extends over all genetic length scales, from 1 cM up to whole chromosomes. In comparing mean Ks values on human autosomes, we also found very extensive heterogeneity between autosomes. As seen in table 1 and figures 1 and 2 , not one in a million random human or mouse genomes has more chromosomal Ks heterogeneity than the real data. The local Ks similarity in the mouse-rat comparison was much weaker and was not detectable on all length scales. However, when we tested for chromosomal heterogeneity, we found again that mouse chromosomes had significantly different mean Ks values (table 1 ).


View this table:
[in this window]
[in a new window]
 
Table 1 Chromosomal Heterogeneity of Mouse Autosomes in Evolutionary Rates, Calculated According to Li's (1993) Method (Ka, Ks) and with Maximum-Likelihood (KMIa, KMIs)

 


View larger version (16K):
[in this window]
[in a new window]
 
Fig. 1.—Mean rate of evolution at synonymous sites (Ks) for the 22 human autosomes and the X chromosome. Eight autosomes (shown as black dots) show significantly high or low rates of evolution under a null model in which all autosomes evolve on average at the same rate (P < 0.05 from randomization data). The dotted line shows mean Ks on autosomes (0.490 ± 0.003)

 
Does such heterogeneity also exist for nonsynonymous rates of evolution? In both species comparisons, we found significant heterogeneity of mean autosomal Ka, which for the human-rodent comparison was evident both on the mouse map and on the human map (table 1 ). Thus, genes positioned on the same autosome have significantly similar rates of nonsynonymous and synonymous site evolution in both the human-mouse and the mouse-rat comparisons. In the more distant human-rodent comparison, we found the chromosomal effect for Ka to be much weaker than that for Ks.

Because of its importance for understanding previous conflicting results on male bias to the mutation rate, we further analyzed Ks heterogeneity among human autosomes. We found that it was robust to analysis of only those genes with known expression in nonimmune tissue ({chi}2 = 113.9, P < 10-6). It was also robust to analysis of Ks after codons involved in doublet substitutions were removed to reduce the covariance of synonymous and nonsynonymous rates (Smith and Hurst 1999bCitation ; Duret and Mouchiroud 2000Citation ) ({chi}2 = 142.8, P < 10-6), showing that the effect is not owing to heterogeneity in rates of nonsynonymous evolution (such heterogeneity was only marginally significant; table 1 ). Eight of the human autosomes (numbers 4, 13, 14, 15, 16, 17, 19, and 21) showed significant (P < 0.05 from 1,000,000 random permutations) deviation of mean Ks from null expectations (fig. 1 ). Four (numbers 4, 14, 17, and 19) remained significant after Bonferroni correction. Similarly, seven mouse autosomes (numbers 2, 4, 5, 8, 10, 11, and 12) showed significant deviation from null expectations (fig. 2 ); four (numbers 4, 8, 10, and 11) remained significant after Bonferroni correction.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 2.—Mean rate of evolution at synonymous sites (Ks) from the human-rodent comparison for the 19 mouse autosomes and the X chromosome. Seven autosomes (shown as black dots) show significantly high or low rates of evolution. The dotted line shows mean Ks on autosomes (0.496 ± 0.004)

 
That genes have not resided on the same autosome for all of the evolution between rodents and humans makes our analysis conservative: rearrangements should act as a randomizing process, tending to homogenize rates of evolution between autosomes. This constitutes evidence against germ cell mutations as the dominant determiner of substitution rates: the germ cell division model for the male mutation bias explicitly fails to predict between-autosome heterogeneity, as all autosomes spend the same time in the male germ line.

Autosomal Heterogeneity Is So Great That the Human X Chromosome Is Not an Outlier
If time spent in the male germ line were the dominant predictor of Ks, then the X chromosome should appear as an outlier. However, the human X chromosome is not an outlier. While the X chromosome has the lowest mean Ks, we find that two autosomes (with a total of 184 genes) have Ks values almost as low (fig. 1 ). In simulations, not one out of a million randomly rearranged genomes had at least 184 genes on autosomes with error bars overlapping those of the X chromosome (P < 10-6). Thus, while time spent in the male germ line may well contribute to the variance in Ks, it fails to explain its majority.

Within-Chromosome Local Similarity Exists Independent of Chromosomal Effects
In our randomization protocol, we can control for between-chromosome heterogeneity by swapping rate values only between genes on the same chromosome. We find that in addition to chromosomal effects, there is local (within-chromosome) Ks similarity, although this is significant only in the human-rodent comparison and not in the mouse-rat comparison. Table 2 shows the results from measuring Ka and Ks similarities for human-rodent orthologs in ever-expanding rings (i.e., among genes between 0 and 2 cM apart, between 2 and 4 cM apart, etc.). Up to around 6 cM on the mouse map, local within-chromosome Ks similarity persists in the human-rodent comparison. As can be seen from table 2 , there is also local within-chromosome similarity in Ka. However, this similarity is short-ranged, and no significant within-chromosome Ka similarity could be detected for genes farther than 2 cM apart in both species comparisons. On the human map, we could not detect significant local similarity in Ka or Ks beyond a distance of 2 Mb (distance 0–2 Mb: {rho}a = 0.949, P = 0.00068; {rho}s = 0.896, P < 10-5; distance 2–4 Mb: {rho}a =0.984, P = 0.23; {rho}s = 0.987, P = 0.25).


View this table:
[in this window]
[in a new window]
 
Table 2 Ratios of Observed and Randomized Rate Differences ({{rho}}) and Corresponding P Values for Within-Chromosome Local Similarity (i.e., controlling for between-chromosome effects)

 
GC Content Does Not Explain Local Similarity
The genomes of vertebrates have been described as mosaics of long (>300 kb) DNA segments homogeneous in base composition, termed isochores (Bernardi 1995Citation ). While the existence of distinct isochores has recently been questioned, there are strong local similarities in GC content (International Human Genome Sequencing Consortium 2001Citation ). It is also known that evolutionary rates are influenced by GC content, although the exact form of this dependence is still a matter of debate (Wolfe, Sharp, and Li 1989Citation ; Bernardi, Mouchiroud, and Gautier 1993Citation ; Smith and Hurst 1999bCitation ; Bielawski, Dunn, and Yang 2000Citation ; Hurst and Williams 2000Citation ). One can then hypothesize that it is similarity in GC content—and not linkage as such—which leads to local rate similarities. This hypothesis can be tested by a randomization protocol that permutes genes only within classes of similar GC3 (Matassi, Sharp, and Gautier 1999Citation ). We still find highly significant similarity in Ks ({rho}s = 0.860, P < 10-5; and {rho}s = 0.861, P < 10-5) and in Ka ({rho}a = 0.917, P = 0.00021; and {rho}a = 0.953, P = 0.0092) for human-rodent orthologs within 1 cM on the mouse map and within 1 Mb on the human map, respectively. Thus, the local similarity in evolutionary rates is not a consequence of similarity in GC content.

Maximum-Likelihood Estimates Confirm the Local Rate Similarities
The protocol used for the estimation of evolutionary rates might influence our ability to pick up the similarities discussed above. To test this hypothesis, we repeated our calculations using rates obtained with the maximum-likelihood protocol introduced by Goldman and Yang (1994)Citation . In accordance with the above results, we found highly significant similarity for genes within 1 cM on the mouse map, both in KaML (human-rodent: {rho}a = 0.914, P = 0.00004; mouse-rat: {rho}a = 0.850, P = 0.0032) and in KsML (human-rodent: {rho}s = 0.776, P < 10-5). As before, the similarity in KsML for the mouse-rat comparison was not significant within the chosen range of 1 cM ({rho}s = 0.936, P = 0.085). Again, both similarities extended over whole chromosomes, leading to chromosomal heterogeneity in KaML and in KsML (see table 1 ). However, the heterogeneity in KaML was now just below significance for human autosomes, and heterogeneity in KsML was not significant for the mouse-rat comparison. The range of the within-chromosome similarity on the mouse map was unchanged compared with that reflected in table 2 . Due to the allowance for biased composition, the maximum-likelihood estimate of KsML depends much more on GC3 (Smith and Hurst 1999bCitation ) compared with the Ks value obtained with Li's (1993)Citation protocol. However, when permuting human-rodent orthologs within classes of similar GC3, we still found significant local similarity on the mouse map (KsML: {rho}s = 0.802, P < 10-5; KaML: {rho}a = 0.915, P = 0.00030) and on the human map (KsML: {rho}s = 0.87, P < 10-5; KaML: {rho}a = 0.95, P = 0.0053).

Low Ks Similarity in Rodents Is Not Due to Small Sample Size or Evolutionary Distance
The relative strengths of the Ka and Ks similarities were very different in both species comparisons. Whereas regionality in Ks was very strong in the human-rodent comparison (see also Matassi, Sharp, and Gautier 1999Citation ), it was weak in the mouse-rat comparison (see also Williams and Hurst 2000Citation ). For local Ka similarity, the situation was reversed: it was strong in the mouse-rat comparison but could be detected only by carefully weighting gene clusters in the human-rodent comparison; indeed, it was not reported by Matassi, Sharp, and Gautier (1999)Citation . The latter discrepancy we have shown to be an artifact of methodology. However, the weak Ks similarity in the mouse-rat comparisons is not so obviously artifactual. Repeatedly drawing 475 random genes from the human-rodent data set, we found higher local Ks similarity (within 1 cM) than in the mouse-rat comparison (P < 0.09) in 978 out of 1,000 draws. This is significant evidence that sample size does not fully explain the different strengths in the two species comparisons (P = 0.022). What, then, are the causes of this difference?

It has been conjectured that selection is less efficient in rodents (Bernardi 1995Citation ), e.g., because of small effective population sizes in structured subpopulations or increased mutation rates. If spatial structure in Ks is maintained by selection (e.g., on codon usage, GC content, or modifiers of the mutation rate), a reduction in local similarity in rodents would then be expected. The reduced heterogeneity of local GC content in murids compared with other mammals has been cited as evidence for such a reduction in selective pressure (Bernardi 2000Citation ).

An alternative hypothesis for the discrepancy in local Ks similarities asserts no difference in the evolutionary mechanisms acting in humans and in rodents, but assumes the difference in divergence time to be the underlying cause. Two species that diverged as recently as the mouse and the rat have accumulated few mutations. The variances in Ks and Ka are thus small, but the variance in estimates of Ks are dependent on gene size and may be proportionally large. For Ks, this sampling variance may drown any linkage effects that would otherwise be visible. However, due to varying selective pressures, nonsynonymous rates of evolution have much higher underlying variances (as a percentage of the mean; see the figures given in Materials and Methods) than synonymous rates. Here, the sampling variance can be considered relatively small and does not obscure local effects. Following this line of argument, we expect the relative standard deviation (as a percentage of the mean) to be higher in the mouse-rat comparison. However, we find the same relative standard deviations for our Ks estimates in both species comparisons (human-rodent, 28%; mouse-rat, 29%; this is unchanged when immune-specific genes are excluded).

We performed a set of Monte Carlo simulations to distinguish between the two alternative explanations of the different strengths in local Ks similarity. We characterized each human-rodent ortholog by its number of synonymous sites (as determined with the maximum-likelihood method) and by its mutation rate, which we approximated by KsML. All sequences were "evolved " repeatedly with a Poisson process (i.e., under a strictly neutral model, with independence of substitutions, and with no substitutional bias). We found that the combined effects of Poisson noise (due to short evolutionary distance) and small sample size appear insufficient to explain the observed discrepancy in local Ks similarity between the two species comparisons.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Genes within a few centimorgans of each other have similar rates of synonymous and nonsynonymous evolution. This similarity is found in comparisons of closely related (mouse-rat), as well as more distant (human-rodent), mammalian species. To detect the similarity optimally in randomization protocols, one has to carefully consider the statistical treatment of linked clusters.

We confirmed that the local similarity in synonymous rate (Ks) was much weaker in the mouse-rat comparison than in the human-rodent comparison. The smaller sample size and the shorter evolutionary time over which mutational processes acted in the mouse-rat comparison should both add relative noise. However, our simulations show that the combined effect is highly unlikely to obscure the local Ks similarity to the extent seen in our data. We must thus conclude that there exist real underlying differences in the spatial patterns of mutation or selection on synonymous sites between humans and rodents. Corresponding differences are known to exist in compositional genome organization and have been attributed to weakened selection in the rodent genome (Bernardi 2000Citation ).

Local similarities of evolutionary rates are detectable on two different genetic length scales: within a few centimorgans, and on whole chromosomes. What light do these results shed on previous estimates of the male mutation bias from comparisons of rates on the X and Y chromosomes and autosomes? Here, we found (1) unexpectedly extensive variance between autosomes in rate of synonymous gene evolution and (2) that the X-linked genes have a rate of evolution comparable to that of genes on some autosomes. Neither finding can be explained as an artifact of sampling, as our randomization tests are robust to such problems. As location on the X chromosome is highly conserved among species (all X-linked genes in our study were X-linked in both humans and rodents), a translocation of sequences from the X chromosome to autosomes also cannot be responsible. These results are not consistent with the hypothesis that time spent in the male germ line is the dominant determiner of synonymous rates of evolution. If we may suppose that the synonymous rate is a measure of the mutation rate, as is consistent with the equality of synonymous rates and intronic rates of evolution (Smith and Hurst 1998Citation ) and patterns of codon usage bias (Eyre-Walker 1991Citation ), then this suggests that germ cell division is not the dominant cause of mutation. In summary, comparison of rates of synonymous evolution on the X and Y chromosomes and autosomes cannot be assumed to be an unbiased method for determining male mutation bias and the relative proportion of germ cell divisions in the two germ lines.

These results may also be helpful in interpreting some of the previous discrepant estimates of the male mutation bias. Notably, if we suppose there to be some other mutagenic force (e.g., recombination) whose effects differ within and between autosomes and also between the X chromosome, the Y chromosome, and autosomes, then we should expect that X-Y comparisons and the X-autosome comparisons need not provide the same estimate for {alpha}. Such a lack of concordance has been observed in rodents (McVean and Hurst 1997Citation ; Smith and Hurst 1999aCitation ). Furthermore, the extent of the regional heterogeneity is so great that sampling from one region alone is likely to lead to biased estimates. The recently obtained unusually low estimate ({alpha} = 1.7) found for primates (Bohossian, Skaletsky, and Page 2000Citation ) came from analysis of only one block of sequence. The discrepancy between this and prior estimates was conjectured to reflect a difference between coding and noncoding regions, but this now appears unlikely, as a sample of pseudogenes (Nachman and Crowell 2000Citation ) provides a higher estimate ({alpha} {approx} 4). The most likely cause of this discrepancy, we wish to suggest, is a biased estimate owing to genomic regionality in rates of evolution, as we have described here.

What causes the local similarities described in this paper? We can reject similarities in GC content, which have been put forward as a possible explanation, as a likely cause. Given a high rate of synonymous evolution of genes in the pseudoautosomal region (Perry and Ashworth 1999Citation ), we might predict that one component of the variation might be mutations induced by recombination (the pseudoautosomal region being a region with an unusually high recombination rate). This would be in line with evidence from yeast (Strathern et al. 1995Citation ) and from mammalian somatic hypermutation (Papavasiliou and Schatz 2000Citation ), suggesting that repair of double-strand breaks, possibly during recombination, is mutagenic. The hypothesis does not obviously concur, however, with the finding that in fruit flies, in which males do not undergo recombination, point mutations appear to be as commonly derived from males as from females (Bauer and Aquadro 1997Citation ). Further analysis of this issue in mammals will require construction of adequate recombinational maps in which ancestral recombination rates can be estimated. This we leave to future work. It will also be very valuable to know which types of point mutations are typically induced by faulty repair of double-strand breaks.

The range of the local similarity in nonsynonymous rates of evolution appears to be much smaller than that for synonymous rates. Still, we found significant chromosomal heterogeneity in Ka, which cannot easily be explained in terms of biases of the repair machinery. As synonymous and nonsynonymous rates are essentially independent when calculated with the maximum-likelihood method, the local similarity in Ka does not appear to be due to the underlying mutation rate (significant chromosomal heterogeneity was also found for Ka/Ks and KaML/KsML; data not shown). To explain the local similarity in nonsynonymous rates, we have to invoke selective explanations. It has recently been suggested that clusters of similarly expressed genes, termed "expression modules," are responsible for the local Ka similarity (Hurst and Eyre-Walker 2000Citation ).


    Footnotes
 
Adam Eyre-Walker, Reviewing Editor

1 Keywords: evolutionary rate linkage chromosomal heterogeneity male mutation bias Back

2 Address for correspondence and reprints: Martin Lercher, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom. m.j.lercher{at}bath.ac.uk . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 

    Bauer V. L., C. F. Aquadro, 1997 Rates of DNA sequence evolution are not sex-biased in Drosophila melanogaster and D. simulans Mol. Biol. Evol 14:1252-1257[Abstract]

    Bernardi G., 1995 The human genome: organization and evolutionary history Annu. Rev. Genet 29:445-476[ISI][Medline]

    ———. 2000 Isochores and the evolutionary genomics of vertebrates Gene 241:3-17[ISI][Medline]

    Bernardi G., D. Mouchiroud, C. Gautier, 1993 Silent substitutions in mammalian genomes and their evolutionary implications J. Mol. Biol 37:583-589

    Bielawski J. P., K. A. Dunn, Z. Yang, 2000 Rates of nucleotide substitution and mammalian nuclear gene evolution: approximate and maximum-likelihood methods lead to different conclusions Genetics 156:1299-1308[Abstract/Free Full Text]

    Bohossian H. B., H. Skaletsky, D. C. Page, 2000 Unexpectedly similar rates of nucleotide substitution found in male and female hominids Nature 406:622-625[ISI][Medline]

    Casane D., S. Boissinot, B. H. J. Chang, L. C. Shimmin, W. H. Li, 1997 Mutation pattern variation among regions of the primate genome J. Mol. Evol 45:216-226[ISI][Medline]

    Chang B. H. J., D. Hewett-Emmett, W.-H. Li, 1996 Male-to-female ratios of mutation-rate in higher primates estimated from intron sequences Zool. Stud 35:36-48[ISI]

    Chang B. H. J., W.-H. Li, 1995 Estimating the intensity of male-driven evolution in rodents by using X-linked and Y-linked Ube-1 genes and pseudogenes J. Mol. Evol 40:70-77[ISI][Medline]

    Chang B. H. J., L. C. Shimmin, S. K. Shyue, D. Hewett-Emmett, W.-H. Li, 1994 Weak male-driven molecular evolution in rodents Proc. Natl. Acad. Sci. USA 91:827-831[Free Full Text]

    Crow J. F., 1997 The high spontaneous mutation rate: is it a health risk? Proc. Natl. Acad. Sci. USA 94:8380-8386[Abstract/Free Full Text]

    ———. 2000 The origins patterns and implications of human spontaneous mutation Nat. Rev. Genet 1:40-47[ISI][Medline]

    Duret L., D. Mouchiroud, 2000 Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate Mol. Biol. Evol 17:68-74[Abstract/Free Full Text]

    Duret L., D. Mouchiroud, M. Gouy, 1994 HOVERGEN—a database of homologous vertebrate genes Nucleic Acid Res 22:2360-2365[Abstract]

    Ellegren H., A. K. Fridolfsson, 1997 Male-driven evolution of DNA sequences in birds Nat. Genet 17:182-184[ISI][Medline]

    Eyre-Walker A. C., 1991 An analysis of codon usage bias in mammals: selection or mutation bias? J. Mol. Evol 33:442-449[ISI][Medline]

    ———. 1999 Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA Genetics 152:675-683[Abstract/Free Full Text]

    Filipski J., 1987 Correlation between molecular clock ticking, codon usage, fidelity of DNA repair, chromosome banding and chromatin compactness in germline cells FEBS Lett 271:184-186

    Goldman N., Z. H. Yang, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences Mol. Biol. Evol 11:725-736[Abstract/Free Full Text]

    Huang W., B. H. J. Chang, X. Gu, D. Hewett-Emmett, W. H. Li, 1997 Sex differences in mutation rate in higher primates estimated from AMG intron sequences J. Mol. Evol 44:463-465[ISI][Medline]

    Hurst L. D., H. Ellegren, 1998 Sex biases in the mutation rate Trends Genet 14:446-452[ISI][Medline]

    Hurst L. D., A. Eyre-Walker, 2000 Evolutionary genomics: reading the bands BioEssays 22:105-107[ISI][Medline]

    Hurst L. D., E. J. B. Williams, 2000 Covariation of GC content and the silent site substitution rate in rodents: implications for methodology and for the evolution of isochores Gene 261:107-114[ISI][Medline]

    International Human Genome Sequencing Consortium. 2001 Initial sequencing and analysis of the human genome Nature 409:860-921[ISI][Medline]

    Kimura M., 1980 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences J. Mol. Evol 16:111-120[ISI][Medline]

    Li W.-H., 1993 Unbiased estimation of the rates of synonymous and nonsynonymous substitution J. Mol. Evol 36:96-99[ISI][Medline]

    Li W.-H., D. L. Ellsworth, J. Krushkal, B. H. J. Chang, D. Hewett-Emmett, 1996 Rates of nucleotide substitution in primates and rodents and the generation time effect hypothesis Mol. Phylogenet. Evol 5:182-187[ISI][Medline]

    McVean G. T., L. D. Hurst, 1997 Evidence for a selectively favourable reduction in the mutation rate of the X chromosome Nature 386:388-392[ISI][Medline]

    Matassi G., P. M. Sharp, C. Gautier, 1999 Chromosomal location effects on gene sequence evolution in mammals Curr. Biol 9:786-791[ISI][Medline]

    Miyata T., H. Hayashida, K. Kuma, K. Mitsuyasu, T. Yasunaga, 1987 Male-driven molecular evolution: a model and nucleotide sequence analysis Cold Spring Harb. Symp. Quant. Biol 52:863-867[ISI][Medline]

    Nachman M. W., S. L. Crowell, 2000 Estimate of the mutation rate per nucleotide in humans Genetics 156:297-304[Abstract/Free Full Text]

    Papavasiliou F. N., D. G. Schatz, 2000 Cell-cycle-regulated DNA double-strand breaks in somatic hypermutation of immunoglobulin genes Nature 408:216-221[ISI][Medline]

    Perry J., A. Ashworth, 1999 Evolutionary rate of a gene affected by chromosomal position Curr. Biol 9:987-989[ISI][Medline]

    Shimmin L. C., B. H. Chang, W.-H. Li, 1993 Male-driven evolution of DNA sequences Nature 362:745-747[ISI][Medline]

    ———. 1994 Contrasting rates of nucleotide substitution in the X-linked and Y-linked zinc-finger genes J. Mol. Evol 39:569-578[ISI][Medline]

    Slattery J. P., S. J. O'Brien, 1998 Patterns of Y and X chromosome DNA sequence divergence during the Felidae radiation Genetics 148:1245-1255[Abstract/Free Full Text]

    Smith N. G. C., L. D. Hurst, 1998 Sensitivity of patterns of molecular evolution to alterations in methodology: a critique of Hughes and Yeager J. Mol. Evol 47:493-500[ISI][Medline]

    ———. 1999a. The causes of synonymous rate variation in the rodent genome: can substitution rates be used to estimate the sex bias in mutation rate? Genetics 152:661-673[Abstract/Free Full Text]

    ———. 1999b. The effect of tandem substitutions on the correlation between synonymous and nonsynonymous rates in rodents Genetics 153:1395-1402[Abstract/Free Full Text]

    Strathern J. N., B. K. Shafer, C. B. McGill, 1995 DNA-synthesis errors associated with double-strand-break repair Genetics 140:965-972[Abstract/Free Full Text]

    Sueoka N., 1962 On the genetic basis of variation and heterogeneity of DNA base composition Proc. Natl. Acad. Sci. USA 48:582-592[ISI][Medline]

    Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]

    Williams E. J. B., L. D. Hurst, 2000 The proteins of linked genes evolve at similar rates Nature 407:900-903[ISI][Medline]

    Wolfe K. H., P. M. Sharp, W.-H. Li, 1989 Mutation rates differ among regions of the mammalian genome Nature 337:283-285[ISI][Medline]

    Yang Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood CABIOS 13:555-556[Medline]

    Yu A., C. Zhao, Y. Fan, et al. (11 co-authors) 2001 Comparison of human genetic and sequence-based physical maps Nature 409:951-953[ISI][Medline]

Accepted for publication July 13, 2001.