Does the Recombination Rate Affect the Efficiency of Purifying Selection? The Yeast Genome Provides a Partial Answer

Csaba Pál, Balázs Papp and Laurence D. Hurst

Collegium Budapest, Institute for Advanced Study, Budapest, Hungary;
Department of Plant Taxonomy and Ecology, Eötvös Loránd University, Budapest, Hungary;
Department of Biology and Biochemistry, University of Bath, Bath, England

Much attention has been focused on the effect of recombination on preservation of favorable variants and on reducing the mutational load (Muller 1964Citation ; Kondrashov 1993Citation ; Barton 1995Citation ; Hurst and Peck 1996Citation ; West, Lively, and Read 1999Citation ). As most amino acid changes are slightly deleterious (Li 1997Citation ), their accumulation is expected to depend on the efficiency of purifying selection. Theoretical models have revealed that recombination might prevent the accumulation of such slightly deleterious mutations. This is potentially of importance because it suggests that purifying selection is less effective in genomic regions of low recombination (Hill and Robertson 1966Citation ; Carvalho and Clark 1999Citation ; Comeron, Kreitman, and Aguade 1999Citation ; Comeron and Kreitman 2000Citation ; Williams and Hurst 2000Citation ).

If purifying selection were more effective in regions of high recombination, accounting for within-genome variation, the rate of protein evolution should be lower in order to minimize the harmful effect of mutations on protein function. Furthermore, codon usage bias, which is at least partly influenced by selection, should be more pronounced in these genomic regions. Previous tests have employed this logic and shown that codon usage is slightly more biased under high recombination rates in Drosophila melanogaster (Comeron, Kreitman, and Aguade 1999Citation ). Two other studies (Carvalho and Clark 1999Citation ; Comeron and Kreitman 2000Citation ) have confirmed a negative relationship between intron size and recombination rate in the same organism (although the two groups give different interpretations of their results). One possible shortcoming of these works is that they did not control for variation in gene expression level. Some experimental works on yeast suggest that genomic regions with high transcription rates are also more prone to the initiation of meiotic double-strand break (DSB) events (Gerton et al. 2000Citation ), and there are also some hints that this feature might be a general feature of eukaryotic organisms (e.g., Jones et al. 1997Citation ). As codon usage bias (e.g., Duret and Mouchiroud 1999Citation ) and intron size (Vinogradov 2001Citation ) are affected by differences in gene expression level in numerous model organisms, the conclusions drawn from the association of these parameters with recombination rate should be treated with caution.

Yeast is currently a unique tool with which to address all of these issues owing to the availability of recombination (Gerton et al. 2000Citation ) and large-scale microarray expression data (Holstege et al. 1998Citation ) from Saccharomyces cerevisiae, along with extensive sequence data from S. cerevisiae (Ball et al. 2001Citation ) and the closely related Candida albicans. Here we show that recombination frequency has only weak, albeit significant, effects on protein evolution, on codon usage, and on protein size. Whether these relationships are owing to variation in the efficiency of purifying selection is unclear. We show that gene expression rates are correlated with recombination rates, and when we control for expression rates, the effect of recombination on the above parameters becomes only marginally significant.

Complete Saccharomyces protein and DNA sequences were obtained from Saccharomyces Genome Database (Ball et al. 2001Citation ), while sequence data for C. albicans were obtained from the Stanford DNA Sequencing and Technology Center website at http://www-sequence.stanford.edu/group/candida (contig version 6 was used for this purpose). We carried out BLASTP 2.0 amino acid sequence similarity searches of the S. cerevisiae protein against all available complete C. albicans proteins and vice versa using BLOSUM62 substitution matrices and the SEG filter for low-complexity regions (Altschul et al. 1997Citation ). The criteria used to define orthologs were (1) a BLASTP significance (E) value of <10-20 and (2) the occurrence of the highest blast score when the putative orthologs were searched against each other. It is also worth noting that numerous sequences in S. cerevisiae belong to large, diverged protein families, but the duplication events most likely occurred before the divergence of the Candida and Saccharomyces lineages. As these sequences did not match our orthology criteria (see above), they were omitted. Some sequence alignments were ambiguous, and therefore they were omitted from the analysis. With the resulting 3,347 gene pairs, protein sequence alignments were carried out with CLUSTAL W, version 1.8, using the default settings (Thompson, Higgins, and Gibson 1994Citation ). Protein distances (dA) were calculated using PHYLIP (Felsenstein 1989Citation ), employing PAM substitution matrices and default settings. Only gene pairs with moderate protein distances (dA < 1.5) were retained. To estimate the recombination rate adjacent to each open reading frame (ORF), a published data set was used (Gerton et al. 2000Citation ). Gerton et al. (2000)Citation used seven DNA microarrays to estimate variation in the level of nearby meiotic DSBs along the whole yeast genome. The median value for each ORF was used as an approximation of local recombination rates, leading to a final data set of 2,758 gene pairs. In figure 1 , three groups of genes are defined: those with low, medium, and high recombination rates. The data set was divided into three subsets of equal sizes. Whole-genome transcription data from Holstege et al. (1998)Citation were used as estimates for gene expression levels. Using microarray technology, Holstege et al. (1998)Citation estimated the number of mRNA molecules per cell and the mRNA half-life for a large collection of genes in the S. cerevisiae under conditions of growth to mid-log phase in YPD media.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 1.—Association between the rate of protein evolution (measured by dA) and recombination frequency. Recombination rate is measured by relative frequency of double-strand break events. For the raw data, r = -0.117 and P < 10-9

 
To analyze the effects of recombination on rates of evolution, a reasonably large data set was compiled including the protein distances of Candida-Saccharomyces orthologs (see above). Although recombination data come only from S. cerevisiae, whereas protein evolution depends also on what has happened during evolution in the Candida lineage, we still found a highly significant association between the absolute rate of protein evolution (dA) and recombination frequency (r = -0.117, P < 10-9; see fig. 1 ). There are some further clues suggesting a relationship between the efficiency of selection and recombination. In agreement with previous findings for Drosophila (Comeron, Kreitman, and Aguade 1999Citation ), we found a positive association between codon adaptation index (which measures departure from random codon usage in favor of optimal codons) and recombination frequency (r = 0.100, P < 10-6).

Protein size distribution is also influenced by recombination pattern: genes coding for long proteins also tend to occur in regions of low recombination (r = -0.161, P < 10-6). This result can be explained if we assume that selection acts on gene length in order to reduce the expense of transcription or translation. Because large proteins are energetically expensive to make, such proteins are disadvantageous if shorter proteins can perform the same function (Moriyama and Powell 1998Citation ). Hence, longer proteins are allowed only in genomic regions in which the strength of purifying selection is sufficiently weak. However, we cannot exclude the possibility that recombination may induce deletions, and hence we examined a mutation bias rather than a result of selection (see also Comeron and Kreitman 2000Citation ).

The analyses above did not exclude the possibility that recombination per se was not the key variable. It has recently become apparent that transcriptionally active genomic regions are more prone to the initiation of meiotic recombination events (Nicolas 1998Citation ; Gerton et al. 2000Citation ). This we can confirm. Using a public database (Holstege et al. 1998Citation ), we found a significant association between transcriptional frequency and recombination rate (r = 0.104, P < 10-6). It has also previously been shown that highly expressed genes in yeast evolve slowly (Pál, Papp, and Hurst 2001Citation ).

Although the interpretation of this latter result is uncertain, one might argue that the negative association between recombination frequency and rate of protein evolution may simply reflect the covariation of gene expression level and recombination rate. In a similar vein, codon bias is especially pronounced in highly expressed genes in yeast, most likely reflecting selection for enhanced efficiency of translation (e.g., Coghlan and Wolfe 2000Citation ). Hence, the association between codon adaptation and recombination might be explained by the covariation of recombination rate and gene expression level. A similar explanation may hold for protein size distribution. Highly expressed proteins are expected to pose a higher energetic cost for the organism. Hence, it is conceivable that selection pressure for reduced protein size is especially pronounced in highly expressed genes (Moriyama and Powell 1998Citation ; Coghlan and Wolfe 2000Citation ).

We found that the nonindependence of recombination and gene expression level can explain in large part the patterns found above. Using multiple-regression analysis (Sokal and Rohlf 1995Citation ) to control for the covariation of gene expression and recombination rate, we found that the correlation between recombination rate and protein evolution (dA) became only marginally significant (table 1 ). Thus, only a very small fraction of the variation in protein evolution is attributable to local differences in genomic recombination rates when expression level is controlled for. In contrast, the correlation between protein size and recombination rate remains largely unaffected.


View this table:
[in this window]
[in a new window]
 
Table 1 Results from Multiple-Regression Analysis of Recombination Rate and Gene Expression Level with Different Dependent Variables

 
Can we be certain that the high rate of evolution of genes in regions of low recombination and low gene expression is due to weaker purifying selection, or could it be that regions with a low recombination rate or a low gene expression level may also have a higher mutation rate? Experimental facts do not seem to support this hypothesis. In yeast, increased transcription rates are associated with increased mutation rates (Datta and Jinks-Robertson 1995Citation ; Morey, Greene, and Jinks-Robertson 2000Citation ). Furthermore, recombinational repair of DSBs in yeast, traditionally believed to be an error-free DNA repair pathway, was recently shown to substantially increase the frequency of point mutations in a nearby interval (Strathern, Shafer, and McGill 1995Citation ; Holbeck and Strathern 1997Citation ). These are the opposite of the patterns one would expect to explain the negative correlation between recombination and rate of protein evolution. Therefore, the association between recombination and rate of protein evolution is unlikely to result from differences in mutational patterns alone.

What about the association between recombination rate and codon usage bias? There is a general relationship between recombination rate and genomic G+C content in numerous eukaryotes, including yeast (see Gerton et al. 2000Citation ). This possibly reflects mutational bias toward G and C in regions of high recombination. If optimal codons are GC-rich, then mutational bias could alone produce a positive correlation between recombination rate and codon usage. This mechanism is most likely to work in D. melanogaster and Caenorhabditis elegans (Marais, Mouchiroud, and Duret 2001Citation ). However, this neutral scenario is unlikely to hold for yeast. First, in contrast to D. melanogaster and C. elegans, optimal codons in yeast do not preferentially end in G or C (Sharp and Cowe 1991Citation ). Second, codon usage is largely influenced by gene expression rather than by recombination rate (table 1 ). Third, there is no significant negative association between codon adaptation index and recombination rate when only AT-ending codons are considered (unpublished data). This is in contrast to the prediction of the neutral scenario (Marais, Mouchiroud, and Duret 2001Citation ).

However, we cannot exclude the possibility that recombination induces deletions. Hence, the association between recombination rate and protein size may be the result of mutation bias rather than selection. It is worth noting that protein size was the only parameter in our study for which the association with recombination remained largely unaffected after controlling for gene expression level.

From these results, we conclude that when gene expression level is controlled for, genomic variation in recombination rates has only a very weak influence on the efficiency of purifying selection. This conclusion is drawn from the analysis of the unicellular yeast, in which meiotic recombination events and transcription occur in the same cell. What are the implications of our finding for multicellular organisms, where meiosis is restricted to germ cells? Clearly, in this case expression level in somatic tissues does not matter: only transcription in germ cells can influence meiotic recombination rate. Indeed, in humans, male meiotic recombination is specifically associated with transcription in testis tissue (Jones et al. 1997Citation ). If generally true, this suggests that the examination of covariance with recombination is a problematic method of studying variation in the intensity of selection, and previous results from the use of this method (e.g., Carvalho and Clark 1999Citation ; Comeron, Kreitman, and Aguade 1999Citation ) need to be treated with caution until covariance with germ cell expression level is eliminated as causative.

Footnotes

William Martin, Reviewing Editor

Address for correspondence and reprints: Csaba Pál, Collegium Budapest, Institute for Advanced Study, Szentháromság 2, Budapest, H-1014, Hungary. cspal{at}colbud.hu . Back

References

    Altschul S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402[Abstract/Free Full Text]

    Ball C. A., H. Jin, G. Sherlock, et al. (11 co-authors) 2001 Saccharomyces Genome Database provides tools to survey gene expression and functional analysis data Nucleic Acids Res 29:80-81[Abstract/Free Full Text]

    Barton N. H., 1995 A general model for the evolution of recombination Genet. Res 64:123-145

    Carvalho A. B., A. G. Clark, 1999 Intron size and natural selection Nature 401:344[ISI][Medline]

    Coghlan A., K. H. Wolfe, 2000 Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae Yeast 16:1131-1145[ISI][Medline]

    Comeron J. M., M. Kreitman, 2000 The correlation between intron length and recombination in Drosophila: dynamic equilibrium between mutational and selective forces Genetics 156:1175-1190[Abstract/Free Full Text]

    Comeron J. M., M. Kreitman, M. Aguade, 1999 Natural selection on synonymous site is correlated with gene length and recombination in Drosophila Genetics 151:239-249[Abstract/Free Full Text]

    Datta A., S. Jinks-Robertson, 1995 Association of increased spontaneous mutation rates with high levels of transcription in yeast Science 268:1616-1619[ISI][Medline]

    Duret L., D. Mouchiroud, 1999 Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis Proc. Natl. Acad. Sci. USA 96:4482-4487[Abstract/Free Full Text]

    Felsenstein J., 1989 PHYLIP (phylogeny inference package). Version 3.2 Cladistics 5:164-166

    Gerton J. L., J. DeRisi, R. Shroff, M. Lichten, P. O. Brown, T. D. Petes, 2000 Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae Proc. Natl. Acad. Sci. USA 97:11383-11390[Abstract/Free Full Text]

    Hill W. G., A. Robertson, 1966 The effect of linkage on limits to artificial selection Genet. Res 8:269-294[ISI][Medline]

    Holbeck S. L., J. N. Strathern, 1997 A role for REV3 in mutagenesis during double-strand break repair in Saccharomyces cerevisiae Genetics 147:1017-1024[Abstract/Free Full Text]

    Holstege F. C., E. G. Jennings, J. J. Wyrick, T. I. Lee, C. J. Hengartner, M. R. Green, T. R. Golub, E. S. Lander, R. A. Young, 1998 Dissecting the regulatory circuitry of a eukaryotic genome Cell 95:717-728[ISI][Medline]

    Hurst L. D., J. R. Peck, 1996 Recent advances in understanding of the evolution and maintenance of sex Trends Ecol. Evol 11:46-52

    Jones M. H., Y. Zhang, K. N. Tirosvoutis, P. M. Davey, A. R. Webster, D. Walsh, N. K. Spurr, N. A. Affara, 1997 Chromosomal assignment of 311 sequences transcribed in human adult testis Genomics 40:155-167[ISI][Medline]

    Kondrashov A. S., 1993 Classification of hypotheses on the advantage of amphimixis J. Hered 84:372-387[ISI][Medline]

    Li W. H., 1997 Molecular evolution Sinauer, Sunderland, Mass

    Marais G., D. Mouchiroud, L. Duret, 2001 Does recombination improve selection on codon usage? Lessons from nematode and fly complete genomes Proc. Natl. Acad. Sci. USA 98:5688-5692[Abstract/Free Full Text]

    Morey N. J., C. N. Greene, S. Jinks-Robertson, 2000 Genetic analysis of transcription-associated mutation in Saccharomyces cerevisiae Genetics 154:109-120[Abstract/Free Full Text]

    Moriyama E. N., J. R. Powell, 1998 Gene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli Nucleic Acids Res 26:3188-3193[Abstract/Free Full Text]

    Muller H. J., 1964 The relation of recombination to mutational advance Mutat. Res 1:2-9[ISI]

    Nicolas A., 1998 Relationship between transcription and initiation of meiotic recombination: toward chromatin accessibility Proc. Natl. Acad. Sci. USA 95:87-89[Free Full Text]

    Pál C., B. Papp, L. D. Hurst, 2001 Highly expressed genes in yeast evolve slowly Genetics 158:927-931[Free Full Text]

    Sharp P. M., E. Cowe, 1991 Synonymous codon usage in Saccharomyces cerevisiae Yeast 7:657-678[ISI][Medline]

    Sokal R., M. Rohlf, 1995 Biometry W. H. Freeman

    Strathern J. N., B. K. Shafer, C. B. McGill, 1995 DNA synthesis errors associated with double-strand-break repair Genetics 140:965-972[Abstract/Free Full Text]

    Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]

    Vinogradov A. E., 2001 Intron length and codon usage J. Mol. Evol 52:2-5[ISI][Medline]

    West S. A., C. M. Lively, A. F. Read, 1999 A pluralist approach to sex and recombination J. Evol. Biol 12:1003-1012[ISI]

    Williams E. J. B., L. D. Hurst, 2000 The proteins of linked genes evolve at similar rate Nature 407:900-903[ISI][Medline]

Accepted for publication July 28, 2001.