Hill-Robertson Interference is a Minor Determinant of Variations in Codon Bias Across Drosophila melanogaster and Caenorhabditis elegans Genomes

Gabriel Marais* and Gwenaël Piganeau{dagger},1

*Laboratoire "Biométrie et biologie évolutive," UMR CNRS 5558, Université Claude Bernard Lyon 1, Villeurbanne, France;
{dagger}Center for the Study of Evolution, University of Sussex, Brighton


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
According to population genetics models, genomic regions with lower crossing-over rates are expected to experience less effective selection because of Hill-Robertson interference (HRi). The effect of genetic linkage is thought to be particularly important for a selection of weak intensity such as selection affecting codon usage. Consistent with this model, codon bias correlates positively with recombination rate in Drosophila melanogaster and Caenorhabditis elegans. However, in these species, the G+C content of both noncoding DNA and synonymous sites correlates positively with recombination, which suggests that mutation patterns and recombination are associated. To remove this effect of mutation patterns on codon bias, we used the synonymous sites of lowly expressed genes that are expected to be effectively neutral sites. We measured the differences between codon biases of highly expressed genes and their lowly expressed neighbors. In D. melanogaster we find that HRi weakly reduces selection on codon usage of genes located in regions of very low recombination; but these genes only comprise 4% of the total. In C. elegans we do not find any evidence for the effect of recombination on selection for codon bias. Computer simulations indicate that HRi poorly enhances codon bias if the local recombination rate is greater than the mutation rate. This prediction of the model is consistent with our data and with the current estimate of the mutation rate in D. melanogaster. The case of C. elegans, which is highly self-fertilizing, is discussed. Our results suggest that HRi is a minor determinant of variations in codon bias across the genome.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
Synonymous codon usage bias commonly observed in living forms is usually assumed to be under a selection-mutation-drift balance (Bulmer 1991Citation ). In many unicellular organisms, invertebrates and plants, codon bias is thought to be mainly the result of small selective effects (Sharp et al. 1993Citation ; Hartl, Moriyama, and Sawyer 1994Citation ; Akashi 1995Citation ; Chiapello et al. 1998Citation ). In these species, including Drosophila melanogaster and Caenorhabditis elegans, highly expressed genes preferentially use optimal codons, corresponding to the most abundant tRNAs in cells, because of weak selection for translational efficiency (Shields et al. 1988Citation ; Stenico, Lloyd, and Sharp 1994Citation ; Moriyama and Powell 1997Citation ; Duret and Mouchiroud 1999Citation ; Duret 2000Citation ). But in such species, it is also recognized that mutation pressure can partly be responsible for variations in codon bias across the genome (Kliman and Hey 1994Citation ; Akashi, Kliman, and Eyre-Walker 1998Citation ). Codon bias is positively correlated with recombination rate in D. melanogaster (Kliman and Hey 1993Citation ; Comeron, Kreitman, and Aguadé 1999Citation ; Marais, Mouchiroud, and Duret 2001Citation ) and C. elegans (Marais, Mouchiroud, and Duret 2001Citation ). Two models have been proposed to explain this observation.

The first model proposes that the positive correlation between codon bias and recombination rate is caused by Hill-Robertson interference (HRi) (Kliman and Hey 1993Citation ; Comeron, Kreitman, and Aguadé 1999Citation ; McVean and Charlesworth 2000Citation ). HRi leads to a decrease of selection efficacy. This is because the linkage disequilibrium between alleles at selected loci, generated by the stochastic nature of mutation and sampling in a finite population, interferes with the action of selection at other loci (Hill and Robertson 1966Citation ; Felsenstein 1974Citation ). Simulation studies suggest that the effect of genetic linkage should be particularly damaging in the case of weak selection, such as selection acting on codon usage (Li 1987Citation ; Comeron, Kreitman, and Aguadé 1999Citation ; McVean and Charlesworth 2000Citation ).

The second model proposes that the positive correlation between codon bias and recombination rate is a byproduct of mutational bias variations (MBV) associated with recombination (Marais, Mouchiroud, and Duret 2001Citation ). Consistent with this model, in D. melanogaster and C. elegans the G+C content of both noncoding DNA and synonymous sites correlates positively with recombination rate (Marais, Mouchiroud, and Duret 2001Citation ). In the D. melanogaster subgroup, local changes in crossing-over frequencies between species are correlated with changes in MBV (Takano-Shimizu 2001Citation ). Because most of the optimal codons end in G or C in both D. melanogaster and C. elegans (Shields et al. 1988Citation ; Stenico, Lloyd, and Sharp 1994Citation ; Duret and Mouchiroud 1999Citation ), the high frequency of optimal codons observed in regions of high recombination may be the result of MBV associated with recombination (Marais, Mouchiroud, and Duret 2001Citation ). A positive correlation between G+C content and recombination has also been observed in other organisms, such as yeast (Baudat and Nicolas 1997Citation ; Gerton et al. 2000Citation ), mouse (Perry and Ashworth 1999Citation ), and human (Eyre-Walker 1993Citation ; Eisenbarth et al. 2000Citation ; Fullerton, Bernardo Carvalho, and Clark 2001Citation ; Yu et al. 2001Citation ). In such eukaryotic organisms, the recombination machinery induces genetic conversion between parental chromosomes during meiosis (Smith and Nicolas 1998Citation ). Experimental evidence in mammals suggests that genetic conversion associated with recombination favors the copy of the most GC-rich sequence over the other (Brown and Jiricny 1988Citation ; Bill et al. 1998Citation ). Biased gene conversion might explain why MBV are associated with recombination in many organisms (Galtier et al. 2001Citation ).

Recently, both models have been tested in C. elegans and D. melanogaster by considering separately codons ending in G or C and codons ending in A or U (Marais, Mouchiroud, and Duret 2001Citation ). In both invertebrates, the frequency of GC-ending codons correlates positively with recombination rate, and the frequency of AU-ending codons correlates negatively with recombination rate, in agreement with the MBV model but not with the HRi model. Thus, the positive correlation between codon bias and recombination rate is mainly caused by MBV in C. elegans and D. melanogaster (Marais, Mouchiroud, and Duret 2001Citation ). An important question remains: is it possible to detect HRi on codon usage in C. elegans and D. melanogaster once the effect of MBV has been accounted for?

Introns are often considered good indicators of mutation patterns (Kliman and Hey 1993Citation , 1994Citation ; Akashi, Kliman, and Eyre-Walker 1998Citation ). Thus, in our previous work, we used introns as indicators of MBV, but we failed to detect any HRi on codon usage (Marais, Mouchiroud, and Duret 2001Citation ). However, introns may be poor indicators of MBV affecting synonymous sites in such compact genomes as D. melanogaster and C. elegans. Because selection on codon usage is not expected to act on lowly expressed genes, we used the synonymous sites of lowly expressed genes to account for the effect of MBV on codon bias. We measured the differences between codon biases of highly expressed genes and their lowly expressed neighbors. This measure of codon bias should therefore be independent of the MBV occurring at synonymous sites. In D. melanogaster we find that HRi probably affects selection on codon usage of genes located in regions of very low recombination (<1 cM/Mb). Under the assumption that highly expressed genes are representative of the genes experiencing selection on codon usage, only 4% of genes are affected by less effective selection on codon usage because of HRi in this species. In C. elegans we do not find any evidence for the effect of recombination on selection for codon bias. We suggest that the correlation between codon bias and recombination rate is a consequence of MBV in this species. Computer simulations indicate that HRi only affects selection on codon usage when the local recombination rate is below the mutation rate. This prediction of the model is consistent with our data and the current estimate of the mutation rate in D. melanogaster. The case of C. elegans, which is highly self-fertilizing, is discussed. Finally, our results suggest that HRi is a minor determinant of variations in codon bias across the genome.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
Genomic Data Analysis
The sequence data are from the complete genomes of C. elegans (Release May 2, 1999 downloaded by FTP at ftp://ncbi.nlm.nih.gov/genbank/genomes/C_elegans/; The C. elegans Sequencing Consortium 1998Citation ) and D. melanogaster (Release October 2, 2000 downloaded by FTP at http://www.fruitfly.org/sequence/download.html; Adams et al. 2000Citation ). The expression level was determined with a method based on expressed sequence tags (ESTs) counting (Duret and Mouchiroud 1999Citation ). Genes were classified according to the number of matching ESTs. The top 30% of genes with ESTs were classified as highly expressed genes (for C. elegans, n = 1,768 with more than 17 detected ESTs; for D. melanogaster, n = 2,399 with more than 12 detected ESTs). Genes without ESTs were classified as lowly expressed genes (for C. elegans, n = 9,392; for D. melanogaster, n = 5,219). Other genes are considered moderately expressed (for C. elegans, n = 4,034; for D. melanogaster, n = 6,132). Recombination rate (cM/Mb) was estimated with a previously described procedure (Kliman and Hey 1993Citation ; Barnes et al. 1995Citation ; Comeron, Kreitman, and Aguadé 1999Citation ; Marais, Mouchiroud, and Duret 2001Citation ). For each chromosome (or chromosome arm in D. melanogaster), we chose the least-order polynomial that fits the data with R2 >= 0.99. Recombination rate as a function of chromosomic location is estimated by taking the derivative of the polynomial curve. For C. elegans, 780 markers localized both in genetic maps and in whole genome sequences have been used (available at http://wormbase.sanger.ac.uk), and third- to fifth-order polynomial curves are used to model the relationship between genetic positions and physical positions. The wild populations of C. elegans are mainly self-fertile hermaphrodites and should have less recombination than the laboratory strains used to construct the genetic maps. In this species, we point out that recombination rates should be accurate relative to each other, but not as an absolute value. For D. melanogaster, 898 markers have been used to estimate recombination rate (available at http://flybase.bio.indiana.edu), and fourth-order polynomial curves are used to model the relationship between genetic positions and physical positions (except for the left arm of the third chromosome, where we retained the second-order polynomial because the use of a higher-order polynomial did not increase R2, which is 0.98). The recombination rate on the fourth chromosome was considered to be zero. Genes are classified into 10 classes of recombination rate with nearly 10% of the total number of genes per class for both C. elegans and D. melanogaster (except for 0–0.25 and 0.25–0.5 classes in D. melanogaster, 5% each). We measured codon bias by the frequency of optimal codons (Fop): Fop ranges from 0.33, when codon usage is uniform, to 1, when genes use only optimal codons (Stenico, Lloyd, and Sharp 1994Citation ; Duret and Mouchiroud 1999Citation ).

Random Sampling of the Data Set
To resolve the problem of the covariations of gene length and recombination rate, we forced the distribution of gene length to be the same for the different classes of recombination rate for both lowly and highly expressed genes. We chose the distribution of gene length of the recombination rate class with the smallest sample size among lowly and highly expressed genes to be the reference distribution of gene length for all other recombination rate classes for both lowly and highly expressed genes. For C. elegans, this distribution corresponds to 23% of genes with coding sequence (CDS) length <1,000 nucleotides, 18% of genes with CDS length of 1,000–1,750 nucleotides, and 59% of genes with CDS length >1,750 nucleotides. For D. melanogaster this distribution corresponds to 18% of genes with CDS length <800 nucleotides, 21% of genes with CDS length of 800–1,550 nucleotides, and 61% of genes with CDS length >1,550 nucleotides. We generated 10 new data sets by random sampling of genes in each class of sequence length for each class of recombination for both lowly and highly expressed genes. In D. melanogaster, n = 4,159 for each data set corrected for gene length variations; in C. elegans, n = 3,100 for each data set corrected for gene length variations.

Computer Simulations
The simulation process is close to that of previous simulations studies of HRi (Li 1987Citation ; Comeron, Kreitman, and Aguadé 1999Citation ; McVean and Charlesworth 2000Citation ): we assumed that each individual is represented by L biallelic sites (e.g., optimal and nonoptimal codons). The haploid population size is N. If not specified, the mutation rate from nonoptimal toward optimal codons is u, the reverse mutation rate is v = 2u leading to an equilibrium value of 0.33 without selection (Fop = 0.33 when codon usage is uniform), and the global mutation rate (number of mutation per site per generation) is m = u(1 - Fop) + vFop. The number of mutations follows a Poisson distribution of mean NLu and NLv. The number of crossing-overs per generation also follows a Poisson distribution of mean NLr where r is the recombination rate (number of crossing-over per site per generation). The N individuals of the next generation are randomly chosen by multinomial sampling among the N individuals of the present generation, given their relative fitness in the population. The absolute fitness of a sequence with i optimal sites is given by (1 + s)i, which is equivalent to negative selection on nonoptimal codons, given a simple transformation of selection coefficient s (Piganeau et al. 2001Citation ). The process is run for 4/(u + v) generations to reach equilibrium. The mean and variance of the equilibrium optimal codons frequency are calculated from 100 values checked every 2N generations, and each simulation is run at least four times. Without linkage between selected sites, selection efficiency is known to depend on the scaled mutation rates Nu and Nv and selection coefficients Ns (Li 1987Citation ). In the rest of the text, the Fop value expected without linkage between selected sites is referred as Fop-max. Under complete linkage, the selection efficiency depends on the scaled mutation rates NLu and NLv and Ns (McVean and Charlesworth 2000Citation ).


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
In D. melanogaster and C. elegans we have previously shown that the G+C content of noncoding DNA positively correlates with the recombination rate, suggesting that MBV varies with recombination in those organisms (Marais, Mouchiroud, and Duret 2001Citation ). Here we confirm these results with another estimate of recombination rate (see table 1 ). Note that the positive correlation between the G+C content of noncoding DNA and the recombination rate is weak but statistically significant and comparable to the positive correlation between the frequency of optimal codons (Fop) and the recombination rate (in D. melanogaster: Rs = 0.058 with P < 10-4, n = 13,750; in C. elegans: Rs = 0.105 with P < 10-4, n = 15,194). In D. melanogaster the association between the G+C content of noncoding DNA and the recombination rate remains unchanged when subtelomeric regions with controversial recombination rate estimates (Hey and Kliman 2002Citation ) are excluded (see table 1 ).


View this table:
[in this window]
[in a new window]
 
Table 1 The Correlation Between the G + C Content of Noncoding DNA and the Recombination Rate in D. melanogaster and C. elegans

 
Therefore, HRi on selection on codon usage can only be detected once MBV have been accounted for. Introns are often considered good indicators of mutation patterns because their evolution is assumed to be neutral (Kliman and Hey 1993Citation , 1994Citation ; Akashi, Kliman, and Eyre-Walker 1998Citation ). On the basis of this assumption, we used introns as indicators of MBV; thus, we computed the residuals of the regression between introns G+C content and codon bias, but we failed to detect any HRi on these residuals (Marais, Mouchiroud, and Duret 2001Citation ). However, most introns are short both in C. elegans (The C. elegans Sequencing Consortium 1998Citation ) and D. melanogaster (Adams et al. 2000Citation ). The base composition of these short introns is constrained by the presence of elements for splicing reaction (Fields 1990Citation ; Mount et al. 1992Citation ). Intergenic regions and introns at the first position in genes often contain regulatory elements of gene expression (Maroni 1994Citation ; Duret and Bucher 1997Citation ). Thus, most of the noncoding DNA may not really be neutral in both C. elegans and D. melanogaster. Large introns that are not at the first position in genes are probably neutral; but these introns often contain transposable elements, which are often AT-rich (Shields and Sharp 1989Citation ; Lerat, Biémont, and Capy 2000Citation ; Lerat, Capy, and Biémont 2002Citation ). Thus, the G+C content of introns and intergenic regions may be poor indicators of MBV affecting synonymous sites (Duret and Hurst 2001Citation ). Genes with a low expression level are not expected to undergo selection on codon usage. Consistent with this, these genes have a weak codon bias (Shields et al. 1988Citation ; Stenico, Lloyd, and Sharp 1994Citation ; Duret and Mouchiroud 1999Citation ) and a high number of synonymous substitutions (Shields et al. 1988Citation ; Sharp and Li 1989Citation ; Powell and Moriyama 1997Citation ; but see Dunn, Bielawski, and Yang 2001Citation ). Thus, the codon bias of lowly expressed genes should solely reflect mutation patterns. We can therefore estimate the effect of MBV with the codon bias of lowly expressed genes. In contrast, genes with a very high expression level are expected to undergo selection on codon usage. Consistent with this, these genes have a highly biased codon usage (Shields et al. 1988Citation ; Stenico, Lloyd, and Sharp 1994Citation ; Duret and Mouchiroud 1999Citation ) and a low number of synonymous substitutions (Shields et al. 1988Citation ; Sharp and Li 1989Citation ; Powell and Moriyama 1997Citation ; but see Dunn, Bielawski, and Yang 2001Citation ). Thus, the codon bias of highly expressed genes should be affected by both selection and mutation patterns. For these genes, the correlation between codon bias and recombination rate should be a consequence of both MBV and HRi if any. HRi can be brought to the fore by comparing codon biases of highly expressed genes (HRi + MBV) with lowly expressed genes (MBV) for different recombination rates.

We studied the complete genomes of C. elegans (The C. elegans Sequencing Consortium 1998Citation ) and D. melanogaster (Adams et al. 2000Citation ). We measured codon bias by the frequency of optimal codons (Fop) (Stenico, Lloyd, and Sharp 1994Citation ; Duret and Mouchiroud 1999Citation ). For each highly expressed gene, we measured the average difference between its Fop and the Fop of its lowly expressed neighbors over an interval of 100 kb centered on the midpoint of the highly expressed gene. In this way, we removed the local effect of MBV on Fop of highly expressed genes. In figure 1 , we show the residuals of Fop after the removal of the MBV effect on codon usage (noted Fop-MBV for Fop corrected for MBV) according to recombination rate. The overall relationship between Fop-MBV and recombination rate is clearly not linear (see fig. 1 ). In D. melanogaster we observed a weak but significant increase of Fop-MBV with recombination rate for highly expressed genes located in regions of recombination rate of 0–1 cM/Mb (Spearman's rank correlation coefficient Rs = 0.129 with P = 0.0033) and no relationship between Fop-MBV and recombination rate for the other highly expressed genes (1 to > 3.9 cM/Mb, Rs = -0.019 with P = 0.32). This observation suggests that codon usage of highly expressed genes located in regions with recombination rate under ~1 cM/Mb in D. melanogaster probably experiences HRi. The same is found for moderately expressed genes, although variations in Fop-MBV induced by HRi tend to be weaker (see fig. 1 ). For these genes, variations in Fop-MBV in regions of recombination rate of 0–1 cM/Mb are not significant (Rs = 0.021 with P = 0.46). Thus, we do not consider them in the rest of the analysis. In C. elegans the relationship between Fop-MBV and recombination rate for highly expressed genes is not convincing, although there is a global correlation between the two parameters (Rs = 0.064 with P < 0.0075). For moderately expressed genes, the relationship is not convincing, and there is no global correlation (see fig. 1 ).



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 1.—Relationship between codon bias corrected for MBV and recombination in (A) D. melanogaster and (B) C. elegans. To remove the local effect of MBV on Fop of each highly expressed gene, we measured the average difference between its Fop and the Fop of its lowly expressed neighbors over an interval of 100 kb centered on the midpoint of the highly expressed gene. The results are unchanged by using intervals ranging from 50 to 500 kb, by using other codon bias index, and by excluding subtelomeric regions with controversial recombination rates (Hey and Kliman 2002Citation ). The residuals of Fop after removal of the MBV effect on codon usage are noted Fop-MBV (for Fop corrected for MBV). The same approach has been applied to moderately expressed genes. Error bars correspond to the 95% interval

 
In D. melanogaster and C. elegans, codon bias is strongly correlated with gene length (Moriyama and Powell 1998Citation ; Comeron, Kreitman, and Aguadé 1999Citation ; Duret and Mouchiroud 1999Citation ; Marais and Duret 2001Citation ). Because the distribution of gene length is not random with respect to recombination rate in either genome (see fig. 2 ), we forced the distribution of gene length to be the same in each class of recombination rate. We generated 10 new data sets by random sampling of genes. In figure 3 , we show the reevaluation of the relationship between Fop-MBV and recombination rate for D. melanogaster and C. elegans for 10 data sets corrected for gene length variations. We still observed a significant linear relationship between Fop-MBV and recombination rate for highly expressed genes located in regions of recombination rate of 0–1 cM/Mb, and no relationship for the other highly expressed genes in D. melanogaster. We did not observe any significant relationship between Fop-MBV and recombination rate for all highly expressed genes in C. elegans. Thus, the relationship between Fop-MBV and recombination rate for highly expressed genes primarily detected in C. elegans (see fig. 1 ) is a byproduct of the variations of gene lengths along the genome. We have no evidence for the effect of recombination on selection for codon bias in this species. However, we detected HRi on codon usage in D. melanogaster. Note that variations in codon bias induced by HRi are only ~5% for highly expressed genes (and ~1.5% for moderately expressed genes, see fig. 1 ). Moreover, HRi influences codon usage only in regions of low recombination rate (<1 cM/Mb). These regions contain 20.1% of the total number of genes in D. melanogaster: 7.2% are lowly expressed genes, 9.1% are moderately expressed genes, and 3.8% are highly expressed genes. However, lowly expressed genes can be excluded because they probably do not undergo selection on codon usage. Moderately expressed genes can also be excluded because of the limited impact of HRi on codon bias of these genes. Thus, few genes (~4%) are affected by less effective selection on codon usage in this species. These genes are located in genomic regions corresponding mainly to the fourth chromosome and to the subtelomeric and pericentromeric regions of the other chromosomes (as defined in Kliman and Hey 1993Citation ). Thus, HRi is a minor determinant of variations in codon bias across the genome.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 2.—Relationship between gene length (bp) and recombination in (A) D. melanogaster and (B) C. elegans. Error bars correspond to the 95% interval

 


View larger version (21K):
[in this window]
[in a new window]
 
Fig. 3.—Relationship between codon bias corrected for MBV and recombination in (A) D. melanogaster and (B) C. elegans. The distributions of gene lengths are the same for the different classes of recombination rate for all the 10 data sets generated by random sampling. For D. melanogaster we found a significant linear relationship between Fop-MBV and recombination rate for highly expressed genes located in regions of recombination rate of 0–1 cM/Mb and no relationship for the other highly expressed genes for 7 of the 10 sampled data sets. For C. elegans we found no significant relationship between Fop-MBV and recombination rate for highly expressed genes for all the sampled data sets

 
The relationship between selection efficiency on codon usage and recombination was also investigated by simulation. In figure 4 , we show the nonlinear relationship between HRi and the ratio of the recombination rate (r) over the mutation rate (m). For a lower recombination rate (r < 4 m, from our simulations), recombination increases selection efficiency. For a higher recombination rate, associations between the alleles are broken down sufficiently fast so that they behave as if there were independent: the mutation-selection equilibrium optimal codon frequency reaches 95% of the value with independent codons (Fop-max). Thus, we show through computer simulations that selection efficiency depends on recombination for a reduced range of recombination rates, depending on the r/m ratio. Furthermore, our simulations enable us to estimate that 95% of Fop-max is reached if r >= 4m. This does not change for different Ns and N(u + v)L (see fig. 4 ). In D. melanogaster the recombination rate above which Fop no longer increases with recombination is 1 cM/Mb = 10-8 recombinations per site per generation (see fig. 1 ). Using the formula r = 4m, this gives a mean mutation rate equal to 2.5 x 10-9 mutations per site per generation, which is consistent with current estimates of the mean mutation rate in this species (Drake et al. 1998Citation ; Keightley and Eyre-Walker 1999Citation ). In C. elegans the current estimates of the mean mutation rate are known to be lower (Drake et al. 1998Citation ). Accordingly, HRi should have an effect in regions where the recombination rate is lower than 1 cM/Mb in C. elegans, which is consistent with our failure to detect genes experiencing the effect of recombination on codon bias in this species. Although the sex ratio in the wild populations of C. elegans is largely unknown, they are thought to be mainly self-fertile hermaphrodites and should have less recombination than the laboratory strains used to construct genetic maps. Recombination rates should be accurate relative to each other but are probably overestimated (Nordborg 2000Citation ). Thus, in C. elegans the expected limit r = 4m cannot be easily compared with our recombination data; hence, the absence of a relationship between Fop-MBV and recombination rates reflects either that the whole genome experiences HRi or that the whole genome does not experience HRi. As the selection strengths necessary to have a high codon bias in a genome composed of totally linked sites should be very important, it is more likely that the genome of C. elegans does not experience HRi; but only a comparison between a closely related and cross-fertilizing nematode species such as Caenorhabditis remanei (Haag and Kimble 2000Citation ) may allow discrimination between those two hypotheses. Unfortunately, the sequence data available for this species are poor. Thus, our analysis suggests that in C. elegans genes could be concentrated in regions of relatively low recombination rate without suffering of HRi as observed in the pericentromeric regions (Barnes et al. 1995Citation ; The C. elegans Sequencing Consortium 1998Citation ). As a conclusion, our results suggest that HRi is a minor determinant of variations in codon bias across the genome because it has small effects and influences few genes in D. melanogaster, and it entails no detectable variation in C. elegans.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 4.—Simulation study of HRi on selection on codon usage. We observed that the selection efficiency depends on the ratio of the recombination rate (r) over the mutation rate (m). The r/m ratio above which 95% of Fop-max (see text) is reached, is obtained for r >= 4m. This did not change with (A) different Ns (with Nm = 0.01, N(u + v)L = 320, u/(u + v) = 0.33) and (B) different N(u + v)L (with Nm = 0.01, Ns = 1, u/(u + v) = 0.4). See Materials and Methods for the correspondence between u, v, and m. Fop values for r/m > 4 with N(u + v)L = 3,200 are not available because of nonpractical simulation times. Data on synonymous polymorphism in Drosophila species gave Ns = 2 (Akashi 1995Citation ), although these values may be underestimated (Andolfatto and Przeworski 2000Citation ; Andolfatto 2001Citation ). A rough estimate of NmL in C. elegans is 20 (number of highly expressed genes per chromosome = 1,768/6, mean length of highly expressed genes = 688 codons, and Nm = 0.0001 derived from Koch et al. 2000) and 3,375 in D. melanogaster (number of highly expressed genes per chromosome = 2,443/4, mean length of highly expressed genes = 551 codons, and Nm = 0.01 [McVean and Charlesworth 2000])

 


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
Special thanks to Laurent Duret and Adam Eyre-Walker for helpful comments on the manuscript and friendly support. We also thank Vincent Daubin, Christian Gautier, Dominique Mouchiroud, and Stephen Wright for stimulating discussions. We are grateful to Roland Westrelin and Bernard Tourancheau for their help in the optimization of the simulation program. This work is supported by the Ministere de la Recherche et de l'Enseignement Superieur and the French bioinformatics programme.


    Footnotes
 
Wolfgang Stephan, Reviewing Editor

1 Both authors contributed equally to this work Back

Abbreviations: Fop, frequency of optimal codons; Rs, Spearman's rank correlation coefficient; MBV, mutational bias variations; HRi, Hill-Robertson interference; Fop-MBV, Fop corrected for MBV; r, recombination rate; m, mutation rate; N, effective population size; L, number of selected sites; s, selection coefficient, CDS, coding sequence; Fop-max, Fop value expected with independent selected sites; EST, expressed sequence tag. Back

Keywords: codon usage recombination mutation patterns Hill-Robertson interference Drosophila Caenorhabditis Back

Address for correspondence and reprints: Gabriel Marais, Laboratoire "Biométrie et biologie évolutive," UMR CNRS 5558, Université Claude Bernard Lyon 1, 43 Bvd du 11 novembre 1918, 69622 Villeurbanne, France. E-mail: marais{at}biomserv.univ-lyon1.fr Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 

    Adams M. D., S. E. Celniker, R. A. Holt, et al. (95 co-authors) 2000 The genome sequence of Drosophila melanogaster Science 287:2185-2195[Abstract/Free Full Text]

    Akashi H., 1995 Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA Genetics 139:1067-1076[Abstract/Free Full Text]

    Akashi H., R. M. Kliman, A. Eyre-Walker, 1998 Mutation pressure, natural selection, and the evolution of base composition in Drosophila Genetica 102/103:49-60

    Andolfatto P., 2001 Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila melanogaster and Drosophila simulans Mol. Biol. Evol 18:279-290[Abstract/Free Full Text]

    Andolfatto P., M. Przeworski, 2000 A Genome-wide departure from the standard neutral model in natural populations of Drosophila Genetics 156:257-268[Abstract/Free Full Text]

    Barnes T. M., Y. Kohara, A. Coulson, S. Hekimi, 1995 Meiotic recombination, noncoding DNA and genomic organization in Caenorhabditis elegans Genetics 141:159-179[Abstract/Free Full Text]

    Baudat F., A. Nicolas, 1997 Clustering of meiotic double-strand breaks on yeast chromosome III Proc. Natl. Acad. Sci. USA 94:5213-5218[Abstract/Free Full Text]

    Bill C. A., W. A. Duran, N. R. Miselis, J. A. Nickoloff, 1998 Efficient repair of all types of single-base mismatches in recombination intermediates in Chinese Hamster ovary cells. Competition between long-patch and G-T glycosylase-mediated repair of G-T mismatches Genetics 149:1935-1943[Abstract/Free Full Text]

    Brown T. C., J. Jiricny, 1988 Different base/base mispairs are corrected with different efficiencies and specificities in monkey kidney cells Cell 54:705-711[ISI][Medline]

    Bulmer M., 1991 The selection-mutation-drift theory of synonymous codon usage Genetics 129:897-907[Abstract/Free Full Text]

    The C. elegans Sequencing Consortium. 1998 Genome sequence of the nematode C. elegans: a platform for investigating biology Science 282:2012-2018[Abstract/Free Full Text]

    Chiapello H., F. Lisacek, M. Caboche, A. Henaut, 1998 Codon usage and gene function are related in sequences of Arabidopsis thaliana Gene 209:GC1-GC38[ISI][Medline]

    Comeron J. M., M. Kreitman, M. Aguadé, 1999 Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila Genetics 151:239-249[Abstract/Free Full Text]

    Drake J. W., B. Charlesworth, D. Charlesworth, J. Crow, 1998 Rates of spontaneous mutation Genetics 148:1667-1686[Abstract/Free Full Text]

    Dunn K. A., J. P. Bielawski, Z. Yang, 2001 Substitution rates in Drosophila nuclear genes: implications for translational selection Genetics 157:295-305[Abstract/Free Full Text]

    Duret L., 2000 tRNA gene number and codon usage in C. elegans genome are co-adapted for optimal translation of highly expressed genes Trends Genet 16:287-289[ISI][Medline]

    Duret L., P. Bucher, 1997 Searching for regulatory elements in human noncoding sequences Curr. Opin. Struct. Biol 7:399-406[ISI][Medline]

    Duret L., L. D. Hurst, 2001 The elevated GC content at exonic third sites is not evidence against neutralist models of isochore evolution Mol. Biol. Evol 18:757-762[Abstract/Free Full Text]

    Duret L., D. Mouchiroud, 1999 Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis Proc. Natl. Acad. Sci. USA 96:4482-4487[Abstract/Free Full Text]

    Eisenbarth I., G. Vogel, W. Krone, W. Vogel, G. Assum, 2000 An isochore transition in the NF1 gene region coincides with a switch in the extent of linkage disequilibrium Am. J. Hum. Genet 67:873-880[ISI][Medline]

    Eyre-Walker A., 1993 Recombination and mammalian genome evolution Proc. R. Soc. Lond. B 252:237-243[ISI][Medline]

    Felsenstein J., 1974 The evolutionary advantage of recombination Genetics 78:737-756[Abstract/Free Full Text]

    Fields C., 1990 Information content of Caenorhabditis elegans splice site sequences varies with intron length Nucleic Acids Res 18:1509-1512[Abstract]

    Fullerton S. M., A. Bernardo Carvalho, A. G. Clark, 2001 Local rates of recombination are positively correlated with GC content in the human genome Mol. Biol. Evol 18:1139-1142[Free Full Text]

    Galtier N., G. Piganeau, D. Mouchiroud, L. Duret, 2001 GC-content evolution in mammalian genomes: the biased gene conversion hypothesis Genetics 159:907-911[Free Full Text]

    Gerton J. L., J. DeRisi, R. Shroff, M. Lichten, P. O. Brown, T. D. Petes, 2000 Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae Proc. Natl. Acad. Sci. USA 97:11383-11390[Abstract/Free Full Text]

    Haag E. S., J. Kimble, 2000 Regulatory elements required for development of Caenorhabiditis elegans hermaphrodites are conserved in the tra-2 homologue of C. remanei a male/female sister species Genetics 155:105-116[Abstract/Free Full Text]

    Hartl D. L., E. N. Moriyama, S. A. Sawyer, 1994 Selection intensity for codon bias Genetics 138:227-234[Abstract/Free Full Text]

    Hey J., R. M. Kliman, 2002 Interactions between natural selection, recombination and gene density in the genes of Drosophila Genetics 160:595-608[Abstract/Free Full Text]

    Hill W. G., A. Robertson, 1966 The effect of linkage on limits to artificial selection Genet. Res 8:269-294[ISI][Medline]

    Keightley P. D., A. Eyre-Walker, 1999 Deleterious mutations and the evolution of sex Science 290:331-333[Abstract/Free Full Text]

    Kliman R. M., J. Hey, 1993 Reduced natural selection associated with low recombination in Drosophila melanogaster Mol. Biol. Evol 10:1239-1258[Abstract]

    ———. 1994 The effects of mutation and natural selection on codon bias in the genes of Drosophila Genetics 137:1049-1056[Abstract/Free Full Text]

    Koch R., H. G. van Luenen, M. van der Horst, K. L. Thijssen, R. H. Plasterk, 2000 Single nucleotide polymorphisms in wild isolates of Caenorhabditis elegans Genome Res 10:1690-1696.[Abstract/Free Full Text]

    Lerat E., C. Biémont, P. Capy, 2000 Codon usage and the origin of P elements Mol. Biol. Evol 17:467-468[Free Full Text]

    Lerat E., P. Capy, C. Biémont, 2002 Codon usage by transposable elements and their host genes in five species J. Mol. Evol 54:625-637.[ISI][Medline]

    Li W.-H., 1987 Models of nearly neutral mutations with particular implications for nonrandom usage of synonymous codons J. Mol. Evol 24:337-345[ISI][Medline]

    Marais G., L. Duret, 2001 Synonymous codon usage, accuracy of translation, and gene length in Caenorhabditis elegans J. Mol. Evol 52:275-280[ISI][Medline]

    Marais G., D. Mouchiroud, L. Duret, 2001 Does recombination improve selection on codon usage? Lessons from nematode and fly complete genomes Proc. Natl. Acad. Sci. USA 98:5688-5692[Abstract/Free Full Text]

    Maroni G., 1994 The organization of Drosophila genes DNA Seq 4:347-354[ISI][Medline]

    McVean G. A. T., B. Charlesworth, 2000 The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation Genetics 155:929-944[Abstract/Free Full Text]

    Moriyama E. N., J. R. Powell, 1997 Codon usage bias and tRNA abundance in Drosophila J. Mol. Evol 45:514-523[ISI][Medline]

    ———. 1998 Gene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli Nucleic Acids Res 26:3188-3193[Abstract/Free Full Text]

    Mount S. M., C. Burks, G. Hertz, G. D. Stormo, O. White, C. Fields, 1992 Splicing signals in Drosophila: intron size, information content, and consensus sequences Nucleic Acids Res 20:4255-4262[Abstract]

    Nordborg M., 2000 Linkage disequilibrium, gene trees and selfing: an ancestral recombination graph with partial self-fertilization Genetics 154:923-929[Abstract/Free Full Text]

    Perry J., A. Ashworth, 1999 Evolutionary rate of a gene affected by chromosomal position Curr. Biol 9:987-989[ISI][Medline]

    Piganeau G., R. Westrelin, B. Tourancheau, C. Gautier, 2001 Multiplicative versus additive selection in relation to genome evolution: a simulation study Genet. Res 78:171-175[ISI][Medline]

    Powell J. R., E. N. Moriyama, 1997 Evolution of codon usage bias in Drosophila Proc. Natl. Acad. Sci. USA 94:7784-7790[Abstract/Free Full Text]

    Sharp P. M., W. H. Li, 1989 On the rate of DNA sequence evolution in Drosophila J. Mol. Evol 28:398-402[ISI][Medline]

    Sharp P. M., M. Stenico, J. F. Peden, A. T. Lloyd, 1993 Codon usage: mutational bias, translational selection, or both? Biochem. Soc. Trans 21:835-841[ISI][Medline]

    Shields D. C., P. M. Sharp, 1989 Evidence that mutation patterns vary among drosophila transposable elements J. Mol. Biol 207:843-846[ISI][Medline]

    Shields D. C., P. M. Sharp, D. G. Higgins, F. Wright, 1988 "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons Mol. Biol. Evol 5:704-716[Abstract]

    Smith K. N., A. Nicolas, 1998 Recombination at work for meiosis Curr. Opin. Genet. Dev 8:200-211[ISI][Medline]

    Stenico M., A. T. Lloyd, P. M. Sharp, 1994 Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases Nucleic Acids Res 22:2437-2446[Abstract]

    Takano-Shimizu T., 2001 Local changes in GC/AT substitution biases and in crossover frequencies on Drosophila chromosomes Mol. Biol. Evol 18:606-619[Abstract/Free Full Text]

    Yu A., C. Zhao, Y. Fan, et al. (11 co-authors) 2001 Comparison of human genetic and sequence-based physical maps Nature 409:951-953[ISI][Medline]

Accepted for publication March 14, 2002.