Translational Selection on Codon Usage in Xenopus laevis

Héctor Musto, Stéphane Cruveiller, Giuseppe D'Onofrio, Héctor Romero and Giorgio Bernardi

Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Naples, Italy;
Laboratorio de Organización y Evolución del Genoma, Sección Bioquímica, Facultad de Ciencias, Montevideo, Uruguay;
Departamento de Genética, Facultad de Medicina, Montevideo, Uruguay


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
A correspondence analysis of codon usage in Xenopus laevis revealed that the first axis is strongly correlated with the base composition at third codon positions. The second axis discriminates between putatively highly expressed genes and the other coding sequences, with expression levels being confirmed by the analysis of Expressed sequence tag frequencies. The comparison of codon usage of the sequences displaying the extreme values on the second axis indicates that several codons are statistically more frequent among the highly expressed (mainly housekeeping) genes. Translational selection appears, therefore, to influence synonymous codon usage in Xenopus.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
With very few exceptions, the genetic code is the same in all living organisms. All amino acids except for methionine and tryptophan are encoded by more than one codon. Synonymous codons are, however, not randomly used (Grantham et al. 1980Citation ), and the factors governing synonymous codon preferences are not the same in different organisms.

Indeed, in the unicellular organisms Escherichia coli and Saccharomyces cerevisiae, synonymous codon choices appear to be positively correlated with the relative abundances of tRNAs, with the correlation being very strong for highly expressed genes (Ikemura 1981, 1982, 1985Citation ; Bennetzen and Hall 1982Citation ; Gouy and Gautier 1982Citation ; Sharp and Li 1986Citation ; Bulmer 1988, 1991Citation ; Kanaya et al. 1999Citation ; for reviews, see Sharp and Matassi 1994Citation ; Sharp et al. 1995Citation ; Akashi and Eyre-Walker 1998Citation ).

In multicellular organisms, different patterns can be found. For example, in Caenorhabditis elegans and Drosophila melanogaster, which are characterized by extensive variation in codon usage, the factors governing the choices have been attributed to an equilibrium between mutational biases and translational selection (Shields et al. 1988Citation ; Sharp and Li 1989Citation ; Moriyama and Gojobori 1992Citation ; Carulli et al. 1993Citation ; Akashi 1994, 1997Citation ; Stenico, Lloyd, and Sharp 1994Citation ; Moriyama and Powell 1997Citation ; Powell and Moriyama 1997Citation ). Translational selection at silent sites has also been reported to be the main factor shaping codon usage in Zea mays (Fennoy and Bailey-Serres 1993Citation ) and Arabidopsis thaliana (Chiapello et al. 1998Citation ).

Compositionally compartmentalized genomes, like those of vertebrates and, in particular, those of warm-blooded vertebrates, show multiple codon usages. The compositional properties of those genomes and, more precisely, the compositional correlations existing between coding sequences (and their different codon positions) and isochores (see Bernardi [2000]Citation for a review) affect codon usage. The situation is strikingly different for genes located in GC-poor and GC-rich isochores (Bernardi and Bernardi 1985Citation ; Bernardi et al. 1985Citation ; D'Onofrio et al. 1991Citation ; Cruveiller, D'Onofrio, and Bernardi 2000Citation ). This point is best illustrated by the example of {alpha}- and ß-globin genes, which show very different codon usages because they are located in isochores with very different levels of GC, in spite of both being very highly expressed and at nearly equimolar amounts in the same cells (Bernardi et al. 1985Citation ).

Expectedly, therefore, when applied to mammalian sequences, multivariate statistical analysis reveals a single major trend that is strongly correlated with the GC level at third codon positions (GC3) of each gene. Moreover, the first axis does not discriminate aspects of gene function such as regulation during development, tissue specificity, constitutive expression, intracellular localization of the protein product, etc. (Sharp et al. 1988, 1995Citation ; Sharp and Matassi 1994Citation ).

Along another line, no correlation was found between the rate of synonymous substitutions (Ks) and either the expression level or the tissue specificity of genes in a mouse/rat comparison (Wolfe and Sharp 1993Citation ). The conclusion that expression levels do not influence the codon usage pattern in mammals was also drawn by analyzing expressed sequence tags (ESTs) in different tissues (Duret and Mouchiroud 2000)Citation .

Since GC3-rich genes represent roughly half of human genes (see Bernardi [2000]Citation for a review), and since the multivariate analysis was carried out on human genes regardless of their GC3 levels (Sharp et al. 1988Citation ), one might think, however, that even if a translational selection effect exists in mammals, it could be swamped out by the much stronger compositional constraints. We decided, therefore, to apply multivariate analysis to the coding sequences of Xenopus laevis, which are characterized by a much narrower GC3 distribution, with very scarce high GC3 values.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Complete coding sequences (CDS) from X. laevis were retrieved from GenBank (release 114.0, December 1999) using ACNUC (Gouy et al. 1985Citation ). Redundancies were removed by CLEANUP (Grillo et al. 1996Citation ). The final data set included 1,303 genes. Codon usage, correspondence analysis COA, GC3 (the frequency of codons ending in C or G, excluding Met, Trp, and stop codons), the "effective number of codons" (Nc) (Wright 1990Citation ), the relative synonymous codon usage (RSCU) (Sharp and Li 1986Citation ), and the codon adaptation index (CAI) (Sharp and Li 1987Citation ) were calculated using the program CodonW 1.3 (J. Peden; http://molbiol.ox.ac.uk/Win95.codonW.zip.) Expression levels were estimated by retrieving from the TIGR database the ESTs (http://www.tigr.org/tdb/xgi/) that matched our CDS sample.


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
The mean GC3 content of the nuclear genes of Xenopus is 48.8%, with a standard deviation of 11%. The distribution of GC3 is much narrower than that for human genes and does not show the abundant GC3-rich values of human sequences (see Bernardi 2000)Citation . Indeed, genes with GC3 values higher than 60% represent only 19% of all genes investigated in the Xenopus genome, whereas they represent 55% in the data set reference from the human genome. The Nc values (Nc is a measure of the bias in codon usage of the genes, and usually highly expressed sequences display lower values compared with lowly expressed sequences) show, however, a relatively broad range, from 30.8 to 61.0 (not shown). These features suggest that there is some variation in codon usage among the sequences. In order to understand the causes of this variation, we conducted a COA of the RSCU values for all of the genes available from Xenopus. The position of each sequence on the plane defined by the first two axes is displayed in figure 1 .



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 1.—Distribution of Xenopus genes on the plane defined by the two main axes of the correspondence analysis

 
The proportions of the total variance accounted for by these principal axes were 20.3% and 6.5%, respectively. The analysis therefore detects a single major source of variation, which is strongly correlated with the GC3 level of each gene (R = 0.98). This axis does not discriminate any other biological feature of the genes, such as gene size, housekeeping or tissue-specific pattern of expression, or intracellular localization of the protein product. Moreover, no correlation was found (R = 0.047; NS) with gene expression levels, that is, with the EST frequencies, even excluding from the analysis those genes with undetected expression levels. Similar results were previously reported for a human data set by Sharp et al. (1988)Citation .

When genes were sorted according to their positions along the second axis, a significant correlation was found with the pyrimidine (Y) content of the genes at the third codon positions (R = -0.37; P < 0.0001). A striking result was, however, that constitutively highly expressed housekeeping genes, such as ribosomal proteins, histones, elongation factors, tubulins, and several enzymes from the intermediary metabolism, were clustered in the top 10% of the distribution. For example, there are 24 sequenced genes coding for ribosomal proteins, and 63% of them are placed in the first third of the distribution along axis 2. Furthermore, highly expressed, tissue-specific coding sequences, such as several actins, {alpha}- and ß-globin, troponin, cytokeratin, etc., were also located in the same group. Regulatory sequences such as zinc finger proteins, oncogenes, homeobox genes, growth factors, etc. were located at the other 10% end of the distribution, which did not comprise any highly expressed housekeeping sequences. Therefore, it seems clear that axis 2 of COA is related to the expression level of each gene.

In order to confirm this interpretation and to obtain an approximate quantitative estimation of the expression levels of the 1,303 genes studied in this paper, we counted the number of matching ESTs for each sequence and their distribution along axis 2. The result of this analysis is shown in figure 2 . In spite of the biased nature of the libraries and the fact that 45% of the genes did not match any EST, the general pattern clearly confirms that there is a gradient of expression from the left (where the majority of ribosomal proteins and other highly expressed housekeeping genes are placed) to the right of the distribution along axis 2. Furthermore, it should be stressed that a significant correlation holds between the position of each sequence on the second axis and the number of corresponding ESTs (R = -0.23; P < 0.0001). This quantitative analysis demonstrates that the sequences with most negative values along axis 2 are more highly expressed. In other words, axis 2 does indeed discriminate expression levels.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 2.—Histogram of the distribution of expressed sequence tags (ESTs) along axis 2. The axis was divided into 10 parts, each of them containing an equal number of genes. In each part, the total number of ESTs was calculated

 
Our next step was to calculate the CAI value for each gene taking as a reference only the genes coding for ribosomal proteins, which are very highly expressed. In this case, we found that the CAI values were strongly correlated with the positions of the genes along the second axis after removal of the ribosomal proteins (R = -0.62; P < 0.0001). A very similar result (R = -0.53; P < 0.0001) was obtained when the reference set comprised the genes that matched more than 20 ESTs. Finally, the CAI values are significantly correlated with the corresponding numbers of matching ESTs (R = 0.21; P < 0.0001). These results confirm that axis 2 is strongly correlated with the expression level of each sequence in Xenopus. Interestingly, the mean Nc value of the genes located at the 5% end of the second axis, where putatively highly expressed sequences were placed, was 46.5, while at the other end it was 52.2, as expected (see above).

Similar results with regard to codon usage, using multivariate analyses, have been widely reported for unicellular species. They have usually been interpreted in terms of natural selection acting at the level of translation (for reviews, see Sharp and Matassi 1994Citation ; Sharp et al. 1995Citation ; Akashi and Eyre-Walker 1998Citation ). Remarkably, similar results were found not only among microorganisms, but also in multicellular species, such as C. elegans (Stenico, Lloyd, and Sharp 1994Citation ).

It should be stressed, however, that there are two main differences between the nematode results and the Xenopus results. These differences concern the source of variation which discriminates expression levels, and the amount of variation which is accounted for by the axis correlated with expression levels. Indeed, while in the nematode it is the first axis which is correlated with expression (and which, by definition, accounts for the majority of the variance), in Xenopus the axis related to that feature is the second, which accounts for a lower proportion of the total variability in codon usage. Accordingly, the differences appear to be more quantitative than qualitative, and hence we conclude that in Xenopus, translational selection indeed influences synonymous codon usage, even if it does so to a lesser extent than in C. elegans.

Our final step was to identify the translationally preferred codons in Xenopus. The codon usage patterns of the sequences displaying the extreme values at both ends of the second axis (100 genes each) were compared, and the differences were tested with a {chi}2 test. The result of this analysis (table 1 ) shows that there are 22 putative preferred codons corresponding to 17 amino acids (the only amino acid with no preferred codon is Tyr), and 50% of the codons are T-ending. Among stop codons, TAA is by far the most frequently used in highly expressed sequences, while an opposite trend was found for TAG. Remarkably, 82% of the preferred triplets are Y-ending, a point which explains the negative correlation previously described between the positions of sequences along the second axis and the corresponding Y levels in the third codon positions.


View this table:
[in this window]
[in a new window]
 
Table 1 Putative Preferred Codons in Xenopus laevis

 
Some general rules emerge from the analysis of the preferred codons: (1) for all quartets (including those belonging to sextets), the T-ending codons are always preferred in highly expressed genes; among these codons, if there is a second favored triplet, it is C-ending; (2) the G-ending codons are the chosen ones for the purine-ending duets; (3) the duets of sextets are never significantly incremented among the highly expressed sequences; (4) the A-ending codons are never preferred; and (5) the NCG codons are always the least frequent, and their usage is further decreased among highly expressed genes.


    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Codon usage in Xenopus, although determined mainly by compositional constraints, is influenced by translational selection. Indeed, a COA on RSCU values detects a trend, independent of GC3, that discriminates putatively highly expressed (mainly housekeeping, but also tissue-specific) genes from the other sequences. This is confirmed by the significant correlation found between the position of each gene along the second axis and the number of matching ESTs and CAI values.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
We thank the referees for their very useful suggestions. S.C. thanks the European Community for contract ERBFMRXCT980221 in the frame of the Training and Mobility of Researchers Programme.


    Footnotes
 
Adam Eyre-Walker, Reviewing Editor

1 Keywords: codon usage isochores translational selection vertebrates correspondence analysis Back

2 Address for correspondence and reprints: Giorgio Bernardi, Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, I-80121, Naples, Italy. bernardi{at}alpha.szn.it Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 

    Akashi H., 1994 Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy Genetics 136:927-935[Abstract/Free Full Text]

    ———. 1997 Codon bias evolution in Drosophila Population genetics of mutation-selection drift. Gene 205:269-278[ISI][Medline]

    Akashi H., A. Eyre-Walker, 1998 Translational selection and molecular evolution Curr. Opin. Genet. Dev 8:688-693[ISI][Medline]

    Bennetzen J. L., B. D. Hall, 1982 Codon selection in yeast J. Biol. Chem 257:3026-3031[Abstract/Free Full Text]

    Bernardi G., 1995 The human genome: organization and evolutionary history Annu. Rev. Genet 29:445-476[ISI][Medline]

    ———. 2000 Isochores and the evolutionary genomics of vertebrates Gene 241:3-17[ISI][Medline]

    Bernardi G., G. Bernardi, 1985 Codon usage and genome composition J. Mol. Evol 22:363-365[ISI][Medline]

    Bernardi G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunier-Rotival, F. Rodier, 1985 The mosaic genome of warm-blooded vertebrates Science 228:953-958[ISI][Medline]

    Bulmer M., 1988 Codon usage and intragenic position J. Theor. Biol 133:67-71[ISI][Medline]

    ———. 1991 The selection-mutation-drift theory of synonymous codon usage Genetics 129:897-907[Abstract/Free Full Text]

    Carulli J. P., D. E. Krane, D. L. Hartl, H. Ochman, 1993 Compositional heterogeneity and patterns of molecular evolution in the Drosophila genome Genetics 134:837-845[Abstract/Free Full Text]

    Chiapello H., F. Lisacek, M. Caboche, A. Henaut, 1998 Codon usage and gene function are related in sequences of Arabidopsis thaliana Gene 209:GC1-GC38[ISI][Medline]

    Cruveiller S., G. D'Onofrio, G. Bernardi, 2000 The compositional transition between the genomes of cold- and warm-blooded vertebrates: codon frequencies in orthologous genes Gene 261:71-83[ISI][Medline]

    D'Onofrio G., D. Mouchiroud, B. Assani, C. Gautier, G. Bernardi, 1991 Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins J. Mol. Evol 32:504-510[ISI][Medline]

    Duret L., D. Mouchiroud, 2000 Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate Mol. Biol. Evol 17:68-74[Abstract/Free Full Text]

    Fennoy S. L., J. Bailey-Serres, 1993 Synonymous codon usage in Zea mays L nuclear genes is varied by levels of C and G-ending codons. Nucleic Acids Res 21:5294-5300[Abstract]

    Gouy M., C. Gautier, 1982 Codon usage in bacteria: correlation with gene expressivity Nucleic Acids Res 10:7055-7074[Abstract]

    Gouy M., C. Gautier, M. Attimonelli, C. Lanave, G. Di Paola, 1985 ACNUC–a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage Comput. Appl. Biosci 1:167-172[Abstract]

    Grantham R, C. Gautier, M. Gouy, R. Mercier, A. Pave, 1980 Codon catalog usage and the genome hypothesis Nucleic Acids Res 8:r49-r62[Abstract]

    Grillo G., M. Attimonelli, S. Liuni, G. Pesole, 1996 CLEANUP: a fast computer programme for removing redundancies from nucleotide sequence databank CABIOS 12:1-8[Abstract]

    Ikemura T., 1981 Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system J. Mol. Biol 151:389-409[ISI][Medline]

    ———. 1982 Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs J. Mol. Biol 158:573-597[ISI][Medline]

    ———. 1985 Codon usage and tRNA content in unicellular and multicellular organisms Mol. Biol. Evol 2:13-34[Abstract]

    Kanaya S., Y. Yamada, Y. Kudo, T. Ikemura, 1999 Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis Gene 238:143-155[ISI][Medline]

    Moriyama E. N., T. Gojobori, 1992 Rates of synonymous substitution and base composition of nuclear genes in Drosophila Genetics 130:855-864[Abstract/Free Full Text]

    Moriyama E. N., J. R. Powell, 1997 Synonymous substitution rates in Drosophila: mitochondrial versus nuclear genes J. Mol. Evol 45:378-391[ISI][Medline]

    Powell J. R., E. N. Moriyama, 1997 Evolution of codon usage bias in Drosophila Proc. Natl. Acad. Sci. USA 94:7784-7790[Abstract/Free Full Text]

    Sharp P. M., M. Averof, A. T. Lloyd, G. Matassi, J. F. Peden, 1995 DNA sequence evolution: the sounds of silence Philos. Trans. R. Soc. Lond. B Biol. Sci 349:241-247[ISI][Medline]

    Sharp P. M., E. Cowe, D. G. Higgins, D. C. Shields, K. H. Wolfe, F. Wright, 1988 Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity Nucleic Acids Res 16:8207-8211[Abstract]

    Sharp P. M., W. H. Li, 1986 An evolutionary perspective on synonymous codon usage in unicellular organisms J. Mol. Evol 24:28-38[ISI][Medline]

    ———. 1987 The codon adaptation index–a measure of directional synonymous codon usage bias, and its potential applications Nucleic Acids Res 15:1281-1295[Abstract]

    ———. 1989 On the rate of DNA sequence evolution in Drosophila J. Mol. Evol 28:398-402[ISI][Medline]

    Sharp P. M., G. Matassi, 1994 Codon usage and genome evolution Curr. Opin. Genet. Dev 4:851-860[Medline]

    Shields D. C., P. M. Sharp, D. G. Higgins, F. Wright, 1988 ‘Silent’ sites in Drosophila genes are not neutral: evidence of selection among synonymous codons Mol. Biol. Evol 5:704-716[Abstract]

    Stenico M., A. T. Lloyd, P. M. Sharp, 1994 Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases Nucleic Acids Res 22:2437-2446[Abstract]

    Wolfe K. H., P. M. Sharp, 1993 Mammalian gene evolution: nucleotide sequence divergence between mouse and rat J. Mol. Evol 37:441-456[ISI][Medline]

    Wright F., 1990 The ‘effective number of codons' used in a gene Gene 87:23-29[ISI][Medline]

Accepted for publication May 11, 2001.