Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Naples, Italy;
Laboratorio de Organización y Evolución del Genoma, Sección Bioquímica, Facultad de Ciencias, Montevideo, Uruguay;
Departamento de Genética, Facultad de Medicina, Montevideo, Uruguay
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Indeed, in the unicellular organisms Escherichia coli and Saccharomyces cerevisiae, synonymous codon choices appear to be positively correlated with the relative abundances of tRNAs, with the correlation being very strong for highly expressed genes (Ikemura 1981, 1982, 1985
; Bennetzen and Hall 1982
; Gouy and Gautier 1982
; Sharp and Li 1986
; Bulmer 1988, 1991
; Kanaya et al. 1999
; for reviews, see Sharp and Matassi 1994
; Sharp et al. 1995
; Akashi and Eyre-Walker 1998
).
In multicellular organisms, different patterns can be found. For example, in Caenorhabditis elegans and Drosophila melanogaster, which are characterized by extensive variation in codon usage, the factors governing the choices have been attributed to an equilibrium between mutational biases and translational selection (Shields et al. 1988
; Sharp and Li 1989
; Moriyama and Gojobori 1992
; Carulli et al. 1993
; Akashi 1994, 1997
; Stenico, Lloyd, and Sharp 1994
; Moriyama and Powell 1997
; Powell and Moriyama 1997
). Translational selection at silent sites has also been reported to be the main factor shaping codon usage in Zea mays (Fennoy and Bailey-Serres 1993
) and Arabidopsis thaliana (Chiapello et al. 1998
).
Compositionally compartmentalized genomes, like those of vertebrates and, in particular, those of warm-blooded vertebrates, show multiple codon usages. The compositional properties of those genomes and, more precisely, the compositional correlations existing between coding sequences (and their different codon positions) and isochores (see Bernardi [2000]
for a review) affect codon usage. The situation is strikingly different for genes located in GC-poor and GC-rich isochores (Bernardi and Bernardi 1985
; Bernardi et al. 1985
; D'Onofrio et al. 1991
; Cruveiller, D'Onofrio, and Bernardi 2000
). This point is best illustrated by the example of
- and ß-globin genes, which show very different codon usages because they are located in isochores with very different levels of GC, in spite of both being very highly expressed and at nearly equimolar amounts in the same cells (Bernardi et al. 1985
).
Expectedly, therefore, when applied to mammalian sequences, multivariate statistical analysis reveals a single major trend that is strongly correlated with the GC level at third codon positions (GC3) of each gene. Moreover, the first axis does not discriminate aspects of gene function such as regulation during development, tissue specificity, constitutive expression, intracellular localization of the protein product, etc. (Sharp et al. 1988, 1995
; Sharp and Matassi 1994
).
Along another line, no correlation was found between the rate of synonymous substitutions (Ks) and either the expression level or the tissue specificity of genes in a mouse/rat comparison (Wolfe and Sharp 1993
). The conclusion that expression levels do not influence the codon usage pattern in mammals was also drawn by analyzing expressed sequence tags (ESTs) in different tissues (Duret and Mouchiroud 2000)
.
Since GC3-rich genes represent roughly half of human genes (see Bernardi [2000]
for a review), and since the multivariate analysis was carried out on human genes regardless of their GC3 levels (Sharp et al. 1988
), one might think, however, that even if a translational selection effect exists in mammals, it could be swamped out by the much stronger compositional constraints. We decided, therefore, to apply multivariate analysis to the coding sequences of Xenopus laevis, which are characterized by a much narrower GC3 distribution, with very scarce high GC3 values.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
When genes were sorted according to their positions along the second axis, a significant correlation was found with the pyrimidine (Y) content of the genes at the third codon positions (R = -0.37; P < 0.0001). A striking result was, however, that constitutively highly expressed housekeeping genes, such as ribosomal proteins, histones, elongation factors, tubulins, and several enzymes from the intermediary metabolism, were clustered in the top 10% of the distribution. For example, there are 24 sequenced genes coding for ribosomal proteins, and 63% of them are placed in the first third of the distribution along axis 2. Furthermore, highly expressed, tissue-specific coding sequences, such as several actins, - and ß-globin, troponin, cytokeratin, etc., were also located in the same group. Regulatory sequences such as zinc finger proteins, oncogenes, homeobox genes, growth factors, etc. were located at the other 10% end of the distribution, which did not comprise any highly expressed housekeeping sequences. Therefore, it seems clear that axis 2 of COA is related to the expression level of each gene.
In order to confirm this interpretation and to obtain an approximate quantitative estimation of the expression levels of the 1,303 genes studied in this paper, we counted the number of matching ESTs for each sequence and their distribution along axis 2. The result of this analysis is shown in figure 2 . In spite of the biased nature of the libraries and the fact that 45% of the genes did not match any EST, the general pattern clearly confirms that there is a gradient of expression from the left (where the majority of ribosomal proteins and other highly expressed housekeeping genes are placed) to the right of the distribution along axis 2. Furthermore, it should be stressed that a significant correlation holds between the position of each sequence on the second axis and the number of corresponding ESTs (R = -0.23; P < 0.0001). This quantitative analysis demonstrates that the sequences with most negative values along axis 2 are more highly expressed. In other words, axis 2 does indeed discriminate expression levels.
|
Similar results with regard to codon usage, using multivariate analyses, have been widely reported for unicellular species. They have usually been interpreted in terms of natural selection acting at the level of translation (for reviews, see Sharp and Matassi 1994
; Sharp et al. 1995
; Akashi and Eyre-Walker 1998
). Remarkably, similar results were found not only among microorganisms, but also in multicellular species, such as C. elegans (Stenico, Lloyd, and Sharp 1994
).
It should be stressed, however, that there are two main differences between the nematode results and the Xenopus results. These differences concern the source of variation which discriminates expression levels, and the amount of variation which is accounted for by the axis correlated with expression levels. Indeed, while in the nematode it is the first axis which is correlated with expression (and which, by definition, accounts for the majority of the variance), in Xenopus the axis related to that feature is the second, which accounts for a lower proportion of the total variability in codon usage. Accordingly, the differences appear to be more quantitative than qualitative, and hence we conclude that in Xenopus, translational selection indeed influences synonymous codon usage, even if it does so to a lesser extent than in C. elegans.
Our final step was to identify the translationally preferred codons in Xenopus. The codon usage patterns of the sequences displaying the extreme values at both ends of the second axis (100 genes each) were compared, and the differences were tested with a 2 test. The result of this analysis (table 1
) shows that there are 22 putative preferred codons corresponding to 17 amino acids (the only amino acid with no preferred codon is Tyr), and 50% of the codons are T-ending. Among stop codons, TAA is by far the most frequently used in highly expressed sequences, while an opposite trend was found for TAG. Remarkably, 82% of the preferred triplets are Y-ending, a point which explains the negative correlation previously described between the positions of sequences along the second axis and the corresponding Y levels in the third codon positions.
|
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: codon usage
isochores
translational selection
vertebrates
correspondence analysis
2 Address for correspondence and reprints: Giorgio Bernardi, Laboratorio di Evoluzione Molecolare, Stazione Zoologica Anton Dohrn, Villa Comunale, I-80121, Naples, Italy. bernardi{at}alpha.szn.it
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Akashi H., 1994 Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy Genetics 136:927-935
. 1997 Codon bias evolution in Drosophila Population genetics of mutation-selection drift. Gene 205:269-278[ISI][Medline]
Akashi H., A. Eyre-Walker, 1998 Translational selection and molecular evolution Curr. Opin. Genet. Dev 8:688-693[ISI][Medline]
Bennetzen J. L., B. D. Hall, 1982 Codon selection in yeast J. Biol. Chem 257:3026-3031
Bernardi G., 1995 The human genome: organization and evolutionary history Annu. Rev. Genet 29:445-476[ISI][Medline]
. 2000 Isochores and the evolutionary genomics of vertebrates Gene 241:3-17[ISI][Medline]
Bernardi G., G. Bernardi, 1985 Codon usage and genome composition J. Mol. Evol 22:363-365[ISI][Medline]
Bernardi G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunier-Rotival, F. Rodier, 1985 The mosaic genome of warm-blooded vertebrates Science 228:953-958[ISI][Medline]
Bulmer M., 1988 Codon usage and intragenic position J. Theor. Biol 133:67-71[ISI][Medline]
. 1991 The selection-mutation-drift theory of synonymous codon usage Genetics 129:897-907
Carulli J. P., D. E. Krane, D. L. Hartl, H. Ochman, 1993 Compositional heterogeneity and patterns of molecular evolution in the Drosophila genome Genetics 134:837-845
Chiapello H., F. Lisacek, M. Caboche, A. Henaut, 1998 Codon usage and gene function are related in sequences of Arabidopsis thaliana Gene 209:GC1-GC38[ISI][Medline]
Cruveiller S., G. D'Onofrio, G. Bernardi, 2000 The compositional transition between the genomes of cold- and warm-blooded vertebrates: codon frequencies in orthologous genes Gene 261:71-83[ISI][Medline]
D'Onofrio G., D. Mouchiroud, B. Assani, C. Gautier, G. Bernardi, 1991 Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins J. Mol. Evol 32:504-510[ISI][Medline]
Duret L., D. Mouchiroud, 2000 Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate Mol. Biol. Evol 17:68-74
Fennoy S. L., J. Bailey-Serres, 1993 Synonymous codon usage in Zea mays L nuclear genes is varied by levels of C and G-ending codons. Nucleic Acids Res 21:5294-5300[Abstract]
Gouy M., C. Gautier, 1982 Codon usage in bacteria: correlation with gene expressivity Nucleic Acids Res 10:7055-7074[Abstract]
Gouy M., C. Gautier, M. Attimonelli, C. Lanave, G. Di Paola, 1985 ACNUCa portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage Comput. Appl. Biosci 1:167-172[Abstract]
Grantham R, C. Gautier, M. Gouy, R. Mercier, A. Pave, 1980 Codon catalog usage and the genome hypothesis Nucleic Acids Res 8:r49-r62[Abstract]
Grillo G., M. Attimonelli, S. Liuni, G. Pesole, 1996 CLEANUP: a fast computer programme for removing redundancies from nucleotide sequence databank CABIOS 12:1-8[Abstract]
Ikemura T., 1981 Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system J. Mol. Biol 151:389-409[ISI][Medline]
. 1982 Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs J. Mol. Biol 158:573-597[ISI][Medline]
. 1985 Codon usage and tRNA content in unicellular and multicellular organisms Mol. Biol. Evol 2:13-34[Abstract]
Kanaya S., Y. Yamada, Y. Kudo, T. Ikemura, 1999 Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis Gene 238:143-155[ISI][Medline]
Moriyama E. N., T. Gojobori, 1992 Rates of synonymous substitution and base composition of nuclear genes in Drosophila Genetics 130:855-864
Moriyama E. N., J. R. Powell, 1997 Synonymous substitution rates in Drosophila: mitochondrial versus nuclear genes J. Mol. Evol 45:378-391[ISI][Medline]
Powell J. R., E. N. Moriyama, 1997 Evolution of codon usage bias in Drosophila Proc. Natl. Acad. Sci. USA 94:7784-7790
Sharp P. M., M. Averof, A. T. Lloyd, G. Matassi, J. F. Peden, 1995 DNA sequence evolution: the sounds of silence Philos. Trans. R. Soc. Lond. B Biol. Sci 349:241-247[ISI][Medline]
Sharp P. M., E. Cowe, D. G. Higgins, D. C. Shields, K. H. Wolfe, F. Wright, 1988 Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity Nucleic Acids Res 16:8207-8211[Abstract]
Sharp P. M., W. H. Li, 1986 An evolutionary perspective on synonymous codon usage in unicellular organisms J. Mol. Evol 24:28-38[ISI][Medline]
. 1987 The codon adaptation indexa measure of directional synonymous codon usage bias, and its potential applications Nucleic Acids Res 15:1281-1295[Abstract]
. 1989 On the rate of DNA sequence evolution in Drosophila J. Mol. Evol 28:398-402[ISI][Medline]
Sharp P. M., G. Matassi, 1994 Codon usage and genome evolution Curr. Opin. Genet. Dev 4:851-860[Medline]
Shields D. C., P. M. Sharp, D. G. Higgins, F. Wright, 1988 Silent sites in Drosophila genes are not neutral: evidence of selection among synonymous codons Mol. Biol. Evol 5:704-716[Abstract]
Stenico M., A. T. Lloyd, P. M. Sharp, 1994 Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases Nucleic Acids Res 22:2437-2446[Abstract]
Wolfe K. H., P. M. Sharp, 1993 Mammalian gene evolution: nucleotide sequence divergence between mouse and rat J. Mol. Evol 37:441-456[ISI][Medline]
Wright F., 1990 The effective number of codons' used in a gene Gene 87:23-29[ISI][Medline]