* Unité GGB, Institut Pasteur, Paris, France
Atelier de BioInformatique, Université Pierre et Marie Curie, Paris, France
HKU-Pasteur Research Centre, Hong Kong
Correspondence: E-mail: erocha{at}abi.snv.jussieu.fr.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: protein evolution substitution rates eukaryotes essentiality expression levels functional categories
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Because high expression levels bias codon usage toward the use of optimal codons under exponential growth conditions, the calculation of some measure of the codon usage bias in proteins has been extensively used to account for protein expression levels in bacteria. Using such measures, recently coupled to transcriptome data analysis, higher expression levels were found to correlate with lower rates of protein evolution in Escherichia coli (Sharp 1991) and Saccharomyces cerevisiae (Pal, Papp, and Hurst 2001). The functional category of a protein may also constrain its rate of evolution, if only because different categories imply different physicochemical constraints and different cellular localizations. For example, "housekeeping" functions are under strong selection for optimizing their function in the normal habitat of the cell (usually making it fast and accurate under exponential growth conditions), whereas outer membrane proteins are often under selection for diversification (to explore a variety of substrates or to evade the immune system of hosts) (Finlay and Falkow 1997). Recently, it was found that genes presenting a large codon usage bias in both E. coli and Bacillus subtilis code for a set of metabolically less costly amino acids (Akashi and Gojobori 2002). This was interpreted as advantageous in proteins translated at a high rate because it lowers the metabolic cost of translation. As a consequence, such proteins should be more evolutionary constrained and thus evolve more slowly.
In parallel to these studies, several conflicting reports have tried to take into account the role of essentiality in the rate of protein evolution. Early works proposed that proteins subject to the same type and level of functional constraints, but differing in terms of dispensability, should evolve at different rates because purifying selection would be more efficient on essential proteins (Wilson, Carlsson, and White 1977). However, a seminal analysis exploring a large set of mouse knock-out mutants showed that, when the tissue specificity of a gene is taken into account, there is no significant difference between the rates of evolution of essential and nonessential genes (Hurst and Smith 1999). Hirsh and Fraser (2001) further observed that essential genes do not evolve faster than nonessential genes in yeast. Quantification of the decrease in fitness associated with the inactivation of a gene is difficult because few systematic experimental data are available and because genes have different fitness effects in different environmental conditions. However, when dispensability was defined according to the loss of fitness associated with gene loss under exponential growth conditions, essentiality was reported to show a small but significant correlation with the rate of protein evolution (Hirsh and Fraser 2001).
Recently, two reports have further investigated the yeast data, strikingly reaching opposite conclusions. Pal, Papp, and Hurst (2003) analyzed protein substitution rates using three close relatives of S. cerevisiae and fitness data from whole-genome transcriptome data. The regression of dispensability on amino acid substitution rates, when controlled for expression level, was nonsignificant, suggesting that expression level is responsible for the small effect of dispensability on protein substitution rates. In contrast, Hirsh and Fraser (Hirsh 2003), using a different methodology to identify orthologs and a different set of transcriptome data, confirmed their previous conclusions, showing a significant correlation between dispensability and the nonsynonymous substitution rates, even when controlled for expression levels. In bacteria, the analysis of E. coli genes has shown that essential genes are more conserved than the bulk of the genes (Jordan et al. 2002). Further, when these authors inferred essentiality by homology in Helicobacter pylori and Neisseria meningitidis, the differences were also found to be significant, although smaller. Taken together, these observations suggest that essentiality is a determinant of amino acid substitution rates in bacterial proteins but not in higher eukaryotes. Yeast would be an intermediate case, where essentiality is not important but differential dispensability might be.
The differences between bacteria and yeast and the contradictory results obtained for the yeast data prompted us to revisit the determinants of amino acid substitution rates of bacterial proteins, using newly available data on essentiality and including all variables previously considered as separate factors. This was done in a multivariate analysis. The definition of the essential character of a gene is contingent to the set of experimental procedures aiming at such determination. Unfortunately, few genomes have been fully characterized in relation to essentiality: S. cerevisiae (Giaever et al. 2002), B. subtilis (Kobayashi et al. 2003), and C. elegans (Kamath et al. 2003). In the PEC database, information about E. coli gene essentiality has been extracted from the literature. This compilation is most useful, but, resting on evidence based on highly variable experimental set ups, it is not immune to the biases created by the lack of coordination of experiments developed with E. coli, especially because the large majority of these experiments were not specifically designed for uncovering lethal phenotypes. In both bacteria, the level of gene expression can be inferred from the bias resulting in optimization of codon usage for optimal growth under fast growth conditions (Ikemura 1981; Sharp and Li 1986; Andersson and Kurland 1990). Using the complete set of data available for gene inactivation in B. subtilis and the compilation available for E. coli, we use the complete sequences of related genomes to analyze the roles of the different factors on protein evolution, in particular with respect to essentiality and expressiveness.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Functional Classification
The information on the functional classification of genes in B. subtilis and E. coli K12 was taken from the corresponding genome sequencing papers (Blattner et al. 1997; Kunst et al. 1997). Proteins were classed into "envelope," "metabolism," "information," and "others & UFO (unknown function ORF)." In B. subtilis, the class "envelope" included every category under "cell envelope and cellular processes," excluding "sporulation," "germination," and "competence." The class "metabolism" included everything under the category "intermediary metabolism." The class "information" included everything under the category "information pathways." The remaining elements and the unknown function CDSs (UFO) were classified as "others & UFO." We made small changes to the categories of the E. coli sequencing paper to render them as comparable with the ones of B. subtilis as possible. Thus, "metabolism" and "energy metabolism" were merged; "cell structure," "membrane proteins," and "transport proteins" were merged into "envelope"; and the "information" class was built from merging of "replication," "transcription," and "translation." The remaining elements and the UFO were classified as "others & UFO."
Phylogenetic Analysis
It is known from the literature than among the three enterobacteria, E. coli and Salmonella are monophyletic relative to Yersinia (Neidhardt et al. 1996). Because we failed to find in the literature the phylogenetic relation between the two Bacillus sp. and O. iheyensis, we derived it using the translation elongation factor EFTu as a phylogenetic marker. We used S. aureus as an outgroup to be able to root the tree and precisely determine which couple is monophyletic relative to the third genome. The EFTu protein sequences were aligned and then back-translated into DNA to build a matrix of distances using maximum likelihood under the HKY85 model (Hasegawa, Kishino, and Yano 1985) in Tree-Puzzle (Schmidt et al. 2002), using the Gamma correction. A tree was then built using the distance matrix and the BIONJ program (Gascuel 1997). The robustness of branches was tested by 1,000 bootstrap experiments, using SEQBOOT and CONSENSE, from the PHYLIP package (Felsenstein 1993). This analysis indicated with large confidence (999/1,000 bootstraps) that the two Bacillus sp. are monophyletic relative to O. iheyensis. The two-way comparison of similarity between all orthologs of B. subtilis with the two other genomes confirmed the topology of the tree, because orthologs in B. subtilis and B. halodurans show systematically higher similarity between themselves than with O. iheyensis (paired t-test, P < 0.001).
Analysis of Orthology
Orthologs were identified as reciprocal best hits (Tatusov and Koonin 1997) (using a global alignment where the gaps on the edges of the largest sequence are ignored) with at least 50% similarity in amino acid sequence and less than 20% difference in protein length. We identified the orthologs between every pair of each of the two triplets (i.e., between each pair of the three Bacillus/Oceanobacillus genomes and each pair of the E. coli/Salmonella/Yersinia genomes). Then, in each triplet, we rejected all orthologs that were not simultaneously present in the three genomes or that gave different correspondences in different comparisons (e.g., the ortholog resulting from the comparison between B. subtilis with B. halodurans and B. subtilis with O. iheyensis was not the same as the one obtained by the comparison of B. halodurans with O. iheyensis).
Because we know the phylogenetic tree describing the evolutionary history of these bacteria, we made a further analysis to remove potential false orthologs that could arise from horizontal transfer or differential gene deletion. Consider the three genomes A, B, and C, where A and B are monophyletic (i.e., B. subtilis and B. halodurans in one group and E. coli and Salmonella in the other group). Consider , ß, and
a triplet of orthologs of these genomes. One would expect the similarity between orthologs in the three genomes (S
,ß, S
,
, Sß,
) to obey the relationship S
,ß > S
,
Sß,
, and we use this information to further filter our data. We allow for a small interval of tolerance (5%) in the first inequality, and thus we eliminate all triplets where
|
Substitution Rates
The rates of synonymous (dS and Ks) and nonsynonymous (dN and Ka) substitutions were computed following Yang's definition (Yang and Nielsen 2000) (dN and dS) using PAML and following Li's definitions (Li 1993) (Ka and Ks) using Jadis (Gonçalves et al. 1999). All results presented in the article refer to dN and dS. The values of Ka and Ks were only used for verification.
CAI Calculations
Codon adaptation index (CAI) values were computed using the EMBOSS package (http://www.uk.embnet.org/Software/EMBOSS). The reference values of codon usage in highly expressed genes were computed using the ribosomal proteins as markers of the translation machinery that is the largest processing machinery under exponential growth conditions. When possible, we preferred to take into consideration the quantitative measure of CAI, but when a qualitative variable seemed useful, we classed genes as highly expressed and nonhighly expressed. A gene was regarded as potentially highly expressed if its CAI was among the 20% highest values of the genome. Variations around the value of 20% (from 10% to 30%) did not significantly alter the results (data not shown).
Metabolic Cost of Amino Acids
The metabolic cost of each amino acid was computed using publicly available data (Akashi and Gojobori 2002). This cost takes into account the metabolic pathways leading to amino acid biosynthesis both in E. coli and B. subtilis. Energetic costs are converted to a single currency of P, based on a proportion of two
Ps for one H (in NADH, NADHP, and FADH2). Each protein was then associated with an average amino acid cost per residue.
Given the sequence similarities between the genomes of the two groups, we have computed dN and Ka values between B. subtilis and B. halodurans (henceforth named the B. subtilis group) and dN, Ka, and dS and Ks values between E. coli K12 and S. enterica serovar Typhimurium (henceforth named the E. coli group). The orthologs of the first group were classed into functional categories, CAI, and essentiality according to the information available for B. subtilis. The orthologs of the second group were classed according to the information available for E. coli. Not all E. coli genes are yet classed as essential or nonessential. To simplify the analysis, we removed all these genes. Thus, the B. subtilis data set includes 1,258 orthologs, and the E. coli data set includes 1,364 orthologs.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
Then we made forward and backward stepwise regressions to inspect whether all variables provide significant information. Both methods led to similar results, and thus only the results of the forward stepwise regression method will be shown here. In a forward stepwise regression, one begins with the smallest possible regression model, the one including the variable showing the highest R2. The model is then built up by successively adding the most significant variables (Draper and Smith 1998). This analysis confirmed that functional categories are better analyzed if pooled together into two groups, one with "information" and "metabolism" and the other with "envelope" and "others & UFO" (data not shown). The other combinations of categories provided no significant contributions to the regression (even at P < 0.2). As a consequence, functional categories were pooled into those two superclasses. We tested whether interaction terms might be significant. In all cases, these analyses indicated that such terms provided no significant contribution to the regression fit (data not shown). At this stage we could define the model to be tested. It has the form:
|
|
Multiple Regression of the E. coli Data
As for the Bacillus group, the preliminary analysis of the E. coli data suggested a log transformation of dN. We also merged together the four functional classes into only two categories, one related with "information" and "metabolism" and the other including the remaining genes. The simple linear regressions, the stepwise regression, and the full multiple regression all revealed that E. coli shares the major characteristics of Bacillus. In the stepwise regression, the introduction of CAI leads to 94% of the total R2 (which is 0.307, similar to Bacillus). The major difference between this analysis and the one of B. subtilis concerns the smaller role of the variable functional category, which only ranks fourth (R2 of 0.033 versus 0.084 in the Bacillus). This may be a consequence of two factors. First, the classification schemes used in E. coli are, despite our best efforts, slightly different from the ones of B. subtilis. Second, the genes that are not classified regarding their essentiality are unevenly distributed among functional classes (fig. 3).
The simple regressions, to test whether the relatively larger importance of CAI indicated by the stepwise regression is not an effect of the correlation between the variables, confirm the important correlation of CAI with the nonsynonymous substitution rate. The regression of CAI shows an R2 of 0.287, whereas the regressions of the other variables never exceed an R2 of 0.08 (table 1). The metabolic cost of amino acids is the third variable to enter the stepwise regression, for which it contributes very little (1%). The simple regression of this variable with dN shows an R2 of 0.002, indicating that the metabolic cost of amino acids explains at most 2% of the total variance. Finally, the regression of the full model shows the same trends of the one of Bacillus (table 1).
Including dS in the Analysis of the E. coli Group
Although the comparison of the E. coli group is hampered by the lack of an exhaustive study of essential phenotypes, the divergence of these genomes allows for the analysis of the rates of synonymous substitution (dS) between E. coli and S. enterica (Sharp 1991; Berg 1999). Thus, our first concern was to quantify the correlations between CAI, dN, dS (both properly transformed), and the metabolic cost of amino acids. CAI shows similar values of pairwise correlation with log dN and log dS (table 2). This suggests that the correlation between dN and CAI is not simply the result of the correlation of both variables with dS. We thus computed the partial correlations between the variables; that is, the correlation between each pair of variables while holding constant the value of the other variables (Zar 1996). All partial correlations between CAI, log dN, and log dS are statistically significant, confirming that dN correlates with CAI independently of dS (table 2). The metabolic cost of amino acids correlates poorly with both dS and dN and negatively with CAI.
|
|
The regressions show that the introduction of dS renders essentiality uninformative, whereas CAI still contributes significantly to the model (table 1). Thus, although there is a significant correlation between dN and dS, the introduction of dS in the model confirms the main conclusion of our work: that essentiality, functional categories, and metabolic cost are, at most, minor determinants of the rate of protein evolution. Also, it shows that CAI and dN correlate in an important way and independently of dS.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Genes with high codon usage biases tend to use metabolically less expensive amino acids (Akashi and Gojobori 2002). The simple regressions of this cost on dN only explain 4.8% (B. subtilis) and 0.2% (E. coli) of the total variance. Such contribution becomes insignificant when the other variables are included. Therefore, the putative selection pressure for metabolic efficiency does not change the rate of nonsynonymous substitutions. This is also true for the rate of synonymous substitutions in E. coli because the partial correlation between the two variables is very low (<0.02). This is unexpected, because a selective pressure acting on the usage of less expensive amino acids should lead to a smaller probability of fixation of synonymous and nonsynonymous substitutions at these sites. Further work will be necessary to establish whether such selection pressure exists.
Expression Levels and the Rate of Protein Evolution
CAI is a good measure for the level of gene expression under exponential growth in fast-growing organisms, as evidenced by the correlation of the frequency of optimal codons with the concentration of the cognate tRNA (Dong, Nilsson, and Kurland 1996) and of the values of CAI with the corresponding mRNA (Coghlan and Wolfe 2000) and protein (Futcher et al. 1999) concentrations. In fact, codon usage bias relates intimately with high expression levels because genes are under selection pressure for the use, for a given amino acid, of the codon(s) corresponding to the most abundant tRNAs (Ikemura 1981). As a result, higher codon usage bias resulting from the selection of optimal codons leads to a higher number of deleterious synonymous substitutions and thus to lower synonymous substitution rates (Sharp 1991). Translation accuracy may also play a role in the establishment of codon usage biases (Akashi 1994). Under the accuracy hypothesis, codon usage also reflects selection for the use of codons that favor lower levels of mistranslation. Proteins for which a larger fraction of substitutions lead to a significant decrease of function (thus lower nonsynonymous substitution rates) would also strongly select "accurate" codons (thus lower synonymous substitution rates and higher CAI). This could justify the correlation found between the rates of synonymous and nonsynonymous substitutions (Li, Wu, and Luo 1985; Mouchiroud, Gautier, and Bernardi 1995).
Lobry and Gautier (1994) suggested a different cause for the correlation of nonsynonymous substitution rates with CAI. They found that highly expressed genes have an amino acid composition that matches the most abundant tRNAs. Under these circumstances, they suggest that highly expressed genes reduce the diversity of amino acid choices to increase translation efficiency. The major difference between these hypotheses relies on the sense of the relationship between the rate of protein evolution and codon usage bias. The accuracy hypothesis indicates that pressure for protein sequence conservation leads to codon usage bias, whereas the latter hypothesis suggests the inverse.
Double mutations, because they typically involve a synonymous and a nonsynonymous substitution (Averof et al. 2000), could lead to a correlation between dN, dS, and CAI. However, several recent works indicate a negligible role for double mutations in genomes (Smith and Hurst 1999; Smith, Webster, and Ellegren 2003). Further, the removal of double substitutions does not eliminate the correlation between synonymous and nonsynonymous substitution rates (Mouchiroud, Gautier, and Bernardi 1995) nor between nonsynonymous substitution rates and the expression level (Pal, Papp, and Hurst 2001). Finally, our results indicate that CAI correlates with nonsynonymous substitution rates almost as strongly as with synonymous substitution rates. This suggests a direct link between expression level and the rate of protein evolution.
One could also suppose that protein expression levels relate directly to the rate of protein evolution because substitutions will be more deleterious in proteins that have a larger impact on fitness and that such an impact is likely to correlate to the proteins expression level. As an example, let us consider two proteins with metabolic nonessential functions and very different expression levels, for which one deleterious mutation renders their biochemical function 5% less efficient. The impact of this efficiency loss on the cell's fitness will be the product of the loss of their biochemical efficiency by the relative weight of the corresponding reactions in the cell metabolism (Hartl, Dykhuizen, and Dean 1985). Under these circumstances, one would expect the same relative efficiency loss to have a larger impact on the cell's metabolism, and thus on the cell fitness, for highly expressed genes. Naturally, many other factors interfere with the rate of protein evolution. Some functions have an importance to the cell's fitness that does not correspond to the expression levels of the corresponding genes. DNA replication, for example, is likely to be under important purifying selection despite the typically low expression levels of the DNA polymerase. Also, expression levels depend on physiological conditions, and some proteins are highly expressed under some conditions and not expressed at all under other conditions. This problem concerns all the hypotheses so far envisaged.
The Role of Essentiality
It has been suggested that essentiality explains a significant part of the overall variance of nonsynonymous substitution rates in bacteria (Jordan et al. 2002). However, this analysis did not control for the effects of expression levels. When this effect is taken into account, essentiality explains very little of the remaining variance. When the regression of the rate of nonsynonymous substitutions in function of essentiality is controlled for expression levels, we find very low R2 (0.007 for B. subtilis and 0.012 for E. coli). Thus, under these circumstances, essentiality explains approximately 1% of the variance in both bacteria. This strongly suggests that essentiality and nonsynonymous substitution rates are related fundamentally via the correlation of both variables with CAI, and that the lack of role of essentiality in the rate of protein evolution in bacteria is the same as in eukaryotes. We have pointed out above the contradiction between the different analyses of the yeast data (Hirsh 2003; Pal, Papp, and Hurst 2003). Hirsh and Fraser's argument is based on the standard population genetics reasoning that the probability of fixation of a deleterious substitution is expected to be high only for proteins whose effect in fitness is small (Hirsh and Fraser 2001). According to this view, one could expect essential proteins to contain a larger number of sites under strong selection, which would result in lower nonsynonymous substitution rates. Our observations indicate otherwise. The observation that essentiality, after controlling for expression, explains 1% of the overall variance is very similar to the one found in yeast by Pal, Papp, and Hurst (2003), using dispensability, and quite smaller than the one found by Hirsh and Fraser (Hirsh 2003). Unfortunately, one cannot precisely compare our results with the yeast data, because systematic dispensability data is currently unavailable in bacteria. Yet, the coincidence between our results and the ones of Pal, Papp, and Hurst (2003) suggests that the role of essentiality is minor both in bacteria and in eukaryotes.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Akashi, H. 1994. Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927-935.
Akashi, H., and T. Gojobori. 2002. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc. Natl. Acad. Sci. USA 99:3695-3670.
Andersson, S. G. E., and C. G. Kurland. 1990. Codon preferences in free-living microorganisms. Microbiol. Rev. 54:198-210.[ISI][Medline]
Averof, M., A. Rokas, K. H. Wolfe, and P. M. Sharp. 2000. Evidence for a high frequency of simultaneous double-nucleotide substitutions. Science 287:1283-1286.
Berg, O. G. 1999. Synonymous nucleotide divergence and saturation: effects of site-specific variations in codon bias and mutation rates. J. Mol. Evol. 48:398-407.[ISI][Medline]
Blattner, F. R., G. P. , III, and C. A. Bloch, et al. (17 co-authors). 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453-1461.
Coghlan, A., and K. H. Wolfe. 2000. Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae. Yeast 16:1131-1145.[CrossRef][ISI][Medline]
Dickerson, R. E. 1971. The structure of cytochrome c and the rates of molecular evolution. J. Mol. Evol. 1:26-45.[Medline]
Dong, H., L. Nilsson, and C. G. Kurland. 1996. Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J. Mol. Biol. 260:649-663.[CrossRef][ISI][Medline]
Draper, N. R., and H. Smith. 1998. Applied regression analysis. John Wiley & Sons, New York.
Felsenstein, J. 1993. PHYLIP (phylogeny inference package). Version 3.6a. Distributed by the author, Department of Genetics, University of Washington, Seattle.
Finlay, B. B., and S. Falkow. 1997. Common themes in microbial pathogenicity revisited. Microbiol. Mol. Biol. Rev. 61:136-169.[Abstract]
Futcher, B., G. I. Latter, P. Monardo, C. S. McLaughlin, and J. I. Garrels. 1999. A sampling of the yeast proteome. Mol. Cell Biol. 19:7357-7368.
Gascuel, O. 1997. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14:685-695.[Abstract]
Giaever, G., A. M. Chu, and C. Connelly, et al. (73 co-authors). 2002. Functional profiling of the Saccharomyces cerevisiae genomes. Nature 418:387-391.[CrossRef][ISI][Medline]
Gonçalves, I., M. Robinson, G. Perriere, and D. Mouchiroud. 1999. JaDis: computing distances between nucleic acid sequences. Bioinformatics 15:424-425.
Hartl, D. L., D. E. Dykhuizen, and A. M. Dean. 1985. Limits of adaptation: the evolution of selective neutrality. Genetics 111:655-674.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160-174.[ISI][Medline]
Hirsh, A. E. 2003. Rate of evolution and gene dispensabilityreply. Nature 421:497-498.[CrossRef][ISI]
Hirsh, A. E., and H. B. Fraser. 2001. Protein dispensability and rate of evolution. Nature 411:1046-1049.[CrossRef][ISI][Medline]
Hurst, L. D., and N. G. Smith. 1999. Do essential genes evolve slowly? Curr. Biol. 9:747-750.[CrossRef][ISI][Medline]
Ikemura, T. 1981. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J. Mol. Biol. 146:1-21.[ISI][Medline]
Jain, R., M. C. Rivera, and J. A. Lake. 1999. Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl. Acad. Sci. USA 96:3801-3806.
Jordan, I. K., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12:962-968.
Kamath, R. S., A. G. Fraser, and Y. Dong, et al. (13 co-authors). 2003. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421:231-237.[CrossRef][ISI][Medline]
Kobayashi, K., S. D. Ehrlich, and A. Albertini, et al. (99 co-authors). 2003. Essential Bacillus subtilis genes. Proc. Natl. Acad. Sci. USA 100:4678-4683.
Kunst, F., N. Ogasawara, and I. Moszer, et al. (151 co-authors). 1997. The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390:249-256.[CrossRef][ISI][Medline]
Li, W. H. 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J. Mol. Evol. 36:96-69.[ISI][Medline]
Li, W.-H., C.-I. Wu, and C. C. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitutions considering the relative likelihood of nucleotide codon changes. Mol. Biol. Evol. 2:150-174.[Abstract]
Lobry, J. R., and C. Gautier. 1994. Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. Nucleic Acids Res. 22:3174-3180.[Abstract]
Mouchiroud, D., C. Gautier, and G. Bernardi. 1995. Frequencies of synonymous substitutions in mammals are gene-specific and correlated with frequencies of non-synonymous substitutions. J. Mol. Evol. 40:107-113.[ISI][Medline]
Nei, M. 2000. Molecular phylogenetics and evolution. Sinauer Press, Sunderland, Mass.
Neidhardt, F., R. Curtiss, J. L. Ingraham, E. C. C. Lin, K. B. Low, B. Magasanik, W. S. Reznikoff, M. Riley, M. Schaechter, and H. E. Umbarger. 1996. Escherichia coli and Salmonella: cellular and molecular biology. ASM Press, Washington, DC.
Pal, C., B. Papp, and L. D. Hurst. 2001. Highly expressed genes in yeast evolve slowly. Genetics 158:927-931.
Pal, C., B. Papp, and L. D. Hurst. 2003. Rate of evolution and gene dispensability. Nature 421:496-497.
Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502-504.
Sharp, P. M. 1991. Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position and concerted evolution. J. Mol. Evol. 33:23-33.[ISI][Medline]
Sharp, P. M., and W.-H. Li. 1986. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24:28-38.[ISI][Medline]
Smith, N. G., and L. D. Hurst. 1999. The effect of tandem substitutions on the correlation between synonymous and nonsynonymous rates in rodents. Genetics 153:1395-1402.
Smith, N. G., M. T. Webster, and H. Ellegren. 2003. A low rate of synonymous double-nucleotide mutations in Primates. Mol. Biol. Evol. 20:47-53.
Sokal, R. R. 1981. Biometry. W. H. Freeman, New York.
Tatusov, R. L., and E. V. Koonin. 1997. A genomic perspective of protein families. Science 278:631-637.
Wilson, A. C., S. S. Carlsson, and T. J. White. 1977. Biochemical evolution. Annu. Rev. Biochem. 46:573-639.[CrossRef][ISI][Medline]
Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:32-43.
Zar, J. H. 1996. Biostatistical analysis. Prentice Hall, New Jersey.