Department of Ecology and Evolutionary Biology, University of Michigan
Correspondence: E-mail:jianzhi{at}umich.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: evolutionary rate dispensability yeast fitness gene expression
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The availability of large gene-knockout data from functional genomic studies has offered the opportunity to test whether protein dispensability and evolutionary rate are indeed correlated at the genome-wide level. This was first attempted by Hurst and Smith (1999). They measured the rate of protein evolution by the ratio of the nonsynonymous nucleotide substitution rate (dN) to the synonymous rate (dS) between orthologous genes of the mouse and rat and measured protein dispensability using knockout phenotypes of 175 mouse genes. They found that nonessential genes evolve more rapidly than essential genes. Here, essential genes are those that when knocked out lead to lethal or sterile phenotypes, and nonessential genes are all other genes. However, after they excluded 34 nonessential immunity genes, which are likely under positive selection, nonessential genes no longer evolve faster than essential genes. They thus concluded that there is no difference in evolutionary rate between essential and nonessential proteins. Hirsh and Fraser (2001) analyzed the fitness effect caused by gene deletion in the yeast Saccharomyces cerevisiae and estimated the rate of protein evolution by comparing orthologous genes of the yeast and nematode Caenorhabditis elegans. They found a significant trend that genes with smaller fitness effects evolve faster. They also argued based on a population genetic model that the protein evolutionary rate is correlated with the fitness effect only when the fitness effect is weak (<0.5), and they believed that Hurst and Smith's failure was due to their inclusion of genes with strong fitness effects such as essential genes. It is known that lowly expressed genes evolve faster than highly expressed genes in yeast, although the exact cause of this relationship is unclear (Pal, Papp, and Hurst 2001). In a reanalysis of the yeast data, Pal, Papp, and Hurst (2003) found that the correlation between the evolutionary rate and fitness effect is no longer significant when the gene expression level is controlled for, suggesting that the correlation between fitness effect and evolutionary rate observed by Hirsh and Fraser (2001) is due to covariation with gene expression. In a response to Pal, Papp, and Hurst (2003), Hirsh and Fraser (2003) claimed that the correlation between evolutionary rate and fitness effect was significant even after they controlled for gene expression, when a larger data set and an improved method were used. However, they did not publish evidence supporting their assertion. Yang, Gu, and Li (2003) also reanalyzed the yeast data. Instead of using the S. cerevisiaeC. elegans comparison to estimate the evolutionary rate as in Hirsh and Fraser (2001), they used the S. cerevisiaeCandida albicans comparison because the latter species pair is evolutionarily much closer to each other. Interestingly, Yang, Gu, and Li (2003) found that the correlation between the evolutionary rate and fitness effect is limited to duplicate genes and is nonexistent among singleton genes. They, however, did not control for gene expression in their study. Castillo-Davis and Hartl (2003) compared C. elegans genes showing embryonic lethality in RNAinterference (RNAi) experiments with those without RNAi phenotypes. They found that the former group of genes evolve significantly more slowly than the latter group and that both duplicate and singleton genes exhibit this difference. In this analysis, they estimated evolutionary rates of C. elegans genes by comparing them with Caenorhabditis briggsae orthologs. But, gene expression was again not controlled for. In addition to these eukaryotic studies, the correlation between protein dispensability and evolutionary rate has been examined in prokaryotes. While the initial finding strongly supported the existence of such a correlation in prokaryotes (Jordan et al. 2002), the correlation was found to be no longer significant after gene expression was controlled for (Rocha and Danchin 2004).
Despite intensive investigations in the past few years, it remains unclear whether protein dispensability and evolutionary rate are correlated, particularly among singletons and after gene expression is controlled for. Due to the availability of a limited number of genome sequences, most previous studies used relatively divergent species for the estimation of protein evolutionary rate. It is possible that such a practice contributed to the inconsistent results observed by different researchers. Because protein dispensability is measured in one species, while evolutionary rate is estimated through the comparison of two species and is therefore an average for the period of evolutionary time separating the two species, use of closely related species would increase the power of detecting the effect of dispensability on evolutionary rate, if such an effect indeed exists and the protein evolutionary rate changes over time. Recently, the genomes of over a dozen yeast species have been sequenced, and these species form a nice gradient in terms of their evolutionary distances from S. cerevisiae (Wolfe 2004). By analyzing these data, here we show that protein dispensability does affect evolutionary rate, even after we control for gene expression and exclude duplicate genes. However, the effect declines with evolutionary time, and protein dispensability measured in one species does not predict the evolutionary rate of the protein in distantly related species.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Data Analyses
To identify orthologs, genome-wide all-against-against BlastP (Altschul et al. 1990) searches (Evalue = 1010) were carried out between yeast S. cerevisiae and one of the nine other yeasts or C. elegans. A hit was considered valid if the alignable region was longer than 80% of the longer one of the two proteins that matched. Reciprocal best hits were defined as orthologs. Transposable elements and mitochondrial genes were excluded from the analysis. A list of orthologous genes was obtained between S. cerevisiae and each of the nine yeasts. Saccharomyces cerevisiae genes that appeared in all the nine lists were subsequently derived. These S. cerevisiae genes and their orthologs in the 9 yeasts were used for the analysis involving only shared orthologs across the 10 yeasts. A gene was defined as a singleton if it did not have duplicate copies in the genome. Operationally, a singleton has no nonself-hits in a genome-wide all-against-all BlastP searches (Evalue = 0.1). Conservatively, a gene was defined as a duplicate gene if it had at least one nonself-hit in genome-wide all-against-all BlastP searches (Evalue = 1020).
Homologous proteins were aligned by Clustal (Thompson, Higgins, and Gibson 1994), and the DNA sequences were then aligned according to the protein alignment. The number of nonsynonymous substitutions per nonsynonymous site between two sequences (dN) was estimated by the likelihood method using PAML (Yang 1997). Rank correlations and partial rank correlations were conducted as described in Sokal and Rohlf (1995).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
Protein Dispensability Measured in One Species Does Not Predict Protein Evolutionary Rate in Distantly Related Species
In the above analysis, we used the S. cerevisiaeS. paradoxus comparison to estimate the protein evolutionary rate, which is actually the average rate during the divergence of the two closely related species. We repeated the rate estimation using comparisons between S. cerevisiae and each of eight more divergent species of yeasts (fig. 1) and studied the influence of fitness effect on evolutionary rate. We found that for each of these eight comparisons, protein dispensability as measured by fitness effect of gene deletion has a small, yet statistically significant, impact on the rate of protein evolution (dN), even after controlling for gene expression (table 1). Both the partial correlation between dN and gene expression after the control for fitness effect and the partial correlation between the fitness effect and gene expression after the control for dN remain significant for each of the eight species considered (table 1).
To investigate how the level of species divergence affects the degree to which protein dispensability impacts the average rate of protein evolution, we plotted 1 and
2 against the mean dN between species pairs for which the average evolutionary rates were estimated. The mean dN was computed by considering all orthologous genes (singleton and duplicate genes) between a species pair. Figure 3 shows that
1 and
2 are both higher than 1 for all species considered. More importantly, there is a clear trend that both
1 and
2 decline as the mean dN between species increases, indicating that the impact of protein dispensability on evolutionary rate reduces with evolutionary time. While the dispensability data from the S. cerevisiae might predict the average evolutionary rate between S. cerevisiae and S. paradoxus quite well, it does not predict the average rate between S. cerevisiae and Y. lipolytica so well. This is likely due to changes in protein function, dispensability, and evolutionary rate over a long evolutionary time, even for orthologous genes.
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our findings imply that protein dispensability measured in S. cerevisiae does not predict the rate of protein evolution in nematodes or other species that are distantly related to the yeast, contradictory to what Hirsh and Fraser (2001) claimed. Their results were based on a small set of genes (119), and it is possible that the correlation they observed was accidental, as suggested by Pal, Papp, and Hurst (2003). Furthermore, in contrast to what Hirsh and Fraser (2001) hypothesized, we found that the correlation between protein dispensability and evolutionary rate can be demonstrated without removing genes of great fitness effects. For instance, we found that the average evolutionary rate for proteins of nonlethal effects is 40% greater than that of proteins of lethal effects when closely related species are compared. Following Hirsh and Fraser (2001), we also analyzed a subset of genes with fitness effects lower than 0.5 but found that the correlation between protein dispensability and evolutionary rate is lower for this subset than for the entire data set. For example, when the S. cerevisiaeS. paradoxus comparison was used for estimating the evolutionary rate, the rank correlation between fitness effect and evolutionary rate was 0.19 (P = 2 x 1035) for the entire data set but only 0.10 (P = 5 x 109) for the subset of genes with low fitness effects. When we controlled for gene expression, the partial rank correlation between fitness effect and evolutionary rate decreased from 0.10 (P = 1 x 1011) for the entire data set to 0.06 (P = 6 x 104) for the subset. This partial rank correlation is no longer statistically significant for the subset (r = 0.02, P = 0.42) when divergent species such as C. albicans is compared with S. cerevisiae, although it remains significant for the entire data set (r = 0.08, P = 4 x 104). Thus, opposite to what Hirsh and Fraser (2001) proposed, our results showed that the effect of protein dispensability on evolutionary rate is less obvious when only genes of low fitness effects are considered. Use of the subset of genes instead of the entire data set was likely the reason why Pal, Papp, and Hurst (2003) could not detect significant impact of protein dispensability on evolutionary rate when gene expression was controlled for. From these considerations, we believe that the findings of Hirsh and Fraser (2001) were by chance, and the evolutionary model they proposed to explain the observation was either unrealistic or irrelevant. Their explanation of why Hurst and Smith (1999) failed to detect the correlation between protein dispensability and evolutionary rate in rodents is probably incorrect as well. We believe that the correlation will be found for rodents when a larger data set is used, unless what we demonstrated in yeasts does not apply to mammals, which seems unlikely.
We detected significant impact of protein dispensability on evolutionary rate for both duplicate and singleton genes. The impact is greater for duplicates than for singletons, as observed by Yang, Gu, and Li (2003) in yeasts and Castillo-Davis and Hartl (2003) in nematodes. The cause of this phenomenon is unclear. Yang, Gu, and Li (2003) suggested that duplicates are to some extent redundant in function, and both the fitness effect and evolutionary rate of a duplicate gene are affected by the level of functional redundancy that the gene shares with its duplicate copy, generating a correlation between the fitness effect and evolutionary rate. However, functional redundancy can also occur between nonparalogous genes. Furthermore, duplicate genes change functions and rates more rapidly than singletons. Thus, it is puzzling why the impact of protein dispensability on evolutionary rate is higher for duplicates than for singletons.
As found by Pal, Papp, and Hurst (2001), our analysis showed that highly expressed genes have low rates of evolution. This correlation is much stronger than the correlation between fitness effect and evolutionary rate, although the former correlation cannot fully explain the latter. The phenomenon of low evolutionary rates for highly expressed genes has also been documented in bacteria (Rocha and Danchin 2004), plants (Wright et al. 2004), and animals (e.g., Duret and Mouchiroud 2000; Subramanian and Kumar 2004), but the underlying cause remains unclear. As is shown in table 1, functional importance only explains a small fraction of the correlation between expression level and evolutionary rate. If different amino acids are synthesized with different costs or incorporated into a peptide with different rates and accuracies during translation, one may hypothesize that certain amino acids would be preferentially used in highly expressed genes (Akashi and Gojobori 2002; Akashi 2003). This would generate an amino acid usage bias in a way similar to the frequently observed codon usage bias. As the codon usage bias leads to the reduction of the synonymous substitution rate (Sharp and Li 1987), the amino acid bias can reduce the rate of amino acid substitution. Consistent with this hypothesis, biased usage of amino acids has been reported in highly expressed genes (Akashi and Gojobori 2002; Akashi 2003; Urrutia and Hurst 2003; Comeron 2004; Rocha and Danchin 2004). However, the level of this bias does not seem to fully explain the high correlation of expression level and evolutionary rate (Rocha and Danchin 2004). Another hypothesis is that highly expressed genes may have low mutation rates because of transcription-coupled repair (Svejstrup 2002). This would reduce substitution rates at synonymous, nonsynonymous, and intron sites. However, Duret and Mouchiroud (2000) found no reduction in mutation rate in genes expressed in the germ line, contradictory to the prediction of the above hypothesis. It is likely that the correlation of protein evolutionary rate and expression level has multiple causes, but the major cause has yet to be identified. Another interesting question is whether the impact of expression level on evolutionary rate is transient, as observed for the impact of protein dispensability on evolutionary rate. Given the rapid evolution of gene expression patterns (Khaitovich et al. 2004; Yanai, Graur, and Ophir 2004), this prediction seems reasonable. We are currently testing this and other hypotheses in an attempt to understand the strong impact of gene expression on the rate of protein evolution.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Akashi, H. 2003. Translational selection and yeast proteome evolution. Genetics 164:12911303.
Akashi, H., and T. Gojobori. 2002. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc. Natl. Acad. Sci. USA 99:36953700.
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403410.[CrossRef][ISI][Medline]
Castillo-Davis, C. I., and D. L. Hartl. 2003. Conservation, relocation and duplication in genome evolution. Trends Genet. 19:593597.[CrossRef][ISI][Medline]
Comeron, J. M. 2004. Selective and mutational patterns associated with gene expression in humans: influences on synonymous composition and intron presence. Genetics 167:12931304.
Duret, L., and D. Mouchiroud. 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. 17:6874.
Gibbs, R. A., G. M. Weinstock, M. L. Metzker et al. (229 co-authors). 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428:493521.[CrossRef][ISI]
Gu, Z., L. M. Steinmetz, X. Gu, C. Scharfe, R. W. Davis, and W.-H. Li. 2003. Role of duplicate genes in genetic robustness against null mutations. Nature 421:6366.[CrossRef][ISI][Medline]
Hirsh, A. E., and H. B. Fraser. 2001. Protein dispensability and rate of evolution. Nature 411:10461049.[CrossRef][ISI][Medline]
. 2003. Genomic function: rate of evolution and gene dispensability (Response). Nature 421:497498.[CrossRef][ISI]
Holstege, F. C., E. G. Jennings, J. J. Wyrick, T. I. Lee, C. J. Hengartner, M. R. Green, T. R. Golub, E. S. Lander, and R. A. Young. 1998. Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95:717728.[CrossRef][ISI][Medline]
Hurst, L. D., and N. G. Smith. 1999. Do essential genes evolve slowly? Curr. Biol. 9:747750.[CrossRef][ISI][Medline]
Jordan, I. K., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12:962968.
Kamath, R. S., A. G. Fraser, Y. Dong et al. (13 co-authors). 2003. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421:231237.[CrossRef][ISI][Medline]
Khaitovich, P., G. Weiss, M. Lachmann, I. Hellmann, W. Enard, B. Muetzel, U. Wirkner, W. Ansorge, and S. Paabo. 2004. A neutral model of transcriptome evolution. PLoS Biol. 2:682689.[CrossRef][ISI]
Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, New York.
Kimura, M., and T. Ohta. 1974. On some principles governing molecular evolution. Proc. Natl. Acad. Sci. USA 71:28482852.[Abstract]
Pal, C., B. Papp, and L. D. Hurst. 2001. Highly expressed genes in yeast evolve slowly. Genetics 158:927931.
. 2003. Genomic function: rate of evolution and gene dispensability. Nature 421:496497.
Rocha, E. P., and A. Danchin. 2004. An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol. Biol. Evol. 21:108116.
Sharp, P. M., and W.-H. Li. 1987. The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol. Biol. Evol. 4:222230.[Abstract]
Sokal, R. R., and F. J. Rohlf. 1995. Biometry. Freeman and Company, New York.
Steinmetz, L. M., C. Scharfe, A. M. Deutschbauer et al. (11 co-authors). 2002. Systematic screen for human disease genes in yeast. Nat. Genet. 31:400404.[ISI][Medline]
Subramanian, S., and S. Kumar. 2004. Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168:373381.
Svejstrup, J. Q. 2002. Mechanisms of transcription-coupled DNA repair. Nat. Rev. Mol. Cell Biol. 3:2129.[CrossRef][ISI][Medline]
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680.[Abstract]
Urrutia, A. O., and L. D. Hurst. 2003. The signature of selection mediated by expression on human genes. Genome Res. 13:22602264.
Wilson, A. C., S. S. Carlson, and T. J. White. 1977. Biochemical evolution. Annu. Rev. Biochem. 46:573639.[CrossRef][ISI][Medline]
Winzeler, E. A., D. D. Shoemaker, A. Astromoff et al. (21 co-authors). 1999. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285:901906.
Wolfe, K. 2004. Evolutionary genomics: yeasts accelerate beyond BLAST. Curr. Biol. 14:R392R394.[CrossRef][ISI][Medline]
Wright, S. I., C. B. Yau, M. Looseley, and B. C. Meyers. 2004. Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata. Mol. Biol. Evol. 21:17191726.
Yanai, I., D. Graur, and R. Ophir. 2004. Incongruent expression profiles between human and mouse orthologous genes suggest widespread neutral evolution of transcription control. OMICS 8:1524.[CrossRef][ISI][Medline]
Yang, J., Z. Gu, and W.-H. Li. 2003. Rate of protein evolution versus fitness effect of gene deletion. Mol. Biol. Evol. 20:772774.
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555556.[Medline]
Zhang, J. 2003. Evolution by gene duplication-an update. Trends Ecol. Evol. 18:292298.[CrossRef][ISI]