* Bioinformatics Research Center, Medical College of Wisconsin, Milwaukee
Department of Ecology and Evolution, University of Chicago
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Darwinian selection purifying selection disease gene oncogene synonymous and nonsynonymous substitutions
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Nucleotide substitutions in protein-coding regions occur in two types: synonymous and nonsynonymous. Synonymous (or silent) nucleotide substitutions do not change the amino acid encoded by the mutated codon; nonsynonymous substitutions result in a translation change, often with profound implications for the resulting protein (Nei and Kumar 2000). Typically, nucleotide substitutions in the third codon position are silent, whereas substitutions in the first and second codon positions result in an amino acid change; highly parameterized maximum-likelihood and other methods take this and related factors (such as transition/transversion bias) into account when calculating substitution rates (Nei and Kumar 2000; Yang and Nielsen 2000). The rate of nonsynonymous substitutions per nonsynonymous substitution site (KA) varies greatly from gene to gene due to varying intensities of purifying selection (and other factors). The rate of synonymous substitutions per synonymous substitution site (KS) is related to µ, the average mutation rate for the genome as a whole and should be similar among genes (Kumar and Subramanian 2002). The ratio of rates of nonsynonymous to synonymous substitutions (KA/KS, or ) is a measure of accepted substitutions normalized for opportunity (Liberles 2001) and can indicate whether or not selection is occurring and the degree and type of selection (Nei and Kumar 2000; Yang and Nielsen 2000). Under neutral evolution,
deviation of KA from KS may be due to positive Darwinian selection when
or purifying (stabilizing) selection when
. For some genes experiencing positive Darwinian selection, the mean
for the entire gene may be <1.0 while portions of the gene (areas not under protein-coding selective constraints, for example) may have values of
(Liberles 2001).
Previous researchers used analyses of relative substitution rates to measure the strength of purifying selection (selection against amino acid changes) on individual disease genes (Hurst and Pal 2001), but no study has demonstrated that this is a general characteristic of disease genes or any subsets of disease genes. This information would augment other research on the functional classification of disease genes (Jimenez-Sanchez, Childs, and Valle 2001) and contribute to our understanding of disease.
We identified 331 human genes implicated in disease and divided these into six functional classes: oncogenes (including tumor suppressor genes), immune system genes, metabolic genes, muscle and bone genes, nervous system genes, and transport genes. We then measured the relative substitution rates experienced by these genes (relative to rodent homologs) and compared these rates to an appropriate set of nondisease-related genes. We hypothesized that these classes of disease-related genes will have experienced different selective pressures than genes that are not typically involved in disease due to the different implications for fitness.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Sequence comparisons between rodents and humans make the assumption that these species have experienced similar selection and fitness effects, despite having diverged 90 to109 MYA (Kumar and Hedges 1998; Nei, Xu, and Glazko 2001; Kumar and Subramanian 2002). Of course, these effects are difficult to measure. A third mammalian group, such as an ungulate, will greatly increase the robustness of this analysis when data become available for such a three-way comparison.
For comparison purposes, we used Nekrutenko's (Nekrutenko, Makova, and Li 2002) data set (153 human genes with known exonic structure and mouse homologs [table 1]), which had the advantage of being fully curated and well-understood genes, making them an appropriate test to ensure our 3,035 genes were not unusual. Another data set (Makalowski and Boguski 1998), while containing many more homologous gene pairs than the Nekrutenko data set (1,880 versus 153), was estimated using a method (Ina 1995) that is not as accurate, in certain circumstances, as the maximum-likelihood method (Bielawski, Dunn, and Yang 2000; Yang and Bielawski 2000; Yang and Nielsen 2000) used in this study.
|
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The average values within these sets were consistent with the previously supported hypothesis that mammalian genes are under strong purifying selection ( [see table 1]) (Makalowski and Boguski 1998; Nekrutenko, Makova, and Li 2002). The value of
for disease genes as a group (311 genes) was not significantly different from the set of 2,724 nondisease genes (table 2), according to a Mann-Whitney test (
[see below]). This remained the case when the class of fast-evolving immune system genes was removed from the analysis. Immune genes typically experience less intense purifying selection and may bias the results (Hurst and Smith 1999).
When cancer-related disease genes were compared with genes outside of that class (3,035 genes minus all cancer-related genes), they had significantly lower (
), whereas other disease-related genes had marginally higher
in three cases (immune, metabolic, and transport disease genes;
, 0.050, and 0.056, respectively) and no difference in two cases (muscle/bone and nervous system disease genes;
and 0.988, respectively). When each disease gene class was compared with a more general set of genes (all nondisease genes), we found the same pattern.
Comparisons between disease and nondisease genes within the five (noncancer) functional classes (immune, metabolism, muscle/bone, nervous, and transport) revealed significant differences only for transport genes (). For that class, the value of
for transport disease genes was marginally, but nonsignificantly, higher than nondisease genes (of all classes [see above]), whereas nondisease transport genes had marginally lower
(
) than nondisease genes of all classes. Due to these results, with respect to the intensity of purifying selection, it is apparent that (with the exception of transport genes) noncancer disease genes differ little from their nondisease counterparts and human genes in general.
Cancer genes, however, seem to be different from other disease-related genes and human genes in general. Only 39 of 121 cancer-related genes had values higher than the average value (0.1) for all other genes (compared with 1,188 of the 3,035 genes). The mean
is significantly lower for cancer-related genes (
) relative to 2,914 noncancer genes, and the mean KA is significantly lower (
for cancer genes and 0.063 for noncancer genes,
[see table 2]), whereas the mean KS is not statistically different. The decrease in KA for cancer genes, greater than changes (in KA) in other classes of genes we examined, is the source of the decrease in the value of
overall and represents a significant, marked decrease in the rate of nonsynonymous substitutions experienced by these genes as a whole. Previous efforts (Gojobori and Yokoyama 1987) demonstrated that the substitution rates of cancer genes differentiate them from other genes, but it was difficult to conclude from that study that more intense purifying selection might be a general characteristic of cancer genes, due to a small sample size (six cellular oncogenes) and less precise method of estimating substitution rates (Nei and Gojobori 1986).
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The results of this study suggest that cancer-related genes experience significantly stronger selective pressures than other disease genes and nondisease genes. This difference may be important for understanding the etiology of cancer-related genes. Why these genes are experiencing stronger purifying selection is unknown. One can imagine a scenario in which increased purifying selection may prevent the multigenic interactions implicated in certain cancers. However, this scenario does not address why these genes are under different selective pressures than other disease-related genesone might logically predict, for example, that other genetic (noncancer) diseases would also be under much more intense purifying selection for the same reason.
An intriguing possibility is hinted at by the nature of cancer genes: we found these genes were overrepresented in a collection of "essential" genes, classified as lethal by mouse knockout experiments. In our 3,035 gene set, there were 104 genes homologous to mouse essential genes; of these, 11 were cancer genes, significantly greater ( in a
2 test) than would be expected by chance alone. The significant overlap between cancer genes and homologs of mouse essential genes provides a potentially important clue about the kind of genes that lend themselves to cancer-causing substitutions. It is well established that many of these oncogenes and tumor suppressor genes act as transcription factors genes and have other conserved biological functions that also characterize essential genes (Hirsh and Fraser 2001), so this overlap is not necessarily surprising. For cancer-related essential genes, noncancer-related essential genes, and all essential genes, the mean
is 0.055, 0.075, and 0.066, respectively.
Genes with a smaller overall affect on fitness of an organism will not experience as severe an intensity of purifying selection as genes with greater overall affect on fitness. In light of our finding, this would lead one to believe that substitutions in cancer genes are more detrimental to fitness than other disease or nondisease genes. The implications of this finding will not be known and appreciated until the results are confirmed through the analysis of a much larger disease gene sets (experiments that become possible as the genome projects mature) and experiments are conducted to explore causal mechanisms. The immediate utility of this finding may be to predict the identity of other cancer genes when a list of candidate genes (in a quantitative trait locus, for example) is investigated.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Present address: Department of Biological Sciences, Idaho State University.
2 Present address: Department of Biochemistry and Molecular Biology, The Pennsylvania State University.
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bielawski, J. P., K. A. Dunn, and Z. Yang. 2000. Rates of nucleotide substitution and mammalian nuclear gene evolution: approximate and maximum-likelihood methods lead to different conclusions. Genetics 156:1299-1308.
Fearon, E. R. 1997. Human cancer syndromes: clues to the origin and nature of cancer. Science 278:1043-1050.
Fortini, M. E., M. P. Skupski, M. S. Boguski, and I. K. Hariharan. 2000. A survey of human disease gene counterparts in the Drosophila genome. J. Cell. Biol. 150:F23-30.[CrossRef][ISI][Medline]
Gentleman, R., and R. Ihaka. 1996. R: a language for data analysis and graphics. J. Comp. Graph. Stat. 5:299-314.
Gojobori, T., and S. Yokoyama. 1987. Molecular evolutionary rates of oncogenes. J. Mol. Evol. 26:148-156.[ISI][Medline]
Hirsh, A. E., and H. B. Fraser. 2001. Protein dispensability and rate of evolution. Nature 411:1046-1049.[CrossRef][ISI][Medline]
Hurst, L. D., and C. Pal. 2001. Evidence for purifying selection acting on silent sites in BRCA1. Trends Genet. 17:62-65.[CrossRef][ISI][Medline]
Hurst, L. D., and N. G. Smith. 1999. Do essential genes evolve slowly? Curr. Biol. 9:747-750.[CrossRef][ISI][Medline]
Ina, Y. 1995. New methods for estimating the numbers of synonymous and nonsynonymous substitutions. J. Mol. Evol. 40:190-226.[ISI][Medline]
Jimenez-Sanchez, G., B. Childs, and D. Valle. 2001. Human disease genes. Nature 409:853-855.[CrossRef][ISI][Medline]
Kumar, S., and S. B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917-920.[CrossRef][ISI][Medline]
Kumar, S., and S. Subramanian. 2002. Mutation rates in mammalian genomes. Proc. Natl. Acad. Sci. USA 99:803-808.
Liberles, D. A. 2001. Evaluation of methods for determination of a reconstructed history of gene sequence evolution. Mol. Biol. Evol. 18:2040-2047.
Makalowski, W., and M. S. Boguski. 1998. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc. Natl. Acad. Sci. USA 95:9407-9412.
Mann, H., and D. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Statist. 18:50-60.[ISI]
Mushegian, A. R., D. E. Bassett, Jr., M. S. Boguski, P. Bork, and E. V. Koonin. 1997. Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. Proc. Natl. Acad. Sci. USA 94:5831-5836.
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426.[Abstract]
Nei, M., and S. Kumar. 2000. Molecular evolution and phylogenetics. Oxford University Press, New York.
Nei, M., P. Xu, and G. Glazko. 2001. Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms. Proc. Natl. Acad. Sci. USA 98:2497-2502.
Nekrutenko, A., K. D. Makova, and W. H. Li. 2002. The KA/KS ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res. 12:198-202.
Nesse, R. M. 2001. How is Darwinian medicine useful? West. J. Med. 174:358-360.[CrossRef][ISI][Medline]
Reiter, L. T., L. Potocki, S. Chien, M. Gribskov, and E. Bier. 2001. A systematic analysis of human disease-associated gene sequences in Drosophila melanogaster. Genome Res. 11:1114-1125.
Rubin, G. M., M. D. Yandell, and J. R. Wortman, et al. (50 co-authors). 2000. Comparative genomics of the eukaryotes. Science 287:2204-2215.
Stearns, S. C., and D. Ebert. 2001. Evolution in health and disease: work in progress. Q. Rev. Biol. 76:417-432.[CrossRef][ISI][Medline]
Yang, Z., and J. P. Bielawski. 2000. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15:496-503.[CrossRef][ISI][Medline]
Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:32-43.
Zar, J. H. 1999. Biostatistical analysis. Prentice Hall, Upper Saddle River, N.J.