Laboratoire de Biométrie, Génétique et Biologie des Populations, Unité Mixte de Recherche Centre National de la Recherche Scientifique 5558, Université Claude Bernard, Villeurbanne, France
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The first studies on protein evolution (Dickerson 1971
) revealed that the rate of amino acid substitution varies considerably among proteins (Li and Graur 1991
; Bernardi, Mouchiroud, and Gautier 1993
; Wolfe and Sharp 1993
). This variation is thought to reflect mainly differences in functional constraints, i.e., in the proportion of the sequence that is critical to the function of the protein. Recently, analyses of a few vertebrate gene families have shown (1) that the degree of sequence conservation varies according to the tissue in which proteins are expressed (Kuma, Iwabe, and Miyata 1995
; Hughes 1997
) and (2) that broadly expressed proteins tend to be more conserved than tissue-specific ones (Hastings 1996
). Both observations were interpreted as resulting from stronger functional constraints on proteins expressed in more diverse cellular environments.
In mammals, the rate of synonymous substitution also varies significantly among genes (Bernardi, Mouchiroud, and Gautier 1993
; Wolfe and Sharp 1993
; Mouchiroud, Gautier, and Bernardi 1995
). It is, however, not yet clear whether this variation reflects variability in mutation rates along genomes or differences in selective pressure on silent sites. Many authors consider that silent sites are neutral because average substitution rates at synonymous sites are very close to substitution rates in pseudogenes or in the genome as a whole (Li and Graur 1991
; Wolfe and Sharp 1993
). However, there is evidence for selection on codon usage in mouse histone genes (Debry and Marzluff 1994
), and comparisons of synonymous and nonsynonymous substitution rates suggest that silent positions may be to some extent under selective constraints (Mouchiroud, Gautier, and Bernardi 1995
; Ohta and Ina 1995
; Alvarez-Valin, Jabbari, and Bernardi 1998
). Selection on synonymous codon usage has been demonstrated in many species, not only in bacteria but also in eukaryotes (including some invertebrates and plants; for a review, see Sharp et al. 1995
). In all cases, the intensity of selection is positively correlated with gene expression level (Gouy and Gautier 1982
; Sharp and Li 1986
; Duret and Mouchiroud 1999
). Thus, if such selection operates in mammals, one should also expect a correlation between synonymous substitution rate and gene expression level. It has been also proposed that the mutation rate might vary with gene expression pattern (Sullivan 1995
). Indeed, it has been shown that nucleotide excision repair, one of the major DNA repair systems, is more efficient in transcribed DNA than in nontranscribed DNA (reviewed in Sullivan 1995
). Thus, genes expressed in the germ line should be more efficiently repaired and hence evolve more slowly than others (Sullivan 1995
).
In this paper, we studied the relationships between substitution rates and tissue distribution of gene expression to try to determine whether gene expression patterns affect mutation rates and/or selection intensity at different sites: synonymous and nonsynonymous codon positions and noncoding regions. We analyzed a large data set of 2,400 human/rodent orthologous genes and 834 pairs of mouse/rat orthologs. The tissue distribution of human genes was estimated by comparing their protein-coding sequences (CDSs) to a database of expressed sequence tags (ESTs) representing 19 tissues from three development states. These 19 tissues are expected to be representative of the whole organism. Hereafter, genes that are expressed in at least 16 tissues will be considered ubiquitous, whereas those that are detected in 03 tissues will be considered tissue-specific. Ubiquitous and tissue-specific genes make up, respectively, 4% and 50% of the data set (table 1 ). Our analysis provided no evidence for variation of the mutation rate according to gene expression pattern and no evidence for selection on synonymous sites but revealed a remarkable relationship between selective pressure on functional sites (in both coding and noncoding DNA) and tissue distribution of gene expression.
|
![]() |
Materials and MethodsSequence Data |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Expression Profiles
We selected from GenBank (release 110, December 1998; Benson et al. 1998
) 679,286 human ESTs from 19 tissues: placenta, liver (fetal, adult), fetal heart, lung (fetal, adult), brain (fetal, infant, adult), breast, colon, testis, retina, uterus, lymphocyte, muscle, prostate, pancreas, and neuron. cDNA libraries from cell culture, tumors, pooled organs, or unidentified tissues were excluded. To limit stochastic variations in expression measures, we retained only cDNA libraries that had been sampled with at least 10,000 ESTs. Expression profiles of human CDSs were determined by counting the numbers of tissues in which they were represented by at least one EST. CDSs were first filtered with the XBLAST program (Claverie and States 1993
) to mask repetitive elements (Alu, L1, MIR, microsatellites, etc.). CDSs were then compared with the EST data set using BLASTN2 (Altschul et al. 1997
). BLASTN2 alignments showing at least 95% identity over 100 nt or more were counted as sequence matches. This criterion was chosen to be low enough to allow the detection of most ESTs despite sequencing error (the average sequence accuracy of ESTs is about 97%) (Hillier et al. 1996
) but stringent enough to distinguishin most casesdifferent members of highly conserved gene families (e.g., for ß- and
-actins, proteins are 98% identical and CDSs are 91% identical; for cardiac and skeletal
-actins, proteins are 99% identical and CDSs are 85% identical; for histones H3.3A and H3.3B, proteins are 100% identical and CDSs are 79% identical). The list of selected genes and their expression patterns is available at http://pbil.univ-lyon1.fr/datasets/Duret_Mouchiroud_1999/data .html.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Substitution Rate in Coding Regions and Tissue-Distribution Breadth
Analysis of Ka values in human/rodent orthologs according to gene expression patterns revealed a sharp negative correlation between Ka and tissue distribution breadth (fig. 1
). On average, tissue-specific proteins evolve almost three times as fast as ubiquitous ones (table 2
). If this variation is due to differences in mutation rate, Ks should vary accordingly. However, the Ka/Ks ratio shows exactly the same variation as Ka (fig. 1
). Thus, the decrease in Ka demonstrates an increase in selective pressure on the amino acid sequence. The analysis of mouse/rat orthologs revealed exactly the same trend (fig. 1
). There are, of course, some slowly evolving tissue-specific proteins. However, analysis of the distribution of Ka values clearly shows an overall shift toward high values in tissue-specific genes compared with ubiquitous ones (fig. 2
).
|
|
|
|
|
|
Substitution Rates in 3' UTRs and 5' UTRs
Analyses of human/rodent orthologs have shown that substitution rates in coding and 5' and 3' noncoding regions are correlated (Ogata, Fujibuchi, and Kanehisa 1996
; Makalowski and Boguski 1998
). Obviously, these correlations cannot be attributed to the neighboring effects responsible for the Ka/Ks correlation. Interestingly, in human/rodent orthologs, the substitution rate within 3' UTR (K 3'UTR) shows exactly the same relationship with the expression pattern as Ka: (1) K 3'UTR decreases steadily with increasing expression breadth (fig. 5
and table 2
), and (2) liver-specific genes have significantly higher K 3'UTR values than brain-specific genes (table 3
). The same trend is observed with 5' UTRs (tables 2 and 3
). However, the differences are not statistically significant, probably because of the small sample size. This finding confirms that 5' and 3' UTRs do not evolve as selectively neutral sequences but, instead, are functionally constrained (Duret, Dorkeld, and Gautier 1993
) and suggests that, as for coding sites, the selective pressure on UTRs is dependent on tissue distribution.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Gene Expression and Selection on Silent Sites
As mentioned in the introduction, in all species in which selection affects synonymous codon usage, the intensity of selection is positively correlated with the gene expression level (Gouy and Gautier 1982
; Sharp and Li 1986
; Duret and Mouchiroud 1999
). As a consequence of this stronger purifying selection, lower Ks values are expected in highly expressed genes compared with weakly expressed genes. Indeed, it has been shown both in bacteria and in drosophila that synonymous substitution rates are lower in genes with a strong codon usage bias (highly expressed) than in other genes (Sharp and Li 1987, 1989; Shields et al. 1988;
Powell and Moriyama 1997
). The fact that we did not observe any correlation between Ks and gene expression pattern in our data set thus suggests that silent sites are not constrained by selection in mammals. Indeed, we did not find any relationship between synonymous codon usage and gene expression among the 2,400 human genes in our data set (data not shown).
Gene Expression and Intensity of Selection
We found a remarkable negative correlation between Ka (and Ka/Ks) and tissue distribution breadth in both human/rodent and mouse/rat orthologs (fig. 1
). This indicates that the selective pressure on nonsynonymous sites depends on the number of tissues in which genes are expressed. Since gene-specific nonsynonymous substitution rates are highly conserved in different mammalian lineages (Mouchiroud, Gautier, and Bernardi 1995
), it is likely that this observation stands for all mammals. Indeed, we observed exactly the same effect in 482 human/bovine orthologous genes (data not shown). A similar trend has already been reported for vertebrates by Hastings (1996)
, who compared the amino acid substitution rates of tissue-specific and broadly expressed protein isoforms. Hastings (1996)
proposed that the increase in selective pressure might result from the more diverse biochemical environments to which broadly expressed proteins are exposed. Broadly expressed proteins may interact with a greater variety of molecules and may have to function under a wider range of physical/chemical conditions (e.g., pH) than narrowly expressed proteins. Hence, more sites would be constrained by protein function.
Although this model probably explains a part of the variability in Ka, we do not think that variations in biochemical environments between different tissues are sufficient to account for the threefold decrease in Ka in ubiquitous versus tissue-specific genes. We propose an additional explanation to account for that observation. To simplify, let us consider two protein isoforms that have exactly the same function in the cell, X1, which is broadly expressed, and X2, which has a restricted tissue distribution. Assume that the biochemical environment is constant in all tissues and, finally, consider a mutation that reduces the activity of that protein. This mutation is likely to have a greater phenotypic effect (and hence a stronger impact on the fitness of the organism) in X1 than in X2 simply because it will affect more tissues or development stages. Thus, a slightly or mildly deleterious mutation is more likely to be counterselected when it occurs in a broadly expressed gene than when it occurs in a tissue-specific gene. Of course, sequences of several tissue-specific genes that are crucial for the organism are highly constrained. However, on average, genes contain many sites at which mutations are not highly deleterious, and for all of those sites, the efficiency of selection will depend on the number of tissues in which genes are expressed. It is likely that this effect accounts for at least a part of the steady decrease in Ka with increasing tissue distribution breadth.
This effect should affect not only protein-coding sites, but also all other elements required for gene function. We have previously shown that many mammalian genes contain long regulatory elements within their 3' UTRs, most of which are probably involved in posttranscriptional regulation of gene expression (Duret, Dorkeld, and Gautier 1993
). Interestingly, we noted that such elements are 2.5-fold more frequent in widely expressed genes than in tissue-specific genes (Duret, Dorkeld, and Gautier 1993
). Indeed, as for Ka, there is a negative correlation between substitution rate within 3' UTR (K 3'UTR) and tissue distribution breadth (fig. 5
). Thus, it seems that, as for coding sites, the efficiency of selection on regulatory elements increases with increasing tissue distribution.
Our results also show that the substitution rates of tissue-specific proteins vary considerably according to the tissue (fig. 4
) and confirm the strong selective pressure on brain-specific proteins (Kuma, Iwabe, and Miyata 1995
; Hughes 1997
). Again, this variation in Ka had been interpreted in terms of functional constraints on protein sequence. It had been proposed that the stronger selective pressure in brain-specific proteins is a consequence of a higher complexity of biochemical networks in the brain compared with those of other tissues (Kuma, Iwabe, and Miyata 1995
). Conversely, many of the lymphocyte-specific proteins are involved in the immunity response. Thus, the higher average Ka values in those proteins might reflect in part the positive selection for sequence diversity in response to environmental changes (Hughes 1997
).
However, the differences in K 3'UTR (table 3 ) can obviously not be explained by such factors. One could argue that brain-specific genes contain more 3' UTR regulatory elements than liver- or lymphocyte-specific genes. Indeed, it is possible that posttranscriptional regulation plays a more important role in tuning the expression level of brain-specific genes than in tuning those of liver- or lymphocyte-specific genes. However, the correlation between Ka and K 3'UTR suggests that both observations result from a same factor. Seemingly, mutations in coding regions or in regulatory elements both have, on average, higher impacts on fitness in genes expressed in brain than in liver-specific genes. This observation probably reflects the central role of the brain compared to peripheral organs.
In summary, the phenotypic impact of a mutation in a gene functional element (protein-coding, regulatory region, etc.) depends not only on its direct effect on the biochemical activity of this gene (or its product), but also on the number and the nature of tissues in which this gene is expressed.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Abbreviations: aa, amino acid; CDS, protein-coding sequence; EST, expressed sequence tag; Ka, number of nonsynonymous substitutions per site; Ks, number of synonymous substitutions per site.
2 Keywords: mammals,
gene expression,
substitution rate,
noncoding regions,
codon usage,
DNA repair.
3 Address for correspondence and reprints: Laurent Duret, Laboratoire de Biométrie, Génétique et Biologie des Populations, Unité Mixte de Recherche Centre National de la Recherche Scientifique 5558, Université Claude Bernard, 43 Boulevard du 11 Novembre 1918, 69622 Villeurbanne cedex, France. E-mail: duret{at}biomserv.univ-lyon1.fr
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:33893402.
Alvarez-Valin, F., K. Jabbari, and G. Bernardi. 1998. Synonymous and nonsynonymous substitutions in mammalian genes: intragenic correlation. J. Mol. Evol. 46:3744.[ISI][Medline]
Bains, W. 1992. Local sequence dependence of rate of base replacement in mammals. Mutat. Res. 267:4354.[ISI][Medline]
Benson, D. A., M. S. Boguski, D. J. Lipman, J. Ostell, and B. F. F. Ouellette. 1998. GenBank. Nucleic Acids Res. 26:17.
Bernardi, G., D. Mouchiroud, and C. Gautier. 1993. Silent substitutions in mammalian genomes and their evolutionary implications. J. Mol. Evol. 37:583589.[ISI][Medline]
Bohr, V. A., C. A. Smith, D. S. Okumoto, and P. C. Hanawalt. 1985. DNA repair in an active gene: removal of pyrimidine dimers from the DHFR gene of CHO cells is much more efficient than in the genome overall. Cell 40:359369.
Boulikas, T. 1992. Evolutionary consequences of nonrandom damage and repair of chromatin domains. J. Mol. Evol. 35:156180.[ISI][Medline]
Claverie, J.-M., and D. J. States. 1993. Information enhancement methods for large scale sequence analysis. Comput. Chem. 17:191201.[ISI]
Debry, R. W., and W. F. Marzluff. 1994. Selection on silent sites in the rodent H3 histone gene family. Genetics 138:191202.
Dickerson, R. E. 1971. The structure of cytochrome c and the rates of molecular evolution. J. Mol. Evol. 1:2645.[Medline]
Duret, L., F. Dorkeld, and C. Gautier. 1993. Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. Nucleic Acids Res. 21:23152322.[Abstract]
Duret, L., and D. Mouchiroud. 1999. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96:44824487.
Duret, L., D. Mouchiroud, and M. Gouy. 1994. HOVERGEN: a database of homologous vertebrate genes. Nucleic Acids Res. 22:23602365.[Abstract]
Gouy, M., and C. Gautier. 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10:70557074.[Abstract]
Gouy, M., C. Gautier, M. Attimonelli, C. Lanave, and G. Di-Paola. 1985. ACNUCa portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage. Comput. Appl. Biosci. 1:167172.[Abstract]
Hastings, K. E. M. 1996. Strong evolutionary conservation of broadly expressed protein isoforms in the troponin I gene family and other vertebrate gene families. J. Mol. Evol. 42:631640.[ISI][Medline]
Hess, S. T., J. D. Blake, and R. D. Blake. 1994. Wide variations in neighbor-dependent substitution rates. J. Mol. Biol. 236:10221033.[ISI][Medline]
Hillier, L., G. Lennon, M. Becker et al. (26 co-authors). 1996. Generation and analysis of 280,000 human expressed sequence tags. Genome Res. 06:807828.[Abstract]
Hughes, A. L. 1997. Rapid evolution of immunoglobulin superfamily C2 domains expressed in immune system cells. Mol. Biol. Evol. 14:15.[Abstract]
Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, England.
Kuma, K., N. Iwabe, and T. Miyata. 1995. Functional constraints against variations on molecules from the tissue level: slowly evolving brain-specific genes demonstrated by protein kinase and immunoglobulin supergene families. Mol. Biol. Evol. 12:123130.[Abstract]
Li, W. H. 1993. Unbiased estimation of the rates of synonymous and nonsynonymous substitutions. J. Mol. Evol. 36:9699.[ISI][Medline]
Li, W. H., and D. Graur. 1991. Fundamentals of molecular evolution. Sinauer, Sunderland, Mass.
Makalowski, W., and M. S. Boguski. 1998. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc. Natl. Acad. Sci. USA 95:94079412.
Mellon, I., V. A. Bohr, C. A. Smith, and P. C. Hanawalt. 1986. Preferential DNA repair of an active gene in human cells. Proc. Natl. Acad. Sci. USA 83:88788882.
Mouchiroud, D., C. Gautier, and G. Bernardi. 1995. Frequencies of synonymous substitution in mammals are gene-specific and correlated with frequencies of non-synonymous substitutions. J. Mol. Evol. 40:107113.[ISI][Medline]
Ogata, H., W. Fujibuchi, and M. Kanehisa. 1996. The size differences among mammalian introns are due to the accumulation of small deletions. FEBS Lett. 390:99103.[ISI][Medline]
Ohta, T., and Y. Ina. 1995. Variation in synonymous substitution rates among mammalian genes and the correlation between synonymous and nonsynonymous divergences. J. Mol. Evol. 41:717720.[ISI][Medline]
Powell, J. R., and E. N. Moriyama. 1997. Evolution of codon usage bias in Drosophila. Proc. Natl. Acad. Sci. USA 94:77847790.
Schaeffer, L., R. Roy, S. Humbert, V. Moncollin, W. Vermeulen, J. H. Hoeijmakers, P. Chambon, and J. M. Egly. 1993. DNA repair helicase: a component of BTF2 (TFIIH) basic transcription factor. Science 260:5863.
Sharp, P. M., M. Averof, A. T. Lloyd, G. Matassi, and J. F. Peden. 1995. DNA sequence evolution: the sounds of silence. Philos. Trans. R. Soc. Lond. B Biol. Sci. 349:241247.[ISI][Medline]
Sharp, P. M., and W. H. Li. 1986. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24:2838.[ISI][Medline]
. 1987. The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol. Biol. Evol. 4:222230.[Abstract]
. 1989. On the rate of DNA sequence evolution in Drosophila. J. Mol. Evol. 28:398402.[ISI][Medline]
Shields, D. C., P. M. Sharp, D. G. Higgins, and F. Wright. 1988. "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5:704716.[Abstract]
Sullivan, D. T. 1995. DNA excision repair and transcription: implications for genome evolution. Curr. Opin. Genet. Dev. 5:786791.[ISI][Medline]
Svejstrup, J. Q., Z. Wang, W. J. Feaver, X. Wu, D. A. Bushnell, T. F. Donahue, E. C. Friedberg, and R. D. Kornberg. 1995. Different forms of TFIIH for transcription and DNA repair: holo-TFIIH and a nucleotide excision repairosome. Cell 80:2128.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680.[Abstract]
Turker, M. S., G. E. Cooper, and P. L. Bishop. 1993. Region-specific rates of molecular evolution: a fourfold reduction in the rate of accumulation of "silent" mutations in transcribed versus nontranscribed regions of homologous DNA fragments derived from two closely related mouse species. J. Mol. Evol. 36:3140.[ISI][Medline]
Wolfe, K. H., and P. M. Sharp. 1993. Mammalian gene evolutionnucleotide sequence divergence between mouse and rat. J. Mol. Evol. 37:441456.[ISI][Medline]