From the Theoretical Systems Biology, Institute of Molecular Biotechnology, 07745 Jena, Germany; and ¶ Department of Biochemistry, National University of Ireland, Galway, Ireland
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
In this article, we explore the premise that protein levels are mainly determined by the corresponding mRNA levels, and we show to what extent translational regulation and selective degradation obliterate a perfect correlation between mRNA and protein abundance. Our focus lies on post-transcriptional regulation of protein amounts measured under standard log-growth conditions. For different compartments and functional modules we investigate to what extent protein levels are determined by the three factors mRNA concentration, translation rate (ribosome density and ribosome occupancy), and protein specific degradation. With "compartment" we always denote spatial subcellular structures, while "module" applies to functionally related genes and proteins. The value of analyzing protein-mRNA correlations for different functional modules and pathways has been noted previously (79). We demonstrate that the quality of protein-mRNA correlations varies among different cellular compartments and functional modules, and we quantify the contribution of post-transcriptional steps, including protein turnover, to the observed expression regulation of proteins. In addition, this study constitutes an example of how large-scale transcriptomics and proteomics data can be combined to gain new insights into cellular regulation.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
In order to correct the MSS for saturation effects, we compared the MSS values to values obtained by serial analysis of gene expression (SAGE) (16). Based on that comparison, we derived a correction for MSS larger than 13 copies per cell (see below).
More details on the data processing are given in the supplemental material. The complete mRNA reference dataset with the median, arithmetic mean, and different error estimates can be obtained from our web site (www.imb-jena.de/tsb/yeast_proteome).
Saturation Correction
Microarray measurements tend to underestimate high levels due to saturation during hybridization (5, 17). SAGE data on the other hand are inaccurate at low mRNA concentrations. In accordance with previous work (5, 7), we used SAGE measurements (16) to adjust the microarray measurements in the upper range. Plotting the two datasets against each other reveals a systematically increasing deviation between the datasets (Fig. S1, supplemental material). Average expression levels start to deviate significantly above 13 molecules per cell. Hence, the correction y = 0.053·MSS2.098 (y is the corrected mRNA signal) is applied to all MSS above 13 mRNA molecules per cell. This correction is based on a regression of the deviation for values larger than 13. Our approach of adjusting microarray data has the advantage of using a signal-de pend ent correction. This way of adjusting the two datasets to each other is similar to state-of-the-art normalization methods used for microarray analysis (15). Its main advantage is that large mRNA values (and all properties derived from it such as the protein-to-mRNA abundance ratio (PRR)) get more realistic. Finally, the consideration of SAGE measurements allows us to include three additional mRNA expression values from SAGE for which no microarray measurement is available.
Grouping and Correlation Analysis
Proteins have been grouped according to their localization and function. Annotations to modules and compartments were done following the MIPS classification (mips.gsf.de, ftp files from March 2004), using only the most general first level annotation. Protein groups are compared on the basis of median values. Significance of correlation is measured with the Spearman rank correlation coefficient rs throughout. No correlation has been calculated if less than 10 data points are available for the regression. If not stated differently the rs mentioned in the text are statistically significant (1% confidence level).
Protein Half-life Descriptor (PHD)
By combining the models from (7, 18, 19) we set up the following differential equation:
![]() |
where [Pi] and [mRNAi] are the protein and mRNA concentrations of the ith ORF; ktransl,i is the product of ribosome density and ribosome occupancy (fraction of mRNA bound to ribosomes) of the ith ORF (19); kp is a genome wide translation constant (essentially it quantifies the speed of elongation) (19) and kd,i is an ORF-specific destruction rate. Because average protein levels are constant at steady state (dP/dt = 0), we can calculate the half-life descriptor as PHDi = kp/kd,i = [Pi]/([mRNAi]·ktransl,i). The half-life of a protein is ln(2)/kd,i, thus the PHD is proportional to the in vivo half-life.
At the current stage, we refrain from expressing the PHD as a half-life in units of time, because the uncertainty of the underlying measurements is still too large to warrant its interpretation as an actual half-life. However, current data allow a classification of proteins into those with high and low stability. As the quality of protein abundance measurements improves also quantitative interpretation will become feasible.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Protein abundance data are taken from Refs. 2 and 7. The protein abundances are much more uncertain than the reference mRNA levels, because they are based on fewer measurements and because the measurement techniques are less mature. To reduce also the error of protein abundances, we calculated the average of protein levels from Refs. 2 and 7 whenever possible (1,669 ORFs are contained in both datasets). Protein versus mRNA correlations are most significant when using the averaged protein levels (see supplemental material), suggesting that the averaged protein concentrations are in fact characterized by reduced noise. The availability of two measurements for some of the genes allows to get at least some idea of the uncertainty of the protein abundance values and properties derived from them. See the supplemental material for a detailed comparison of the available datasets.
The efficiency of translation can be measured by ribosome density on the mRNAs and the fraction of mRNA bound to ribosomes (ribosome occupancy) (1, 1921). Hence, the observed protein levels should be better explained if in addition to mRNA abundance also ribosome density is taken into account. Previous calculations of ribosome density were based on ORF length (1, 20, 21). Using data from Refs. 1, 3, and 21, we calculated two different ribosome densities: Either the number of ribosomes gets divided by the ORF length (Ribosome Density 1) or by the transcript length (Ribosome Density 2). In both cases, the average number of ribosomes from Refs. 1 and 21 was used. We find that correlations between protein levels versus ribosome density are strongest when using Ribosome Density 2 (Table I), suggesting that ribosome densities calculated on the basis of transcript length better describe translational efficiency.
|
|
Protein-to-mRNA Ratio (PRR)
If there would be no post-transcriptional regulation of protein levels, the PRR were the same for all proteins. Thus, varying PRRs are indicative of post-transcriptional regulation.
The median PRR of all ORFs is 2,500 protein molecules per mRNA molecule, and the median values of modules and compartments vary by a factor of two around this cell-wide median (Fig. 1, c and d). The median PRR is smallest in the compartments "extracellular proteins" and "cell wall" and it is largest in the lipid particles. Among the functional modules, low PRRs appear in the module "protein synthesis" and the largest occur in the "energy" module. Large PRRs (i.e. efficient translation) of energy-related proteins is plausible, because many of these proteins are needed throughout the cell cycle and under all environmental conditions.
The PRR is similar to the "enrichment" proposed previously by Greenbaum et al. (22), who were using a smaller set of protein data. They came to similar conclusions with respect to enrichment in most modules. A significant difference between Greenbaums and our studies occurs only in case of "protein synthesis" where we find a PRR below the global average, while Greenbaum and colleagues observe an enrichment for these ORFs. The protein abundances used by Greenbaum and coauthors were obtained by gel-based techniques, which are known to have a bias toward proteins with longer half-lives and higher abundances (4, 5). Thus, proteins with low PRRs likely have been missed in previous studies. In addition, the two measures (median PRR, enrichment) are not identical, and differences with respect to growth conditions cannot be ruled out as a potential cause of this inconsistency.
Protein Abundance Is Weakly Correlated to mRNA Abundance and Translational Activity
The number of proteins synthesized per unit time depends on the number of mRNA molecules coding for this protein and on the respective translation rate. Therefore, we define "translational activity" as the product of mRNA abundance and translation rate (i.e. ribosome density times ribosome occupancy (1, 19)) (Figs. 1 and 2). If there is little post-transcriptional regulation for most proteins of a certain compartment or functional module, protein levels will be correlated to the corresponding mRNA concentrations. A significantly improved correlation when additionally accounting for the translation rate indicates strong translational regulation.
|
In agreement with previous observations (20, 21), ribosome density and ribosome occupancy are positively correlated with mRNA abundance (Table I). Thus, our analysis suggests a general tendency to increase mRNA levels and ribosome density in concert ("homodirectional changes" (20)). The different ribosome densities discussed above also yield two variants of translational activity. We find slightly improved correlations with mRNA and protein abundance when dividing the number of ribosomes per mRNA by the transcript length rather than the ORF length (Table I, supplemental material Fig. S4). A long transcript length compared with the ORF length is indicative for regulatory elements in the UTR (3). Such UTRs may be populated by ribosomes, e.g. if they contain upstream ORFs (12). Hence, regulatory elements on the UTR may reduce the effective ribosome density, which is partly being accounted for by using the transcript length instead of the ORF length. In the remainder we restrict our analysis to ribosome densities and translational activities based on transcript length.
Interestingly, there is no consistent improvement of the correlations when using translational activity instead of mRNA abundance (Fig. 2). In case of the functional modules, the correlation versus translational activity is mostly the same or it is slightly better than the correlation versus mRNA levels. There is, however, a strong, significant improvement of the correlation for the module "protein activity regulation," indicating that translational control of protein amounts is highly important for these proteins.
Evidence for Translation on Demand
When environmental signals require a quick cellular response, protein expression regulation via altering transcription may be too slow for urgently needed proteins. In such situations the cell constitutively maintains a sufficient level of mRNA, but blocks translation until the protein is actually needed (e.g. GCN4 regulation (23)). Most of such proteins will be synthesized at low levels under standard conditions (i.e. without stressors), while mRNA should be present at reasonable amounts to allow for "translation on demand." Translation on demand has been suggested for the yeast proteins GCN4, HAC1, and ICY2 (2325). By analyzing the correlations, mRNA levels, and ribosome densities we confirm this notion (Table II) and we identify new candidate genes that are potentially subject to translation on demand.
|
Other modules involved in fast response to environmental stimuli ("cellular communication/signal transduction," "cell rescue/defense/virulence," and "interaction with cellular environment") show similar patterns with respect to correlations, mRNA levels, and ribosome densities (Figs. 1 and 2), suggesting that a sub-set of the ORFs is regulated via translation on demand.
Protein Degradation Significantly Affects Protein-mRNA Ratios
Although protein levels in some modules are better explained by taking into account translation rates and mRNA abundance together, there remains a large amount of scatter. While this scatter must partly be attributed to uncertainty and variability of the measurements, also regulated protein turnover will be causative. We calculated a protein half-life descriptor (PHD, see "Materials and Methods") for about 4,000 proteins. The PHDs are provided as supplemental data and they can be downloaded from our web site.
The PHD quantifies the deviation from a perfect relationship between observed protein abundance and translational activity. Assuming that Equation 1 is a valid approximation of the real kinetic and neglecting noise in the data, the PHD values are proportional to the in vivo half-lives of the proteins (see "Materials and Methods"). Hence, small PHDs correspond to short half-lives and large PHDs to long half-lives. The PHD values lie between 0.04 (Rpl21p) and 5,000 (Pck1p). This is a range over 5 orders of magnitude, which is not unrealistic given that in vivo half-lives vary from a few seconds up to many days (26). However, more than 95% of the PHDs are between 0.1 and 100, i.e. by far the most PHDs lie within just 3 orders of magnitude.
Because the PHDs are based on five measured properties (protein and mRNA abundance, ribosome density and occupancy, transcript length), we expect large uncertainty of the calculated PHDs. If, for instance, a gene was differently expressed during the mRNA and protein abundance measurements, the PHD derived from these values could substantially over- or underestimate the true value.2 However, a qualitative agreement between in vivo half-lives and the PHDs should be achievable for many proteins. Based on protein abundance data from Refs. 2 and 7 and on ribosome densities/occupancies from Refs. 1 and 21, we can estimate the uncertainty of PHDs for 1,554 proteins that are contained in all four datasets. Here we define the PHD uncertainty range as the deviation of the maximum and minimum PHD based on all possible parameter combinations from the four datasets. PHDs vary by less than a factor of 2 for 186 proteins and the PHDs of 453 proteins vary by more than a factor of 10. Thus, about 30% of the PHDs deviate by more than one order of magnitude.
Table III shows calculated PHDs along with measured protein half-lives taken from the literature. The table shows that large PHDs often correlate with long half-lives. As a rough rule we can conclude that proteins with half-lives in the range of a few minutes up to an hour usually have PHDs below 3. Relative differences between turnover rates of related proteins are often well reflected by the PHD (P1p versus P2p, Hmg1p versus Hmg2p, Rad51p versus Rad52p). Fig. 1, c and d show median PHDs for different compartments and functional modules. High PRRs are unlikely to coincide with low PHDs (bottom right quadrant). A low PHD means that the respective protein has a low stability, which renders high PRRs unlikely. The module "protein synthesis" has a median PHD significantly below the cell average and at the same time a very good protein-mRNA correlation (Fig. 2b), suggesting that both transcription and turnover of ribosomal proteins are strongly regulated (3, 8, 9). Our findings confirm previous suggestions that ribosomal protein amounts are regulated via degradation of excess proteins (6, 2729).
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
We compared protein abundances from Ref. 2 with previous studies, which have used different techniques and which were performed on a smaller scale (7, 22). Although there is no obvious bias in the data (e.g. when plotting the two datasets against each other), the measurements sometimes deviate by orders of magnitude (Fig. S2, supplemental material). A comparison of ribosomal proteins reveals the uncertainty inherent to the available protein abundance data. In our reference dataset, which is the average of the data from Refs. 2 and 7, protein concentrations of ribosomal proteins range from 3,000 to 300,000 molecules per cell. When looking at the protein datasets from Refs. 2 and 7 separately, the ranges are 450600,000 and 500100,000, respectively. Whatever dataset is chosen, all ranges deviate significantly from a 1:1 stoichiometry assumed for ribosomal subunits.3 This certainly is of concern when looking at individual proteins. However, average results for compartments or modules are more stable against unbiased noise. A comparison of the module- and compartment-specific correlations (Fig. S3, Table S1, supplemental material) shows that the results with respect to differences between the protein groups are largely independent of the dataset chosen. For example, the finding that the median PHD of ribosomal proteins is comparably low is independent of the range of the values and the conclusion does not depend on the protein abundance dataset used. That means, even if we use only the protein abundances from Ref. 7, which are gel-based measurements, we find a median PHD significantly below the cell average (cell-wide median PHD, 6.4; median PHD for module "protein synthesis," 1.7). Although the current protein abundance data hardly allow quantitative prediction, their improvement in the future will yield substantially more detailed insights and even quantitative conclusions can be drawn.
A rising number of studies is looking at ribosome density as a measure for translational efficiency (1, 20, 21). It is therefore important to clarify whether ORF length or transcript length is more relevant for the translational efficiency. This study provides statistical evidence that ribosome densities based on transcript length are the better descriptor of translational efficiency, possibly because transcript length correlates to the presence of regulatory motives in UTRs (11, 30). A longer UTR has a higher probability of containing such a regulatory element (3), which might explain the weak though statistically significant negative correlation between protein abundance and UTR length (Table I).
The correlation of protein to mRNA levels in individual compartments is often weaker than for the cell-wide average. This finding supports the assumption that there is no general correlation between mRNA and protein abundance, and it is consistent with previous studies analyzing specific biochemical pathways (9, 21). However, a more pronounced correlation of the two properties can be observed for certain functional modules (Fig. 2b). This is most likely an effect of co-expression and common translation regulation of these genes (8, 20, 21).
The fact that protein abundance in some cases is better correlated to translational activity than to mRNA copy numbers alone supports the hypothesis that regulation at the translational level can at least partly be described by ribosome occupancy and ribosome density (20, 21). However, in some cases this assumption may be wrong. Ribosomes could bind to mRNA without actively translating the message (e.g. HAC1 (25)). As long as the number of ribosomes binding to such transcripts is low, conclusions are not significantly affected. In case of HAC1, the ribosome density is comparably low (0.26 per 100 nt) and even under normal conditions a small amount of Hac1p is detected (Table II, this is most likely the unspliced, noninduced form Hac1pu (25)). Thus, in this example, a low ribosome density corresponds with a low protein abundance, which is in agreement with our assumptions. Future work should more in detail investigate the blocking of translation during the elongation step. If this mechanism of translation regulation turns out to be relevant for a significant number of proteins, this could have severe implications also for previous studies that rest on the same assumptions as this work (20, 21). As long as only a small number of proteins is regulated in this way, our general conclusions based on the analysis of protein groups would not change.
In general, the protein abundance is only slightly better explained by translational activity than by mRNA abundance. Thus, the remaining scatter underlines the importance of other post-transcriptional control mechanisms. The large variability of the PHDs documents the importance of turnover for protein level regulation (13, 26). The PHD values calculated for all available proteins vary over 5 orders of magnitude. Even if we assume that, for instance, two orders of magnitude were due to noisy data, there would be a remaining variability of 3 orders of magnitude. If the large scatter is not completely random, protein turnover may be similarly important for protein abundance regulation than translational control, at least during vegetative growth.
Exploring the available data allows to identify compartments that are subject to translation on demand (e.g."signal transduction") or that are regulated via protein turnover (such as "cell wall"). Combinations could also be observed such as for ribosomal proteins, for which the data suggest joint transcriptional and post-transcriptional regulation. In agreement with previous findings (8, 9, 21), this study implies a significant primary response to environmental changes at the translational level, which remains undiscovered in the exclusive analysis of mRNA levels. In case of the module "protein activity regulation," the relevance of translational control is particularly apparent, which should trigger more detailed analyses of the expression regulation of this group of proteins.
Finally, we demonstrate the possibility to calculate a PHD, which relates the steady-state protein level to the synthesis rate (18). We had to combine measurements from different laboratories, where growth conditions might not always be identical. In order to improve the precision of PHDs, independent experimental verification of protein levels is particularly important. Recently, other means of measuring protein turnover rates at a larger scale have been suggested (14, 18, 31), but up to now no half-life dataset of the size presented here has been published.
Our analysis helps to identify compartments where microarray experiments might be sufficient to predict protein level regulation as opposed to those where post-transcriptional regulation has to be taken into account. Future work should more precisely identify conditions under which a good correlation between gene transcription and protein abundance can be expected. Having this information available is crucial for correctly interpreting gene expression data, such as those obtained from microarray experiments.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
1 The abbreviations used are: MSS, microarray-based standard signal; PHD, protein half-life descriptor; PRR, protein-to-mRNA abundance ratio; SAGE, serial analysis of gene expression.
2 In addition, the affinity tagging employed in Ref. 2 could potentially alter the in vivo half-lives of some proteins. According to Ref. 2, the large majority of proteins that are known to be short-lived is rapidly degraded also after tagging, "indicating that the tag is not inhibiting their proteolysis" (citation from the supplemental material for Ref. 2). However, such analysis does not exclude the possibility that the half-life of stable proteins are shortened or even increased.
3 Despite of experimental errors, this deviation may also be due to additional functions of the proteins. For instance, RPL40 also encodes a ubiquitin protein. Similarly, other ribosomal proteins may have additional functions, which would partly explain the scatter of protein abundances.
* This work has been funded by the Federal Ministry of Education and Research, Germany. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
S The on-line version of this manuscript (available at http://www.mcponline.org) contains supplemental material.
Published, MCP Papers in Press, August 23, 2004, DOI 10.1074/mcp.M400099-MCP200
To whom correspondence should be addressed: Institute of Molecular Biotechnology, Beutenbergstr. 11, D-07745 Jena, Germany. Tel.: 49-3641-65-6331; Fax: 49-3641-65-6191; E-mail: beyer{at}imb-jena.de
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|