(Received for publication, October 10, 1995; and in revised form, December 28, 1995)
From the
This work describes a rapid and sensitive technique for the
identification of Saccharomyces cerevisiae proteins on
two-dimensional gels based on the determination of their amino acid
ratios. Specific double labeling with H and
C
or
S-labeled amino acids, chosen among those that are
specifically incorporated into proteins without interconversion,
allowed an accurate measurement of different amino acid ratios for 200
proteins. A computer program was developed to screen a yeast data base
containing 1700 protein sequences and to identify proteins matching the
measured M
, pI, and amino acid ratios. The method,
tested with 45 reference proteins, allowed 79 new identifications
corresponding to abundant proteins belonging to a few functional
families. Some protein spots correspond to homologs of mammalian
proteins or to uncharacterized open reading frames. Remarkably, among
identified proteins of similar abundance, the organellar proteins have
a markedly lower codon usage bias than the cytosolic ones. The double
labeling technique is particularly suited to the analysis, on a single
two-dimensional gel, of the influence of physiological or genetic
changes on yeast protein content.
Saccharomyces cerevisiae is a model organism to investigate the biology of the eukaryotic cell. The yeast genome is compact (13.5 megabases) and contains only 6000-7000 genes. More than 1000 of them have been characterized by traditional techniques. The systematic genome sequencing initiated by the European Community is progressing rapidly and should soon be completed. The most striking observation derived from sequencing data is that nearly 40% of putative proteins deduced from gene sequence are novel, without homology to previously known proteins, and thus without predicted functions. Moreover, the majority of these novel genes are non-essential. In this context, the functional analysis of the yeast genome is a real challenge for the coming years.
Two-dimensional gel electrophoresis (1) is a powerful tool to undertake functional studies relevant to the yeast genome. More than 1500 soluble proteins of yeast are detectable and well separated on two-dimensional gels. This technique offers the opportunity to detect alterations in protein synthesis, protein modifications, and protein degradation occurring in response to environmental or genetic changes. However, the two-dimensional gel approach suffers from the low number of proteins which are identified on the yeast protein map. Up to now, less than 80 spots have been unambiguously characterized(2, 3, 4, 5) .
Yeast
proteins separated on two-dimensional gels can be identified by
different techniques: coelectrophoresis with a pure protein, detection
with specific antibodies, microsequencing or overexpression from genes
on high-copy plasmids(4, 5) . Identification of large
numbers of protein spots by either method is a daunting task. Other
methods have been proposed based on the measure of pI, M, and a third structural parameter allowing the
search of the corresponding protein in data bases. Partial amino acid
composition(6, 7, 8, 9, 10, 11) and
accurate M
measurement of peptidic cleavage
products based on matrix-assisted laser desorption/ionization mass
spectroscopy (12, 13) have been used as third
parameter. Although these methods remain predictive, they are sensitive
and useful as an integrated approach to protein identification. This
paper describes the simple and rapid amino acid analysis of 200 yeast
proteins by the selective incorporation of several pairs of
H- and
C-labeled amino acids. Seventy nine new
protein spots were identified based on their M
,
pI, and amino acids ratios, which doubles the number of yeast proteins
presently identified on two-dimensional gels. Identified proteins
belong to few functional families. We noted that cytosolic and
organellar proteins of similar abundance have a markedly different
codon bias.
For double labeling, 0.75-1 mCi of L-[4,5-H]leucine was added with one of
the following L-U-
C-labeled amino acids: Ile,
Tyr, Phe (200 µCi), Arg, His, Lys (400 µCi), or a
S-labeled amino acid Cys or Met (400 µCi). To avoid
conversion of methionine to cysteine, 1 mM unlabeled cysteine
was added to the culture medium. In one case, leucine was supplied as
C-labeled amino acid (60 µCi) with L-[5-
H]tryptophan (1 mM).
S- and
H-labeled amino acids were purchased
from Amersham Corp. U-
C-labeled amino acids were produced
and purified in our laboratory. All isotopes were of the highest
available specific activity.
After labeling, cells were collected by
centrifugation, washed with 200 µl of HO, and frozen as
a pellet at -180 °C. To avoid problems of radiolysis, the
cell pellet should be analyzed within 1 month.
First dimension gels were reequilibrated 1 min before loading on the second dimension slab gel. The slab gels were 23 cm long, 21 cm high, and 1 mm thick and contained 12.5% acrylamide (A/B: 37.5 (w/v)) and 0.38 M Tris-HCl, pH 8.8. Electrophoresis was performed at 13-15 °C at 17 watts/gel for 5 h 30 min. Electrophoresis buffer was 25 mM Tris base, 192 mM glycine, and 0.2% SDS.
After electrophoresis, gels were stained with Coomassie Brilliant Blue R-250, dried, and processed for autoradiography by standard procedures.
C/
H ratios were calculated by the double
reinjection method using the internal standard method(16) .
C or
S spill into
H window was
11-13%, and the
H spill into the
C
window was 1-2%. The average
C/
H ratio
ranged from 0.12 to 0.5, depending on the double labeling experiments.
Each sample was counted for 15 min. When the counting was less than
1000 cpm in H or less than 400 cpm in
C, the
measure was not taken into account. This detection limit corresponds to
about 0.1 µg of protein.
The His/Leu and Cys/Leu ratios of each protein were determined from duplicate experiments. The other double labelings were not duplicated.
The six proteins having the smallest distance were selected. The best candidate corresponded to the lowest distance (d1) and the five next candidates corresponded to the distances d2 to d6.
A limit distance (dL) was chosen as the sum of standard deviations of amino acid ratios. For example, for nine amino acid ratios, dL = 0.06 (Ile/Leu) + 0.08 (Phe/Leu) + 0.14 (His/Leu) + 0.15 (Arg/Leu) + 0.15 (Lys/Leu) + 0.17 (Tyr/Leu) + 0.19 (Trp/Leu) + 0.20 (Met/Leu) + 0.21 (Cys/Leu) = 1.35.
If d1 < dL, the best candidate was considered as identified. If d1 > dL, we considered that the data were not accurate enough or that the protein was not in the data base.
Figure 1:
HPLC profiles
of amino acid conversion tests. Cells were labeled for 30 min with
[C]Leu, [
C]Ala,
[
C]Thr, or [
C]Val. The
proteins were hydrolyzed, and the resulting amino acids were applied to
a HPLC column (see ``Materials and
Methods'').
We have no direct information about the interconversion of
Asn, Gln, Cys, Met, and Trp, due to the degradation of these amino
acids during acid hydrolysis. However, protein labeling with H-Trp or
S-Cys indicated a poor
interconversion of these amino acids, as inferred from the very low
labeled intensity of reference proteins known to be devoid of Trp
(SOD1, HSP60, TIF1) or Cys (SSC1, PGI1). In contrast, labeling with
[
S]Met revealed a notable conversion of Met to
Cys, since the only reference protein devoid of Met (TPI1) was labeled.
However, this interconversion disappeared when 1 mM Cys was
added to the culture medium.
Altogether, these results indicate that
14 amino acids can be used for in vivo labeling of proteins
under conditions yielding only a minimum amount of interconversion
(less than 15%). Amino acids chosen for protein labeling must also be
discriminant: less frequent amino acids have variable distribution
among proteins and are likely to give the most useful information. The
amino acids must also label proteins with high efficiency. For example
utilization of [C]Pro was avoided because the
proline transport system is strongly repressed under our culture
conditions(18) . In the same line, labelings with Ala, Val, or
Thr were not performed because they required the addition of other
amino acids in excess which limits the transport of the marker and the
labeling efficiency. Taking into account these additional criteria, 10
amino acids were chosen for the present study: Trp, Cys, Met, His, Arg,
Tyr, Phe, Lys, Ile, and Leu.
Since no major difference in interconversions was observed between short or long term labeling (Table 1), we chose a labeling period of 5 h to obtain high levels of incorporation.
Practically,
after double labeling, total yeast proteins were extracted and
separated on two-dimensional gels. The 200 most abundant proteins were
numbered and individual spots were excised and counted for H and
C or
S radioactivity under
carefully controlled conditions. Among these proteins, the identity and
the sequence of 45 protein spots were already unambiguously identified
by microsequencing or overexpression methods (Table 2). These
reference proteins were used as internal standards. For each double
labeling, the experimental isotope ratios of the reference proteins
were plotted against their known amino acid ratios (Fig. 2). In
each case, the experimental values were in good agreement with the
theoretical amino acid ratios.
Figure 2:
Measured C/
H
ratio or
S/
H ratio as a function of theorical
amino acid ratios for reference proteins. The lines were drawn to
minimize the average error. The standard deviations were 6% for
Ile/Leu, 15% for His/Leu, 15% for Lys/Leu, 19% for Trp/Leu, 45% for the
first Met/Leu labeling and 21% for the second Met/Leu experiment
supplemented with 1 mM Cys.
Two Met/Leu labelings were carried out with or without unlabeled Cys in the culture medium (Fig. 2). With added Cys, the average error decreased from 30 to 18% and the standard deviation from 45 to 21%. Therefore, the error observed was largely due to the interconversion of labeled Met to labeled Cys.
The curves minimizing the average error of these 45 reference proteins for each amino acid ratio are shown in Fig. 2. These calibration curves were used to determine the amino acid ratios for the other 155 proteins analyzed.
Using the experimental pI, M, and amino acid ratios of the 45 reference
proteins, we tested the reliability of the search in the data base. The
results are reported in Table 2. The correct protein or a closely
homologous one (more than 90% identity) obtained the best score in 44
of 45 cases. The correct identification (d1 < dL)
occurred in 37 cases. In three cases, the identified protein was a very
homologous one. In four cases, no protein was identified (d1
> dL), but the correct protein ranked first each time. In
only one case, an incorrect protein was identified (PDC1 in the place
of ZWF1 which came third with a score of 1.37). An identification was
proposed in 90% of the searches (41 of 45 cases).
It was interesting that the second, third, and next best scores, d2, d3, etc., were generally similar and markedly higher than d1. Moreover d2 was higher than dL in 40 of 45 cases, indicating that the dL value was well chosen. d2 represents the statistical best score when the right protein is absent from the data base. Consequently, in almost all cases, if the sequence of the correct protein or a very homologous one was not in the data base, no protein would be identified.
In numerous cases, the search distinguished correctly between proteins of the same family. Correct identifications were found even between proteins having more than 90% sequence identity like PDC1/PDC5, ENO1/ENO2, ADH1/ADH2, TDH2/TDH3, TIF51A/ANB1, SSA1/SSA2, SSB1/SSB2. Only three errors were observed: HSP82 was identified in the place of the very homologous protein HSC82 and SSB2 in the place of the two isoforms of SSC1. In these three cases, the correct protein ranked second with a very close score.
Figure 3:
Yeast protein two-dimensional map with
names of identified proteins (listed in Table 2and Table 3). Strain S288C was grown on glucose as carbon source and
labeled with [S]Met at mid log phase. 4
10
cpm of protein extract were loaded on the first
dimensional gel. The gel was exposed 2 days for autoradiography.
Reference proteins are in bold characters and newly identified proteins
are in standard characters.
Some of the proposed identifications were confirmed by independent experiments: as expected, spots identified as KAR2 and UBC4 (heat shock proteins) were induced by heat shock. Spots proposed as PDB1, ILV2, ILV5, MDH1, SOD2, ATP2, and COR1 known to be mitochondrial proteins were enriched in mitochondrial extracts (data not shown). The intensity of LEU1, LYS9, MET25, and CYS3 spots decreased when, respectively, leucine, lysine, and cysteine were added in the culture medium (data not shown). These observations underscored the reliability of the predictions.
The majority of the identified proteins of known localization were cytosolic (57 among 79), but we found also 10 mitochondrial proteins (ATP2, COR1, HSP60, ILV2, ILV5, MDH1, MRP8, PDB1, SOD2, SSC1), 6 nuclear (EGD1, GSP1, HSP104, PRS4, PRS5, UBA1), 4 vacuolar (APE1, PEP4, VMA1, VMA2), 4 proteins of the endoplasmic reticulum (CDC48, KAR2, SAR1, TRG1), and 1 vesicular protein (CLC1).
The 104 identified proteins are among the most abundant proteins of yeast. The large majority of these proteins (93) had a CBI higher than 0.35 and the correlation observed by Bennetzen and Hall (17) between CBI and protein abundance was largely confirmed (Fig. 4). However, we identified 11 abundant proteins whose gene CBI was lower than 0.25 (APE1, CLC1, MRP8, PRS4, PRS5, RBK1, UBA1, ZWF1, L80003.20, YEL071W, and YKL117W). We noticed that among them, the six proteins of known localization were totally or partially localized in a subcellular compartment. Conversely, considering proteins of similar abundance (Fig. 4), the average CBI of cytosolic proteins was 0.68, whereas the average CBI of the 25 identified proteins known to be localized in organelles was significantly lower (0.43).
Figure 4:
CBI of identified cytosolic and organellar
proteins as function of polypeptide abundance. CBI were calculated by
the method of Bennetzen and Hall(17) . The individual protein
synthesis rates were measured as the average of three independent
[C]Leu labelings divided by the theorical
leucine proportion of each individual protein. A similar diagram was
obtained with CAI (codon adaptation index) instead of CBI. The
cytosolic and organellar proteins localized in the gray area were used to measure the average CBI of each protein class:
: cytosolic proteins;
, organellar
proteins.
In this work, we have identified 120 spots on two-dimensional
gels of yeast proteins, including 69 new identifications and 13
proteins homologous to mammalian proteins or corresponding to open
reading frames with unknown functions. The identification method relies
on the screen in a yeast protein data base of the proteins matching the
experimental data of M, pI, and amino acid ratios
of the analyzed spots. The method, tested with 45 reference proteins,
allowed 41 precise identifications with only one incorrect prediction.
The high reliability of this identification method depended on accurate amino acid analysis. The prerequisite to the double labeling technique used here was the evaluation of the extent of amino acid interconversions in yeast. This essential information was not available in the literature. We showed that Leu, Ile, Phe, Tyr, Pro, Lys, and His were not metabolized before incorporation into protein and could be used in double labeling experiments. The 7 other amino acids tested (Gly, Val, Ala, Glu, Asp, Thr, and Ser) were converted in high proportions. The amino acid products observed were always in accordance with the amino acid biosynthetic pathways known in yeast(19) . Interestingly, we noticed that amino acids that are not metabolized have the smallest cytosolic pools(20) . Our interpretation is that amino acids with low pools are rapidly incorporated in the proteins and their residence time is too short for them to be significantly converted or catabolized. We also observed that these amino acids have the highest energetic costs for their de novo synthesis. From an economical point of view, it seems logical that cells avoid wasting metabolites obtained with a high energetic expense.
The knowledge of amino acid interconversion in yeast allowed us to
develop an accurate technique of amino acid analysis. Garrels et
al.(4) also used a labeling approach to analyze the amino
acid composition of yeast proteins. Their method differed from our
procedure in three essential points: (a) the labelings were
performed with different amino acids, including Ser, Thr, and Val; (b) the major part of the analysis was based on single
labeling experiments requiring comparison of two-dimensional gels; (c) in the other part of the work, double labeling experiments
were done with C- and
S-labeled amino acids
and relied on the decay of
S radioisotope: after
two-dimensional gel electrophoresis, the two radioisotopes were deduced
by comparison of several exposures spanning 4 months and quantified by
phosphor technology. By this approach, Garrels et al. identified 33 new polypeptide spots, 14 of which were also
analyzed in our work. While 10 spots were similarly identified, 4
predictions were conflicting: we predicted VMA1, ILV2, TRP5, and an
isoform of TKL1 in the place of respectively NCPR1, MLS1, RFA1, and
APS. The procedure described by Garrels et al.(4) probably introduced errors for at least two reasons:
first, Ser, Thr and Val are highly interconverted amino acids (see Fig. 1and Table 1); second, the single labeling method is
of limited accuracy due to variations in cell culture, protein
extraction, two-dimensional gel electrophoresis, and quantification
steps. These points could explain the discrepancies observed between
some of our identifications and the results reported by Garrels et
al.(4) .
A non-radioactive method of amino acid analysis was developed by Eckerskorn et al.(21) , Jungblut et al.(9) Shaw (10) , and Hobohm et al.(11) . After transfer of the protein spots to a blotting membrane, the proteins were extracted, hydrolyzed, and amino acid compositions were determined by conventional techniques. This method is reliable but cumbersome and needs large quantities of proteins (usually 1 µg). The radioactive technique developed in this work is at least 10-fold more sensitive. Numerous spots invisible by Coomassie Blue staining were successfully analyzed by double labeling.
The method of protein identification based on amino acid
composition has two general limitations. The first limitation comes
from the fact that not all the protein spots are accessible to
identification. Hence, from a total of 200 spots, the program was
unable to make a prediction in 80 cases. These negative results can be
obtained for different reasons: (a) overlapping of protein
spots (if 2 or more proteins migrate at the same place on
two-dimensional gels); (b) unknown post-translational
modifications (phosphorylation, N-acetylation, glycosylation,
N- or C-terminal cleavage) or aberrant migrations which change the
apparent pI, M, or even the amino acid ratios in
case of proteolytic cleavage; (c) absence of the corresponding
protein in the data base. Presently, our data base contains only 1700
protein sequences of CBI > 0.1. In all likelihood, by the end of the
year 1996, the complete sequence of the yeast genome will be available.
In the coming year, our data base will probably contain more than 2500
protein sequences of CBI > 0.1 and new identifications will be
possible. With a complete data base, it will be possible to simplify
the search: the best candidate could be systematically selected without
considering a cutoff limit. In that way, as seen with the reference
protein test, all the spots will be identified with minimum errors (1
error in 45 predictions).
The second limitation of the method lies in its predictive nature, which does not provide reliable identifications but only highly probable predictions. Our method must therefore be considered as a global, quick, and relatively low cost step in protein characterization. If one of the predicted proteins is of particular interest, it is possible to verify its identity unambiguously by microsequencing. It is also possible to confirm all these predictive identifications by an other predictive technique like LASER desorption or electrospray mass spectroscopy(12) . If two highly predictive and independent methods give the same identification, the result may become as reliable as genetical or sequencing methods.
Two-dimensional gel electrophoresis gives a global view of the abundant proteins of yeast. The identification of a large number of such proteins raises now the possibility of making some general statements on yeast proteins. Among the proteins of known localization, we found 57 cytosolic proteins and 25 organellar proteins. Remarkably, we noted that the proteins located in a subcellular compartment have a significantly lower CBI than cytosolic proteins of similar abundance. One possible explanation, consistent with the endosymbiotic theory, is that the organelles contain a high proportion of proteins encoded by genes of foreign origin with different codon usage. Another interesting possibility is that a lower codon bias reduces protein elongation rate and contributes to the generation of a time delay necessary to target and begin the translocation of the nascent protein into the organelle. Consistent with this hypothesis, the signal recognition particule, which binds the leader sequence of endoplasmic reticulum-localized proteins and targets the protein into the organelle, also functions to slow down or stop the elongation (for review, see (22) ).
In
conclusion, the amino acid analysis of polypeptide spots appears to be
an ideal first step for a global and rapid identification of proteins
on two-dimensional gels, especially in the case of systematically
sequenced genomes. This method successfully tested in S. cerevisiae could easily be applied to other micro-organisms whose genome is
under extensive study such as Escherichia coli, Bacillus
subtilis, Haemophilus influenzae, or Schizosaccharomyces pombe. This technique can also be adapted
to compare in a single gel, two cell populations, for example one
labeled with [H]Leu and the other with
[
C]Leu. This method, already used by Ludwig et al.(2) and Batailléet
al.(23) , circumvents the classical artifacts linked with
the comparison of two gels. Thus, the quantitative analysis of
variations in protein composition resulting from environmental or
genetic changes will be simplified. In a preliminary experiment for
testing this approach, we showed that the presence of 500 µM Leu in minimal culture medium represses LEU1, LEU2, ILV5, and GDH1
consistent with previous reports (19, 24) and induces
expression of several genes, including ARG1, CYS4, HIS4, ARO4, YBR025C, and unidentified proteins. (
)