©1996 by The American Society for Biochemistry and Molecular Biology, Inc.
Rapid Identification of Yeast Proteins on Two-dimensional Gels (*)

(Received for publication, October 10, 1995; and in revised form, December 28, 1995)

Isabelle Maillet Gilles Lagniel Michel Perrot (1) Helian Boucherie (1) Jean Labarre (§)

From the Service de Biochimie et Génétique Moléculaire, Bât 142, CEA-Saclay, F-91191 Gif-sur-Yvette Cedex Institut de Biochimie et de Génétique Cellulaires, UPR CNRS 9026, 1 rue Camille St. Säens, 33077 Bordeaux Cedex, France

ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

This work describes a rapid and sensitive technique for the identification of Saccharomyces cerevisiae proteins on two-dimensional gels based on the determination of their amino acid ratios. Specific double labeling with ^3H and ^14C or S-labeled amino acids, chosen among those that are specifically incorporated into proteins without interconversion, allowed an accurate measurement of different amino acid ratios for 200 proteins. A computer program was developed to screen a yeast data base containing 1700 protein sequences and to identify proteins matching the measured M(r), pI, and amino acid ratios. The method, tested with 45 reference proteins, allowed 79 new identifications corresponding to abundant proteins belonging to a few functional families. Some protein spots correspond to homologs of mammalian proteins or to uncharacterized open reading frames. Remarkably, among identified proteins of similar abundance, the organellar proteins have a markedly lower codon usage bias than the cytosolic ones. The double labeling technique is particularly suited to the analysis, on a single two-dimensional gel, of the influence of physiological or genetic changes on yeast protein content.


INTRODUCTION

Saccharomyces cerevisiae is a model organism to investigate the biology of the eukaryotic cell. The yeast genome is compact (13.5 megabases) and contains only 6000-7000 genes. More than 1000 of them have been characterized by traditional techniques. The systematic genome sequencing initiated by the European Community is progressing rapidly and should soon be completed. The most striking observation derived from sequencing data is that nearly 40% of putative proteins deduced from gene sequence are novel, without homology to previously known proteins, and thus without predicted functions. Moreover, the majority of these novel genes are non-essential. In this context, the functional analysis of the yeast genome is a real challenge for the coming years.

Two-dimensional gel electrophoresis (1) is a powerful tool to undertake functional studies relevant to the yeast genome. More than 1500 soluble proteins of yeast are detectable and well separated on two-dimensional gels. This technique offers the opportunity to detect alterations in protein synthesis, protein modifications, and protein degradation occurring in response to environmental or genetic changes. However, the two-dimensional gel approach suffers from the low number of proteins which are identified on the yeast protein map. Up to now, less than 80 spots have been unambiguously characterized(2, 3, 4, 5) .

Yeast proteins separated on two-dimensional gels can be identified by different techniques: coelectrophoresis with a pure protein, detection with specific antibodies, microsequencing or overexpression from genes on high-copy plasmids(4, 5) . Identification of large numbers of protein spots by either method is a daunting task. Other methods have been proposed based on the measure of pI, M(r), and a third structural parameter allowing the search of the corresponding protein in data bases. Partial amino acid composition(6, 7, 8, 9, 10, 11) and accurate M(r) measurement of peptidic cleavage products based on matrix-assisted laser desorption/ionization mass spectroscopy (12, 13) have been used as third parameter. Although these methods remain predictive, they are sensitive and useful as an integrated approach to protein identification. This paper describes the simple and rapid amino acid analysis of 200 yeast proteins by the selective incorporation of several pairs of ^3H- and ^14C-labeled amino acids. Seventy nine new protein spots were identified based on their M(r), pI, and amino acids ratios, which doubles the number of yeast proteins presently identified on two-dimensional gels. Identified proteins belong to few functional families. We noted that cytosolic and organellar proteins of similar abundance have a markedly different codon bias.


MATERIALS AND METHODS

Yeast Strain, Growth, and Labeling

This study was performed with strain S288C (Matalpha SUC2 mal mel gal2 CUP1; (14) ), which is the reference strain for genome sequencing. Cells were grown and labeled at 30 °C in 2.5 ml of medium containing 0.67% yeast nitrogen base minus amino acids (Difco), 2% glucose and buffered at pH 5.8 with 1% succinate and 0.6% NaOH. A mid log preculture was diluted to about 0.3 A/ml and labeled for 4-5 h to reach 0.7-0.8 A/ml. For remetabolization studies, cells were labeled with 200 µCi of one of the 15 following ^14C amino acids: Gly (final concentration 1 mM), Ala and Ser (0.53 mM), Asp, Glu, Thr, and Pro (0.4 mM), Leu, Ile, Val, His, Lys, and Arg (0.3 mM), Tyr and Phe (0.2 mM).

For double labeling, 0.75-1 mCi of L-[4,5-^3H]leucine was added with one of the following L-U-^14C-labeled amino acids: Ile, Tyr, Phe (200 µCi), Arg, His, Lys (400 µCi), or a S-labeled amino acid Cys or Met (400 µCi). To avoid conversion of methionine to cysteine, 1 mM unlabeled cysteine was added to the culture medium. In one case, leucine was supplied as ^14C-labeled amino acid (60 µCi) with L-[5-^3H]tryptophan (1 mM). S- and ^3H-labeled amino acids were purchased from Amersham Corp. U-^14C-labeled amino acids were produced and purified in our laboratory. All isotopes were of the highest available specific activity.

After labeling, cells were collected by centrifugation, washed with 200 µl of H(2)O, and frozen as a pellet at -180 °C. To avoid problems of radiolysis, the cell pellet should be analyzed within 1 month.

Measurement of Amino Acids Conversion

Frozen cells were resuspended in 200 µl of H(2)O and disrupted by shaking in the presence of 50 mg of glass beads (0.45 mm) on a Mini BeadBeater (Biospec products). The tubes were agitated twice for 30 s at 30-s intervals and left on ice between shakings. Cells were then centrifuged, and proteins were precipitated twice with trichloroacetic acid. The proteins were then hydrolyzed at 100 °C for 12 h in 6 M HCl. After evaporation of HCl, samples were applied to HPLC (^1)column (Adsorbosphere OPA HS 5 µm, 100 times 4.6 mm; Alltech) equipped for o-phthaldialdehyde derivatization essentially as described by Godel et al.(15) . The postcolumn radioactive detection was obtained after fractionation in 0.5-ml sample (three fractions/min) by scintillation counting of each fraction. Alternatively, samples were applied on cellulose thin layer chromatography in two different solvent systems (butanol-1/acetic acid/water (90/15/33) and isoamylalcohol/acetic acid/pyridine/water (20/5/40/20)), and radiolabeled amino acids were quantified by phosphor technology (Phosphor-Imager, Molecular Dynamics). Similar results were obtained by both methods.

Preparation of Cell Extracts for Electrophoresis

Frozen cells were disrupted with glass beads (see above) in 20 µl of a solution containing 0.5 M Tris-HCl, pH 7.0, 20 mM CaCl(2), 50 mM MgCl(2), RNase A (50 units/ml), and DNase I (500 units/ml). After 10 min on ice, 20 µl of lysis buffer (0.1% SDS; 4% Ampholines 3-10) were added. After 5 s vortexing, 50 mg of urea and 20 µl of a sample buffer (4.75 M urea, 4% CHAPS, 1% Ampholines 3-10, 5% beta-mercaptoethanol) were added. Protein samples were then kept for 10 min at room temperature, gently vortexed, and centrifuged for 4 min at 12,000 times g. The supernatant was stored at -80 °C.

Two-dimensional Gel Electrophoresis

Two-dimensional gel electrophoresis was a modification of the procedure of O'Farrell(1) . It was performed on a Millipore Investigator apparatus used according to the manufacturer's instructions. Proteins were separated in the first dimension by isoelectric focusing in gels containing 4% acrylamide (acrylamide/bisacrylamide ratio: 37.5 (w/v)), 9.5 M urea, 3.6% CHAPS, and 3.75% carrier ampholytes (25% Ampholines 3-10, 75% Ampholines 4-8). Gels were polymerized for 2 h by addition of 0.03% ammonium persulfate (w/v) and 0.03% TEMED (v/v). Gel tubes were 22 cm long and 1 mm inside diameter. 10-20-µl sample containing 150-250 µg of proteins were directly applied to the gel, without overlay buffer. Gels were run at room temperature for 17 h at 1000 V, then for 30 min at 2000 V. Anode and cathode solutions were 0.08 M phosphoric acid and 0.1 M NaOH, respectively. After focusing, the gel was extruded with a syringe as recommended by the manufacturer and equilibrated 1 min in a buffer containing 0.375 M Tris, pH 9.2, 3% SDS, and 0.01% bromphenol blue. Gels were stored at -20 °C until use or immediately loaded.

First dimension gels were reequilibrated 1 min before loading on the second dimension slab gel. The slab gels were 23 cm long, 21 cm high, and 1 mm thick and contained 12.5% acrylamide (A/B: 37.5 (w/v)) and 0.38 M Tris-HCl, pH 8.8. Electrophoresis was performed at 13-15 °C at 17 watts/gel for 5 h 30 min. Electrophoresis buffer was 25 mM Tris base, 192 mM glycine, and 0.2% SDS.

After electrophoresis, gels were stained with Coomassie Brilliant Blue R-250, dried, and processed for autoradiography by standard procedures.

Extraction of Protein Spots

The spots were numbered on autoradiographic film. Round sections (2.3 mm diameter) of dried gels were punched out at the location of each protein using a sharpened cannula. Most spots were stained enough to be visible on the gel. Other gel pieces were extracted using the autoradiographic film as tracing paper. After extraction, the gel was autoradiographed again to check the precision of spot excision. The spots were placed in plastic scintillation counting vials, rehydrated with 50 µl of H(2)O, and digested for 3 h at 40 °C in 1 ml of NCS (nuclear Chicago solubilizer, Amersham). After addition of 4 ml of BCS-NA (biodegradable counting scintillant, non-aqueous; Amersham), the samples were maintained from 7 to 10 days at room temperature in the dark for stabilization before counting.

Determination of Radioactivity

Radioactivity was measured with a LKB-Pharmacia 1211 mini beta counter using ^3H and ^14C channels that minimize reinjections. Background was determined as the average of 10-20 control samples. Depending on gels, it ranged from 150 to 300 cpm for ^3H and from 50 to 100 cpm for ^14C.

^14C/^3H ratios were calculated by the double reinjection method using the internal standard method(16) . ^14C or S spill into ^3H window was 11-13%, and the ^3H spill into the ^14C window was 1-2%. The average ^14C/^3H ratio ranged from 0.12 to 0.5, depending on the double labeling experiments.

Each sample was counted for 15 min. When the counting was less than 1000 cpm in ^3H or less than 400 cpm in ^14C, the measure was not taken into account. This detection limit corresponds to about 0.1 µg of protein.

The His/Leu and Cys/Leu ratios of each protein were determined from duplicate experiments. The other double labelings were not duplicated.

Determination of pI and M(r)

The isoelectric point and molecular weight were deduced from protein migration as described previously(5) . The location of previously identified protein spots in the first dimension was plotted as a function of the calculated pI values of the corresponding polypeptides. In the same way, the location of each reference protein spot in the second dimension was plotted as a function of log of the calculated values of their M(r). In each case, the regression curve was drawn. M(r) and pI of each analyzed polypeptide spot were then determined using the standard curves. The uncertainty in predicting protein parameters from gel migration according to this procedure was shown to be less than 0.2 pH units for pI and less than 15% for M(r)(5) .

Searching for a Protein in the Data Base of Amino Acid Composition

A yeast protein data base containing 1700 sequences was constructed. It contains all the sequences of the yeast protein data base of Garrels (Release 3, 1995) with CBI higher than 0.1. The proteins of lower CBI are supposed to be weakly expressed (17) and absent from the 200 most abundant proteins of yeast. For each protein of the data base, the isoelectric point (pI), the molecular weight (M(r)), and the amino acid ratios (rA) were calculated, taking into account known post-translational modifications (N- or C-terminal cleavage and N-acetylation). The analysis of a polypeptide (m) consists in calculating a distance between the polypeptide and each protein (x) of the data base. This distance is the addition of a distance in isoelectric point (dI), a distance in molecular mass (dM), and 6-9 distances in amino acid composition (dA): d(x) = dI(x) + dM(x) + (dA(x)), where dI(x) = 0, if pI(m) - pI(x) < 0.2; dI(x) = pI(m) - pI(x) - 0.2, if pI(m) - pI(x) > 0.2; dM(x) = 0, if 1 - M(r)(x)/M(r)(m) < 0.15; dM(x) = 1 - M(r)(x)/M(r)(m) - 0.15, if 1 - M(r)(x)/M(r)(m) > 0.15; dA(x) = 1 - rA(x)/rA(m), if rA(m) > 0.15; dA(x) = rA(m) - rA(x)/0.15, if rA(m) < 0.15.

The six proteins having the smallest distance were selected. The best candidate corresponded to the lowest distance (d1) and the five next candidates corresponded to the distances d2 to d6.

A limit distance (dL) was chosen as the sum of standard deviations of amino acid ratios. For example, for nine amino acid ratios, dL = 0.06 (Ile/Leu) + 0.08 (Phe/Leu) + 0.14 (His/Leu) + 0.15 (Arg/Leu) + 0.15 (Lys/Leu) + 0.17 (Tyr/Leu) + 0.19 (Trp/Leu) + 0.20 (Met/Leu) + 0.21 (Cys/Leu) = 1.35.

If d1 < dL, the best candidate was considered as identified. If d1 > dL, we considered that the data were not accurate enough or that the protein was not in the data base.


RESULTS

Strategy and Choice of Marker Amino Acids

Our approach to protein identification was based on the determination of the M(r), pI, and amino acid ratios of the analyzed spots. Experimental M(r) and pI were measured using standard curves obtained with previously identified proteins (see ``Materials and Methods''). The determination of amino acid ratios relies on the incorporation of pairs of ^3H- and ^14C-labeled amino acids in the proteins and quantification of both radioisotopes for each spot analyzed. A prerequisite to this approach is that the labeled amino acids should not be interconverted into other amino acids during the in vivo labeling period. Such information not being available in the literature, we examined the interconversion of 15 labeled amino acids under our experimental conditions. Yeast cells were labeled with one ^14C-labeled amino acid, and proteins were extracted, hydrolyzed, and analyzed for labeled amino acids (see ``Materials and Methods''). The results are reported in Fig. 1and Table 1. Among the 15 amino acids tested, only 7, His, Ile, Leu, Lys, Phe, Pro, and Tyr, were specifically incorporated into proteins without detectable interconversion. Arg was weakly metabolized to Pro. The other amino acids (Ala, Asp, Glu, Gly, Ser, Thr, Val) were converted in high proportion into other amino acids. However, we found that these metabolic interconversions could be greatly limited by adding an excess of the nonradioactive amino acid by-product in the medium (Table 1). For example, the presence of 1 mM unlabeled Leu in the culture medium blocked the interconversion of [^14C]Val to [^14C]Leu. Similarly, interconversions of Ala, Glu, and Thr into other amino acids were considerably reduced by addition of adequate non-radioactive amino acids.


Figure 1: HPLC profiles of amino acid conversion tests. Cells were labeled for 30 min with [^14C]Leu, [^14C]Ala, [^14C]Thr, or [^14C]Val. The proteins were hydrolyzed, and the resulting amino acids were applied to a HPLC column (see ``Materials and Methods'').





We have no direct information about the interconversion of Asn, Gln, Cys, Met, and Trp, due to the degradation of these amino acids during acid hydrolysis. However, protein labeling with ^3H-Trp or S-Cys indicated a poor interconversion of these amino acids, as inferred from the very low labeled intensity of reference proteins known to be devoid of Trp (SOD1, HSP60, TIF1) or Cys (SSC1, PGI1). In contrast, labeling with [S]Met revealed a notable conversion of Met to Cys, since the only reference protein devoid of Met (TPI1) was labeled. However, this interconversion disappeared when 1 mM Cys was added to the culture medium.

Altogether, these results indicate that 14 amino acids can be used for in vivo labeling of proteins under conditions yielding only a minimum amount of interconversion (less than 15%). Amino acids chosen for protein labeling must also be discriminant: less frequent amino acids have variable distribution among proteins and are likely to give the most useful information. The amino acids must also label proteins with high efficiency. For example utilization of [^14C]Pro was avoided because the proline transport system is strongly repressed under our culture conditions(18) . In the same line, labelings with Ala, Val, or Thr were not performed because they required the addition of other amino acids in excess which limits the transport of the marker and the labeling efficiency. Taking into account these additional criteria, 10 amino acids were chosen for the present study: Trp, Cys, Met, His, Arg, Tyr, Phe, Lys, Ile, and Leu.

Since no major difference in interconversions was observed between short or long term labeling (Table 1), we chose a labeling period of 5 h to obtain high levels of incorporation.

Determination of Amino Acid Ratios

The choice of a double labeling method using one amino acid labeled with ^3H and the other with ^14C or S, avoids the comparison of different two-dimensional gels and thus, artifacts due to differences in cell culture, protein extraction, proteolysis, gel electrophoresis, and quantification and therefore circumvents the problem of internal controls for protein recovery.

Practically, after double labeling, total yeast proteins were extracted and separated on two-dimensional gels. The 200 most abundant proteins were numbered and individual spots were excised and counted for ^3H and ^14C or S radioactivity under carefully controlled conditions. Among these proteins, the identity and the sequence of 45 protein spots were already unambiguously identified by microsequencing or overexpression methods (Table 2). These reference proteins were used as internal standards. For each double labeling, the experimental isotope ratios of the reference proteins were plotted against their known amino acid ratios (Fig. 2). In each case, the experimental values were in good agreement with the theoretical amino acid ratios.




Figure 2: Measured ^14C/^3H ratio or S/^3H ratio as a function of theorical amino acid ratios for reference proteins. The lines were drawn to minimize the average error. The standard deviations were 6% for Ile/Leu, 15% for His/Leu, 15% for Lys/Leu, 19% for Trp/Leu, 45% for the first Met/Leu labeling and 21% for the second Met/Leu experiment supplemented with 1 mM Cys.



Two Met/Leu labelings were carried out with or without unlabeled Cys in the culture medium (Fig. 2). With added Cys, the average error decreased from 30 to 18% and the standard deviation from 45 to 21%. Therefore, the error observed was largely due to the interconversion of labeled Met to labeled Cys.

The curves minimizing the average error of these 45 reference proteins for each amino acid ratio are shown in Fig. 2. These calibration curves were used to determine the amino acid ratios for the other 155 proteins analyzed.

Predictivity of the Amino Acid Analysis Method Using Previously Identified Proteins

A yeast protein data base containing 1700 protein sequences of CBI > 0.1 was constructed and a computer program was designed to search the data base for proteins that match experimental parameters, M(r), pI, and amino acid ratios. A distance was calculated (defined under ``Materials and Methods'') between the query protein and each protein of the data base. The best candidate had the lowest distance (d1). The best candidate was considered as identified if d1 < dL (dL being a limit distance as defined under ``Materials and Methods''). If d1 > dL, we considered that the data were not accurate enough or that the protein was not in the data base.

Using the experimental pI, M(r), and amino acid ratios of the 45 reference proteins, we tested the reliability of the search in the data base. The results are reported in Table 2. The correct protein or a closely homologous one (more than 90% identity) obtained the best score in 44 of 45 cases. The correct identification (d1 < dL) occurred in 37 cases. In three cases, the identified protein was a very homologous one. In four cases, no protein was identified (d1 > dL), but the correct protein ranked first each time. In only one case, an incorrect protein was identified (PDC1 in the place of ZWF1 which came third with a score of 1.37). An identification was proposed in 90% of the searches (41 of 45 cases).

It was interesting that the second, third, and next best scores, d2, d3, etc., were generally similar and markedly higher than d1. Moreover d2 was higher than dL in 40 of 45 cases, indicating that the dL value was well chosen. d2 represents the statistical best score when the right protein is absent from the data base. Consequently, in almost all cases, if the sequence of the correct protein or a very homologous one was not in the data base, no protein would be identified.

In numerous cases, the search distinguished correctly between proteins of the same family. Correct identifications were found even between proteins having more than 90% sequence identity like PDC1/PDC5, ENO1/ENO2, ADH1/ADH2, TDH2/TDH3, TIF51A/ANB1, SSA1/SSA2, SSB1/SSB2. Only three errors were observed: HSP82 was identified in the place of the very homologous protein HSC82 and SSB2 in the place of the two isoforms of SSC1. In these three cases, the correct protein ranked second with a very close score.

Identification of New Proteins

The identification method was applied to 155 unknown protein spots. 69 new identifications were obtained ( Table 3and Fig. 3) and 10 proteins already predicted by Garrels et al.(4) were confirmed. Among the proteins identified, six proteins were identical to some reference protein (ADH1, ENO2, FBA1, PDC1, TDH3, TPI1), and five proteins were predicted twice (ADE3, SHMT2, TKL1, TRG1, and TSA1). Taking into account these isoforms, the new identifications concern 66 different proteins. Interestingly, we identified 13 proteins only known as putative open reading frames. Seven of them were of unknown function having no homology with any known protein and the six others were similar to proteins of mammals or other organisms.




Figure 3: Yeast protein two-dimensional map with names of identified proteins (listed in Table 2and Table 3). Strain S288C was grown on glucose as carbon source and labeled with [S]Met at mid log phase. 4 times 10^6 cpm of protein extract were loaded on the first dimensional gel. The gel was exposed 2 days for autoradiography. Reference proteins are in bold characters and newly identified proteins are in standard characters.



Some of the proposed identifications were confirmed by independent experiments: as expected, spots identified as KAR2 and UBC4 (heat shock proteins) were induced by heat shock. Spots proposed as PDB1, ILV2, ILV5, MDH1, SOD2, ATP2, and COR1 known to be mitochondrial proteins were enriched in mitochondrial extracts (data not shown). The intensity of LEU1, LYS9, MET25, and CYS3 spots decreased when, respectively, leucine, lysine, and cysteine were added in the culture medium (data not shown). These observations underscored the reliability of the predictions.

General Features of Identified Proteins

A total of 104 gene products were identified in this work. These proteins could be classified in different functional families: glycolysis or carbon metabolism enzymes (19 proteins), amino acid and purine biosynthesis enzymes (24 proteins), 13 heat shock proteins, 10 translation factors and ribosomal proteins, 6 proteolytic enzymes, 4 ATPase subunits, 4 structural proteins, 3 antioxidant enzymes, and a few other classes.

The majority of the identified proteins of known localization were cytosolic (57 among 79), but we found also 10 mitochondrial proteins (ATP2, COR1, HSP60, ILV2, ILV5, MDH1, MRP8, PDB1, SOD2, SSC1), 6 nuclear (EGD1, GSP1, HSP104, PRS4, PRS5, UBA1), 4 vacuolar (APE1, PEP4, VMA1, VMA2), 4 proteins of the endoplasmic reticulum (CDC48, KAR2, SAR1, TRG1), and 1 vesicular protein (CLC1).

The 104 identified proteins are among the most abundant proteins of yeast. The large majority of these proteins (93) had a CBI higher than 0.35 and the correlation observed by Bennetzen and Hall (17) between CBI and protein abundance was largely confirmed (Fig. 4). However, we identified 11 abundant proteins whose gene CBI was lower than 0.25 (APE1, CLC1, MRP8, PRS4, PRS5, RBK1, UBA1, ZWF1, L80003.20, YEL071W, and YKL117W). We noticed that among them, the six proteins of known localization were totally or partially localized in a subcellular compartment. Conversely, considering proteins of similar abundance (Fig. 4), the average CBI of cytosolic proteins was 0.68, whereas the average CBI of the 25 identified proteins known to be localized in organelles was significantly lower (0.43).


Figure 4: CBI of identified cytosolic and organellar proteins as function of polypeptide abundance. CBI were calculated by the method of Bennetzen and Hall(17) . The individual protein synthesis rates were measured as the average of three independent [^14C]Leu labelings divided by the theorical leucine proportion of each individual protein. A similar diagram was obtained with CAI (codon adaptation index) instead of CBI. The cytosolic and organellar proteins localized in the gray area were used to measure the average CBI of each protein class: : cytosolic proteins; circle, organellar proteins.




DISCUSSION

In this work, we have identified 120 spots on two-dimensional gels of yeast proteins, including 69 new identifications and 13 proteins homologous to mammalian proteins or corresponding to open reading frames with unknown functions. The identification method relies on the screen in a yeast protein data base of the proteins matching the experimental data of M(r), pI, and amino acid ratios of the analyzed spots. The method, tested with 45 reference proteins, allowed 41 precise identifications with only one incorrect prediction.

The high reliability of this identification method depended on accurate amino acid analysis. The prerequisite to the double labeling technique used here was the evaluation of the extent of amino acid interconversions in yeast. This essential information was not available in the literature. We showed that Leu, Ile, Phe, Tyr, Pro, Lys, and His were not metabolized before incorporation into protein and could be used in double labeling experiments. The 7 other amino acids tested (Gly, Val, Ala, Glu, Asp, Thr, and Ser) were converted in high proportions. The amino acid products observed were always in accordance with the amino acid biosynthetic pathways known in yeast(19) . Interestingly, we noticed that amino acids that are not metabolized have the smallest cytosolic pools(20) . Our interpretation is that amino acids with low pools are rapidly incorporated in the proteins and their residence time is too short for them to be significantly converted or catabolized. We also observed that these amino acids have the highest energetic costs for their de novo synthesis. From an economical point of view, it seems logical that cells avoid wasting metabolites obtained with a high energetic expense.

The knowledge of amino acid interconversion in yeast allowed us to develop an accurate technique of amino acid analysis. Garrels et al.(4) also used a labeling approach to analyze the amino acid composition of yeast proteins. Their method differed from our procedure in three essential points: (a) the labelings were performed with different amino acids, including Ser, Thr, and Val; (b) the major part of the analysis was based on single labeling experiments requiring comparison of two-dimensional gels; (c) in the other part of the work, double labeling experiments were done with ^14C- and S-labeled amino acids and relied on the decay of S radioisotope: after two-dimensional gel electrophoresis, the two radioisotopes were deduced by comparison of several exposures spanning 4 months and quantified by phosphor technology. By this approach, Garrels et al. identified 33 new polypeptide spots, 14 of which were also analyzed in our work. While 10 spots were similarly identified, 4 predictions were conflicting: we predicted VMA1, ILV2, TRP5, and an isoform of TKL1 in the place of respectively NCPR1, MLS1, RFA1, and APS. The procedure described by Garrels et al.(4) probably introduced errors for at least two reasons: first, Ser, Thr and Val are highly interconverted amino acids (see Fig. 1and Table 1); second, the single labeling method is of limited accuracy due to variations in cell culture, protein extraction, two-dimensional gel electrophoresis, and quantification steps. These points could explain the discrepancies observed between some of our identifications and the results reported by Garrels et al.(4) .

A non-radioactive method of amino acid analysis was developed by Eckerskorn et al.(21) , Jungblut et al.(9) Shaw (10) , and Hobohm et al.(11) . After transfer of the protein spots to a blotting membrane, the proteins were extracted, hydrolyzed, and amino acid compositions were determined by conventional techniques. This method is reliable but cumbersome and needs large quantities of proteins (usually 1 µg). The radioactive technique developed in this work is at least 10-fold more sensitive. Numerous spots invisible by Coomassie Blue staining were successfully analyzed by double labeling.

The method of protein identification based on amino acid composition has two general limitations. The first limitation comes from the fact that not all the protein spots are accessible to identification. Hence, from a total of 200 spots, the program was unable to make a prediction in 80 cases. These negative results can be obtained for different reasons: (a) overlapping of protein spots (if 2 or more proteins migrate at the same place on two-dimensional gels); (b) unknown post-translational modifications (phosphorylation, N-acetylation, glycosylation, N- or C-terminal cleavage) or aberrant migrations which change the apparent pI, M(r), or even the amino acid ratios in case of proteolytic cleavage; (c) absence of the corresponding protein in the data base. Presently, our data base contains only 1700 protein sequences of CBI > 0.1. In all likelihood, by the end of the year 1996, the complete sequence of the yeast genome will be available. In the coming year, our data base will probably contain more than 2500 protein sequences of CBI > 0.1 and new identifications will be possible. With a complete data base, it will be possible to simplify the search: the best candidate could be systematically selected without considering a cutoff limit. In that way, as seen with the reference protein test, all the spots will be identified with minimum errors (1 error in 45 predictions).

The second limitation of the method lies in its predictive nature, which does not provide reliable identifications but only highly probable predictions. Our method must therefore be considered as a global, quick, and relatively low cost step in protein characterization. If one of the predicted proteins is of particular interest, it is possible to verify its identity unambiguously by microsequencing. It is also possible to confirm all these predictive identifications by an other predictive technique like LASER desorption or electrospray mass spectroscopy(12) . If two highly predictive and independent methods give the same identification, the result may become as reliable as genetical or sequencing methods.

Two-dimensional gel electrophoresis gives a global view of the abundant proteins of yeast. The identification of a large number of such proteins raises now the possibility of making some general statements on yeast proteins. Among the proteins of known localization, we found 57 cytosolic proteins and 25 organellar proteins. Remarkably, we noted that the proteins located in a subcellular compartment have a significantly lower CBI than cytosolic proteins of similar abundance. One possible explanation, consistent with the endosymbiotic theory, is that the organelles contain a high proportion of proteins encoded by genes of foreign origin with different codon usage. Another interesting possibility is that a lower codon bias reduces protein elongation rate and contributes to the generation of a time delay necessary to target and begin the translocation of the nascent protein into the organelle. Consistent with this hypothesis, the signal recognition particule, which binds the leader sequence of endoplasmic reticulum-localized proteins and targets the protein into the organelle, also functions to slow down or stop the elongation (for review, see (22) ).

In conclusion, the amino acid analysis of polypeptide spots appears to be an ideal first step for a global and rapid identification of proteins on two-dimensional gels, especially in the case of systematically sequenced genomes. This method successfully tested in S. cerevisiae could easily be applied to other micro-organisms whose genome is under extensive study such as Escherichia coli, Bacillus subtilis, Haemophilus influenzae, or Schizosaccharomyces pombe. This technique can also be adapted to compare in a single gel, two cell populations, for example one labeled with [^3H]Leu and the other with [^14C]Leu. This method, already used by Ludwig et al.(2) and Batailléet al.(23) , circumvents the classical artifacts linked with the comparison of two gels. Thus, the quantitative analysis of variations in protein composition resulting from environmental or genetic changes will be simplified. In a preliminary experiment for testing this approach, we showed that the presence of 500 µM Leu in minimal culture medium represses LEU1, LEU2, ILV5, and GDH1 consistent with previous reports (19, 24) and induces expression of several genes, including ARG1, CYS4, HIS4, ARO4, YBR025C, and unidentified proteins. (^2)


FOOTNOTES

*
This work was supported by a fellowship from the région Ile de France (to I. M.) and by a grant from the Groupement de Recherche et d'Etudes sur les Génomes (GREG). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§
To whom correspondence should be addressed. Tel.: 33-1-69-08-22-31; Fax: 33-1-69-08-47-12; labarre{at}jonas.saclay.cea.fr.

(^1)
The abbreviations used are: HPLC, high performance liquid chromatography; CHAPS, 3-[(3-cholamidopropyl)dimethylammonio]-1-propane sulfonate; TEMED, N,N,N`,N`-tetramethylethylenediamine; CBI, codon bias index.

(^2)
C. Godon and J. Labarre, unpublished results.


ACKNOWLEDGEMENTS

We thank A. Sentenac for suggesting this work, P. N. Lirsac for initiating the interconversion studies, M. Gilbert for its contribution to the construction of the protein data base, P. Thuriaux for encouragement and critical discussions, and C. Jackson for improving the manuscript.


REFERENCES

  1. O'Farrell, P. H. (1975) J. Biol. Chem. 250, 4007-4021 [Abstract]
  2. Ludwig, J. R. D., Foy, J. J., Elliott, S. G., and McLaughlin, C. S. (1982) Mol. Cell. Biol. 2, 117-126 [Medline] [Order article via Infotrieve]
  3. Bataille, N., Peypouquet, M. F., and Boucherie, H. (1987) Yeast 3, 11-21 [Medline] [Order article via Infotrieve]
  4. Garrels, J. I., Futcher, B., Kobayashi, R., Latter, G. I., Schwender, B., Volpe, T., Warner, J. R., and McLaughlin, C. S. (1994) Electrophoresis 15, 1466-1486 [Medline] [Order article via Infotrieve]
  5. Boucherie, H., Dujardin, G., Kermorgant, M., Monribot, C., Slonimski, P., and Perrot, M. (1995) Yeast 11, 601-613 [Medline] [Order article via Infotrieve]
  6. Latter, G. I., Burbeck, S., Fleming, J., and Leavitt, J. (1984) Clin. Chem. 30, 1925-1932 [Abstract/Free Full Text]
  7. Neidhardt, F. C., Appleby, D. B., Sankar, P., Hutton, M. E., and Phillips, T. A. (1989) Electrophoresis 10, 116-122 [Medline] [Order article via Infotrieve]
  8. Sibbald, P. R., Sommerfeldt, H., and Argos, P. (1991) Anal. Biochem. 198, 330-333 [Medline] [Order article via Infotrieve]
  9. Jungblut, P., Dzionara, M., Klose, J., and Wittmann-Leibold, B. (1992) J. Protein Chem. 11, 603-612 [Medline] [Order article via Infotrieve]
  10. Shaw, G. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 5138-5142 [Abstract]
  11. Hobohm, U., Houthaeve, T., and Sander, C. (1994) Anal. Biochem. 222, 202-209 [CrossRef][Medline] [Order article via Infotrieve]
  12. Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., and Watanabe, C. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 5011-5015 [Abstract]
  13. Pappin, D. J. C., Hojrup, P., and Bleasby, A. J. (1993) Curr. Biol. 3, 327-332
  14. Mortimer, R. K., and Johnston, J. R. (1986) Genetics 113, 35-43 [Abstract/Free Full Text]
  15. Godel, H., Graser, T., Foldi, P., Pfander, P., and Furst, P. (1984) J. Chromatogr. 297, 49-61 [CrossRef][Medline] [Order article via Infotrieve]
  16. Simonnet, G. (1990) in Radioisotopes in Biology (Slater, R. J., ed) pp. 31-85, IRL Press, Oxford, New York
  17. Bennetzen, J. L., and Hall, B. D. (1982) J. Biol. Chem. 257, 3026-3031 [Abstract/Free Full Text]
  18. Grenson, M. (1983) Eur. J. Biochem. 133, 135-139 [Abstract]
  19. Jones, E. W., and Fink, G. R. (1982) in The Molecular Biology of the Yeast Saccharomyces cerevisiae: Metabolism and Gene Expression (Strathern, J. N., Jones, E. W., and Broach, J. R., eds) pp. 181-299, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
  20. Messenguy, F., Colin, D., and ten Have, J. P. (1980) Eur. J. Biochem. 108, 439-447 [Abstract]
  21. Eckerskorn, C., Jungblut, P., Mewes, W., Klose, J., and Lottspeich, F. (1988) Electrophoresis 9, 830-838 [Medline] [Order article via Infotrieve]
  22. Lutcke, H. (1995) Eur. J. Biochem. 228, 531-550 [Abstract]
  23. Bataille, N., Thoraval, D., and Boucherie, H. (1988) Electrophoresis 9, 774-780 [Medline] [Order article via Infotrieve]
  24. Hu, Y., Cooper, T. G., and Kohlhaw, G. B. (1995) Mol. Cell. Biol. 15, 52-57 [Abstract]
  25. Norbeck, J., and Blomberg, A. (1995) Electrophoresis 16, 149-156 [Medline] [Order article via Infotrieve]
  26. Sanchez, Y., Parsell, D. A., Taulien, J., Vogel, J. L., Craig, E. A., and Lindquist, S. (1993) J. Bacteriol. 175, 6484-6491 [Abstract]
  27. Nicolet, C. M., and Craig, E. A. (1989) Mol. Cell. Biol. 9, 3638-3646 [Medline] [Order article via Infotrieve]
  28. Klier, H., and Lottspeich, F. (1992) Electrophoresis 13, 732-735 [Medline] [Order article via Infotrieve]

©1996 by The American Society for Biochemistry and Molecular Biology, Inc.