From the Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University and Flanders Interuniversity Institute for Biotechnology (VIB09), A. Baertsoenkaai 3, B-9000 Ghent, Belgium
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Here we introduce a peptide-based protein identification technique, different from the multidimensional protein identification technology (MudPIT)1 or isotope-coded affinity tag method (2, 3) and is based on a modified version of diagonal chromatography (13): combined fractional diagonal chromatography (COFRADICTM). COFRADICTM was applied to select and identify methionine peptides in a total Escherichia coli tryptic digest. Using 50 million cells, we identified at least 800 different proteins. The procedure is 100 times more sensitive than two-dimensional gel analysis and can be carried out in a fully automated manner. Next to its very high sensitivity, COFRADICTM offers the possibility to select from highly complex mixtures any subset of peptides containing amino acids that can be chemically or enzymatically modified.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Sorting of Methionine-containing Peptides
A lysate corresponding to 50 x 106 cells was digested overnight at 37 °C with 5 µg of trypsin (Promega Corp., Madison, WI) and stopped by acidification with trifluoroacetic acid. The digest was centrifuged to remove cell debris and insoluble material, and the supernatant was injected onto a narrow-bore reverse-phase ZORBAX® 300SB-C18 column (2.1 inner diameter x 150 mm, Agilent Technologies, Waldbronn, Germany) coupled to an Agilent 1100 Series Capillary LC system under the control of the Agilent ChemStation software modules. Following injection of the sample, a solvent gradient was developed at a constant flow rate of 80 µl/min. First, the column was rinsed with 0.1% trifluoroacetic acid in water (Baker HPLC analyzed, Mallinckrodt Baker B.V., Deventer, The Netherlands) (solvent A) for 10 min followed by a linear gradient to 70% acetonitrile (Baker HPLC analyzed) in 0.1% trifluoroacetic acid (solvent B) over 100 min (thus an increase of 1% of solvent B/min). We refer to this reverse-phase-HPLC separation as the primary run. Peptides were collected starting from 40 min on (corresponding to a concentration of 30% of solvent B) in a total of 48 fractions of 1 min (or 80 µl) each in a microtiter plate using the Agilent 1100 Series fraction collector. Fractions separated by 12 min were pooled as described in Table I and dried in a centrifugal vacuum concentrator. These dried fractions were redissolved in 70 µl of 1% trifluoroacetic acid in water and placed in the Agilent 1100 Series Well plate sampler. The methionine oxidation reaction proceeded in the injector compartment by transferring 14 µl of a freshly prepared aqueous 3% H202 solution to the vial containing the peptide mixture. This reaction proceeded for 30 min at 30 °C after which the sample was immediately injected onto the reverse-phase-HPLC column. Methionine-sulfoxide (Met-SO)-containing peptides elute under the given experimental conditions in a time window from 7 to 1 min in front of the bulk of the unmodified peptides and were collected in eight subfractions per primary fraction (Table I and Fig. 1). Subfractions with the same subscript and derived from the same secondary run (e.g. subfractions 121, 241, 361, and 481 of run 2A were pooled; see Table I and Fig. 1).
|
|
For liquid chromatography-tandem mass spectrometry (LC-MS/MS) identification of methionine-containing peptides, we used 75% of the volume of the pooled subfractions, which were dried in a centrifugal vacuum concentrator and redissolved in 45 µl of 0.05% formic acid in 2:98 acetonitrile:water (by volume). This solution was split into two equal parts, which were used for two consecutive LC-MS/MS runs. Per run, 20 µl was loaded onto a 0.3-mm-inner diameter x 5-mm trapping column (PepMap, LC Packings, Amsterdam, The Netherlands) at a flow rate of 20 µl/min of solvent A. By valve switching, the trapping column was back-flushed, and the sample was loaded onto a nanoscale reverse-phase C18 column (75-µm-inner diameter x 150-mm PepMap column, LC Packings), and a binary solvent gradient was started. The solvent delivery system was run at a constant flow of 60 µl/min, and by using a 1:300 flow splitter (Accurate, LC Packings), 200 nl/min of solvent was directed into the nanocolumn. Peptides were eluted from the stationary phase using a gradient from 0 to 100% solvent B applied in 50 min. The outlet of the nanocolumn was in-line connected to a distal metal-coated fused silica PicoTipTM needle (PicoTipTM FS360-20-10-D-C7, New Objective, Inc., Woburn, MA) placed in front of the inlet of a Q-TOF mass spectrometer (Micromass UK Ltd., Cheshire, UK). Automated data-dependent acquisition with the Q-TOF mass spectrometer was initiated 20 min after the solvent gradient was started. The acquisition parameters were chosen such that only doubly charged ions were selected for fragmentation. After completion of the first LC-MS/MS run, a mass exclusion list was created containing all the selected ion masses of the peptides that were identified using Mascot (16). This exclusion list was then used for the second LC-MS/MS analysis on the remaining half of the material. The same procedure was used for the analysis of the other pooled subfractions.
Peptide and Protein Identification
The obtained collision-induced dissociation spectra in each LC-MS/MS run were automatically converted to a Mascot (www.matrixscience.com)-acceptable format (pkl-format) using Proteinlynx available in the Micromass MassLynx software (version 3.4). Per Mascot search a maximum of 300 collision-induced dissociation peak lists were merged and used for peptide identification in a newly created database only containing E. coli methionine-containing tryptic peptides. For this list we used the E. coli K12 proteome (Refs. 17 and 18 and ftp.expasy.ch/databases/complete-proteomes/ECOLI.dat). The following search parameters were used: enzyme: trypsin, maximum number of missed cleavages: 1, variable modification: oxidation (Met), N-formyl (protein) and pyroglutamate formation (amino-terminal Glu and Gln), peptide tolerance: 0.3 Da, MS/MS tolerance: 0.25 Da, and peptide charge: 2+. Only MS/MS spectra that exceeded Mascots significance level were retained.
A second independent method was developed to match experimental spectra with peptide sequences. This method was devised to quantify the probability of a spectrum to match a peptide sequence using two independent methods. As for Mascot, we used as reference the list of methionine-containing tryptic peptides (0 or 1 miscleavage) derived from the complete E. coli K12 proteome. Optional modifications used were oxidation of methionine and the formation of pyroglutamic acid from glutamate and glutamine. From each peptide sequence a list of b and y ions was derived to obtain a theoretical spectrum. Each experimental spectrum containing at least six peaks was scored for similarity with each theoretical spectrum with matching precursor ion mass.
The scoring scheme thus associates two distinct scoring methods that both quantify the likelihood of the experimental spectrum-theoretical spectrum match to occur. The first scoring takes into account the number of matching peaks with reference to the size of the spectra. The G-test was used to evaluate the probability for a match to differ from expectancy. The second scoring focuses on the amplitude of the matching peaks with reference to the relative amplitude of the spectra. The score indicates the deviation from expectancy of the correlation coefficient between spectra. The final score is the product of the two intermediate scores. Because we here combined a probabilistic approach and signal correlation, we named this algorithm ProcorrTM. ProcorrTM is accessible at penyfan.rug.ac.be/procorr/, and details will be published in the future.2
The same MS/MS spectra were subjected to ProcorrTM analysis, and the matches differing from expectancy were retained (confidence level of 98%). Since the number of identified proteins is 2.5 times lower than the number of identified spectra, a level of confidence of 98% for spectrum identification corresponds to a level of confidence of at least 95% for protein identification.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
In Fig. 1 we illustrate the adaptation of diagonal chromatography to the sorting of subsets of peptides from very complex mixtures. During the first chromatographic step (run 1), peptides are separated and collected in fractions of appropriate time intervals (Fig. 1A). A specific modification reaction is then carried out in every fraction, altering the properties of a subset of peptides. Every fraction could then be rerun under the same chromatographic conditions, referred to as the secondary run (run 2). The altered peptides will now shift in run 2 compared with their original positions in run 1. The unaltered peptides do not show this shift (Fig. 1B). The shifted peptides can be collected for analysis.
The number of secondary runs, which in principle should be equal to the number of fractions collected in the primary run, can be reduced by combining primary fractions. This is done in such a way that the shifting peptides of a given fraction do not overlap with the non-shifting peptides of neighboring fractions. Depending on the extent of the shifts, up to four or more primary fractions can be combined, thereby reducing the number of secondary runs by the same factor (Fig. 1C). The entire sorting procedure can further be shortened, if necessary, by using two or more columns operating in a synchronous mode. The unaltered peptides are mostly discarded, while the sorted peptides are either on-line analyzed by mass spectrometry or collected for identification in a ternary LC-MS-coupled system. The procedure, in which fractions of the first chromatographic step are combined, modified, and run in a diagonal chromatographic manner is therefore called COFRADICTM.
Application of COFRADICTM to the E. coli Proteome
An analysis of the predicted proteome of different model organisms revealed that methionine-containing peptides provided the best representation of the predicted proteins. For instance, for the E. coli proteome, between 99.7 and 95.8% of the predicted proteins contained at least one methionine residue (depending whether the initiator methionine is counted or not). Only 85.4% of the proteins contained cysteine. The same trend in amino acid representation is also observed in other model organisms (data not shown). We therefore decided to select for Met-containing peptides and used the oxidation of methionine to Met-SO as the sorting vehicle since the sulfoxide is more hydrophilic than the non-modified peptide.
COFRADICTM was therefore used to sort Met peptides present in a tryptic digest of a total, unfractionated, 4 M urea extract of 50 x 106 E. coli cells. Forty-eight fractions of 80 µl (1 min) each were collected during the primary reverse-phase-HPLC run. The first fraction was taken between 40 and 41 min (number 1), and the last fraction was taken between 87 and 88 min (number 48) following the start of the run (Fig. 1A). In every fraction we converted the methionine peptides to their sulfoxide derivative by a simple oxidation step. Conditions were established in which neither Cys nor Trp residues were oxidized and where Met residues were not converted into their sulfones. In the chromatographic conditions used, the oxidized Met peptides generally display a hydrophilic shift ranging from 1 to 7 min. The extent and the range of the hydrophilic shifts were similar for early- and late-eluting peptides. Therefore, the same time shifts and intervals for peptide selection could be kept throughout the entire secondary run.
The sorted Met-SO peptides were collected during a 6-min broad interval starting 7 min before the elution time of the unaltered peptides. This window is thus 6 times broader than that in which peptides eluted during the primary run. This is an important aspect of COFRADICTM: the sorted peptides elute in a less compressed manner, thereby facilitating their identification by further LC-MS/MS analysis.
In the COFRADICTM mode (see Table I), we combined the primary fractions 1 (4041 min), 13 (5253 min), 25 (6465 min), and 37 (7677 min) and collected the sorted Met-SO peptides in the secondary run during the intervals 3339 min (118), 4551 min (1318), 5763 min (2518), and 6975 min (3718) in eight subfractions. Thus during the secondary run, we collected the sorted Met-SO peptides in eight "subfractions" (indexed 18) linked with the number of the primary fraction. For instance, 118 indicates the eight secondary Met-SO subfractions derived from the primary fraction number 1. Identically subindexed subfractions were pooled and dried prior to MALDI or LC-MS/MS analysis. Thus we pooled 11, 131, 251, and 371 and further 12, 132, 252, and 372, etc. In Fig. 1C we show the UV absorption profile (214 nm) during the Met-SO-sorting procedure on the combined primary fractions 8 (4748 min), 20 (5960 min), 32 (7172 min), and 44 (8384 min) and the collection of the Met-SO peptides during intervals 4046 min (818), 5258 min (2018), 6470 min (3218), and 7682 min (4418). A complete fraction collection protocol and detailed time table for the methionine peptide-sorting procedure is provided as additional information in Table I. The entire sorting procedure thus includes one primary run followed by 12 secondary runs (2A2L), which can be completed in less than 24 h.
An aliquot () of the combined secondary subfractions was analyzed by MALDI-TOF-MS during which Met-SO peptides could be recognized by their typical neutral loss of methanesulfenic acid (loss of 64 atomic mass units). In this way, we detected at least 1720 different tryptic peptides, 1618 of which contained at least one oxidized methionine residue (data not shown). Thus, less than 6% of the sorted peptides were either not recognized as Met peptides due to lack of specific fragmentation or did not contain methionine and slipped through during the sorting process.
For further individual peptide and protein identification we used a LC-MS/MS configuration using the remaining of the material. For this, we carried out 2 x 96 ternary runs in an automated manner. The obtained information was probed against an E. coli K12 database consisting of only Met-containing peptides. This database consisted of 31,746 peptides in the mass range between 780 and 2400 Da and was generated allowing 0 and 1 miscleavage for trypsin in the predicted K12 proteome. The database size reduction was possible because of the low number of non-methionine peptides identified in an initial MALDI-MS screening exercise (see above).
The Mascot search algorithm assigned 2167 MS/MS spectra to 1326 different peptides, corresponding with 754 different E. coli proteins. The peptide identification probability was 95% or higher (Table II). The same MS/MS spectra were also analyzed with ProcorrTM, an in-house-developed peptide identification algorithm providing an overall confidence of 98% for 1350 peptides and for 807 different proteins. In this case a maximum of 43 spectra (0.02 x 2147 spectra), and thus proteins could have been falsely assigned (Table II). In total, 872 different proteins were identified: 689 proteins were found by both algorithms and are therefore highly relevant, 118 proteins were found with ProcorrTM but not with Mascot, and 65 proteins were identified by Mascot only. The complete protein list is provided in Supplemental Table III. A classification of the 872 different proteins according to major functional categories or to important pathways is represented in a virtual cell shown in Fig. 2.
|
|
We identified a small but still significant percentage of putative integral membrane proteins (13.1% of the inner membrane and 22.6% of the outer membrane components). This is much higher than by conventional methods. We detected 26 proteins with a hydrophobicity (GRAVY) index (20) larger than 0.3, whereas all previous methods only detected two members of this class of proteins (see Supplemental Table III). As already mentioned previously (2), membrane proteins are often detected via the tryptic peptides released from their outer membrane parts, which are accessible for the protease.
Although the original complexity of the peptide mixture is reduced by approximately a factor 5, the flux of peptides passing into the ion source of the mass spectrometer is still too high for individual peptide detection. Given this situation, proteins represented by a large number of Met peptides are expected to be detected with higher probability than proteins with few methionines. Such a bias toward Met-rich proteins is indeed observed by relating the percentage of identified proteins with the number of methionines in these proteins. We observe a nearly linear increase from 18% for the total predicted proteome to 43% for proteins with 10 or more Met residues (data not shown). This percentage is still increasing, reaching a plateau at 60%, when proteins with more than 17 methionines are considered (the latter value is statistically weak because of the low number of proteins involved). Based on these observations and since it seems unlikely that Met-rich proteins are differently represented in the cell lysate compared with Met-poor proteins, we assume that approximately the same percentage (about 50%) of the predicted E. coli proteins may be detectable in our system. This means that we most likely identified 37% of the proteins actually present in the cell lysate.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
However, it should be clear that methionine COFRADICTM, as most other described peptide-based proteomic technologies, does not allow studying protein modification, protein processing/degradation, and the determination of different protein isoforms on a global scale. Typically, these topics have been addressed by separating protein mixtures on two-dimensional gels followed by Western blotting using specific antibodies. Nevertheless, it should be noted that the concept of COFRADICTM allows the isolation of different representative peptides if these can be specifically modified. We have recently altered the sorting chemistry such that amino-terminal peptides of all proteins present in a mixture can be isolated.3 These types of peptides now allow the analysis of protein amino-terminal processing on a global scale in a gel-free manner. Similarly, we are developing a sorting chemistry to specifically isolate phosphorylated peptides out of protein digestion mixtures.
Following our analysis, we detected an unexpectedly large number of membrane proteins. This is a particular property of the peptide-based approach and was previously noticed by the group of J. Yates (2) using the MudPIT approach. Indeed, proteins can be in situ trimmed at their extramembranous parts, generating a set of hydrophilic peptides, some of which can function as signature peptides. This approach, in which COFRADICTM may play a crucial role, may offer a valuable alternative to procedures in which membrane proteins are isolated using new types of detergents or novel extraction protocols prior to gel separation (21, 22).
Several peptide-based proteome approaches were recently described. The MudPIT technology of the group of J. Yates (2) identified more than 1500 different proteins from yeast. This impressive number was reached by accumulating data from three prefractionated lysates each containing up to 400 µg of protein. This represents at least 100 times the amount of starting material used in our studies. In addition, MudPIT does not include any presorting step, which makes it very difficult to separate the high number of peptides, thereby suffering from experiment to experiment reproducibility.
COFRADICTM follows a preselection step introduced to reduce the number of peptides. This is similar to the isotope-coded affinity tag approach (3) and to the covalent chromatography method (5). However, COFRADICTM is much more versatile than previous methods because any peptide carrying a group that can be specifically and quantitatively modified can in principle be sorted. In the example shown here, we have used one of the simplest modification reactions in protein chemistry: the conversion of a methionine side chain to its more hydrophilic sulfoxide derivative. A similar one-step reaction could also be proposed for the selection of cysteine peptides: for instance the reduction of -S-S-R groups to the more hydrophilic thiol groups will provoke a hydrophilic shift for SH-containing peptides during the second chromatographic step. An additional advantage is that all these different sorting protocols can be carried out by the same robots in a fully automated manner.
While peptide-based proteomics clearly offers aspects of high sensitivity, broad protein coverage, and full automation, protein identification is, more than in conventional two-dimensional gel approaches, dependent on the confidence by which peptides are identified. Thus both the quality of the MS/MS fragmentation spectra of the individual peptides and peptide identification algorithms to interpret these spectra are therefore of utmost importance.
To provide more confidence to protein identification, we used Mascot as the first searching algorithm, but we additionally used a second in-house-developed algorithm, ProcorrTM. Since the latter is based on a combination of parameters, we could also use higher stringency criteria while still identifying more proteins. Using the latter algorithm, we identified at least 807 different proteins with 95% probability. This is 53 proteins more than with Mascot. Taken together, both algorithms identify 872 different proteins, which can be classified into three categories: those identified by both algorithms, those identified with a peptide probability score of at least 98% by ProcorrTM, and finally those identified with a peptide probability score of 95% by Mascot only. These 872 proteins represent almost 40% of the estimated expressed E. coli proteome as calculated from the identification score of Met-rich proteins (see above).
Fig. 3 relates the distribution curves for the acidic proteins detected by COFRADICTM with the total predicted proteome and those reported in the SWISS-2DPAGE database. COFRADICTM detects more than 4 times more proteins. The difference between the two data sets is even most striking when the basic proteins are considered where a large number of proteins found by COFRADICTM are missing in the two-dimensional gel approaches.
|
In conclusion, we have demonstrated that COFRADICTM constitutes a valid alternative for peptide-based proteomics. It is very sensitive and is characterized by a broad protein coverage, including abundant and rare; large and small; and acidic, basic, and hydrophobic proteins. In the example of the E. coli proteome we identified 872 different proteins with very high probability scores. This number could have been considerably larger but was restricted by the limited capacity of peptide ion selection in the mass spectrometer used in this study. We are therefore confident that novel high-throughput machines may provide a complete coverage of peptides and thus a full coverage of the expressed proteome.
COFRADICTM offers more to proteomics than just a methionine peptide-sorting technology. COFRADICTM is a total concept that may become an indispensable tool for future proteomics. The high sensitivity of COFRADICTM-based proteome analysis clearly allows analysis of only minute amounts of biological material, which was until now not possible using "classical" proteomic technologies. COFRADICTM will make it possible to carry out targeted forms of proteomics such as detecting and measuring protein cleavage and processing in total cellular lysates or post-translational modifications.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
1 The abbreviations used are: MudPIT, multidimensional protein identification technology; COFRADIC, combined fractional diagonal chromatography; LC-MS/MS, liquid chromatography-tandem mass spectrometry; Met-SO, methionine-sulfoxide; MALDI-TOF-MS, matrix-assisted laser dissociation ionization-time of flight-mass spectrometry; HPLC, high pressure liquid chromatography.
2 G. R. Thomas, J. Vandekerckhove, K. Gevaert, and N. Berntenis, in preparation.
3 K. Gevaert, M. Goethals, L. Martens, J. Van Damme, A. Staes, G. R. Thomas, and J. Vandekerckhove, submitted.
4 K. Gevaert, A. Staes, J. Van Damme, H. Demol, and J. Vandekerckhove, in preparation.
* This work was supported by Fund for Scientific Research-Flanders (F.W.O.-Vlaanderen) Grants G.0044.97, G.0225.98, and G.0050.02 and by European Commission Grant QLK2-200-31536.
S The on-line version of this article (available at http://www.mcponline.org) contains Supplemental Table III.
A postdoctoral fellow of the Fund for Scientific Research-Flanders (Belgium) (F.W.O.-Vlaanderen).
Present address: VIB Proteomics Core Facility, Rijvisschestraat 118, B-9052 Zwijnaarde, Belgium.
¶ A research assistant of the Fund for Scientific Research-Flanders (Belgium) (F.W.O.-Vlaanderen).
Published, MCP Papers in Press, October 24, 2002, DOI 10.1074/mcp.M200061-MCP200
|| To whom correspondence should be addressed: Dept. of Biochemistry, Flanders Interuniversity Inst. for Biotechnology, Ghent University, A. Baertsoenkaai 3, B-9000 Ghent, Belgium. Tel.: 32-93313303; Fax: 32-93313597; E-mail: joel.vandekerckhove{at}rug.ac.be
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|