Department of Pharmaceutical Chemistry, Mass Spectrometry Facility, University of California San Francisco, San Francisco, California 94143-0446
¶ Institute for Neurodegenerative Diseases, University of California San Francisco, San Francisco, California 94143
|| Cardiovascular Research Institute, University of California San Francisco, California 94143-0130
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Strategies for characterizing changes in complex mixtures have been developed using both of these technologies. For two-dimensional gels, samples may be run on separate gels, stained, and protein abundances compared with the use of imaging software (12). However, in practice, protein pattern comparisons can be difficult to achieve due to poor reproducibility of protein separations on two-dimensional gels. Defining an individual spot may not be straightforward, as a result of which a large amount of manual work may be required to complement the software interpretation in obtaining a reliable analysis (13). An approach that has alleviated many of the problems due to gel-to-gel electrophoretic variability is differential gel electrophoresis (14, 15).
Mass spectrometry (MS) is not a quantitative technique per se as ion yields are highly dependent on the chemical and physical nature of the sample. However, isotopic labeling combined with MS has been extensively used for many years to produce accurate quantitation of small molecules (16, 17), and, more recently, this has been extended to peptides (18) and proteins (1923). The development of isotope-coded affinity tag (ICAT) reagents allows for quantitation through isotopic labeling and simultaneously achieves a reduction in sample complexity (24). These reagents consist of three functional parts: i) an iodoacetamide group that reacts with the free sulfhydryl group of a reduced cysteine side chain, ii) a biotin moiety to aid isolation of modified peptides by avidin affinity chromatography, and iii) a linker group that contains either heavy or light isotopic variants. For the first generation ICAT reagent, this linker region contained either eight deuterium (heavy reagent: d8) or eight hydrogen atoms (light reagent: d0) and therefore conferred a difference in nominal mass of 8 Da between heavy and light reagents. In a typical side-by-side experiment, one sample would be labeled with light reagent and the other with heavy reagent. After attachment of ICAT labels, samples are combined and the cysteine-containing components are affinity purified by means of the biotin tag. After mass spectrometric data acquisition, the resulting mass spectra would be searched for pairs of isotope envelopes differing in mass by 8 Da, and relative quantities of proteins could be determined by comparison of the integrated peak areas of the two corresponding isotope profiles. The collision-induced dissociation (CID) of peptides of interest by tandem MS (MS/MS) would give rise to a sequence-specific fragmentation pattern, from which the identity of the parent protein could be derived by either data base search algorithms or de novo CID spectral interpretation.
However, despite the clear merits of this approach, several shortcomings of this first generation reagent were identified: i) the d0- and d8-modified peptides did not coelute by reverse-phase chromatography, making quantitation less accurate (25); ii) the tag itself was quite bulky, and consequently fragmentation of modified peptides produced many fragments in the CID spectrum related to the tag rather than the peptide (26); (iii) the substantial mass addition resulting from the attachment of the tag could shift the masses of larger peptides outside the optimum range for detection by standard MS instruments; and finally iv) the choice of 8 Da mass difference for the heavy ICAT reagent produced potential isobaric ambiguity between peptides containing two ICAT-labeled cysteine residues (M +16.100 Da) and the common oxidation of methionine residues (
M +15.995 Da).
Hence, a second generation of such reagents has been developed. The first of these used ICAT reagents immobilized on beads and incorporated a photocleavable linker (27). Capture of the cysteine-containing peptides was followed by photocleavage-based elution of labeled peptides. Although this proved to be more sensitive than the first generation ICAT reagent, it retained the use of deuterium as the isotopic label and consequently still suffered from the chromatographic separation of light- and heavy-modified species.
Here we report the use of a commercial second generation ICAT reagent that contains an acid-cleavable linker group connecting the biotin moiety with the sulfhydryl reactive isotope tag. In this instance, ICAT labeling and biotin-based peptide affinity isolation is followed by acid cleavage, resulting in removal of the biotin moiety. The benefits of this step are the addition of a much smaller chemical moiety to the cysteine residue and improvement in the quality of CID fragmentation spectra obtained from modified peptides, especially larger species. Also, rather than using deuterium as the heavy isotope, this reagent employs nine 13C atoms as the isotopic label for the heavy reagent. Therefore, the heavy- and light-modified peptides coelute by reverse-phase chromatography, making quantitation simpler to achieve and the results more reliable.
In most ICAT studies reported thus far (2831), sample availability has not been a limiting factor. Total amount of protein used in these studies ranged from 4.4 mg to 200 µg. Here we have sought to apply the technology to low-microgram sample quantities, consistent with the nature of many protein samples of interest in biomedical research. To reduce sample losses during the ICAT protocol, volatile buffers that could easily be removed by vacuum centrifugation were utilized wherever possible. Whereas in conventional ICAT analyses all noncysteine-containing peptides are typically discarded, here these peptides have been retained for separation and identification by multidimensional LC and MS/MS. Our approach serves to combine the differential profiling strength provided by the ICAT strategy with the high sequence coverage afforded by multidimensional LC.
To exemplify the power of this technology, we present data from two projects of biological significance that place high demands on sensitivity and analysis: i) the characterization of proteins binding to the murine prion protein, and ii) the characterization of proteins from human tracheal epithelium gland secretions. Data were collected using two different MS platforms, a MDS-Sciex QSTAR with electrospray ionization (ESI) for on-line nanoflow-high-pressure LC (HPLC) analysis and an Applied Biosystems 4700 Proteomics Analyzer utilizing matrix-assisted laser desorption/ionization (MALDI) with off-line analysis of previously separated nanoflow-HPLC fractions. The former is a quadrupole selection, quadrupole collision cell, orthogonal acceleration time-of-flight instrument (Qq-TOF), whereas the latter is an axial TOF/TOF instrument. We present a comparison of results on the basis of total number of peptides detected, total number of proteins identified, and proteins detected and quantitated by cleavable ICAT (cICAT) versus those identified in the flow through of the avidin chromatography. We also consider the accuracy of mass measurement in MS and MS/MS modes, sensitivity in both modes, sample throughput, and ease of use.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
ICAT Labeling
Bovine serum albumin (BSA) was reduced, alkylated, and digested according to the standard protocol supplied with the original d0/d8 ICAT kit. The complex protein samples (515 µg) were labeled with the cleavable 13C-ICAT reagent using a modified protocol. Briefly, protein samples were denatured in 6 M urea/20 mM NH4HCO3, pH 8.2. Reduction with 1 mM trichloroethylphosphine was allowed to proceed for 20 min at 70 °C. ICAT reagents were dissolved in 20 mM NH4HCO3, pH 8.2, 10% acetonitrile (ACN), and labeling was carried out for 2 h at room temperature. Samples were combined and diluted 4-fold to reduce the concentration of urea to below 1.5 M. Tryptic digestion was initiated by the addition of 1% (w/w) of side-chain modified, tosylphenylalanyl chloromethyl ketone-treated porcine trypsin and was allowed to proceed at 37 °C for 4 h.
Cation Exchange Chromatography
SCX chromatography was used to remove neutral species from the tryptic peptides and to achieve peptide fractionation of the digest mixture. Tryptic digest samples were adjusted to 25% ACN and acidified (pH 3.0) by the addition of formic acid. HPLC was carried out using a Beckman Gold system equipped with an analytical µ-flow upgrade, with Rheodyne injection port and a 35-nl dead volume ultraviolet cell. Separation was achieved using multiple sample injections onto a 2.1 x 10 mm polysulfoethyl A column with a 240-µl injection loop. Solvent A consisted of 25% ACN, 0.05% formic acid, and solvent B consisted of solvent A with 400 mM NH4HCO3. A typical separation employed 0% B from 015 min to allow for sample loading and removal of nonpeptide species, followed by a gradient of 050% B from 1522 min, 50100% from 2223 min, and finally the column was washed with a solution of 1 M KCl in solvent A. Fractions were collected in 0.65-ml siliconized tubes.
Avidin Affinity Purification
The SCX-eluted fractions were neutralized by the addition of 2 volumes of 100 mM NH4HCO3, pH 9.5, using NH4OH (30%) as necessary to bring the pH up to 8.0. The above-mentioned HPLC was also utilized for the avidin affinity chromatography. The 20-µl avidin cartridge was primed with 1 ml 0.4% trifluoroacetic acid (TFA) in 30% ACN followed by 1 ml of 100 mM NH4HCO3 at pH 8.0. Samples were loaded using multiple injections on a 240-µl injection loop. The column was washed with 500 µl of 50 mM NH4HCO3, pH 8.0, followed by 500 µl of the same solution containing 10% methanol, followed by 1000 µl of HPLC-grade water. The SCX fractions were passed through the avidin column one at a time using a flow rate of 100 µl/min, and the flow through was collected for LC-MS/MS analysis. Labeled peptides were eluted with 200400 µl of 0.4% TFA in 30% ACN as determined by the absorbance at 218 nm.
Nano-LC-MALDI-TOF/TOF Mass Spectrometry Analysis
The SCX fractions and avidin-eluted samples were subjected to nanoflow HPLC using the Ultimate LC system (Dionex) at a flow rate of 300 nl/min. Separation of peptides was achieved by a gradient of increasing ACN in water (234%) over 100 min using 0.1% w/v TFA as the ion-pairing agent on a 75-µm ID self-packed column. HPLC eluent was spotted directly onto the MALDI target plate using a Probot spotting robot (Dionex), supplemented with a sheath flow of 500 nl/min matrix solution (1:1 dilution of -cyano-4-hydroxycinnamic acid with 70% methanol/0.4% TFA) spotting one fraction per minute. The Probot plumbing was replaced with capillary tubing using polyetheretherketone (PEEK) sleeves to reduce void matrix volume. Using this mixture, it was necessary to ensure that the elution capillary protruded no further than 2 mm from the matrix sheath needle, thereby preventing crystallization of the matrix on the tip.
MALDI-MS data were acquired in an automated mode using a 4700 Proteomics Analyzer (Applied Biosystems). This instrument employed a neodymium: yttrium aluminum garnet (Nd:YAG) frequency-tripled laser operating at a wavelength of 354 nm and a laser repetition rate of 200 Hz. Initially, a MALDI-MS spectrum was acquired from each spot (1000 shots/spectrum), then peaks with a signal-to-noise ratio (S:N) greater than 15 in each spectrum were automatically selected for MALDI-CID-MS analysis (7500 shots/spectrum). A collision energy of 1 keV was used with air as the collision gas for CID accumulation. After acquisition, the data were subjected to automatic baseline correction, mathematically smoothed, and stored in an Oracle data base. Assuming that all ions were singly charged, peaklists from all MS/MS spectra were automatically extracted from the Oracle data base and submitted for batch analysis data base searching using an in-house copy of Protein Prospector (version 4.3, University of California San Francisco, San Francisco, California) with the new program, LCBatch-Tag, or an in-house copy of Mascot, version 1.8 (Matrix Science). The latter was managed using the Mascot Daemon running on the same computer. MS/MS mass values submitted to both search engines were limited using the following criteria: minimum S:N threshold 810, masses of 060 Da, and within 20 Da of the precursor were excluded, and a maximum of 60 peaks per spectrum were submitted.
Protein Prospector searches were performed by specifying the inclusion of high-energy fragment ions characteristic of the TOF/TOF instrument, whereas Mascot searches included only the low-energy fragment ions and internal ions. For externally calibrated spectra, the allowed mass tolerance specified between expected and observed masses for searches was ±75 ppm for MS data, ±200 for MS/MS parent ions, and ±250 ppm for MS/MS fragment ions. In cases where internal calibrants were used, the analogous values were ±25, ±25, and ±150 ppm. All samples were searched against the nonredundant National Center for Biotechnology Information data base (NCBInr.10.25.2002).
Nano-LC-ESI-Qq-TOF Mass Spectrometry Analysis
Tryptic peptides were subject to LC-MS/MS analysis on a QSTAR Pulsar mass spectrometer (MDS Sciex, Concord, Ontario, Canada) operating in positive ion mode. Chromatographic separation of peptides was performed as above except that formic acid was used as the ion pairing agent. The LC eluent was directed to a micro-ionspray source. Throughout the running of the LC gradient, MS and MS/MS data were recorded continuously based on a 6-s cycle time. Within each cycle, MS data were accumulated for 1 s, followed by two CID acquisitions of 2.5 s each on ions selected by preset selection parameters of the information-dependant acquisition method. In general, the ions selected for CID were the most abundant in the MS spectrum, except that singly charged ions were excluded and dynamic exclusion was employed to prevent repetitive selection of the same ions within a preset time. Collision energies were programmed to be adjusted automatically according to the charge state and mass value of the precursor ions. Peak lists for data base searching were created using a script from within the Analyst software. Searches were performed using the two search engines as above except that only the low-energy CID fragments characteristic of the ESI Qq-TOF instrument were considered. The allowed mass tolerance range between expected and observed masses for searches was ±100 ppm for MS peaks and ±0.1 Da for MS/MS fragment ions.
Protein Quantitation
Protein quantitation using ICAT pairs was performed by an initial analysis using two different software systems from Applied Biosystems: GPS Explorer in the case of TOF/TOF data and ProICAT in the case of Qq-TOF data. This was followed by manual confirmation of ICAT-labeled ions using these software programs and manual analysis for those that were not identified in an automated fashion. For manual quantitation, monoisotopic peak intensities were used initially followed by isotope envelope area for proteins that were of significant interest.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
To improve the quality of the quantitative results and to maximize the signal, the ICAT-labeled peptides were retained on the monomeric avidin column until all cation fractions had been passed through and were then eluted in a single fraction. The flow-through fractions and the avidin eluate resulted in n +1 samples for mass spectrometric analysis. Each of these was split into two identical subfractions, which were subjected to 100 min reverse-phase nanocapillary LC-MS/MS analyses; one using the online ESI-Qq-TOF and the other using the off-line MALDI-TOF/TOF instrument.
One experiment consisted of preparing two samples from the stock six-protein mixture, each containing 5 µg of total protein to give a theoretical ratio of 1. Data from this experiment is described in Tables I and II. Here we compare the number of peptides predicted to contain cysteine with the number detected after the standard cICAT procedure. Labeling of the cysteine thiols and the avidin affinity chromatography both proved to be efficient because no cysteine-containing peptides were detected in the flow-through fractions analyzed. The avidin elution fraction contained primarily cICAT-labeled peptides and, in the case of BSA, one to four nonlabeled peptides. In experiments carried out on BSA alone, the hydrophobic peptide DAFLGSFLYEYSR was found in both labeled and nonlabeled fractions. It was observed that the abundance of this peptide in the cICAT fractions could be decreased by the use of stronger wash conditions. Despite the observation of a limited amount of nonspecific peptide binding, overall the affinity chromatographic separation was highly specific and efficient.
|
|
The cICAT ratios were calculated using several methods: i) integration of the isotope envelope in a single MS spectrum or the averaged spectrum over the elution time of the peptide; ii) integration of the monoisotopic peak in a single MS spectrum; or iii) measurement of the monoisotopic peak intensity in a single MS spectrum. The maximum standard deviation for these measurements on individual peptides in ESI and MALDI spectra for peptides derived from transferrin was 0.108, with an average standard deviation of 0.047 (data not shown). Thus, all of these approaches gave reasonably accurate quantitation measurements, and there was no significant difference observed in quantitation values or reliability between the two mass spectrometer platforms. After adding a correction factor of 0.16 for ESI and 0.88 for MALDI to each ratio, the average heavy-to-light (H:L) ratio for all peptides derived from transferrin are 1.001 for ESI and 1.003 for MALDI with standard deviations of 0.062 and 0.093, respectively. The correction factor was calculated by taking the average of all ratios in the data set and subtracting from the theoretical ratio of 1. While the difference in the correction factors for these two platforms is unclear, their origin can be explained in part by the amount of 13C8 reagent impurity in the heavy-labeled tag (see Fig. 9, m/z 582.83). This example is an indication of the accuracy and precision achievable with this method when several cICAT-labeled peptides are identified from the same protein and a protein is present with a known ratio so that an accurate correction factor can be applied.
|
A shortcoming of the original ICAT reagent, and indeed of any labeling method that relies on the 1H and 2H isotopes, is that peptides labeled with this reagent pair do not coelute by reverse-phase chromatography (25). BSA was labeled in a 1:1 ratio using the original ICAT, and cICAT reagents. Five micrograms of total protein was carried through each procedure, but to simulate the low levels of protein anticipated in "real" biological samples, only 2.5% of the resulting sample from each was analyzed by LC-MALDI-MS and 2.5% by LC-ESI-MS (corresponds to 830 fmol BSA and transferrin each). The peptide SHC*IAEVEK labeled with the D8 tag eluted earlier than the D0-tagged peptide (Fig. 3a), whereas this same peptide when labeled with the cICAT reagent showed coelution of both the light and heavy variants (Fig. 2b). Panels a, b, and c indicate that the observed ratio of peptides derivatized with light or heavy original ICAT reagent changed substantially as they eluted from the LC column, whereas d, e, and f showed this was not the case for cICAT. Thus, quantitation accuracy was found to be more reliable using the cICAT reagent. These results were representative of all peptides studied, regardless of their elution time in a given LC run.
|
|
|
Shown in Fig. 5 are the CID spectra of the labeled peptide VVEQMC*VTQYQK, in the cICAT heavy and light forms, acquired with these two different instrument systems. This peptide was obtained in the course of the identification of proteins that interact with the prion protein (PrP), and in fact is a peptide derived from PrP. The fragmentation pattern is representative of some general features differentiating the two mass spectrometric measurements. One obvious difference is the distribution of fragment ions: using ESI we observed more y series ladders, whereas the MALDI spectrum showed slightly less preference for y series but an increase in the internal ions. In general, for MALDI-TOF/TOF analysis at the level of tens of femtomoles (or S:N >3050 in the MS scan), large numbers of fragment ions are observed. Such CID spectra tend to give high scores when matched by data base searching, as shown by the example in Fig. 5, c and d, and therefore higher protein identification confidence levels. Conversely, for precursor ions of low signal-to-noise ratios, the general trend is for only a few fragment ions to be observed (Fig. 6a). For analyses carried out with the ESI Qq-TOF platform, the number of fragment ions was found to be less dependent on the precursor ion intensity, although as expected the y and b fragment ions of higher m/z tend to regress into the noise for the lowest intensity precursor ions.
|
|
Prion Protein-Containing Complexes
In one collaborative project, the combined multidimensional chromatography/cICAT protocol introduced above was employed to identify proteins that interact with the cellular PrP in mice. Our goal was to identify non-PrP components within immunoaffinity-purified protein complexes that contain the PrP. In particular, it was envisioned that such an approach could yield information about the cellular microenvironment and function of PrP. In order to distinguish specific PrP interactors from proteins that copurify nonspecifically, a negative control sample in which the PrP-specific antibody was omitted was processed in parallel. High sensitivity was particularly important in this study because only a limited amount of sample (i.e. low-microgram quantities) could be obtained.
From analysis of the mass spectrometric data, the identification of several proteins not previously implicated in binding/interacting with the PrP was established with high confidence. A number of proteins were common to both the negative control sample and the PrP-specific pull down. These included BSA that was employed to saturate unspecific binding sites of chromatography matrices, avidin, and keratins 1 and 9. Furthermore, in the PrP-specific pull-down sample, both PrP and the PrP-directed antibody employed for immunoaffinity purification gave rise to a number of high-quality CID spectra. Nevertheless, of the 50 proteins identified with high confidence,
20 were unique to the test sample and absent from the control. Among these were N-CAM1, a known interactor of the cellular PrP (34), and N-CAM2, a low-abundance paralogue of N-CAM1 that is predominantly expressed in the olfactory bulb (35). Two of the CID spectra that aided the identification of N-CAM1 are shown in Fig. 6, a and b. Fifty-five peptides were identified as belonging to this protein, with 25 being nonredundant. Ten of these were identified using each of the mass spectrometer platforms, nine were unique to ESI-collected data, whereas six were unique to MALDI. Fig. 6c shows the sequence coverage obtained.
It was anticipated that quantitation by cICAT might permit the identification of proteins specifically involved in the development of prion diseases. To this end, a "dominant negative" mouse strain was employed that expresses physiological levels of a mutated PrP on a wild-type PrP-ablated background. Previously, it had been shown that this point mutation renders mice resistant to infection with prions (36). The quantitative comparison of samples derived from wild-type and PrP-mutant mice revealed no significant differences in the abundances of the identified proteins specific to the PrP pull-down samples. Some examples of identified proteins and corresponding H:L ratios are PrP (1.14), PrP-specific Fab (1.02), N-CAM1 (1.01), and contactin 1 (0.98). However, some of the proteins identified in the negative control did show changes in abundance ratios, such as the glycolytic enzyme glyceraldehydes-3-phosphate dehydrogenase (1.85), propionyl CoA carboxylase alpha subunit (1.67), and Na/K ATPase beta subunit (1.43). Therefore, on this occasion the combination of multidimensional chromatography and cICAT offered no significant advantage over multidimensional chromatography alone in terms of identifying proteins involved in the development of prion diseases. Detailed results of these experiments will be described elsewhere.2
Tracheal Epithelium Gland Secretions
In a second representative project, we investigated the changes in the proteome of the human airway lining fluid in a patient with cystic fibrosis in order to explore mechanisms of disease in these patients. Tracheal tissue was obtained from the explanted lung of a cystic fibrosis patient following lung transplantation. Tracheal tissue from the lungs of a donor that were not selected for transplantation was used as a control. After preparation of the trachea and cleaning of the tracheal epithelium, gland secretions were pipetted directly from the glands and collected. From each specimen, 3.5 µl of fluid was obtained.
An analysis was carried out using the protocols described above yielding close to 7000 CID spectra, from which a large number of proteins was identified. A summary of the proteins identified plotted versus the number of cysteine peptides they contain (in the mass range of 7004000 Da) is presented in Fig. 7 and is reviewed in the "Discussion." Several proteins appeared to be at a high concentration based on the number of peptides identified. These included human serum albumin (HSA), mucin, lactotransferrin, serotransferrin, immunoglobulin, and several types of keratins. In addition, several proteins were identified that are believed to be expressed at low levels, such as kinases, receptors, and other signaling proteins and peptides. As an example of sequence coverage among the strongly represented proteins, HSA gave almost 94% peptide sequence coverage of the protein. Within the ICAT fractions, coverage of the cysteine-containing peptides for HSA was 90% (20 of 22) with only the tryptic peptides CCK and ETYGEMADCAK being missed.
|
|
For 4700 data, the script "peak to mascot" was used to create a peak list that was filtered on the basis of a minimum S:N threshold. In the case of Protein Prospector, we employed a new in-house program, PeakSpotter, to extract peak lists from TOF/TOF spectra that had been stored within the Oracle data base also filtering the peak list on the basis of S:N.
For each data type, peak lists of all SCX fractions were combined into one text file for searching, and the cICAT fraction was searched both separately and as a combined list. For the identification of the major constituents of each sample, the three search engines agreed quite well, but for the less abundant components there was more variability in the protein identifications determined. By comparing matches between the different search engines, the lower-scoring matches could be assigned with higher confidence in those cases when their presence was reported by multiple search engines. Manual inspection and interpretation of selected spectra also confirmed this conclusion.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The key components required for optimizing the characterization of low-level biological samples are the quality of sample preparation, minimization of sample handling/losses, robustness of the protocol, mass spectrometric sensitivity, and the quality of the data processing and analysis. Sensitivity was important in this study because only limited amounts of sample were available. Other features critical for success of the technique were careful sample preparation to provide samples that were compatible with derivatization, efficient chromatographic separation, and the availability of high performance down-stream tandem mass spectrometric analysis. While the two biological projects outlined in this work were very different, both samples shared high complexity, low protein abundance, and were relatively difficult to extract from their natural biological sources.
It is well established that chromatographic separation of protein digests increases data density, i.e. the number of peptides that can be identified in complex samples by MS (32). In particular, peptides of low abundance or low ionization efficiency are more readily identified in separated digests, where suppression (41, 42) of ionization by other peptides is minimized. To further increase peptide coverage, datasets were collected from two different combinations of ionization strategies and machine architectures: on-line LC-ESI on a Qq-TOF instrument versus off-line collection of LC fractions for subsequent analysis using the TOF/TOF instrument (LC-MALDI). From these studies, we found that from 20 to 50% of all peptides detected were unique to the individual ionization strategies/instrument employed. The collection of such complementary datasets proved particularly beneficial whenever low-abundance proteins gave rise to low-confidence protein assignments based on a single ionization strategy/instrument. The overall amount of peptide needed for successful protein identification by MS/MS was found to be similar on both instruments (low numbers of femtomoles loaded or injected for an individual protein). In our hands, ESI tended to be more sensitive for the very lowest-level complex mixtures. Peptides of lower molecular mass were generally favored by LC-ESI, whereas LC-MALDI tended to identify fewer but larger peptides, thereby giving approximately equal percentage of protein coverage. It is also noteworthy that larger peptides generally give more definitive protein identifications, therefore LC-MALDI is a key element in the application of this methodology for all but the most sample-limited analysis.
Although the analysis of unseparated digests by MALDI MS/MS is 10 times faster than ESI LC-MS/MS, off-line separation for MALDI as used in this work precedes mass analysis and throughput is approximately two to four times slower than ESI LC-MS/MS experiments. This disadvantage of LC-MALDI may be outweighed by the fact that data can be collected from individual LC fractions for extended periods of time. On the other hand, the high sensitivity of the Qq-TOF instrument might allow for multiple injections of smaller fractions of material, combined with the use of exclusion lists created from previously identified proteins, thereby minimizing some of the restrictions imposed by the time-dependent nature of LC-ESI experiments. It should be noted that these comparisons are highly dependent on the sample and instrument configuration. For complex protein samples in the regime of hundreds of micrograms or more, the throughput may well be similar for either approach.
In our control experiments, quantitation accuracy for most peptides was within 10% deviation. However, ratios derived from some multiple-cysteine-containing peptides from albumin were off by as much as 60%. We have used as a threshold a greater than 30% ICAT ratio difference as a minimum change to report a change in protein abundance. In the PrP experiments, we expected to observe changes in protein levels for a few proteins at most. The experimentally introduced proteins identified in the experiment also served as a control for the cICAT ratios (Neutravidin 0.97, Fab or IgG-related 1.02). None of these internal standards gave cICAT ratios differing by greater than 10% from the theoretical ratio of 1, leading us to believe that the reduction and alkylation was complete for these proteins. Of the proteins that quantitation data was obtained that were specific to the PrP sample, none resulted in significant changes in expression levels.
In our initial analysis of the cystic fibrosis dataset, almost 7000 CID spectra submitted for data base searching yielded 1500 protein identifications that were based on a Protein Prospector score of >10 and/or a Mascot score of >40. From our experience, such cut-off criteria lead to a high number of false positives. For identifications close to the cut-off, the false positive rate can be greater >75%. Using more stringent criteria, such as not allowing protein IDs based on several low-scoring peptides, the final number of high-confidence identifications was 311. Based on manual interpretation of a random sampling of spectra, we believe that the rate of false positives within these protein identifications is less than 1 in 40.
Of the 311 proteins identified in the cystic fibrosis sample, 285 contain at least one cysteine residue. By contrast, 72 proteins were identified on the basis of cICAT-labeled peptides (Fig. 7b). There are many factors that contribute to this low overall efficiency of detection by ICAT. First, it should be noted that the average protein was predicted to give 41 peptides in the mass range 7004000 Da, only 8 of which would contain cysteine. By contrast, the average number of peptides detected per protein was only 3, therefore many of the cysteine-containing peptides would be missed. Of course, many proteins have much fewer than 8 cysteines, and such proteins are less likely to be detected. Furthermore, such low-level analysis is likely to overlook certain peptides due to suppression effects. The presence of chemical or post-translational modifications and the occurrence of nonspecific cleavages will give peptides of unexpected mass that will not, in many cases, be correctly identified in the data base search using the current strategies. Furthermore, the information-dependant acquisition methodology employed in the ESI Qq-TOF will overlook peptides that cannot be selected for CID as they coelute with others giving stronger signals or in some cases side-products of the cleavage reaction. In the case of LC-MALDI, suppression (even with LC separation) and limited sample amounts may restrict the number of peptides that can be analyzed in any one fraction.
The value of combining multidimensional chromatography and cICAT strategies is that we obtain very comprehensive analysis of the sample composition with the opportunity to derive accurate quantitation on a subset of proteins. Whether this extra layer of information is useful will depend upon the nature of any specific research goal. Thus, we obtained no new information on up- or down-regulation of the interaction partners of the PrP, whereas several proteins of interest in cystic fibrosis research were revealed by the cICAT ratios. As for the precision of the quantitation, changes in protein abundance of greater than 30% were detected with confidence, with 10% being typical. Thus, quantification was found to be at least as accurate as gel analysis when using silver or Coomassie Brilliant Blue staining or mRNA profiling by expression array methods (43). While we have not performed a systematic analysis of the effective dynamic range for heavy-to-light quantitation, results from these experiments give a sense of the useable range. A number of protein expression differences of up to a factor of 5 were observed and showed deviations within the errors described above. Differences observed in the range of a factor of 10 were infrequent, and such assignments are, in most cases, ambiguous or unreliable as the peptide derived from the less-abundant sample would not be selected for CID in our analyses. If a clear identification was made for one of these "singlets," it would not necessarily indicate that the protein was present in one sample and not the other. Fig. 8 illustrates the difficulty of achieving quantitative results when the intensity of one of the cICAT-labeled components is close to the signal-to-noise or chemical noise level of the MS scan. Precise measurement of the isotope envelope area of the light-labeled peptide in this example cannot be calculated accurately, although the ratio of the monoisotopic peak intensities gives an approximation of the ratio. In general, complex mixtures contain significant amounts of chemical noise in any given MS scan, which can be falsely interpreted as representing a potential cICAT partner. As a result, abundance differences greater than 5-fold between samples are likely to result in the more abundant ions being treated as singlets. This indicates the maximum realistic dynamic range of this technique when working with the small amounts of protein described in this work. However, for many biological questions a 5-fold or greater difference in protein abundance represents a significant change, and follow-up studies can provide more precise results. It should also be noted that a disadvantage of this or any other technique that acquires quantitative information at the peptide level rather than the protein level is the loss of information concerning post-translational modifications.
In summary, the technical methodology developed here allows for the comprehensive analysis of large protein complexes and substantially improves the sequence coverage obtained for low-level samples, thereby resulting in more protein identifications with higher confidence. This has been illustrated here by the successful application of this approach to samples of biological interest that would be difficult to analyze by other methods. The combination of SCX chromatography and cICAT labeling gives a broad and comprehensive picture of all proteins present in a complex sample, while simultaneously providing relative quantitative data on a significant fraction of the proteins identified. The combination of two different separation and analysis platforms also yields complementary information that greatly improves the confidence in the identifications of the less-abundant proteins, which incidentally may represent the species of greatest interest.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
Published, MCP Papers in Press, May 23, 2003, DOI 10.1074/mcp.M#00021-MCP200
1 The abbreviations used are: LC, liquid chromatography; ICAT, isotope-coded affinity tag; MS, mass spectrometry; MS/MS, tandem mass spectrometry; ESI, electrospray ionization; MALDI, matrix-assisted laser desorption/ionization; ACN, acetonitrile; PrP, prion protein; SCX, strong cation exchange; CID, collision-induced dissociation; TOF, time-of-flight; cICAT, cleavable ICAT; BSA, bovine serum albumin; HPLC, high-performance LC; TFA, trifluoroacetic acid; S:N, signal-to-noise ratio; Qq, quadrupole selection, quadrompole collision cell; H:L, heavy-to-light ratio; HSA, human serum albumin.
2 Schmitt-Ulms, G., Hansen, K. C., Liu, J., Cowdrey, C., Yang, J., DeArmond, S. J., Cohen, F. E., Prusiner, S. B., and Baldwin, M. A., manuscript in preparation.
3 Hirsch, J., Song, Y., Hansen, K. C., Thiagarajah, J. R., Matthay, M. A., Burlingame, A. L., and Verkman, A. S., manuscript in preparation.
* This work was supported by Grants RR-01614, 14606, and 12961 from the National Institutes of Health.
To whom correspondence should be addressed. Tel.: 415-476-4895; Fax: 415-502-1655; E-mail: khansen{at}itsa.ucsf.edu
** To whom correspondence should be addressed. E-mail: alb{at}itsa.ucsf.edu
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|