From the Department of Chemistry and Biochemistry,
Howard Hughes Medical Institute, University of Colorado, Boulder, Colorado 80309-0215
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Quantifying changes in protein abundance between samples is a key requirement for profiling changes in cell state at a molecular level. One approach uses isotope or mass tag labeling of peptides where two samples to be compared are covalently modified by isotopically distinguishable (e.g. 1H versus 2H, 12C versus 13C, or 14N versus 15N) but chemically similar adducts, the samples are proteolyzed and mixed, and relative changes in protein abundance are determined from ratios of intensities between the differentially labeled peptides (79). Stable isotope labeling enables relative peptide abundances to be directly compared, bypassing problems due to ion-suppressive effects of coeluting peptides (10). Studies reporting the accuracy and variability of ICAT experiments suggest the technique can reliably detect down to an 1.5-fold change in protein abundance over a dynamic range from 10- to 100-fold (1114). Disadvantages include cost of isotopic labeling and requirement for pairwise comparisons between samples, which prevent retrospective comparisons and complicate large studies.
Label-free protein quantitation methods are promising alternatives. Three studies using standards have demonstrated that mass spectral peak intensities of peptide ions correlate well with protein abundances in complex samples. Bondarenko and co-workers demonstrated linear responses of peptide ion peak areas between 10 and 1,000 fmol of myoglobin spiked into human plasma with a relative standard deviation <11% (15, 16). Likewise Wang et al. (17) published similar results with protein standards spiked into serum, obtaining a median relative standard deviation of 26% for peak intensity ratios from 3,400 ions in 25 replicate measurements. Another label-free method, termed spectral counting, compares the number of MS/MS spectra assigned to each protein. Spectral counts of standard proteins added to yeast extracts showed linearity over 2 orders of magnitude with high correlation to the relative protein concentration (18). An advantage of spectral counting is that relative abundances of different proteins can in principle be measured. Thus, significant correlations have been shown between spectral counts and independent estimates of protein copy number in yeast (19).
Each of these studies utilized relatively simple analytical protocols; the application of these methods to mammalian systems, where the dynamic range and the number of proteins is greater, requires methods for dealing with multiple chromatographic separations. Here we investigate the performances of peptide ion intensity measurements and spectral counting in analyzing changes in protein abundances in complex samples from human cells.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Three experimental samples were analyzed in this study. Sample 1 consists of soluble K562 proteins that were proteolyzed and then separated into 20 SCX-HPLC fractions in three replicate fractionations. Two analyses were carried out with these samples: (a) reversed-phase replicate analysis where fractions from one SCX replicate were split and each was analyzed by RP-LC/MS/MS over the full m/z range from 350 to 1,500 Da for each survey scan and (b) SCX replicate comparison where fractions from two of the replicate SCX-HPLC separations were analyzed by RP-LC/MS/MS over the full m/z range from 350 to 1,500 Da for each survey scan. Sample 2 consists of proteins in gel filtration fractions that were proteolyzed, peptides were resolved into 20 fractions by SCX-HPLC, and each SCX fraction was analyzed by RP-LC/MS/MS in one full m/z range run and six gas phase fractionation runs covering 300678, 670798, 798918, 9101,038, 1,0301,278, and 1,2701,750 Da (20). Sample 3 consists of soluble extracts from cells that were (i) unstimulated, (ii) stimulated with 10 nM PMA to induce cell differentiation, or (iii) pretreated with 20 µM U0126 prior to 10 nM PMA to block cell differentiation. Soluble proteins were extracted under each condition and proteolyzed, peptides were resolved in six fractions by SCX-HPLC, and SCX fractions were analyzed by RP-LC/MS/MS in one full m/z range (2501,500 Da) and three gas phase fractionation analyses covering 350806, 7941,131, and 1,1191,600 Da. Statistics for the three samples are listed in Supplemental Table 1, A and B.
For experiments where protein standards were added to a complex experimental sample, a peptide digest of 3 mg of protein from the soluble extract was separated into 20 SCX fractions, and two SCX fractions containing 13 and 11% of total peptides (calculated from A280 of each fraction normalized to total A280 summed over all fractions) were combined. Tryptic digests of BSA or bovine apotransferrin (ApoT), reduced and alkylated with iodoacetamide, were added to the pool, and 1.9% of the resulting mixture was used in each LC/MS analysis. This corresponds to 10 µg of peptide from the original K562 extract in each analysis plus varying amounts of standard protein digests maximally added at 20 pmol of BSA + 5 pmol of ApoT (
0.3 µg of total standard protein).
Quantitation of Proteins between Gel Filtration Fractions
Proteins in fractions from sizing gel exclusion were separated by 10% SDS-PAGE and visualized by Coomassie staining. Gel images were scanned (UMAX 1200), and staining intensities were quantified using the public domain NIH Image program (rsb.info.nih.gov/nih-image/), measuring band volumes (pixel intensity integrated over band area) in arbitrary units. The ratio of the band volumes between different lanes was calculated and compared with ratios of protein abundance determined from spectral ion intensities or spectral counting. Proteins in selected bands were identified by in-gel digestion and MS analysis as described previously (23) using an ABI DE-STR MALDI-TOF mass spectrometer for peptide mass fingerprinting and an Agilent XCT ion trap mass spectrometer for LC/MS/MS sequencing.
Microarray Measurements
K562 cells (7 x 105) were cultured and treated with PMA ± U0126 as above, and total RNA was isolated by TRIzol extraction (Invitrogen). Independent experiments were performed in triplicate for each condition. First and second strand cDNA synthesis, in vitro transcription of biotin-labeled cRNA, and fragmentation were carried out following standard protocols from the Affymetrix Expression Analysis Technical Manual (www.affymetrix.com). The samples were hybridized onto U133 2.0 Plus GeneChips (Affymetrix) and processed at the University of Colorado Health Sciences Center Cancer Center Microarray Core facility. Datasets were corrected for background and normalized using robust multiarray average normalization (24). p values for assessing significance of changes were determined by using an empirical Bayes approach and were corrected for multiple testing using the false discovery rate (25, 26).
Mass Spectrometry
Each SCX fraction was analyzed by RP-LC/MS/MS using a ThermoFinnigan LCQ Deca ion trap mass spectrometer. MS/MS was performed by acquiring one full scan mass spectrum and then MS/MS spectra of the three most intense peaks. Automated gain control was set at 8 x 108. The m/z range either covered a full mass window (3501,500 Da) or was divided into three to six narrow windows for gas phase fractionation over 3501,750 Da (2729). DTA files were generated from the MS/MS spectra by TurboSequest and searched by Sequest or concatenated into a Mascot Generic File for searching with Mascot; searches used the International Protein Index (IPI) database from European Bioinformatics Institute (version 2.18 for Sample 2; version 3.0, November 2004 for Samples 1 and 3) (30). Sequest and Mascot search results were parsed into an Oracle 9i relational database using in-house parsers and a modified version of DBParser (31). Peptide identifications were filtered by MSPlus, a program for improved peptide identification and validation, which determines confidence based on agreement between Sequest and Mascot identifications as well as chemical properties of the peptides. The high confidence identifications are then processed by Isoform Resolver, which uses a peptide centric database strategy to assemble validated peptide sequences into protein identifications, accounting for groups of isoforms and splice variants that share common peptides (20). Quantitation by spectral ion intensity measurements or spectral counting was performed on the high confidence peptide sequences using the isoform groupings generated by Isoform Resolver.
Serac: New Software for Protein Quantitation
The Serac PeakExtractor module measures peptide ion intensities by calculating peak areas from extracted ion chromatograms (XICs) for each peptide ion in each LC/MS. PeakExtractor was written in Visual Basic and requires the Xcalibur development kit for accessing Finnigan LCQ .RAW data files. Each peptide ID is mapped back to its originating .RAW data file, the file is opened, and an XIC is generated by summing the intensities within a narrow m/z range from the full scan mass spectra for each scan cycle using a lower limit of ((peptide monoisotopic mass 1) + charge)/charge and upper limit of ((peptide average mass + 2) + charge)/charge. Built-in Xcalibur development kit functions for peak finding, peak smoothing, and peak integration are used to determine the peak area intensity for the peptide with user-specified parameters to control each step. Parameters are determined that optimize peak finding, base-line subtraction, and integration for each experimental dataset. For example, Fig. 1A shows the interface that is used to optimize scanning and peak integration parameters for the ion trap MS to minimize the presence of peak heterogeneity; such parameters must be reoptimized whenever changes are made to the experimental protocol. Once the parameters have been established for each experiment, PeakExtractor analyzes the entire dataset in a batch operation. The running time to analyze a dataset with 91,273 high confidence MS/MS identifications (Fig. 5, Sample 3) on a Pentium 4 processor was 3 h.
|
|
Protein Ratios from Peak Area Intensities
The Serac PASC module calculates ratios of ion intensities for peptides matched between different experiments and averages the peptide ratios as a measure of protein change as in stable isotope labeling studies (32, 33). After removing potential outliers using Dixons Q-test (34), the program calculates mean and standard deviations of log2 intensity ratios of matched peptides found in each protein. A Students t test is used to identify proteins that change significantly between datasets, resulting in two-tailed p values for each protein, similar to methods used in microarray analysis for identifying differentially expressed genes (35).
Protein Ratios from Spectral Counts
PASC also combines spectral counts for each protein using the protein isoform grouping identified by Isoform Resolver and counting the number of MS/MS assigned to each protein. Spectral counts for peptides shared between isoforms were considered separately due to ambiguity in protein assignment. Differences in spectral counts are identified by applying a likelihood ratio test (G test) for independence (36), which is similar to the 2 and Fischers exact tests and corrects for variations in total counts between datasets. The G statistic is approximately distributed as
2 with one degree of freedom, allowing p value calculations for each protein that aid in identifying those with differential expression. To quantify changes in spectral counts, we estimated -fold changes as proposed by Beissbarth et al. (37) for serial analysis of gene expression (SAGE) data, which avoids the discontinuity seen in simple count ratios when a protein shows spectral count = 0 in one of the samples.
![]() |
where, for each protein, RSC is the log2 ratio of abundance between Samples 1 and 2; n1 and n2 are spectral counts for the protein in Samples 1 and 2, respectively; t1 and t2 are total numbers of spectra over all proteins in the two samples; and is a correction factor set to 0.5 by Beissbarth et al. (37) and varied in this study. This expression has the advantage of correcting for differences in sampling depth between two experiments.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
For each identified and validated peptide, PeakExtractor calculates a peak intensity area representing the abundance of each charge state in each RP-LC/MS/MS run using XICs of LC/MS spectra (see "Experimental Procedures" for details). Different charge forms of one peptide are treated as separate observations. A Visual Basic interface enables manual inspection of individual XICs (Fig. 1A) and optimization of parameters for XIC summation, background subtraction, peak detection, and detection of possible heterogeneity in MS/MS spectra within single peaks (Fig. 1B). PeakExtractor allows users to define parameters for peak fitting and methods for combining information from different LC/MS analyses. When several peaks appear in the XIC, the program chooses the peak with the correct MS/MS spectra for the peptide of interest. When different peptides with similar m/z values coelute, PeakExtractor accepts only those peaks with one peptide ion uniquely associated with it. PeakExtractor interfaces directly with a relational database such as Oracle or Microsoft Access (Fig. 1C) that is then used by the PASC analysis modules.
Correcting Peptide Peak Area Intensity Ratios for Systematic Bias
Peak area (PA) intensity measurements of peptide ions from replicate RP-LC/MS/MS runs were compared for 16 SCX fractions from Sample 1. An example of systematic bias in signal is shown in Fig. 2A where variable recovery between replicate analyses was apparent in peptides from certain SCX fractions that deviated from the diagonal. These may be due to variations in sample loading, HPLC, or MS instrument performance. Anderle et al. (38) showed that systematic variations in mass spectrometric ion current ratio measurements could be minimized by normalizing each peak intensity by the sum or median of ion intensities over the entire run (including ions validated or not validated by MS/MS). We found that correcting each peptide intensity by the sum of total intensities centered the distribution of off-diagonal points found in various SCX fractions and reduced the variability around the population mean (Fig. 2, A and B) by reducing the log2 peptide ratio standard deviation from 0.66 to 0.53. In general, inconsistent loading was the most common cause of systematic variation between reversed-phase runs and the normalization procedure corrected for this bias.
|
Similar correlations were observed between replicate SCX runs using either method that could be attributed to the peptide elution behavior where 90% of peptides eluted in only one SCX fraction (data not shown). We therefore chose to use the second approach for further studies, summing the intensity measurements of those peptide ions observed in multiple SCX fractions to compensate for variable SCX separation in more complex samples. Methods are included in the software for maximizing the alignment of SCX fractions when there is large chromatographic variation along with validation methods to easily detect outliers or systematic biases when changes are made to the overall system.
Random noise is introduced into the system at various steps, including SCX and reversed-phase chromatography, and contributes to the spread of the log2 peptide ratios around zero. The standard deviation of log2 peptide ratios for SCX replicates (Fig. 2C) was 0.59 compared with 0.53 for reversed-phase replicates (Fig. 2B), suggesting that the RP-LC/MS/MS process is more important than SCX chromatography in contributing to variability in peptide ratios. Protein ratios are calculated by intensity weighted averaging of peptide ratios, which has the effect of reducing the standard deviation of the protein ratio distributions, depending on the number of peptides for a given protein. The 95% range (±2) for log2 peptide ratios in this dataset was ±1.2, corresponding to the ability to detect a 2.3-fold change in ratio for a single peptide. When ratios of peptide intensities for each protein were averaged, the ratio threshold for significant change was lowered as a function of the number and standard deviation of peptide ratios contributing to the protein average as shown in Fig. 2D. As expected, the majority of false positives were observed in proteins represented by only one or two peptides in common between samples with lower variability in proteins with three or higher peptides in common (Fig. 2, D and E). The log2 ratio distributions for proteins represented by three or more peptides showed less variation (Fig. 2E) and were normally distributed as verified by normal probability plots (data not shown).
Linearity and Sensitivity
We tested linearity of intensity measurements by spiking varying amounts of two standard proteins into 10 µg of a complex sample (pooled from two SCX fractions in Sample 1). In total, 21 and 43 peptide ions were observed for BSA (added at 0.220 pmol) and ApoT (added at 0.055 pmol), respectively. Linearity between peak area intensity versus amount of added standard was observed for nearly all peptides. Two examples are shown in Fig. 3A, and linear regression parameters for all peptides are listed in Supplemental Table 2.
|
When observed and expected ratios were compared, a significant deviation in slope from the expected value of 1 was observed (Fig. 3B). Observed protein ratios were systematically underestimated with the largest deviations as high as 2-fold occurring when at least one ion in the pair had low intensity. This is symptomatic of background noise, which was relatively constant between samples of similar complexity analyzed on the same instrument. Removing all ions with peak area <2 x 107 and subtracting this background value from each intensity measurement improved the agreement between expected versus observed ratios (Fig. 3C), confirming that background contributes to protein ratio underestimation. After correction, the observed ratios matched expected ratios to within 1.4-fold.
Quantifying Proteins by Spectral Counting
Experiments described above were evaluated by spectral counting, by summing all MS/MS observations for any peptide in a given protein, including spectra redundant for ion charge states. Spectral counting is complicated when tryptic peptides are shared between different database entries containing overlapping sequences, which we refer to as "protein isoforms." Assessment of such occurrences was facilitated by Isoform Resolver (20), which uses a peptide-centric database to report different IPI entries that share peptide sequences.
To develop a quantitative measure for comparing proteins in different samples by spectral counting, we used a count ratio method described for SAGE analysis (37) and applied the G test of independence to assess significance (36). This method determines confidence level primarily from the ratio of total peptides observed in the two samples to be compared and is implemented in the Serac PASC module. The G test is useful for detecting differential expression between datasets with different sample populations and assumes only that sampling is equally random in each dataset.
Variability in spectral counts between samples was evaluated in replicate analyses of Sample 1. Fig. 4A compares spectral counts per protein in parallel SCX runs of the same proteolytic digestion, and Fig. 4B plots the same data on a log scale. In each case, G test calculations were used to determine 95 and 80% critical value thresholds, predicting confidence limits for differential protein expression. For example, at the 95% threshold, when 0, 1, 2, and 3 spectra are observed in one sample, spectral counts of 3, 6, 9, and 10 in the other sample are respectively required to observe changes in abundance of a given protein between two samples to yield p values <0.05. -Fold changes corresponding to these limits were calculated as in Equation 1 ("Experimental Procedures"), using a correction factor = 0.5, as proposed by Beissbarth et al. (37) to avoid discontinuities for proteins with spectral count = 0 in one of the samples. Fig. 4C shows the minimum ratio that can be determined at 95 and 80% confidence as a function of the lowest spectral count. As spectral counts increase, protein ratios that can be measured with high confidence decrease, indicating that smaller differences in spectral counts are needed to detect significant changes.
|
Linearity and sensitivity in spectral count measurements were evaluated in experiments spiking varying amounts of BSA and ApoT into pooled SCX fractions. Triplicate measurements showed good agreement between spectral counts and protein added down to 200 fmol of BSA and 50 fmol of ApoT, near the limit of detection for the LC/MS/MS configuration used in the experiment (Fig. 4D). This validates the use of minimal spectral counts down to values of 1 or 0 when G test statistics are used to properly assess changes. At the other extreme, a saturation effect was observed in which spectral counts were nonlinear at high levels of added standards. Comparison of BSA and ApoT suggested nonlinearity above a spectral count threshold of 30 that was independent of protein identity or amount added. This effect is likely to be caused by dynamic exclusion, which limits the number of spectra taken per peptide ion and therefore the number of spectral counts as the degree of sampling approaches maximal sequence coverage. The behavior reveals a practical limit in measurements of spectral counts per protein in any one RP-LC/MS/MS run. However, this limit was never reached in experimental samples (Fig. 4, A and B) where the maximal spectral counts in any single RP run never exceeded 22. Spectral counts distributed across different SCX fractions or replicate RP runs can be added without complications from saturation.
Fig. 4E shows various dilution pairs of BSA and ApoT, comparing expected versus observed log2 protein ratios (RSC), the observed estimated by Equation 1 using = 0.5. Observed protein ratios increased proportionally with expected over the entire range of BSA and ApoT that was added. The range of ratios quantified by peak area intensity measurements was observed up to log2
6.6. Thus, spectral counting showed a wider range than peak area intensity measurements because it enabled ratio measurements to be made at the lowest concentrations of standards, where spectral counts were 0.
Observed protein ratios from spectral counting of protein standards showed large deviations from expected ratios in which observed ratios were systematically overestimated (Fig. 4E). This was especially true when the minimal spectral count in a dilution pair was 0 (Fig. 4E, open squares), indicating that the correction factor for denominator values was too small. By varying the magnitude of the correction factor to = 1.25, better correspondences between observed and expected protein ratios were obtained (Fig. 4F). This reflected large changes in ratio for dilution pairs where minimum spectral counts were low but little effect when minimum spectral counts were high. After correction, the observed ratios matched expected ratios to within
2.3-fold. Thus, the errors in protein ratios are significantly higher when measured by spectral counting than peak area intensities, particularly when minimum spectral counts in one sample equal 0 or 1.
Comparison of Methods for Protein Quantification
Published experiments on label-free quantitation have all used standard proteins for validation. We therefore carried out tests to evaluate the behavior of ion intensity measurement versus spectral counting using more complex samples. Soluble proteins were separated by gel filtration (Sample 2) and visualized by Coomassie staining after separation of fractions by SDS-PAGE (Fig. 5A).
Intensities of peptide ions for these proteins were then compared between different gel filtration fractions following solution digestion, SCX fractionation of peptides, and RP-LC/MS/MS. Fig. 5B represents intensities of peptide ions for cytoplasmic actin, showing six pairwise comparisons of different gel filtration fractions. Only peptides found in common between each pair of fractions were plotted in each panel. The plots showed average linear correlation coefficients r = 0.94 ± 0.04. Similar results were observed for other high abundance proteins where at least three peptide ions could be directly compared between fractions.
To provide independent ratio measurements, protein bands were quantified by scanning densitometry, and ratios of protein abundances between different gel filtration fractions were determined from integrated areas under selected bands. Proteins were selected based on their apparent resolution on the gel and identified by in-gel digestion and MS. Proteins were eliminated when the analysis of the in-gel digests indicated a significant amount of a second protein was present in a band or when a protein eluted at different molecular weights on the gel.
RPA protein ratios determined from peak area intensity measurements were then determined for these proteins and compared with the protein ratios estimated from gel staining (Fig. 5C). The gel staining abundance ratio (RGEL) is expressed as a log2 value where 0 indicates no change, and positive and negative values indicate increases and decreases in staining intensity, respectively. Likewise protein ratios from spectral counts were compared with protein ratios from gel staining (Fig. 5D and Supplemental Table 3). Linear regression of the ratios from gel staining versus from peak area intensity measurements or spectral counts showed reasonable agreement, confirming that label-free methods for quantitation are able to report changes in protein abundance in complex samples. Interestingly a higher correlation was observed when we compared protein ratios from intensity measurements versus spectral counts for the same proteins (Fig. 5E). This strongly suggests that most of the variability in Fig. 5, C and D, is due to error in quantifying gel staining intensities and that protein ratios measured from peak intensities agree well with those measured from spectral counts.
We next examined the behavior of all proteins that could be compared between two gel filtration fractions. In contrast to the narrow distribution in spectral counts between replicate analyses of soluble K562 proteins (Fig. 4, A and B), the variation in spectral counts was much higher when comparing proteins between gel filtration fractions 6 and 8 (Fig. 5F). More than 30% of proteins were found to lie outside the 95% confidence limits (Fig. 5F). Significant differences are expected given that many proteins would be expected to change in abundance between two gel filtration fractions. An off-diagonal centering of the critical value thresholds in this figure reflected the difference in total spectral counts between the two gel filtration fractions that is corrected by the G test statistic.
We then compared RPA versus RSC for all proteins that could be compared between gel filtration fractions 6 versus 8. Fig. 5G shows 147 proteins that shared 3 peptides in common of 2,306 proteins identified in fraction 6 and 1,834 proteins in fraction 8. A reasonable correlation between RPA versus RSC was observed, although the variability was somewhat higher than data shown in Fig. 5E due to the lower average number of peptides in common over all proteins in Fig. 5G. However, the subset of 62 proteins with RPA values that were significantly different from 0 (p
0.05, determined from Students t test of the log2 protein ratios; Fig. 5G, closed symbols) included 48 (77%) with RSC significantly different from 0 (p
0.05, determined by the G test; Fig. 5G, closed triangles). The remaining 14 with p > 0.05 for RSC (Fig. 5G, closed circles) were those in which RPA magnitudes were lower. Thus, for proteins sharing
3 peptides in common between samples, those with significant changes in abundance were most readily identified by setting 95% confidence limits for both RPA and RSC, illustrating complementarity between these measurements.
Measurements of protein abundance differences were then extended to examine the behavior of proteins that could be compared between pairwise combinations of gel filtration fractions 616. An average of 732 proteins were observed in each gel filtration fraction, totaling 5,339 proteins (20). Each protein typically eluted in three or more fractions; thus we expect the majority of proteins to show differences between pairs of fractions with protein ratios varying from small to large. We examined various pairwise comparisons of fractions 6 through 16. In total, 22,218 unique comparisons of protein abundance could be made. Of these, 19% shared at least one peptide in common, necessary for peak area intensity comparisons, whereas 81% lacked common peptides and could only be evaluated by spectral counting (Fig. 6A). Of the former set, 1,233 (29%) showed three or more peptide ions in common between any two fractions, meeting our criteria for peak area intensity measurements (Fig. 6B). Of the latter set, 7,740 (35%) had at least four spectra in one of the paired fractions. Thus, a similar percentage of proteins met our criteria for spectral count measurements. However, the ratio of proteins quantifiable by spectral counts exceeded those quantifiable by peak area intensities by 3-fold, and no proteins quantifiable by peak area measurements were excluded from spectral count measurements. Plotting the distribution of peptide ratios within each of these subsets showed a wider dispersion of protein ratios by spectral counting than peak area intensity measurements as well as
3-fold wider magnitude range (Fig. 6D, left panel). Thus, the spectral counting measurements sampled a larger range of abundance ratios than peak area measurements.
|
Comparing Changes in Protein Abundance to mRNA Abundance
Finally protein changes that accompany cell differentiation were compared with corresponding mRNA changes. In response to phorbol ester stimulation, K562 cells undergo morphological changes involving cell attachment and spreading and induction of genes characteristic of the megakaryocyte lineage. These responses are regulated by the MKK/ERK pathway and are therefore repressed by cell-permeable inhibitors of MKK1/2 such as U0126 (21, 22). Parallel experiments were carried out using shotgun proteomics and Affymetrix microarrays to examine cells that were untreated, stimulated with PMA to induce differentiation, or treated with PMA in the presence of U0126 to repress differentiation.
Soluble proteins were extracted, proteolyzed, separated by SCX-HPLC, and analyzed by RP-LC/MS/MS (Sample 3). The number of proteins identified in (a) control, (b) PMA-treated, and (c) PMA + U0126-treated samples were 463, 395, and 449, totaling 703 proteins. Of these, 62 and 67 proteins, respectively, showed three or more peptides in common between pairs a versus b and b versus c with 43 in common between all three conditions, whereas 254 proteins had four or more spectral counts among one of the three experiments. These were considered quantifiable by intensity measurements. RPA and RSC measurements between control versus PMA conditions were each plotted against log2 ratios of mRNA abundance (RmRNA) measured using Affymetrix microarrays (Fig. 7, A and B). The dispersion of ratios was larger with RSC than RPA as were correlations with RmRNA. After filtering log2 protein ratios for p 0.1, 41 proteins were observed to significantly change as measured by RSC, and these correlated well with changes measured by RmRNA (Fig. 7A, solid symbols), whereas 10 proteins with significant changes were indicated by RPA (p
0.1; Fig. 7B, solid symbols).
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Reproducibility and sensitivity were evaluated to assess the performance of each method. By normalizing ion intensity measurements by the total ion intensity for each RP run, peak area intensity measurements of replicate analyses showed reproducibility to within log2 ratio values of ±1.2, indicating that 2.3-fold changes could be measured with >95% confidence. Typical coefficients of variance were 30% (0.45 x S.D.) for protein ratios determined from three or more peptide ratios, which were distributed normally on a log2 scale. Analysis of standard proteins showed linearity in peak area intensities versus amount added and good correlations between observed and expected protein ratios. Interestingly observed ratios generally underestimated true ratios, but this systematic variation could be corrected for by subtracting background intensities from each measurement. Only 0.5% of peak areas quantified showed intensities less than 2 x 107 counts; thus relatively little information was lost by this correction.
Spectral counting yielded good reproducibility between replicate experiments. False positives were well within the 95% confidence limit and decreased as spectral counts increased. Here we applied the G test of independence to detect significant change in protein abundance with spectral counts. The G test is less accurate at low counts; therefore, calculations at any confidence limit predict that larger differences in spectral counts would be needed to detect significant changes in protein ratio as the minimum spectral count decreases (Fig. 4C). Analysis of standard proteins showed linearity in spectral counts versus amount of protein added. Correlations between observed and expected protein ratios were observed with standards, although observed ratios generally overestimated true ratios by significant amounts. This was primarily due to the discontinuity in log2 ratio that occurred in cases where the number of spectral counts in the sample with lower abundance was close to 0 and could be partially corrected by increasing the value of used to calculate protein ratios from spectral counts.
Both methods showed good correlations in experiments using standard proteins. In addition, both methods could be validated by independent measurements evaluating protein abundance ratios by gel staining following SDS-PAGE. An important advantage of spectral counting is that peptides in common between datasets are not required for the protein ratio calculations, enabling greater percentages of proteins to be compared, as observed in the experiment comparing gel filtration fractions (Fig. 6C, Sample 2), which was chosen for its high representation of protein changes. By considering a wider range of proteins, where sequence coverage can be very low in one sample, greater sensitivity could be achieved. Such differences were also observed when the number of protein changes was relatively small (as in Sample 3). On the other hand, it was clear from the analysis of standards that the measured protein ratios showed larger error by spectral counting than by peak area intensity measurements. This was particularly true for proteins where one sample had only 0 or 1 spectra and was caused by the discontinuity in ratio measurement mentioned above. Thus, the two methods are complementary with spectral counting measurements yielding greater sensitivity and peak ion intensity measurements yielding greater accuracy in reporting changes in protein abundances. Intuitively ratios measured from spectral counts are most accurate for proteins with large numbers of spectra, and ratios measured from peak area intensities are most accurate for proteins with large numbers of overlapping peptide ions.
One of the limiting factors for quantitation by peak area intensity measurements as presented is the requirement for obtaining assigned MS/MS spectra prior to peak matching. This can reduce the number of comparable proteins if the reacquisition of MS/MS spectra between experiments is low and would be expected to be most problematic for ions with weak intensities. To examine this, we analyzed datasets of replicate RP runs of 16 SCX fractions (Sample 1a). We compared histograms of intensities (not normalized) from peaks whose MS/MS spectra were identified in replicate samples (Fig. 8, blue outline) versus those where replicate MS/MS spectra were not observed (Fig. 8, red outline). All peaks showed signal above background due to the fact that data were prefiltered to only accept well behaved, quantifiable peaks. The results showed that
of all peaks identified in any one sample were identified by MS/MS in the other sample. Thus, our estimate of peptides not reacquired due to MS/MS sampling limitations was
. Perhaps not surprisingly, the distribution of the unmatched group was skewed toward lower intensities with a mean value
2-fold less than the matched group. This may limit the sensitivity of the peak area strategy compared with profiling by spectral counting. Nevertheless the criterion for stringency developed in this study, i.e. requiring three or more matched peptide ions per protein, has potentially greater impact on the number of proteins that can be quantified by peak area measurements than the MS/MS reacquisition issue. Furthermore even at this fairly low sampling depth, we can still observe the majority of peptide peaks reproducibly. The percentage of observable peaks should increase with higher sampling depth provided by newer, faster scanning instruments.
|
In comparisons of protein ratios between gel filtration fractions (Fig. 6D, Sample 2), the RPA measurements ranged up to 5.8 (56-fold), whereas RSC measurements ranged up to 7.3 (160-fold) (Fig. 6D). The apparently lower range of ratios detected by peak area measurements reflects the requirement for three or more peptides in common. This selects for proteins with higher coverage that therefore show smaller differences in abundance. In contrast, requiring a maximum of four or more peptides for quantitation by spectral counting selects for proteins with wider dispersion of abundance ratios. This is augmented by examining proteins with RSC or RPA selected for significance with p 0.05 (Fig. 6, C and D). Log2 protein ratios that were significantly different from 0 as determined by SC but not PA showed wider dispersion and range, whereas ratios determined by PA but not SC showed narrower distributions. This indicates that RSC will sample a larger effective dynamic range than RPA when datasets are filtered by tests of significance.
Performance of both approaches depends strongly on the depth of MS/MS sampling because ratios by spectral counts are most significant for proteins with large numbers of spectra, and ratios by peak area intensity are most significant for proteins with large numbers of overlapping peptide ions. Without high sampling, protein ratios estimated from peak area intensities are limited to abundant proteins with high sequence coverage. Likewise ratios estimated by spectral counting can be significantly suppressed when the minimum spectral count is 0 and the maximum spectral count is limited by sampling due to low protein abundance. The recent availability of more sensitive linear ion traps with higher scan rates promises to increase sampling depth and enhance capabilities for more effective label-free quantification.
In the analysis of Sample 3, we surveyed an average of 435 proteins from each of three conditions in human K562 cells of which 254 were quantifiable by spectral counting (SC 4) and 62 by peak area intensity measurements (common peptides
3) between two conditions. Thus, a relatively low depth of sampling enables
of proteins to be compared. Applying spectral counting, we detected 41 (9.4%) proteins changing in abundance between control and PMA treated samples, allowing p
0.1. By comparison, mRNA datasets revealed
28% genes showing changes significant with p
0.1. The correspondence between mRNA and protein changes in our experiments is consistent with our previous findings that MKK/ERK promotes cell differentiation by controlling gene expression (22). Correspondence between mRNA and protein has also been shown to be significant in yeast for proteins within moderate to high abundance expression ranges (11).
In summary, label-free methods for protein quantitation in shotgun datasets offer an alternative approach to stable isotope labeling methods. Peak area intensity and spectral counting methods enable protein ratios significant to 2.5-fold to be determined with high confidence. This represents lower sensitivity than can be achieved by isotopic labeling where protein ratios significant to
1.5-fold are typically reported (7, 11, 13, 14). Nevertheless protein changes at or above 2.5-fold are well within the range of greatest interest in experimental comparisons, and the ability to achieve this without stable isotope labeling can be advantageous under conditions where metabolic labeling or chemical derivatization is difficult. To minimize experimental variation, data should be collected in a manner that interleaves samples, although satisfactory results were still obtained with datasets of Samples 2 and 3, each of which were collected over several months and not interleaved. Combining the complementary methods for quantitation by peak area intensities and spectral counting increases the power for detecting protein changes in shotgun experiments.
![]() |
FOOTNOTES |
---|
Published, MCP Papers in Press, June 23, 2005, DOI 10.1074/mcp.M500084-MCP200
1 The abbreviations used are: SCX, strong cation exchange; RP, reversed-phase; ApoT, apotransferrin; PMA, phorbol 12-myristate 13-acetate; PA, peak area; SC, spectral count; RPA= log2(peptide or protein ratio) measured from peak area intensities; RSC= log2(protein ratio) measured from spectral counts; XIC, extracted ion chromatograph; ERK, extracellular signal-regulated kinase; MKK, mitogen-activated protein kinase kinase.
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material.
¶ Present address: Research Triangle Inst., 3040 Cornwallis Rd., Research Triangle Park, NC 27709-2194.
|| To whom correspondence should be addressed. Tel.: 303-492-4799; Fax: 303-492-2439; E-mail: natalie.ahn@colorado.edu
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|