Complementary Analysis of the Mycobacterium tuberculosis Proteome by Two-dimensional Electrophoresis and Isotope-coded Affinity Tag Technology *

Frank Schmidt{ddagger}, Samuel Donahoe§, Kristine Hagens, Jens Mattow, Ulrich E. Schaible, Stefan H. E. Kaufmann, Ruedi Aebersold§ and Peter R. Jungblut{ddagger},||

From the {ddagger} Core Facility Protein Analysis and Department of Immunology, Max Planck Institute for Infection Biology, D-10117 Berlin, Germany and the § Institute for Systems Biology, Seattle, Washington 98103-8904


    ABSTRACT
 TOP
 ABSTRACT
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Classical proteomics combined two-dimensional gel electrophoresis (2-DE) for the separation and quantification of proteins in a complex mixture with mass spectrometric identification of selected proteins. More recently, the combination of liquid chromatography (LC), stable isotope tagging, and tandem mass spectrometry (MS/MS) has emerged as an alternative quantitative proteomics technology. We have analyzed the proteome of Mycobacterium tuberculosis, a major human pathogen comprising about 4,000 genes, by (i) 2-DE and mass spectrometry (MS) and by (ii) the isotope-coded affinity tag (ICAT) reagent method and MS/MS. The data obtained by either technology were compared with respect to their selectivity for certain protein types and classes and with respect to the accuracy of quantification. Initial datasets of 60,000 peptide MS/MS spectra and 1,800 spots for the ICAT-LC/MS and 2-DE/MS methods, respectively, were reduced to 280 and 108 conclusively identified and quantified proteins, respectively. ICAT-LC/MS showed a clear bias for high Mr proteins and was complemented by the 2-DE/MS method, which showed a preference for low Mr proteins and also identified cysteine-free proteins that were transparent to the ICAT-LC/MS method. Relative quantification between two strains of the M. tuberculosis complex also revealed that the two technologies provide complementary quantitative information; whereas the ICAT-LC/MS method quantifies the sum of the protein species of one gene product, the 2-DE/MS method quantifies at the level of resolved protein species, including post-translationally modified and processed polypeptides. Our data indicate that different proteomic technologies applied to the same sample provide complementary types of information that contribute to a more complete understanding of the biological system studied.


Classical proteomics studies of Mycobacterium tuberculosis have combined two-dimensional electrophoresis (2-DE)1 with mass spectrometry (MS) and have revealed about 1,800 distinct spots separated by 2-DE (1). About 350 of these were identified, and the comparison of the protein patterns of virulent and attenuated strains identified several proteins that are being studied further as potential vaccine candidates (1, 2). More than two million deaths/year and eight million new infections are caused by M. tuberculosis, the bacterium responsible for tuberculosis (3). Genomics, transcriptomics, and proteomics, which form the rationale basis to develop new therapeutic and preventive strategies, have been applied to combat this disease. The complete genome of M. tuberculosis comprises about 4,000 genes, which were classified in six protein classes and 30 subclasses (4).

The smallest unit of the proteome, the protein species, is defined by its chemical structure (5). Therefore, each modification of a protein leads to a new protein species. These are successfully resolved by 2-DE if they differ by at least one charge or by at least several hundred daltons in mass. Quantification by 2-DE requires a high degree of pattern reproducibility, which is difficult to achieve in a multistep and parallel procedure.

To alleviate limitations of the 2-DE/MS method, more recently internally standardized gel-free quantitative proteomics methods have been developed. Of these the prototypical method is isotope-coded affinity tag (ICAT) reagent labeling and tandem mass spectrometry (MS/MS) (6). Proteins contained in two sample mixtures are covalently labeled with the isotopically light or heavy form of the ICAT reagents, respectively, and the samples are combined and proteolyzed. After purification of the labeled peptides via the affinity tag that is part of the reagents, they are analyzed by LC-MS/MS. The peptides eluting from the final chromatography step are subjected to data-dependent MS/MS, providing for peptide quantification based on the relative signal intensities of the heavy and light forms of a particular peptide detected in the MS scan and for peptide identification by MS/MS and sequence database searching. This approach has been successfully applied to a wide variety of biological samples (714). Higher mass accuracy and manual peptide selection without the time pressure from data-dependent procedures were achieved by combining ICAT labeling with MALDI quadrupole time-of-flight mass spectrometry (15). Furthermore, the labeling protocols have been optimized (16), resulting in a robust quantitative technology.

Here we compare the results obtained by analyzing a microbial pathogen comprising about 4,000 genes using the 2-DE/MS and ICAT-LC/MS methods. Two strains of the M. tuberculosis complex were compared. Both technologies showed biases for and against certain types and classes of proteins and quantified proteins at different levels. 2-DE/MS complemented ICAT-LC/MS for low Mr and cysteine-free proteins and protein species separation. The ICAT technology complemented the 2-DE/MS with some functional protein classes. ICAT-LC/MS was superior for high Mr proteins and membrane proteins.


    EXPERIMENTAL PROCEDURES
 TOP
 ABSTRACT
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
2-DE—
After growing M. tuberculosis H37Rv and Mycobacterium bovis BCG in Middlebrook 7H9 broth as described previously (1), cellular proteins were dissolved in 9 M urea, 70 mM dithiothreitol, 2% carrier ampholytes Servalyte 2–4 (Serva, Heidelberg, Germany), and protease inhibitors (N-p-tosyl-L-lysine chlormethyl ketone, leupeptin, E64, pepstatin A; 25 µM) (1). Carrier ampholyte isoelectric focusing was combined with SDS-polyacrylamide gel electrophoresis (1). Gels with the size 23 cm x 30 cm were used. For analytical gels (gel width of 0.75 mm) stained with silver and for preparative gels (gel width of 1.5 mm) to be stained with Coomassie Brilliant Blue G-250, 60 µg and up to 600 µg of protein, respectively, were applied at the anodic side of the gel. The 2-DE technique used has a resolution power of 5,000 spots. Subtractive analysis was performed manually, and some of the clear intensity differences were quantified by the program TOPSPOT (available for download free of charge from www.mpiib-berlin.mpg.de/2D-PAGE/).

MALDI-MS—
Proteins separated by 2-DE were identified by peptide mass fingerprinting after in-gel tryptic digestion as described previously (1). A Voyager Elite mass spectrometer (Perseptive Biosystems, Framingham, MA) was used with a mass accuracy of 30 ppm after internal calibration. Proteins were identified by MS-FIT or MASCOT (Matrix Science Ltd., London, United Kingdom, www.matrixscience.com) database searches. The identified proteins had a sequence coverage higher than 30%. For comparison with the ICAT results the dataset described earlier (1, 2) was used. Spot positions and identities with additional information and hyperlinks to sequence and pathway databases are available at www.mpiib-berlin.mpg.de/2D-PAGE/.

ICAT Reagent Labeling and Chromatography—
Total cell extract was prepared for the two strains, and proteins were dissolved in buffer (6 M urea, 0.05% SDS, 5 mM Tris, pH 8.3, 5 mM EDTA) to obtain a protein concentration of 700 µg/200 µl and reduced with 5 mM tributylphosphine for 30 min at 37 °C. After addition of 350 nmol ICAT reagent to each sample (~0.5 nmol of ICAT/µg of protein; final ICAT concentration, 1.75 mM) proteins were incubated for 90 min in the dark at room temperature with gentle stirring. After incubation, dithiothreitol was added to a final concentration of 10 mM to quench residual-free reagent. Labeled samples were mixed, diluted 4-fold with water so that the final urea concentration was 1.5 M, and digested with 1:25 trypsin/protein for 5 h at 37 °C. To remove the remaining ICAT reagent and other contaminants and to separate peptides, the resulting peptide mixture was combined, acidified to pH 3, and loaded on a 4.6 mm x 200 mm Polysulfoethyl A cation-exchange chromatography column (Poly LC Inc., Columbia, MD) with 5-µm particles and 300-Å pores at a flow rate of 800 µl/min. The column was washed by buffer A (5 mM KH2PO4, 25% acetonitrile, pH 3.0), and the peptides were eluted by buffer B (20 mM KH2PO4, 350 mM KCl, 25% acetonitrile, pH 3.0). The column was developed over a 50-min dual-step salt gradient.

The collected fractions were dried by speed-vac and resuspended in 2x phosphate-buffered saline pH 7.2 for avidin purification using a self-packed UltraLink monomeric avidin column (Pierce) with 400 µl of packed beads in a glass Pasteur pipette. The column was washed with water, and biotinylated peptides were eluted by 0.3% formic acid. Avidin column eluent was dried down by speed-vac, and the pellet was resuspended in reverse phase buffer A (5% acetic acid, 0.005% heptafluorobutyric acid). The biotinylated peptides were analyzed using a reverse phase capillary chromatography (75 µm, 10-cm self-packed C18 column, Monitor, Column Engineering, Ontario, Canada) at a flow rate of 250 nl/min.

Ion Trap-MS—
Peptide identification by collision-induced dissociation was carried out by data-dependent precursor ion selection using the dynamic exclusion option on a Finnigan LCQ ion trap mass spectrometer. The MS/MS spectra were searched against a protein sequence database (M. tuberculosis H37Rv, ftp://ftp.sanger.ac.uk/pub/tb/sequences/TB.pep; 3,924 entries) using the SEQUEST software tool, and the abundance ratios of isotopically labeled cysteinyl peptide pairs were calculated using the XPRESS program (14).

Only strictly tryptic cysteinyl peptides and peptides with an Xcorr score value higher than 3 were used for identification. The first 800 identifications from SEQUEST, sorted by Xcorr, were verified by spot check using MASCOT. Also, quantification of the first 800 cysteine-containing peptides was inspected manually. The SEQUEST results and the manual changes were implemented and stored in a relational MySQL database, and a Web interface was developed to ask multicriteria questions for intelligent data searches (www.mpiib-berlin.mpg.de/2D-PAGE/, functional classification tool). The database entries fulfilling these search criteria are reported by a browser such as Netscape or Internet Explorer.


    RESULTS
 TOP
 ABSTRACT
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Data Amount and Reduction—
The M. tuberculosis H37Rv genome has been predicted to comprise about 4,000 genes (4). However, it cannot be expected that proteins representing every predicted gene are present in a bacterial culture grown under specific conditions. From M. tuberculosis H37Rv cells in late exponential phase about 1,800 distinct protein spots were resolved on silver-stained 2-DE gels (1). Analysis of the same samples by ICAT-LC/MS identified about 60,000 peptides. Both datasets contained redundancies. Because of post-translational modifications and processing, many genes are represented on 2-DE gels by multiple spots. In the ICAT experiment a particular peptide may occur in different chromatographic fractions and will therefore be sequenced multiple times. To obtain comparable datasets data reduction was necessary for both approaches. The data analysis flow chart is shown in Fig. 1 .



View larger version (21K):
[in this window]
[in a new window]
 
FIG. 1. Flow chart of data reduction in ICAT-LC/MS and 2-DE/MS approaches.

 
The criterion to have comparable datasets was the accessibility of data. In 2-DE gels the best accessible proteins are within the high intensity spots, which resulted in high identification scores. In ICAT-LC/MS, where the abundance cannot be directly measured, the accessibility is also determined by scoring factors. Therefore, as a comparable dataset the most intense spots from a 2-DE gel and the most reliably identified proteins from the ICAT analysis were chosen. The 560 spots initially identified by 2-DE/MS and present in both strains were listed in order of decreasing silver-staining intensity. Of these the 160 most intense spots were chosen for further analysis. After removing redundancies caused by post-translational modifications, 108 unique proteins constituted the final 2-DE dataset that was used throughout the study (Table I ).


View this table:
[in this window]
[in a new window]
 
TABLE I Identified proteins from 2-DE

 
The primary ICAT-LC/MS data contained numerous cysteine-free peptides and peptides missing lysines or arginines at their C terminus. These primary data were reduced to 2,000 cysteinyl peptides with double tryptic termini. The 800 peptides with the highest SEQUEST scores were further manually evaluated, resulting in 280 uniquely identified proteins (Table II ). This final ICAT-MS dataset was used throughout the study. Within the 108 and 280 proteins identified by 2-DE/MS and ICAT-LC/MS, respectively, 27 were common to both (Fig. 2 ). For peptide quantification, the same 800 spectra were evaluated manually, and if necessary, quantification calculated by the XPRESS program was corrected manually by adjusting the scan intervals, the tolerance, and the predicted shift of light and heavy peptide forms.


View this table:
[in this window]
[in a new window]
 
TABLE II Intensity differences detected by ICAT

No., number of identified and quantified peptides of proteins; if >=2, the average value of the ratio was used.

 


View larger version (65K):
[in this window]
[in a new window]
 
FIG. 2. Relationship of the number of normalized proteins (3,924) to the number of proteins identified by 2-DE/MS (108), the number of proteins identified by ICAT-LC/MS (280), and the number of common proteins between the practical approaches (27).

 
All the identified proteins were inserted into a 2D-PAGE database (www.mpiib-berlin.mpg.de/2D-PAGE). Within this database the position of the identified spots on 2-DE gels, including Mr and pI values, MS data, and other characteristics, is available. Links to sequence, classification, and pathway databases provide further information. The relational structure of the database is constructed to allow flexible data mining procedures (18).

Protein Class Biases of 2-DE/MS and ICAT-LC/MS Proteomics—
The genome of M. tuberculosis was classified into six functional classes (Fig. 3 ) (4) and further subdivided into 30 subclasses comprising 80 protein families. Three of these protein families (respiration, IS elements, and PE family) were further classified into subfamilies. To identify protein classes that were over- or underrepresented in the two datasets for either method the ratio of the number of identified proteins of each class and the total number of proteins was calculated. These ratios were compared with the ratio of the total number of members of each family and the total number of predicted proteins (normalized ratio).



View larger version (18K):
[in this window]
[in a new window]
 
FIG. 3. Comparison of proteins identified by the ICAT-LC/MS and 2-DE/MS methods versus normalized protein percentages. Identities were assigned to six main protein classes. Columns represent percentages relative to the total of predicted genes (3,924), the total of ICAT-LC/MS-identified proteins (280), and the total of 2-DE/MS-identified proteins (108).

 
At the level of protein classes it is obvious that both experimental approaches overrepresented the number of proteins from the class "small molecule metabolism," whereas the protein classes "other" and "unknowns" were clearly underrepresented (Fig. 3). Both 2-DE/MS and ICAT-LC/MS approaches revealed similar numbers/class. However, at the resolution of the protein subclass level (Fig. 4 ), the ICAT-LC/MS method compared with the 2-DE/MS method preferentially identified proteins of the subclasses "central intermediary metabolism," "polyketide and non-ribosomal peptide synthesis," "degradation of macromolecules," "cell envelope," and "transport/binding proteins." In contrast, 2-DE preferred relative to the ICAT method the protein subclasses "lipid biosynthesis," "synthesis and modification of macromolecules," "chaperones/heat-shock proteins," "protein and peptide secretion," and "adaptations and atypical conditions." Both methods overestimated relative to the normalized ratio the subclasses "degradation," "energy metabolism," "amino acid biosynthesis," and "lipid biosynthesis and detoxification" and underestimated "cell envelope," "transport/binding proteins," and "virulence." None of the proteins in the subclasses "cell division," "IS elements, repeated sequences, and phage," "PE and PPE families," "cytochrome P450 enzymes," "cyclases," and "chelatases" were experimentally verified by either of the two methods within the reduced datasets. With the restricted number of experimentally identified proteins a more detailed analysis at the protein family or subfamily level was not considered. These data suggest that either method as executed in this study was significantly biased in favor of highly abundant proteins.



View larger version (28K):
[in this window]
[in a new window]
 
FIG. 4. Comparison of proteins identified by ICAT-LC/MS and 2-DE/MS versus normalized proteins. The protein classes small molecule metabolism, macromolecule metabolism, cell processes, and other are subdivided into 30 subclasses. Conserved hypotheticals and unknowns have no subdivisions. Columns represent percentages relative to the total of normalized genes (3,924), the total of ICAT-LC/MS-identified proteins (280), and total of 2-DE/MS-identified proteins (108).

 
Molecular Mass Bias—
To estimate the bias of either method for the identification of proteins of a particular mass range the percentages of proteins predicted from the genome of M. tuberculosis H37Rv and of proteins identified by 2-DE/MS and ICAT-LC/MS were sorted by Mr in bins of 10-kDa width and into an additional bin for proteins >100 kDa (Fig. 5 a). With 61.7% most of the proteins were predicted in the Mr range between 10 and 40 kDa. In the range between 10 and 60 kDa 3,300 of 3,924 (84%) proteins were predicted. With respect to the Mr of identified proteins the 2-DE and ICAT methods were complementary. For the low mass range (10–30 kDa) proteins identified by 2-DE/MS were clearly overrepresented, whereas for all the mass ranges >30 kDa proteins identified by ICAT-LC/MS were overrepresented. The overrepresentation was most pronounced for proteins >100 kDa relative to both 2-DE/MS and normalized percentages. Both methods underestimated proteins with a Mr <10 kDa, and furthermore, 2-DE gave a poor representation of proteins with Mr >60 kDa.



View larger version (26K):
[in this window]
[in a new window]
 
FIG. 5. Bias of ICAT-LC/MS and 2-DE/MS against protein characteristics. The percentages of normalized, ICAT-LC/MS-identified, and 2-DE/MS-identified proteins are displayed according to their characteristics. a, molecular mass in 10-kDa increments. Proteins with a molecular mass smaller than 10 kDa and greater than 100 kDa were grouped in one column each. Theoretical molecular masses are extracted from the Sanger Institute database. b, pI values in one-step increments. Theoretical pI values are extracted from the Sanger database. c, hydropathy indices in 0.2-step increments. Hydropathy is displayed as the grand average hydropathy (GRAVY) score, which is calculated as the arithmetic mean of the sum of the hydropathic indices of each amino acid. The left end of the scale represents hydrophobic proteins, and the right end represents hydrophilic proteins.

 


View larger version (16K):
[in this window]
[in a new window]
 
FIG. 5. —continued

 
Isoelectric Point Biases—
To study biases of the two methods for certain pI ranges, the pI range from 3 to 13 was divided into bins of 1 pI unit, and proteins were assigned to these bins based on the calculated pI value of the unmodified polypeptide chain (Fig. 5b). The pI values were calculated by summing the pI values of the single amino acids and dividing them by the number of the amino acids (19). The relative distribution of the normalized mycobacterial proteins showed a low number of proteins in the pI range of 7–8, an observation that has also been described for several other organisms (2022). Both methods overrepresented proteins between pI 4 and 6, with 2-DE/MS showing stronger representation for pI range 4–5 and ICAT-LC/MS for pI range 5–6. More than 50% of all proteins identified by either the ICAT-LC/MS or 2-DE/MS method had a calculated pI between 5 and 6, and more than 70% had a calculated pI between 5 and 7. Proteins of pI ranges >6 were underrepresented by both methods, and proteins of predicted pI values >11 were not detected at all.

Hydropathy Bias—
Hydrophobic proteins often cause problems for protein analysis due to insolubility in commonly used solvents and loss by binding to surfaces. A measure for hydrophilicity and hydrophobicity is the grand average hydropathy (GRAVY) score, which is calculated as the arithmetic mean of the sum of the hydropathic indices of each amino acid in a protein (23). This index was used to compare the number of normalized proteins and those identified by ICAT-LC/MS and 2-DE/MS in specific hydropathic ranges (Fig. 5c). Fifty-nine percent of the normalized mycobacterial proteins have a hydrophobic character, and 41% have a hydrophilic character. The hydropathy distribution has the shape of a Gaussian curve with a maximum of more than 50% of the normalized proteins concentrated in the neutral range between the values -0.2 and 0.2. Of the experimentally identified proteins 75% (ICAT-LC/MS) and about 60% (2-DE/MS) fell into this neutral range. 2-DE/MS slightly overestimated proteins in the hydrophobic range, whereas ICAT-LC/MS gave slight underestimation. Proteins in the strong hydrophilic range were underestimated by both 2-DE/MS and ICAT-LC/MS. These results reveal that 2-DE/MS has advantage for the identification of hydrophobic proteins and that both approaches had problems with the analysis of hydrophilic proteins.

Quantification of Differences in Protein Abundance between M. tuberculosis H37Rv and M. bovis BCG
Absolute quantification of proteins is a difficult task and at best achieved by amino acid composition analysis (24). Therefore, most proteomic experiments attempt to obtain relative quantitative information by determining the abundance ratios of proteins present in two samples. In the 2-DE/MS method this is achieved by comparing optical densities of matched spots in 2-DE gels, and in the ICAT-LC/MS method it is achieved by calculating the ratios of signal intensities for pairs of differentially isotopically labeled peptides. Comparing virulent with attenuated strains of the M. tuberculosis complex with the classical proteomics approach, we detected 32 spots only present in the virulent strains and obvious intensity differences by visual inspection (1, 2). Reproducibility of sample handling is improved by the ICAT-LC/MS approach due to the fact that aliquots from two biological samples are mixed directly after labeling and treated identically throughout the whole analysis procedure, the two samples thus serving as mutual internal standards.

A higher number of proteins of different abundance in the two strains were detected by ICAT-LC/MS. Proteins showing intensity differences by more than 3-fold were extracted from Table II and are summarized in Table III . After quantification by the XPRESS tool, the calculated ratios were manually verified for the 280 proteins extracted, and if necessary, they were corrected. This analysis revealed three proteins as unique for the virulent H37Rv strain. These three proteins are potential vaccine candidates. Two of them, Rv0223c and Rv1513, were already predicted by genomic analyses (25) as belonging to regions deleted in BCG strains. Rv1513 is a conserved hypothetical protein of unknown function and is specific to M. tuberculosis but absent from the wild type M. bovis genome, as are genes of the flanking clusters of Rv1506c and Rv1516c. Even though a considerable number of flanking genes are annotated as "conserved hypotheticals," this gene cluster may be involved in carbohydrate metabolism due to the fact that Rv1516c is a sugar transferase homologue and that gmdA, epiA, and Rv1520, which code for a GDP-mannose 4,6-dehydratase, a nucleotide-sugar epimerase, and another putative sugar transferase, respectively, are part of the same cluster. Rv0223c is a probable aldehyde hydrogenase that is present in the genome of M. bovis wild type but not of the leprosy agent Mycobacterium leprae, a mycobacterial species with large genetic deletions when compared with M. tuberculosis (26). The other ± variant, Rv0570, was newly detected by ICAT-LC/MS and is a ribonucleotide reductase present in the genome of wild type M. bovis but absent from M. leprae. Furthermore, 20 proteins were increased and 23 decreased by more than 3-fold in M. tuberculosis H37Rv compared with M. bovis BCG (Table III). The genes of these proteins, of which only a small subset was also detected by the 2-DE/MS approach, are present in BCG, although probably down-regulated or not induced by so far unknown regulatory mechanisms active in M. tuberculosis under the same culture conditions. Furthermore, because this analysis was performed with mycobacterial samples from one type of growth condition, i.e. logarithmic growth phase in 7H9 fully complemented medium, the proteins down- or up-regulated in BCG in comparison to M. tuberculosis may not be controlled in the same manner in either of the two mycobacterial strains. The proteins up-regulated in BCG in comparison with M. tuberculosis, although present in the latter strain, are of various functional categories. Of particular interest were the proteins coded by genes Rv2935 and Rv2940c, which belong to a cluster of genes involved in glycolipid synthesis including synthesis of polyketides and mycocerosates, and Rv1527c, which is located in a cluster of genes involved in polyketide and lipid synthesis. These data suggest that under the specific growth conditions used in this study the lipid synthesis pathway in BCG is regulated differently than that of M. tuberculosis. As a certain mycocerosate and its synthesis machinery have been implicated in virulence in M. tuberculosis, these data are of potential clinical importance (27).


View this table:
[in this window]
[in a new window]
 
TABLE III Intensity differences more than 3-fold detected by ICAT

*, genes are absent in the BCG strain. No., number of identified and quantified peptides of proteins; if >=2, the average value of the ratio was used. #, identified by 2-DE/MS.

 
Comparison of Quantification by ICAT-LC/MS and 2-DE/MS—
Because the number of proteins with different abundance detected by both the 2-DE/MS and ICAT-LC/MS methods was low (Table III), we searched to explain this observation. One reason may be that the number of identified proteins is low in comparison to the expressed proteins in M. tuberculosis (Fig. 2) so that the chance of finding numerous proteins common to both datasets is low. To search for further reasons we analyzed the 27 proteins both approaches identified as quantitatively regulated. In 2-DE gels the spots of these 27 proteins were quantified with the help of the evaluation software TOPSPOT. The intensity values of spots from gels derived from four independent sample preparations for each mycobacterial strain were averaged. Accepting a divergence of 30%, only eight of the 27 proteins showed the same intensity difference detected by 2-DE and ICAT-LC/MS (Table IV). One example is shown in Fig. 6 . Enoyl-CoA hydratase (Rv0905) showed an intensity relationship of 1:1.12 and 1:1.1 for 2-DE/MS and ICAT-LC/MS, respectively. Thus far, this protein was only found in one spot of each 2-DE pattern. The protein was identified with a sequence coverage of 46%, and the peptides of the protein covered 62% of the peptide mass fingerprint. All of the five most intense peaks belonged to Rv0905. The remaining peaks comprised contamination from matrix, trypsin, or stain. No additional protein reached an acceptable scoring factor, suggesting that Rv0905 represents the only protein in this spot (28). These data indicate that for the simple situation encountered for Rv0905, i.e. probable single spot pattern and no co-migrating proteins, the two methods provided comparable results.


View this table:
[in this window]
[in a new window]
 
TABLE IV Common proteins between ICAT-LC/MS and 2-DE/MS

<>,average of ratio from added intensities. #, average ratio of single protein species. NA, not applicable.

 


View larger version (56K):
[in this window]
[in a new window]
 
FIG. 6. Comparison of a 2-DE spot with ICAT-LC/MS quantification as exemplified by the enoyl-CoA hydratase/isomerase superfamily (also known as eccH) (Rv0905). Four gels from independent preparations of both strains were compared with ICAT-LC/MS quantification. A ratio of 1:1 of the spot intensities determined with the TOPSPOT software was confirmed by the calculated d0:d8 ratio obtained using XPRESS software.

 
More complex situations resulted in a divergence of the ratios obtained by the two methods. Four spots contained more than one protein. Here, only ICAT-LC/MS allowed us to distinguish which one(s) of the proteins present in the spot changed their abundance. Therefore, quantification by 2-DE of spots comprising a mixture of two or more proteins is not useful unless also stable isotope-tagging methods are being used (17). One spot with an intensity ratio of 1:1 between BCG and H37Rv contained two proteins, phosphoenolpyruvate carboxykinase (Rv0211) and an ATPase of ATP:ADP antiporter family (Rv2115c) (Fig. 7 ). ICAT-LC/MS analysis revealed a ratio of 1:0.68 for Rv0211 and 1:1.22 for Rv2115c, clearly distinguishing between these two proteins. In these four cases only ICAT-LC/MS allowed us to quantify the protein amount.



View larger version (75K):
[in this window]
[in a new window]
 
FIG. 7. Comparison of a spot containing two proteins by 2-DE/MS with ICAT-LC/MS of these two proteins. The 2-DE/MS results show two proteins in one spot resulting in an intensity ratio of the sum of both. The quantity of 2-DE-separated proteins was obtained by averaging spot intensities of four gels from different preparations of both strains. In the bottom the calculated d0:d8 ratio detected by ICAT-LC/MS using XPRESS software is shown. Proteins analyzed are phosphoenolpyruvate carboxykinase (Rv0211) and ATPase of the ATP:ADP antiporter family (Rv2115c).

 
For five genes, the corresponding proteins were detected in different spots. In these cases, each protein species could be independently quantified by the 2-DE/MS method, whereas the ICAT-LC/MS method quantified the sum of the protein species as a single protein. Succinyl-CoA synthase {alpha} chain (Rv0952) was identified in three 2-DE spots (Fig. 8 ). Each of them had a different intensity relationship between BCG and H37Rv of 1:0.41, 1:1, and -/+. ICAT-LC/MS did not distinguish between these three protein species and calculated a 1:0.7 relationship. Adding up the intensities of all 2-DE spots containing Rv0952 a ratio of 1:1 was calculated between BCG and H37Rv assimilating the ICAT-LC/MS result. These cases exemplify the potential of 2-DE to separate proteins at the protein species level, a capability that is usually not provided by the ICAT-LC/MS method. This is of particular importance if differential post-translational modification leads to different electrophoretic mobility. For a further 10 proteins for which the ratios determined by 2-DE/MS and ICAT-LC/MS differed, the reasons for the discrepancy are presently unknown. It is likely that additional data, e.g. the identification of additional spots representing a particular gene product or the detection of additional proteins in specific 2-DE gel spots, will help to clarify the apparent discrepancies.



View larger version (65K):
[in this window]
[in a new window]
 
FIG. 8. Comparison of a protein consisting of three protein species in the 2-DE gels with the ICAT-LC/MS quantification: succinyl-CoA synthase {alpha} chain (Rv0952). The quantity of 2-DE-separated protein species was obtained by averaging spot intensities of four gels from different preparations of both strains. One spot is missing in the four gels from strain M. bovis BCG in comparison with the gels from M. tuberculosis H37Rv. Spot 1 is present in both strains and shows a ratio of 1:0.41. Spot 2 is also present in both strains and shows a ratio of 1:1. Spot 3 is detected only in gels from M. tuberculosis H37Rv and has an average intensity of 260. ICAT-LC/MS generated only one result (1:0.7) using the XPRESS software. The ratio of the sum of intensities of spots 1 and 2 in strain M. bovis BCG and the sum of intensities of spots 1, 2, and 3 in strain M. tuberculosis H37Rv is 1:1.

 
These data clearly show that the 2-DE/MS and ICAT-LC/MS cannot be expected to and in fact do not provide identical quantitative results. Only in cases where a protein exists as a single protein species or where all protein species are represented by one spot containing only one protein will quantification in both methods be comparable. This was only true for 22% of the cases investigated.


    DISCUSSION
 TOP
 ABSTRACT
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
As was shown for the ribosomal proteome the resolution of a complete proteome is feasible provided the complexity is not too high (5). Problems with dynamic range of protein concentration, sensitivity, and resolution increase with the complexity of the samples to be analyzed. In addition, intrinsic properties of certain classes of proteins can make their separation and/or detection difficult. Limitations for the analysis of basic and hydrophobic proteins by 2-DE have been described previously (5, 20). The presence of cysteine in the protein is a prerequisite for ICAT analysis as proteins lacking cysteines cannot be quantified. From the mycobacterial genome 81% of the proteins contain at least one cysteine. Including the limitation by the MS measurement to peptides between Mr 500 and 4,000 the percentage of accessible proteins is reduced to 75%. Cysteine content in the protein subclasses chaperones/heat-shock proteins and PE and PPE families is significantly reduced (4). Only five of 16 predicted chaperones/heat-shock proteins contain at least one cysteine. Heat-shock proteins like DnaK, GroEL2, GroES, and HspX are represented as intense spot series by 2-DE but are not present within the ICAT-labeled proteins. Instead, LC/MS peptides of these proteins were found with high scores within the unlabeled fraction of the 60,000 peptides. Proteins of the PE and PPE families were not identified by either 2-DE/MS or ICAT-LC/MS. Only 42 of 167 proteins of the PE and PPE families subclass contain at least one cysteine. Indeed, 26 proteins of the PE and PPE families were detected within the 200 best scores of the unlabeled proteins. These 26 proteins have a mean Mr of 88,961 and are therefore not in the optimal Mr range for 2-DE. However, 12 of these proteins have an Mr <60 kDa showing that the Mr cannot be the only reason for excluding them in the 2-DE/MS analysis. The mean pI value of 4.39 and the hydropathy values also should not cause problems for 2-DE. For this protein class LC/MS seems to be of advantage compared with 2-DE/MS. Because of the low amount of cysteines in this class of proteins, ICAT labeling is not of advantage here, and quantification is not possible.

Fig. 4 illustrates lack of the subclasses polyketide and non-ribosomal peptide synthesis, central intermediary metabolism, and degradation of macromolecules by 2-DE/MS analysis, whereas by ICAT-LC/MS all of these three subclasses were overrepresented. The subclass polyketide and non-ribosomal peptide synthesis contained 41 proteins, and 21 of them had an Mr >100,000. This may be the reason for absence within the 2-DE patterns and overrepresentation in ICAT-LC/MS. The other two protein subclasses contain only one protein with an Mr >100,000, leaving the reason unresolved as to why these subclasses were not represented by 2-DE/MS.

Membrane proteins were also underrepresented by the ICAT-LC/MS approach, although an earlier report described the analysis of 491 microsomal proteins and therefore showed compatibility of the method with the analysis of poorly soluble proteins (14). The underrepresentation of membrane proteins in this study is therefore most likely explained by the protein solubilization conditions used. Underrepresented protein classes other and unknowns could contain falsely assigned genes. We have shown by 2-DE/MS (29) that gene prediction programs have neglected six genes in M. tuberculosis. The only way to avoid falsely predicted genes is to accept a gene as such only after obtaining evidence for its existence at the protein level. Genes in the genome lists should be annotated by a comment "confirmed at the protein level" with reference.

Proteins can only be accurately quantified by 2-DE/MS if the spot intensity is in the linear range of the staining method used and if the spot consists only of one protein species. Furthermore, we show that quantitative results obtained by the two methods cannot be compared directly. The ICAT-LC/MS method quantifies protein composition, whereas the 2-DE/MS method quantifies each protein species individually. Therefore, ICAT-LC/MS and 2-DE/MS are complementary quantitative proteomic methods, and the optimal use of either method depends on the biological objectives.

It should be noted that both methods are undergoing significant improvements. To avoid the quantification problems caused by non-parallel separation and identification in the classical 2-DE/MS proteomics an ICAT-2-DE/MS strategy was developed (17). The advantages of protein species separation are combined with parallel quantification and quantification of several proteins within one spot. ICAT-2-DE/MS will not allow the quantification of large proteins by the parallel process and needs again to be complemented by ICAT-LC/MS. Fluorescence labeling by difference gel electrophoresis (30) exchanges optical density measurements by fluorescence measurements, hereby increasing the dynamic range of optical measurements. Experience will show whether the labeling process with difference gel electrophoresis, ICAT, or others can be performed with 100% yield for all proteins. Maybe even these methods have to be accepted to be complementary because of the different amino acids that are labeled. The ICAT-LC/MS method has been advanced by the development of second generation isotope-tagging reagents that are characterized by the use of 12C/13C as the stable isotope, resulting in the elimination of the chromatographic shift of deuterated tags and the introduction of an acid-cleavable linker that leads to improved MS/MS spectral quality due to the reduced size of the isotope tag. Furthermore, advanced software tools for the analysis of LC-MS/MS data have been developed that significantly accelerate the pace and increase the consistency of data analysis (31, 32).2 Collectively, these recent technical advances suggest that the performance of quantitative proteomic technologies is rapidly approaching the stage at which the routine and complete analysis of cellular proteomes will become reality.


    ACKNOWLEDGMENTS
 
K. P. Pleissner and T. Eifert are acknowledged for excellent help in bioinformatics.


    FOOTNOTES
 
Received, August 3, 2003, and in revised form, October 6, 2003.

Published, MCP Papers in Press, October 13, 2003, DOI 10.1074/mcp.M300074-MCP200

1 The abbreviations used are: 2-DE, two-dimensional electrophoresis; MS, mass spectrometry; MS/MS, tandem mass spectrometry; LC, liquid chromatography; ICAT, isotope-coded affinity tag; MALDI, matrix-assisted laser desorption ionization. Back

2 X.-J. Li, H. H. Zhang, J. A. Ranish, and R. Aebersold, submitted for publication. Back

* This work was supported by European Union Grant QLK2CT200001536; Bundesministerium für Bildung und Forschung (Germany) Grant 031U107A/207A; NHLBI, National Institutes of Health Contract N01-HV-28179; and NCI, National Institutes of Health Grant 1 R33 CA93302-01. Back

|| To whom correspondence should be addressed: Core Facility Protein Analysis, Max Planck Institute for Infection Biology, Schumannstr. 21/22, D-10117 Berlin, Germany. E-mail: jungblut{at}mpiib-berlin.mpg.de


    REFERENCES
 TOP
 ABSTRACT
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Jungblut, P. R., Schaible, U. E., Mollenkopf, H.-J., Zimny-Arndt, U., Raupach, B., Mattow, J., Halada, P., Lamer, S., and Kaufmann, S. H. E. (1999) Comparative proteome analysis of Mycobacterium tuberculosis and Mycobacterium bovis BCG strains: towards functional genomics of microbial pathogens. Mol. Microbiol. 33, 1103 –1117[CrossRef][Medline]

  2. Mattow, J., Jungblut, P. R., Schaible, U. E., Mollenkopf, H.-J., Lamer, S., Hagens, K., Müller, E.-C., and Kaufmann, S. T. H. (2001) In search for a novel tuberculosis vaccine: identification of proteins from Mycobacterium tuberculosis missing in Mycobacterium bovis BCG strains. Electrophoresis 22, 2936 –2946[CrossRef][Medline]

  3. Kaufmann, S. H. E. (2000) Is the development of a new tuberculosis vaccine possible? Nat. Med. 6, 955 –960[CrossRef][Medline]

  4. Cole, S. T., Brosch, R., Parkhill, J., Garnier, T., Churcher, C., Harris, D., Gordon, S. V., Eiglmeier, K., Gas, S., Barry, C. E., III, Tekaia, F., Badcock, K., Basham, D., Brown, D., Chillingworth, T., Connor, R., Davies, R., Devlin, K., Feltwell, T., Gentles, S., Hamlin, N., Holroyd, S., Hornsby, T., Jagels, K., Barrell, B. G., et al. (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537 –544[CrossRef][Medline]

  5. Jungblut, P., Thiede, B., Zimny-Arndt, U., Muller, E. C., Scheler, C., Wittmann-Liebold, B., and Otto, A. (1996) Resolution power of two-dimensional electrophoresis and identification of proteins from gels. Electrophoresis 17, 839 –847[Medline]

  6. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H., and Aebersold, R. (1999) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994 –999[CrossRef][Medline]

  7. Ideker, T., Thorsson, V., Ranish, J. A., Christmas, R., Buhler, J., Eng, J. K., Bumgarner, R., Goodlett, D. R., Aebersold, R., and Hood, L. (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929 –934[Abstract/Free Full Text]

  8. Shiio, Y., Donohoe, S., Yi, E. C., Goodlett, D. R., Aebersold, R., and Eisenman, R. N. (2002) Quantitative proteomic analysis of Myc oncoprotein function. EMBO J. 21, 5088 –5096[Abstract/Free Full Text]

  9. Guina, T., Purvine, S. O., Yi, E. C., Eng, J., Goodlett, D. R., Aebersold, R., and Miller, S. I. (2003) Quantitative proteomic analysis indicates increased synthesis of a quinolone by Pseudomonas aeruginosa isolates from cystic fibrosis. Proc. Natl. Acad. Sci. U. S. A. 100, 2771 –2776[Abstract/Free Full Text]

  10. Ranish, J. A., Yi, E. C., Leslie, D. M., Purvine, S. O., Goodlett, D. R., Eng, J., and Aebersold, R. (2003) The study of macromolecular complexes by quantitative proteomics. Nat. Genet. 33, 349 –355[CrossRef][Medline]

  11. Baliga, N. S., Pan, M., Goo, Y. A., Yi, E. C., Goodlett, D. R., Dimitrov, K., Shannon, P., Aebersold, R., Ng, W. V., and Hood, L. (2002) Coordinate regulation of energy transduction modules in Halobacterium sp. analyzed by a global systems approach. Proc. Natl. Acad. Sci. U. S. A. 99, 14913 –14918[Abstract/Free Full Text]

  12. Von Haller, P. D., Yi, E., Donohoe, S., Vaughn, K., Keller, A., Nesvizhskii, A. I., Eng, J., Li, X. J., Goodlett, D. R., Aebersold, R., and Watts, J. D. (2003) The application of new software tools to quantitative protein profiling via ICAT and tandem mass spectrometry: I. Statistically annotated datasets for peptide sequences and proteins identified via the application of ICAT and tandem mass spectrometry to proteins co-purifying with T cell lipid rafts. Mol. Cell. Proteomics 2, 426 –427[Abstract/Free Full Text]

  13. Arur, S., Uche, U. E., Rezaul, K., Fong, M., Scranton, V., Cowan, A. E., Mohler, W., and Han, D. K. (2003) Annexin I is an endogenous ligand that mediates apoptotic cell engulfment. Dev. Cell 4, 587 –598[Medline]

  14. Han, D. K., Eng, J., Zhou, H., and Aebersold, R. (2001) Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat. Biotechnol. 19, 946 –951[CrossRef][Medline]

  15. Griffin, T. J., Gygi, S. P., Rist, B., and Aebersold, R. (2001) Quantitative proteomic analysis using a MALDI quadrupole time-of-flight mass spectrometer. Anal. Chem. 73, 978 –986[CrossRef][Medline]

  16. Smolka, M. B., Zhou, H., Purkayastha, S., and Aebersold, R. (2001) Optimization of the isotope-coded affinity tag-labeling procedure for quantitative proteome analysis. Anal. Biochem. 297, 25 –31[CrossRef][Medline]

  17. Smolka, M., Zhou, H., and Aebersold, R. (2002) Quantitative protein profiling using two-dimensional gel electrophoresis, isotope-coded affinity tag labeling, and mass spectrometry. Mol. Cell. Proteomics 1, 19 –29[Abstract/Free Full Text]

  18. Pleissner, K.-P., Eifert, T., and Jungblut, P. R. (2002) A European pathogenic microorganism proteome database: construction and maintenance. Comp. Funct. Genomics 3, 97 –100[CrossRef]

  19. Rickard, E. C., Strohl, M. M., and Nielsen, R. G. (1991) Correlation of electrophoretic mobilities from capillary electrophoresis with physicochemical properties of proteins and peptides. Anal. Biochem. 197, 197 –207[Medline]

  20. Büttner, K., Bernhardt, J., Scharf, C., Schmid, R., Mäder, U., Eymann, C., Antelmann, H., Völker, A., Völker, U., and Hecker, M. (2001) A comprehensive two-dimensional map of cytosolic proteins of Bacillus subtilis. Electrophoresis 22, 2908 –2935[CrossRef][Medline]

  21. VanBogelen, R. A., Abshire, K. Z., Moldover, B., Olson, E. R., and Neidhardt, F. C. (1997) Escherichia coli proteome analysis using the gene-protein database. Electrophoresis 18, 1243 –1251[Medline]

  22. Link, A. J., Hays, L. G., Carmack, E. B., and Yates, J. R., III (1997) Identifying the major proteome components of Haemophilus influenzae type-strain NCTC 8143. Electrophoresis 18, 1314 –1334[Medline]

  23. Kyte, J., and Doolittle, R. F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105 –132[Medline]

  24. Jungblut, P., Dzionara, M., Klose, J., and Wittmann-Liebold, B. (1992) Identification of tissue proteins by amino acid analysis after purification by two-dimensional electrophoresis. J. Protein Chem. 11, 603 –612[Medline]

  25. Behr, M. A., Wilson, M. A., Gill, W. P., Salamon, H., Schoolnik, G. K., Rane, S., and Small, P. M. (1999) Comparative genomics of BCG vaccines by whole genome DNA microarrays. Science 284, 1520 –1523[Abstract/Free Full Text]

  26. Cole, S. T., Eiglmeier, K., Parkhill, J., James, K. D., Thomson, N. R., Wheeler, P. R., Honore, N., Garnier, T., Churcher, C., Harris, D., Mungall, K., Basham, D., Brown, D., Chillingworth, T., Connor, R., Davies, R. M., Devlin, K., Duthoy, S., Feltwell, T., Fraser, A., Hamlin, N., Holroyd, S., Hornsby, T., Jagels, K., Lacroix, C., Maclean, J., Moule, S., Murphy, L., Oliver, K., Quail, M. A., Rajandream, M. A., Rutherford, K. M., Rutter, S., Seeger, K., Simon, S., Simmonds, M., Skelton, J., Squares, R., Squares, S., Stevens, K., Taylor, K., Whitehead, S., Woodward, J. R., and Barrell, B. G. (2001) Massive gene decay in the leprosy bacillus. Nature 409, 1007 –1011[CrossRef][Medline]

  27. Cox, J. S., Chen, B., McNeil, M., and Jacobs, W. R. (1999) Complex lipid determine tissue specific replication of Mycobacterium tuberculosis in mice. Nature 402, 79 –83[CrossRef][Medline]

  28. Schmidt, F., Schmid, M., Mattow, J., Pleissner, K. P., and Jungblut, P. R. (2003) Iterative procedure to improve peptide mass fingerprint identification of two-dimensional electrophoresis separated proteins. J. Am. Soc. Mass Spectrom. 14, 943 –956[CrossRef][Medline]

  29. Jungblut, P. R., Müller, E.-C., Mattow, J., and Kaufmann, S. H. E. (2001) Proteomics reveals open reading frames in Mycobacterium tuberculosis not predicted by genomics. Infect. Immun. 69, 5905 –5907[Abstract/Free Full Text]

  30. Uenlue, M., Morgan, M. E., and Minden, J. S. (1997) Difference gel electrophoresis: a single method for detecting changes in protein extracts. Electrophoresis 18, 2071 –2077[Medline]

  31. Keller, A., Nesvizhskii, A. I., Kolker, E., and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383 –5392[CrossRef][Medline]

  32. Keller, A., Purvine, S., Nesvizhskii, A. I., Stolyar, S., Goodlett, D. R., and Kolker, E. (2002) Experimental protein mixture for validating tandem mass spectral analysis. OMICS 6, 207 –212[CrossRef][Medline]