Program in Proteomics and Bioinformatics, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5G 1L6, Canada
** Department of Statistics, Yale University, New Haven, Connecticut 06520
Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Two-dimensional polyacrylamide gel electrophoresis has been the traditional method of choice for high resolution proteome analysis (7, 8). Despite recent advances, this approach is biased against membrane-associated proteins, low abundance proteins, or proteins with extremes in isoelectric point or molecular weight (9, 10). The identification of gel-separated proteins by mass spectrometry (MS)1 is also tedious due to the need to extract, digest, and analyze individual gel spots. Consequently techniques for gel-free chromatographic separation of protein or peptide mixtures coupled to on-line MS detection are currently in development. One promising method, based on multidimensional capillary-scale liquid chromatography-electrospray ionization tandem MS (LC-MS) protein identification technology (MudPIT) pioneered by Yates and colleagues (1113), permits shotgun sequencing of large numbers of proteins present in cell extracts. MudPIT has been applied successfully to several model organisms, leading to the identification of 1,484 proteins in yeast (12), 2,363 proteins in rice (14), and, most recently, 2,415 proteins in Plasmodium (15). Other powerful gel-free approaches, such as the use of accurate mass tag detection by Fourier transform ion cyclotron resonance MS (16), isotope-coded affinity tags (17), and high accuracy quadrupole MS (18), also allow for significant proteome coverage. Nevertheless, the mouse proteome is predicted to be very complex and highly regulated (19), involving many thousands of proteins regulated by means of differential synthesis and selective subcellular localization. Furthermore, current high throughput experimental proteomic approaches do not allow for ready transformation of raw data into meaningful, easy to interpret output.
Here we describe the development and application of PRISM, a generic Proteomic Investigation Strategy for Mammals that allows for systematic, efficient, and unbiased detection and simplified follow-up analysis of large numbers of proteins expressed in mammalian cells and tissues. PRISM consists of a series of integrated experimental and analytical steps, starting with subcellular fractionation and high throughput protein shotgun sequencing using an optimized MudPIT procedure followed by automated statistical validation, annotation, and categorization of the identified proteins based on universal Gene Ontology (GO) annotation terms (20). PRISM was evaluated on healthy adult mouse lung and liver, and physiologically significant differences in the tissue specificity and subcellular localization of hundreds of proteins were readily detected, confirming the utility of the approach for global analysis of complex mammalian proteomes.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Tissue Preparation and Organelle Fractionation
Healthy adult female mice (ICR) were CO2-asphyxiated and sacrificed. The organs of interest were perfused with cold phosphate-buffered saline, rapidly removed, rinsed, and homogenized for 2 min in ice-cold lysis buffer containing 250 mM sucrose, 50 mM Tris-HCl (pH 7.4), 5 mM MgCl2, 1 mM DTT, and 1 mM phenylmethylsulfonyl fluoride using a tight fitting Teflon pestle attached to a power drill. All subsequent steps were performed at 4 °C. The lysate was centrifuged in a benchtop centrifuge at 800 x g for 15 min; the supernatant served as source of cytosol, mitochondria, and microsomes. The pellet, which contains the nuclei, was rehomogenized for 1 min in lysis buffer and centrifuged again as above. The nuclei were homogenized in cushion buffer (2 M sucrose, 50 mM Tris-HCl (pH 7.4), 5 mM MgCl2, 1 mM DTT, and 1 mM phenylmethylsulfonyl fluoride), filtered through cheesecloth to remove debris, layered onto 4 ml of cushion buffer, and pelleted in an ultracentrifuge at 80,000 x g for 35 min (Beckman SW41 rotor). Mitochondria were isolated from the crude cytoplasmic fraction by benchtop centrifugation at 6,000 x g for 15 min, whereas the microsomal fraction was isolated by 100,000 x g ultracentrifugation for 1 h (Beckman SW41 rotor). The supernatant was saved as the "cytosol" fraction.
Organelle Extraction
Nuclear proteins were extracted by resuspending and incubating the nuclei in 5 volumes of 20 mM HEPES (pH 7.9), 1.5 mM MgCl2, 0.42 M NaCl, 0.2 mM EDTA, and 25% glycerol for 30 min with gentle shaking. The nuclei were then lysed by 10 passages through an 18-gauge needle, and debris were removed by microcentrifugation at 13,000 rpm for 30 min. The supernatant served as the "nuclear" fraction. Mitochondrial proteins were isolated by incubating the mitochondria in a hypotonic lysis buffer containing 10 mM HEPES, pH 7.9 for 30 min on ice. The suspension was briefly sonicated, and debris were pelleted in a benchtop microcentrifuge at 13,000 rpm for 30 min. The supernatant served as the "soluble mitochondrial" fraction. Membrane proteins were extracted by gently resuspending the insoluble mitochondrial pellet and the microsomes in extraction buffer containing 20 mM Tris-HCl (pH 7.8), 0.4 M NaCl, 15% glycerol, 1 mM DTT, and 1.5% Triton-X-100. The suspension was incubated with gentle shaking for 1 h and recentrifuged at 100,000 x g for 1 h (Beckman SW60Ti rotor). The supernatants served as the "microsome" and "mitochondrial pellet" fractions, respectively. For the crude whole tissue extract, mouse liver was homogenized for 2 min in ice-cold homogenization buffer containing 250 mM sucrose, 50 mM Tris-HCl (pH 7.4), 5 mM MgCl2, 1 mM DTT, and 1 mM phenylmethylsulfonyl fluoride. The resulting solution was briefly sonicated and centrifuged at 800 x g, and the supernatant was analyzed.
Digestion of Cell Extract for MudPIT Analysis
An aliquot of 150 µg of total protein from each fraction was precipitated overnight with 5 volumes of ice-cold acetone followed by centrifugation at 21,000 x g for 20 min. The protein pellet was solubilized in 8 M urea, 50 mM Tris-HCl, pH 8.5 at 37 °C for 2 h and reduced by the addition of 1 mM DTT for 1 h at room temperature followed by carboxyamidomethylation with 5 mM iodoacetamide for 1 h at 37 °C. The samples were then diluted to 4 M urea with 50 mM ammonium bicarbonate, pH 8.5 and digested with a 1:150 molar ratio of endoproteinase Lys-C at 37 °C overnight. The next day the mixtures were further diluted to 2 M urea with 50 mM ammonium bicarbonate, pH 8.5, supplemented with CaCl2 to a final concentration of 1 mM, and incubated overnight with Poroszyme trypsin beads at 30 °C with rotating. The resulting peptide mixtures were solid phase-extracted with SPEC-Plus PT C18 cartridges (Ansys Diagnostics, Lake Forest, CA) according to the manufacturers instructions and stored at -80 °C until further use.
MudPIT Analysis
A fully automated 15-cycle, 30-h MudPIT chromatographic procedure was set up essentially as described previously (12, 13). Briefly, an HPLC quaternary pump was interfaced with an LCQ DECA XP ion trap tandem mass spectrometer (ThermoFinnigan, San Jose, CA). A 150-µm-inner diameter fused silica capillary microcolumn (Polymicro Technologies, Phoenix, AZ) was pulled to a fine tip using a P-2000 laser puller (Sutter Instruments, Novato, CA) and packed with 10 cm of 5-µm Zorbax Eclipse XDB-C18 resin (Agilent Technologies, Mississauga, Ontario, Canada) and then with 6 cm of 5-µm Partisphere strong cation exchange resin (Whatman). Samples were loaded manually onto separate columns using a pressure vessel. The chromatography was carried out as described by Wolters et al. (13).
Protein Identification and Validation
The SEQUEST program (a kind gift from Jimmy Eng and John Yates III) was used to search peptide spectra essentially as described previously (21). The database was populated with non-redundant mammalian Swiss-Prot and TrEMBL protein sequences in both a normal and inverted amino acid orientation (22). Statistical analysis (error modeling) was performed on the SEQUEST scores obtained for over 30,000 peptide matches. Formally the output of the analysis, Yi, was given as: Yi = "0" (spectrum is incorrectly matched to an inverted peptide sequence; Yi = "1" (spectrum is matched to a normal peptide sequence, possibly incorrect); Yi = "2" (the spectrum matches the correct peptide sequence). We estimated a function F(x,y,z..) that characterizes the likelihood that a peptide match with score Xi = (x,y,z..) is correct as
![]() | (Eq. 1) |
For a protein with multiple peptide matches, {X1, X
2,..., X
m}, one can then estimate the probability of correct identification by
![]() | (Eq. 2) |
To compute the detection sensitivity (coverage), an estimate of the number of proteins actually present in the sample was made using
![]() | (Eq. 3) |
where PepN is the number of observed peptides, AProL is the average amino acid length of proteins in the database, APepL is the average length of a peptide in the database, and
1 is a positive constant proportional to the number of matches to an actual peptide.
F was approximated by first carving the regions and then fitting a smooth function. Monotonicity implies that for every ß, there is a rectangular region, R, for which the function F will have a value of at least ß. Assumption (i) implies that for a large ß and a rectangle R with a large number of observations (K > 100) the following applies
![]() | (Eq. 4) |
where represents the proportion of 1s in region R.
By continuity, for a not too large R, P(Yi = k|Xi
R) is approximately equal for all X
i
R. Hence we let
![]() | (Eq. 5) |
![]() | (Eq. 6) |
The probability that region R with K observations has x K of 1s (meaning either 1 or 2 since 2s cannot be recognized a priori) and (1 -
) x K of 0s is given by
![]() | (Eq. 7) |
It is well known that the maximum likelihood estimator of q is (23); we therefore let q
. Assumption (i) implies p1 = p0, resulting in
![]() | (Eq. 8) |
![]() | (Eq. 9) |
Equation 4 is proven.
Therefore, if R is a rectangle with at least 1s and the SEQUEST scores X
i
R, the probability that a peptide match is correct is approximated by
![]() | (Eq. 10) |
Next rectangular regions are identified for which F(x,y,z) ß, ß = {0.98, 0.96, 0.9, 0.8, 0.7, 0.6, 0.5}. To this end, for fixed ß, we defined a function H(x,y,z) = 1 if (x,y,z)
R and =0 otherwise. For
= (ß - 1)/2, we minimized the weighted l1 distance between function H and the data points. In other words, since the rectangle R is easily parameterized (example R = {(x,y,z) such that a < x < b and c < y < d and e < z < f}), one looks for values "a,b,c,..." that minimize the following quantity
![]() | (Eq. 11) |
To ensure , the weights were computed as follows
![]() | (Eq. 12) |
where = 2 - 1/
. Assumption (ii) implies the existence of a smooth monotone function that can approximate the data. The actual optimization algorithm was an accelerated Random search (24). Computations were run on a desktop computer using FORTRAN.
GOClust
GOClust takes as input a tab-delimited text file of validated proteins. To facilitate comparison across multiple samples, the programs DTASelect and Contrast (a generous gift from Dave Tabb, Scripps Research Institute, La Jolla, CA) were used to arrange the data sets (25). Protein matches to other sequence databases are first mapped to a corresponding Swiss-Prot or TrEMBL entry using the Sequence Retrieval System at the Canadian Bioinformatics Resources (www.cbr.ncr.ca). The GOA flat file (regularly updated) that provides GO annotations for non-redundant Swiss-Prot, TrEMBL, and Ensemble entries was downloaded from the European Bioinformatics Institute (www.ebi.ac.uk). The final output is a series of tables of grouped proteins that share a common annotation to one or more preselected GO terms. The choice of terms is fully flexible to satisfy user interests.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Protein Validation
The output of SEQUEST is a series of putative protein matches and associated peptide scores, which include a cross-correlation score based on spectral fit (Xcorr), the normalized difference between the Xcorr of the top and second best matches (Cn), and a preliminary ranking based on the number of matched ion peaks (RSp). A subjective combination of these scores as well as other factors such as the charge of the precursor ion, the presence of tryptic termini (relevant in experiments where the peptides are generated by digestion with trypsin), and the number of peptides that map to a given protein is typically used to evaluate the accuracy of a prediction (30). To provide a more rigorous estimate of the accuracy of SEQUEST predictions, we developed a statistical algorithm, STATQUEST, that uses an empirical, probabilistic method for determining the likelihood of each putative peptide match.
We began our error modeling by considering the criteria mentioned above as a collection of variables and evaluating whether a subset, d, of these might describe a region of d-dimensional space enriched for correctly identified proteins. The goal was to produce a function corresponding to the probability that a given protein with SEQUEST scores Xi = (x,y,z,..) is correctly identified. To this end, we evaluated the distribution of SEQUEST scores for tens of thousands of mouse peptide spectra obtained by searching a database populated with mouse and human protein sequences in both the normal amino acid order as well as a fully inverted order (see "Experimental Procedures"). Our analysis had two assumptions. (i) If a match is incorrect, SEQUEST has an equal chance to return a forward (a 1) or an inverted sequence (a 0). (ii) The likelihood of a correct match is a smooth and monotone function dependent on the Xcorr,
Cn, RSp, charge, and tryptic status of the peptide.
Since a match to an inverted sequence, or 0, clearly indicates an incorrect match, we located regions of variable space (i.e. Xcorr, Cn, and RSp) where the concentration of 0s is low. Since a low concentration of 0s relates to a high probability of correct matches, we were able to derive a likelihood function (see "Experimental Procedures"). (To further justify the above approach, we offer Supplemental Figs. F1 and F2 that show that the distribution of Xcorr and
Cn for matches to normal peptide sequences (or 1s; Supplemental Fig. F2) is biased compared with matches to inverted sequences (or 0s; Supplemental Fig. F1)).
While the exact form of this function is not known, a good (least squares) fit is achieved with
![]() | (Eq. 13) |
where Q(x,y) is a polynomial expression of second degree. Singly, doubly, and triply charged peptides were treated separately, and the three predictors (variables) were fixed as x = Xcorr, y = Cn, and z = RSp.
![]() | (Eq. 14) |
The output of this function is a probability value for putative matches, which allows for easy assignment of a confidence factor. We found that the dimension of the problem could be reduced by fixing the third variable, z < 5, since this results in virtually the same entries; an Accelerated Random Search (24) was used as the optimization algorithm. Heuristically the optimization "slides" the upper right corner rectangle until it encounters a region with concentration of 1s lower than . For example, if
is set as 0.99, the optimization scheme fits rectangles in such a way that it must add at least 100 of 1s for every added 0. If there is no such rectangle, the optimization scheme stops. Hence the optimization produces rectangles with maximum area at a concentration of at least
. A graphical representation of G(x,y) for doubly charged precursor ions is shown in Fig. 2B. While G(x,y) depends on both the peptide charge and the particular MS instrumentation used, the interpolated functions are virtually indistinguishable for proteins derived from distinct organisms (data not shown). STATQUEST, launched by command line, can be used to rapidly filter large SEQUEST output files based on preselected confidence (p value) cut-off values.
Functional Annotation and Clustering
Data analysis poses a significant challenge to large scale proteomic studies. To this end, we developed GOClust, a Perl-based computer program, to automatically annotate and subgroup long lists of validated proteins based on the GO annotation schema, a dynamic controlled vocabulary for describing the known molecular function, subcellular location, and biological role of proteins that changes as knowledge accumulates (20). As outlined in Fig. 2C, GOClust first sorts the proteins based on the database to which each corresponding accession number maps (e.g. Swiss-Prot, Protein Information Resource (PIR), or GenPept). Next the program obtains the GO identification numbers (GOids), and corresponding GO terms, assigned to each protein (or a close homologue) using GOA reference flat files downloaded from the European Bioinformatics Institute (32, 33) and the GO Consortium (20). Lastly the annotated proteins are grouped based on common annotation to one or more subcategories within the GO hierarchy, and the clusters are summarized in a spreadsheet (see "Experimental Procedures"). Users can specify the degree of annotation granularity (sets of returned GO terms) to narrow or broaden the scope of the analysis to areas of interest.
Mouse Lung and Liver
To evaluate the effectiveness of PRISM, we used it to investigate differences in the protein expression and subcellular distribution characteristics of adult mouse lung and liver. A combined total of over 300,000 spectra were acquired over the course of 2 weeks from equivalent normalized aliquots of the five respective subcellular fractions (see "Experimental Procedures"). The spectra were searched against non-redundant Swiss-Prot and TrEMBL mouse and human protein sequences downloaded from the European Bioinformatics Institute. Putative matches were filtered by STATQUEST using a p value threshold of 0.02 ( = 0.99), which virtually eliminates false positives albeit at the expense of sensitivity (coverage). (We estimate that
40% of proteins that underwent collision-induced dissociation pass this stringent cut-off (data not shown)). This resulted in the high confidence identification of a total of 2,106 unique proteins (8,606 unique peptides) in the two organs (Table I). Of these, 1,460 proteins (4,242 unique peptides) were detected in lung (Table I), and 1,358 proteins (4,364 unique peptides) were detected in liver (Table I), divided roughly equally among the fractions (the entire expression data set is presented in Supplemental Table I). This analysis also confirmed the expression of over 500 hypothetical proteins predicted by ongoing mouse cDNA sequencing efforts, in particular the RIKEN study (4). In contrast, a total of only 413 proteins were identified in a control MudPIT analysis of an unfractionated whole liver extract (data not shown), confirming that the subcellular fractionation procedure provides significantly deeper proteome coverage.
GOClust was able to map 1510 (71%), 1186 (56%), and 925 (43%) of the validated proteins to at least one annotation term within the three main GO categories (molecular function, biological process, and cellular component, respectively), which define the known biochemical function, subcellular localization, and physiological role of gene products. This permitted ready examination of the tissue and organelle distribution of proteins across diverse biological processes such as cell metabolism, intracellular signaling, or gene expression (Supplemental Table II). Marked enrichment of specific classes of proteins was evident in the different fractions of both organs, a sampling of which is provided in Fig. 3A. For instance, known nuclear proteins, such as components of the spliceosome and nucleolus, were found as expected almost exclusively in the nuclear fractions (Fig. 3A). Similarly integral membrane proteins known to reside in the Golgi apparatus (Fig. 3A) or the endoplasmic reticulum, lysosome, or peroxisome (Supplemental Table II) were enriched in the microsomal membrane fractions. Some cross-contamination between fractions was evident, likely in part due to proteins bearing annotation to more than one subcellular compartment and to proteins that traffic between cellular compartments such as the signal transducer and activator of transcription STAT3 (see Table III). It should be noted that only those proteins that could be annotated to a GO term within cellular component are shown in Fig. 3A, which does not represent all the proteins detected with known subcellular localization properties, e.g. many more known nucleolar proteins were identified than are presently annotated in the GO schema. This limitation will improve in the near future as the GO database expands.
|
|
Close comparison of the various fractions revealed organelle-specific protein distribution signatures consistent with physiologic expectation. For example, the nuclear fractions were highly enriched for known nuclear proteins, including dozens of proteins involved in transcription, RNA processing, and DNA metabolism (Fig. 3C). Over 170 hypothetical proteins (most predicted by the RIKEN mouse cDNA library sequencing project (4)) were also uniquely detected in the nuclear fractions, strongly suggesting they may have a nuclear related function.
Many tissue-specific proteins were also identified. For instance, members of the P450 family of cytochromes, membrane-bound heme-thiolate monooxygenases involved in NADPH-dependent electron transport pathways active in hepatocytes, were highly enriched in the liver microsomal fraction (Table II). Indeed, of the 27 P450 isoforms detected in this study (25 of which were found in the microsome fractions), 20 were found exclusively in the liver, 17 of which are known to be liver-specific (35); two were detected in both lung and liver, one of which (C2F2) is known to be expressed in Clara cells present in both organs (36); and three others were detected uniquely in lung, one of which (CP4B) is known to be pulmonary specific (37). In contrast, two-thirds of known mitochondrial proteins detected were found in both organs (Supplemental Table II), consistent with an organelle expected to be similar between organs.
Other notable examples of tissue specificity are listed in Table III, such as the transcriptional regulator STAT3 (STA3), thyroid transcription factor 1 (TTF1), and cAMP-response element-binding protein (CREB)-binding protein (CBP), which were found exclusively in the lung along with several of their known targets, including the pulmonary surfactant-associated proteins, which lower the surface tension at the air-liquid interface in alveoli (38). Conversely hepatocyte nuclear factor 4- (HN4A), a nuclear hormone receptor essential for liver development (39) that drives transcription of liver-specific enzymes such as
1-antitrypsin, apolipoprotein C-III, and transthyretin (40, 41), and hepatocyte nuclear factor 6 (HNF6), a transcriptional activator that binds to cognate promoter consensus sequences upstream of several liver transcribed genes (42), were detected exclusively in the liver nuclear fraction.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Examination of the annotated, clustered data set revealed many examples of tissue- or organelle-specific protein expression of notable biological interest. Particularly striking was the detection of low abundance transcription factors linked to liver development (HN4A), liver-specific gene transcription (HNF6), and lung surfactant homeostasis (TTF1 and STA3) that help define the physiological characteristics of these respective tissues. Detection of differential expression of a large number of membrane-bound P450 cytochromes in the liver compared with lung also highlights a significant physiological difference between these two tissues with prominent medical relevance. The P450 enzymes represent one of the largest mammalian multigene families with a central role in drug metabolism. As a consequence, the P450 system impacts many of the most important issues of clinical pharmacology, including drug pharmacokinetics, drug metabolism, and undesirable drug-drug interactions (44, 45). The expression characteristics of each of the P450 enzyme isoforms differ in disease states and are an important consideration in drug evaluation studies (46). Of the dozens of known mouse P450 paralogues, PRISM was able to detect most of the clinically important hepatic P450 isoforms, including 1A2 and 2E1 as well as diverse members of the 2C, 2D, and 3A subfamilies. This indicates that PRISM can serve as a platform to investigate the biochemical basis of drug metabolism in a standardized laboratory mouse model setting.
Protein extraction techniques based on differential solubility have previously been shown to significantly increase the number of proteins that can be identified by MudPIT (12). We have shown that subcellular fractionation can also significantly increase the proteome coverage and depth of information retrieved by LC-MS and have established that differential centrifugation can be an efficient method for isolating fairly pure subcellular compartments. Importantly, of the over 575 proteins detected exclusively in the nuclear fractions, nearly half (265 proteins) were either annotated solely to the nucleus or had a function known to be localized within the nucleus (e.g. transcription). Traditional biochemical techniques, such as Western blotting, immunostaining, or specialized techniques for isolating ultrapure preparations of organelles (47), may be warranted to confirm individual protein distributions. Nonetheless orthologues for approximately 40% of the mouse nuclear specific proteins reported in this study were identified in two recent proteomic studies carried out on highly purified nucleoli of human HeLa cells (29). Interestingly
30% of the human nucleolar proteins were encoded by novel or uncharacterized genes (28, 29), roughly the same fraction of hypothetical proteins detected in this study. Considering this enrichment of nuclear related protein functions and the clear segregation of the nuclear fractions during hierarchical clustering, the implication is that a large proportion of the proteins of unknown function detected in the nuclear fractions likely has a biochemical role specific to the nucleus, most likely related to gene expression, RNA processing, or chromosome dynamics. In summary, PRISM not only provides direct evidence for the actual tissue expression of hundreds of hypothetical proteins but can serve as a powerful approach to quickly gain insight into the biological role and/or molecular function of hundreds of novel gene products.
Elegant statistical approaches for eliminating SEQUEST mismatches have been described (22, 48). The use of a statistical algorithm to filter preliminary database sequence matches will allow for standardization in the reporting of protein identifications, permitting comparison between different proteomic studies and providing the basis for rationale estimates of the absolute complexity of a given mammalian proteome. The graded function reported here is particularly powerful since the likelihood that a given identification is correct is well defined, and false positives need not always be eliminated at the expense of true positives. Moreover the necessary statistical assumptions are easy to identify and verify.
Data management, mining, and visualization methods are increasingly a fundamental part of large scale proteomic studies. The bioinformatic tool GOClust described here allows researchers to organize the large numbers of proteins identified by MudPIT, or other large scale proteomic techniques, into smaller, more accessible categories of particular relevance or interest using a standardized nomenclature that permits comparison across multiple experiments. GOClust therefore allows researchers to browse proteomic data sets from a global, system perspective and then drill down to address specific biological questions. The capability of GOClust to reveal insightful patterns of protein expression and point to fruitful new areas for follow-up investigation will increase in concert with the significant annotation efforts ongoing by the GO consortium (20).
In conclusion, PRISM provides a new experimental and analytical framework for systematic, in-depth investigation of the proteomes of mammalian organisms. Although the approach described here is largely qualitative in nature, complementary techniques that allow for the determination of protein relative abundance (21, 4952) can readily be incorporated. A combination of these and other related genomic scale methodologies should allow unprecedented insight into the complexities and dynamics of the mammalian proteome and their relationship to mammalian physiology, development, and disease.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
Published, MCP Papers in Press, February 10, 2003, DOI 10.1074/mcp.M200074-MCP200
1 The abbreviations used are: MS, mass spectrometry; LC, liquid chromatography; GO, Gene Ontology; MudPIT, multidimensional protein identification technology; PRISM, Proteomic Investigation Strategy for Mammals; HPLC, high pressure liquid chromatography; DTT, dithiothreitol; CBP, cAMP-response element-binding protein (CREB)-binding protein.
* This work was supported in part by a grant (to A. E.) from the National Science and Engineering Research Council of Canada.
S The on-line version of this article (available at http://www.mcponline.org) contains Supplemental Figs. F1 and F2 and Tables I and II.
These authors contributed equally to this work.
¶ A scholar of the Josef Schormüller Gedächtnisstiftung.
|| Present address: Dept. of Biochemistry and Molecular Biology, University of Calgary, 3330 Hospital Dr., Calgary, Alberta T2N 1N4, Canada.
To whom correspondence should be addressed: CH Best Inst., 112 College St., Rm. 402, Toronto, Ontario M5G 1L6, Canada. Tel.: 416-946-7281; Fax: 416-978-8528; Email: andrew.emili{at}utoronto.ca
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|