From the Banting and Best Diabtetes Centre, University of Toronto, Toronto, Ontario, Canada;
Department of Medicine, Toronto General Hospital, Toronto, Ontario, Canada; and || Program in Proteomics and Bioinformatics, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5G 1L6, Canada
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Although glucagon-producing cells are the second most abundant islet cell type within the pancreatic islets of Langerhans, a majority of studies aimed at identifying islet gene products have used either a mixture of islet cells, or purified ß cells or ß cell lines (15). The current paucity of data on gene and protein expression in islet
cells is due in part to the almost insurmountable technical challenges in isolating sufficient numbers of pure primary
cells for exhaustive molecular studies. Our group (68) and others (9, 10) have therefore studied several transformed rodent
cell-derived cell lines as model systems suitable for systematic experimental analyses aimed at determining the genetic, biochemical, and physiological mechanisms that underlie islet
cell function.
While several laboratories have reported investigations of whole islet or islet ß cell gene expression profiles (1, 2), only limited information is currently available concerning gene expression and proteomic profiles linked to the specification of the differentiated phenotypes associated with cells. To this end, we have initiated extensive molecular analyses of gene expression patterns in mature
cells using an islet
cell line as a model system, with the goal of deciphering key biochemical, regulatory, and cell biological properties. The
TC-1 cell line, in particular, was chosen as a focus of this study because it is well differentiated, preserving normal endocrine characteristics such as correct hormone processing and secretion of glucagon in response to stimulation (11).
Several types of high-density DNA microarrays apt for global analysis of gene expression profiles are readily available to researchers today. These vary in terms of the technology used in their manufacture and use, as well as in terms of price, ease of use and analysis, and overall sensitivity and reproducibility. The main types of high-throughput gene expression arrays are spotted cDNA arrays (1214), short oligonucleotide arrays (15), long oligonucleotide arrays (Agilent: www.agilent.com) (16), and fiber optic arrays (Illumina: www.illumina.com) (17). An informative web resource outlining the various platforms, as well as sources of commercial arrays, is available online at ihome.cuhk.edu.hk/b400559/array.html.
A particularly popular commercial research platform is the Affymetrix GeneChipTM short oligonucleotide arrays platform, which consist of panels of 25-nucleotide-long probes synthesized in situ on the surface of a microscope slide. Each probe is built up consecutively on a predefined area (square element) on the array (typically called a cell or feature). The probe sequence is complementary to a unique target sequence present in a putative mRNA transcript, allowing specific hybridization (15). Affymetrix chip densities have been growing quickly in recent years, with newer arrays commonly capable of probing tens of thousands of gene transcriptsoften representing the entire genome complement of an organismsimultaneously (18). Between 1998 and 2004, for instance, the average number of probes packed onto a single Affymetrix array shot up from
2,000 to >50,000, due to significant reductions in probe feature dimensions, which have now decreased from >50 to
11 µm on average (18).
To interrogate the expression level of a single gene, a typical "probe set"generally consisting of 1120 distinct 25-mer oligonucleotide probesis designed to hybridize to discrete sequences in a particular target gene transcript. These so-called "perfect match probes" (PM)1 are aimed at detecting a specific mRNA presumed to be present in a biological sample and are usually optimized to prevent fortuitous similarity to off-target sequences resulting in confounding cross-hybridization to other gene products (19). In addition to the PM probes, Affymetrix chips also contain adjacent "mismatch probes" (MM), which are virtually identical to PM probes except for a single-nucleotide substitution at the 13th nucleotide of the sequence so as to perturb hybridization to the intended target. The PM and MM probe cells from the same probe set are usually evenly distributed over the entire chip area to minimize the effects of systematic variabilitysuch as experimental artifacts due to localized hybridization defectswhich often lead to inconsistencies in the inferences that can be drawn from a study.
The MM features were initially designed to measure nonspecific binding so as to control for spurious background signal, but their use for estimating and removing noise has been criticized in recent years (20, 21). Indeed, it has been shown that more accurate estimates of relative transcript abundance between different biological samples can be obtained from interrogation of respective mRNA profiles using data derived from PM probes alone (22). Nevertheless, the MM probe data are used by the native Affymetrix software to determine if a target transcript is in fact present in a given sample. It has been shown that for target at low concentrations (<8 pM) the combination of PM-MM probe pairs can offer significantly higher sensitivity as compared with PM probes alone (19).
Because of steady improvements in technology, chip design, industrial standardization, and analysis methods, DNA microarrays have become a widespread experimental medium in the endocrine research community. This popularity, combined with the increasing density of microarray readouts, has resulted in a corresponding exponential increase in data production (23), a trend that is putting increasing strain on effective data analysis and biological interpretation (24). This is compounded by the many sources of irrelevant obscuring variance that also need to be carefully considered when dealing with microarray data, including irrelevant fluctuations due to scanner perturbations, changes in hybridization conditions such as variable humidity, different array batches, or even different personnel performing the experiments. Because these artifacts can substantially affect the results (and conclusions) obtained from a study (2527), they need to be systematically identified, corrected for, or removed to ensure the reliability of an experiment. In certain platforms (e.g. cDNA arrays), the use of internal reference mRNA populations and reciprocal chromo/fluorophore label swaps can be used to minimize spurious measurements (24). Alternatively, with the Affymetrix array platform, multiple repeat analyses can be performed to increase confidence in distinguishing genuine signal from spurious deviances and baseline noise (28).
Ongoing challenges in evaluating the integrity of microarray data are spurring the development of novel computational methods for methodical data interpretation (2931). Thanks to a concerted effort by the entire bioinformatics community, both academic and commercial algorithms for improving the reliability of quantification of chip signal and global comparative analyses have been steadily improving. For instance, seemingly effective methods for data quantification and normalization have been introduced in recent years (20, 24, 3234). Indeed, there is now a bewildering array of open-source programs and proprietary software tools available for automated data preprocessing and analysis (35) (see ihome.cuhk.edu.hk/b400559/array.html for a comprehensive list).
Many of the more commonly used tools are tailored specifically to the Affymetrix data format. Most of these employ some of the now standard statistical and machine learning methods (such as various types of clustering algorithms (e.g. SOM, k-means, hierarchical) and/or dimension reduction methods (e.g. PCA and many others) to extract meaningful information from typically complex, feature-rich gene expression datasets (36), but the list of interesting new approaches to the data analysis challenge is steadily growing.
Despite this progress, preliminary studies of gene expression patterns in endocrine cell lines have revealed a number of unforeseen obstacles that confound effective interpretation of microarray data,2 several of which are the focus of this report. The most noteworthy of these concerns the problem of deciding on optimal transcript detection p values threshold cutoffs to faithfully determine whether an mRNA target is indeed present or absent in a sample of interest. It is now broadly accepted that all microarray results should be treated as preliminary, and therefore putative mRNA expression patterns of special interest must necessarily be validated using an independent screening method, such as Northern blot analysis, RT-PCR, or real-time RT-PCR. Proper detection calls are an especially important factor for selecting optimal data points for use as features in pattern recognition and machine learning analyses aimed at defining robust molecular signatures (such as biomarkers). Detection p values can also be useful for assessing the reliability of extracted gene expression values for cross-sample comparative quantification analyses.3
Considering that proteins represent the ultimate effectors of gene function, and given that previous proteomic studies have suggested a relatively weak concordance between mRNA and protein levels (37), it may, in fact, be preferable to measure protein expression in islet cells directly. Effective methods for large-scale protein identification and quantitation have recently been developed (38, 39). Of these, high-throughput protein shotgun sequencing procedures coupling high-resolution multidimensional liquid chromatographic separation of proteolytic peptide digests to ultra-sensitive MS/MS (LC-MS) appear to be among the more powerful new emerging methods for examining complex protein mixtures like cell extracts (38, 39). By eliminating the need to first separate proteins on polyacrylamide gels, these gel-free profiling methods circumvent most limitations associated with gel-based procedures.
Although rapidly increasing in popularity, current LC-MS-based shotgun profiling methods are by no means nearly as robust, simple to execute, or as widely accessible as microarray technology. Moreover, gel-free proteomic screening methodologies still suffer from significant detection bias, leading to preferential detection of higher-abundance housekeeping proteins that are rarely of particular biological interest (38, 39) and generally do not achieve the same extent of global coverage as compared with that typically obtainable by microarray-based analysis (37). Despite these caveats, shotgun protein profiling methods can potentially be used to both guide the optimal interpretation of microarray data and to confirm the results of a gene expression study, and vice-versa.
To evaluate the advantages and challenges of combining proteomic and functional genomic screening platforms for deducing functional adaptations associated with enteroendocrine cells, we have performed pilot parallel large-scale analyses of gene and protein expression in mitotically active asynchronous cultures of murine TC-1 cells (11). We used an Affymetrix MG_U74Av2 GeneChip, containing 12,488 distinct murine gene probe sets, to measure global mRNA levels, and the multidimensional protein identification technology (MudPIT) procedure developed by Yates and colleagues (40, 41) to examine the global protein profile of nuclear-enriched cell extracts. Here, we summarize our key findings to date, outlining both the advantages and challenges of comparative cross-platform analyses so as to obtain a more complete, and rigorous, insight into the biochemical makeup of enteroendocrine cells. We compare and contrast the relative merits and limitations associated with each of the two platforms and list factors that influence the sensitivity and specificity of analysis with a particular emphasis on the effects of p value thresholds on the rate of false discovery and overall detection coverage. We also describe empirically derived criteria regarding optimal experimental design, together with simple guidelines that we believe allow for more reliable (and comprehensive) interpretation of complex global molecular profiling datasets. While centered on an experimental setting directly relevant to
islet cell biology, the analytical approach reported here are broadly applicable to a range of analogous endocrine-related research problems and hence should be relevant to any researcher currently using (or contemplating) large-scale expression profiling at the mRNA and/or protein levels as a means of investigating the molecular makeup of endocrine cells and tissues of interest.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Microarray/Statistical Analysis
Total RNA was prepared using Trizol extraction (Invitrogen) and RNAeasy kit (Qiagen, Valencia, CA) according to the manufacturers" instructions. Profiling was performed in triplicate for three distinct preparations of total RNA, using the Affymetrix GeneChip MG_U74Av2 microarray. Each experiment was performed separately on different days using chips derived from different fabrication batches. Hybridization and washing were performed using standard conditions. Image processing was performed using an Affymetrix GeneArray 2500 scanner, with 570-nm, 3-µm laser parameter settings. The data file was processed using Affymetrix Microarray Suite 5.0 (MAS5.0) using default parameter settings, and the average intensity of all probes on the array was scaled to target intensity of 150. Detection p value alone, or together with signal intensity, was used to make a prediction (present, marginal, or absent) as to gene expression. A freeware version of the Robust Multichip Average (RMA) algorithm was obtained from Bioconductor (www.bioconductor.org). Freely available statistical software "R" (www.r-project.org) and proprietary Matlab 7.0, Statistical and Bioinformatics Toolboxes from MathWorksTM (www.mathworks.com) were also used for the analysis.
Validation RT-PCR
First-strand cDNA synthesis was generated from total RNA using the SuperScript Pre-amplification System (Fermentas, Hanover, MD) following the manufacturer"s recommended protocol. Target cDNAs were amplified by PCR using gene-specific primer pairs. PCR products were loaded onto a 1% agarose gel, electrophoresed in Tris-acetate-EDTA buffer, transferred onto nylon membranes, and hybridized using internal oligonucleotide probes labeled by T4 kinase reaction with [-32P]ATP (Amersham Bioscience, Piscataway, NJ). Visualization was performed by standard film-based autoradiography. Primer sequences and detailed conditions used for RT-PCR are available upon request.
Extracts
Nuclear-enriched soluble protein extracts were prepared using a commercial protocol (Nu-CLEAR protein extraction kit; Sigma-Aldrich). Briefly, three separate cell pellets were thawed, resuspended in 5 volumes of hypotonic lysis buffer (10 mM HEPES, pH 7.9, 1.5 mM MgCl2, 10 mM KCl), and incubated on ice for 15 min. After repelleting by centrifugation for 5 min at 420 x g, the nuclei were rinsed twice in 400 µl of lysis buffer, and resuspended in high-salt extraction buffer (1 M HEPES, pH 7.9, 1 M MgCl2, 5 M NaCl, 0.5 M EDTA, pH 8.0, 25% (v/v) glycerol). Soluble proteins were salt-extracted from the isolated nuclei by incubation on ice for 15 min with occasional vigorous vortexing. After addition of Nonidet P-40 to a final concentration of 0.04% (v/v), the nuclei were disrupted with a glass homogenizer. Insoluble debris was removed by centrifugation at 13,000 rpm for 30 min, and the supernatants were retained for proteomic analysis (final protein concentration 2 mg/ml).
Proteolytic Digestion and Sample Preparation
Equivalent amounts of total protein (100 µg) from each processed cell batch were precipitated with 5 volumes of ice-cold acetone, followed by centrifugation at 13,000 x g for 15 min. The pellets were resolubilized in 40 µl of 8 M urea, 50 mM Tris-HCl, pH 8.5, 1 mM DTT, for 1 h at 37 °C, followed by dilution to 4 M [urea] using an equal volume of 100 mM ammonium bicarbonate, pH 8.5 (AmBic Sigma-Aldrich). The denatured proteins were digested with a 1:150-fold ratio of endoproteinase Lys-C (Roche, Basel, Switzerland) at 37 °C overnight. The next day, the samples were further diluted to 2 M urea with an equal volume of AmBic supplemented with 1 mM CaCl2. Digestion was continued by adding poroszyme trypsin beads (Applied Biosystems) followed by incubation at 30 °C with rotation. The resulting peptide mixtures were solid phase-extracted with a SPEC-Plus PT C18 cartridge (Ansys Diagnostics, Lake Forest, CA). The eluates were concentrated by Speedvac to near dryness and stored at 80 °C until analysis.
MudPIT Analysis
A fully automated 12-cycle, 20-h long MudPIT procedure was set up as described previously (42). Briefly, an HPLC quaternary pump was interfaced with an LCQ DECA XP quadrupole ion trap mass spectrometer (Thermo Finnigan, Woburn, MA). A P-2000 laser puller (Sutter Instruments, Novato, CA) was used to pull a fine tip on one end of a 100-µm inner diameter fused silica capillary microcolumn (Polymicro Technologies, Phoenix, AZ). The column was packed first with 6 cm of 5-µm Zorbax Eclipse XDB-C18 resin (Agilent Technologies, Palo Alto, CA), followed by 6 cm of 5-µm Partisphere strong cation exchange resin (Whatman, Middlesex, United Kingdom). The extracts were loaded separately on to the column using a pressure vessel and analyzed sequentially using a fully automated 12-step, four-solvent, 24-h chromatographic cycle. A detailed description of the exact chromatographic conditions is provided in the supplementary information. Data-dependent MS/MS acquisition was performed in real time, with the ion trap instrument operated using dynamic-exclusion lists.
Protein Identification and Statistical Validation
The SEQUEST database search algorithm (43) was used to match the acquired tandem mass spectra to a minimally redundant database of mouse and human Swiss-Prot/TrEMBL protein sequences downloaded from the European Bioinformatics Institute (www.ebi.ac.uk). To further evaluate, and thereby minimize, the number of incorrect (false-positive) identifications, the spectra were searched against protein sequences in both the normal (forward) and fully inverted (reverse) amino acid orientations. The STATQUEST algorithm was used to compute a confidence score (indicating the percentage likelihood or probability of being correct) for every candidate match. The Contrast and DTASelect software tools were used to manage, organize, and filter the resulting dataset (44) (fields.scripps.edu). Spectral counts were summed across the replicates and used as a semiquantitative measure of protein abundance, as described by Liu et al. (45).
Dataset Cross-referencing
Matches between sequences identified by the proteomic and the complete Affymetrix probe sets were cross-mapped via annotation. Accession numbers corresponding to the MG_U74Av2 GeneChip were extracted from the annotated gene table "MG_U74Av2_annot.csv" (Oct. 12, 2004) from the Affymetrix website: www.affymetrix.com/supportechnical/. The probe IDs were cross-referenced to UniGene IDs, which were then mapped to the corresponding Swiss-Prot/TrEMBL accessions obtained for the identified proteins. Proteins matching multiple genes on the Affymetrix array were removed, as were matches where multiple MutPIT-derived proteins matched to one or more probe sets.
Functional Classification
Gene Ontology (GO) categorical annotations (www.ebi.ac.uk/GOA) were used as indicators of biological function and other properties, where annotations were available (70% of the proteins examined in this study). Statistical testing for enrichment of functional categories among the set of identified proteins and the mRNA probe sets was based on a hypergeometric distribution model using the method of Hughes and co-workers (46). This method returns the probability (p value) that the intersection of a given protein list with any given annotation class occurs by chance. A Bonferroni scaling factor was applied to correct for spurious significance due to repeat testing (multihypotheses) over the many GO terms; scores were adjusted by dividing the preliminary p value by the number of tests conducted. A threshold cutoff p value of 103 was used as a final selection criterion to highlight statistically significant, potentially biologically interesting clusters.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
For the gene expression analysis, the total RNA sample was extracted and separately hybridized to Affymetrix Murine Genome U74Av2 GeneChip microarrays (MG_U74Av2). The chip represents 6,000 oligonucleotide sequences obtained from the Mouse UniGene database (build 74) that have been functionally characterized. The remaining
6,000 probes represent ESTs. The chip has 16 probe pairs per probe set, with a 20-µm feature size and a putative sensitivity rated at 1:100,000. Extensive gene annotation files are available (www.affymetrix.com) (47), providing detailed information for each probe, such as sequence source, transcript ID, target description, UniGene ID, alignments, gene title and symbol, chromosomal location, identifiers for Ensembl, LocusLink, MGI, and Prot/TrEMBL databanks, as well as GO functional annotations, just to name a few.
Data Processing
The result of the microarray profiling experiments was a series of image files, corresponding to the intensities recorded for each feature, which needed to be processed and analyzed. Probe set quantification and data normalization was performed based on Absolute Analysis, using the Statistical Algorithm in the Affymetrix-supplied analytical software (Microarray Suite 5.0, MAS5.0). Signal was scaled to a target intensity of 150 across all probes (Fig. 1). At this step, the information extracted represented the absolute signal intensity for each probe set, together with a calculated detection p value, and an overall detection call ("present," "marginal," or "absent"), which is in turn based on a predefined default p value threshold. For this study, the default Affymetrix default p value threshold of 0.04 was used (that is, a gene was considered present if its calculated expression detection p value was equal to or below 0.04 in two of three experiments). The suitability of this detection p value threshold is examined below.
|
Data Analysis and Detection of Gene Expression
Despite the effectiveness of normalization, a key, and frequently underestimated, step in Affymetrix microarray signal processing is the determination of a probe detection call, wherein transcript signal of a probe set (and by inference, the corresponding gene product) is deemed to be present or absent. There are two pieces of information available per probe set on an Affymetrix chip that can be used to determine whether a particular transcript is indeed present (or absent) in a sample, signal intensity and the detection p value. The aim is to derive a rigorous, statistically based, and reliable measure of genuine signal over background (baseline), with a sufficiently high signal-to-noise ratio to unambiguously confirm the existence of a specific target mRNA species in the sample of interest.
The simplest method for deciding whether a transcript is expressed is to look at the "detection call" made by the Affymetrix algorithm, which is based on the calculated detection p value. The detection p value is a statistically derived confidence measure or indicator of whether the signal generated from a probe set is sufficiently above background (or not) to be deemed present. Using the default software settings, any transcript detected with a detection p value threshold cutoff defined as 1 (or lower) will be defined as "present," while those between
1 and a second threshold
2 will be called "marginal" and those with p values greater than
2 deemed "absent." Default values for
1 and
2 can differ between chip types and are defined by Affymetrix based on extensive in-house analysis of test arrays (51).
The detection p value is determined in two steps. First, a discrimination score, R, is calculated based on the signal intensities of the group of PM probes relative to the corresponding set of MM probes per probe set. Then, the scores from each of the probe pairs are tested against a user-defined threshold (tau), which specifies the minimum R that "votes" for presence, to calculate a detection p value for the entire probe set. These steps are described in more detail below.
From a set of intensity values for each group of PM and MM probes defining a probe set, R is calculated for each PM and MM pair according to the following formula:
![]() |
![]() |
In other words, one can see that R depends strictly on the ratio of PM/MM, which varies between 1 and 1. The bigger the calculated ratio, the closer R will be to 1. This would happen in cases of high signal-to-noise. Conversely, if the ratio is close to 1 (signal marginally above background), R will be close to 1.
The R scores are rank-sorted and evaluated using the one-sided Wilcoxon"s signed rank test (see Refs. 48 and 49 for details). For every probe set, the algorithm ranks the probe pairs based on how far their discrimination score is from and uses the information to calculate the detection p value. The Wilcoxon"s signed rank test is a commonly used, simple nonparametric statistical method that has several desirable properties and certain limitations. Two notable advantages are that it is insensitive to spurious outliers and does not assume a normal distribution of the data (49). It is not necessarily the optimal test for class discrimination, however, and more advanced statistical measures (including methods better suited to the principles of machine learning) are likely to be far more effective for determining true probabilities, resulting in fewer incorrect or marginal classification calls.3
The calculated R scores are compared with , and if the R score is higher than
, a "present" call is made for a given probe pair, while those with R scores falling below this value result in an "absent" call. The votes across all probe pairs comprising a complete probe set are tabulated and summarized in the detection p value, which is a confidence measure reflecting the detection call. The higher the R score is above
, the closer the detection p value is to 0, and hence the greater confidence that a transcript is indeed expressed. Conversely, the lower the R score is below
, the closer the p value is to 1 and hence a lower confidence is placed on the detection call and, by inference, on the possible existence of the corresponding mRNA in the sample.
Increasing increases the stringency of the test by decreasing the rate of incorrect predictions or false-positives. Although this effectively increases the specificity and reliability of the assay, in doing so the overall sensitivity of an experiment (namely, the detection of bona fide mRNA transcripts) is reduced. For example, increasing
from 0.015 to 0.15 decreases the proportion of present calls on a Rat Genome U34A array by 25% (4). Conversely, lowering
decreases the specificity of an analysis, thereby increasing the sensitivity. There is a default value for
for each particular type of chip, which is based on extensive controlled experimentation. In general, Affymetrix recommends that this default not be changed (52).
Redefining the threshold has a somewhat different effect on the detection call than modifying the detection p value threshold. The former influences the votes of individual probe pairs for presence or absence, which in turn may drastically affect the final detection p value of a probe setin other words, the
controls the number of probe pairs that vote for presence. The detection p value, on the other hand, considers the probe set as a whole. Ultimately, however, the effect of changing either threshold leads to a similar resultboth modify the percentage of present calls made per dataset. Modifying the detection call by changing the detection p value threshold is much easier to do in practice, however, because it does not require reanalysis of the data outputted by Affymetrix analysis software, which can be very time consuming for large datasets. For our purposes, we decided not to modify
(or the statistical measure for calculating R), but rather work with detection p value thresholds in order to optimize the detection call for our
TC-1 dataset.
We also considered the differences in detection p values obtained in the three replicate arrays and found them to be very precisely reproduced between replicates, especially for probes with very low detection p values (<0.1). In other words, we found that most probes with a detection p value below the default detection call threshold (0.04 in the case of the MG_U74Av2 array) in a single experiment were likewise similarly detected across the other replicates. Indeed, 78% of probes deemed present in any one of the three replicates were likewise present in all three, while 12% were present in exactly two of three replicates, with the remaining 10% detected in just one dataset. However, because roughly 50% of all probes on the 12,000-gene array were called present using the default p value cutoff (Supplementary Table I), the decision as to whether some 600 or so transcripts are indeed expressed remains unclear. One may need to consider additional information when deciding detection in these cases, such as absolute signal intensity.
A detection p value close to 0.5 indicates that there is no significant difference in the relative abundance of the PM probes with respect to the MM probes. There are several possible causes for this: one is that the transcript is indeed not detected, so the signal will be very low (close to background) for both the PM and MM probes. In this case, the absent call is correct. Another reason may be nonspecific binding to the MM probe and possibly also the PM probe. In this case, the signal recorded for both may be very high, but the difference in signal between PM and MM may not be significant, resulting in a high detection p value and an absent detection call. This is problematic, because it can be difficult to decide whether the probe is actually absent and what is detected is truly nonspecific binding, or that the mRNA is very abundant and so binds significantly even to the MM probes. Indeed, we have observed that highly abundant mRNAs often have detection p values as high as 0.8 (and hence are deemed "absent" by the Affymetrix analysis software) only to confirm these as truly highly expressed gene products using RT-PCR (as discussed below).
Signal Intensity
Signal is a quantitative metric summarizing the intensity of the probe set and representing the relative abundance of expression of transcript. Signal calculation takes into consideration the absolute and relative intensities of PM and MM probes in each probe pair and uses them to calculate an "idealized mismatch" (IM) signal which is then used instead of MM. If MM < PM, then IM = MM, otherwise IM is calculated from PM and depends on various other factors such as background intensity around the probe cell. The Affymetrix-supplied software uses a One-Step Tukey"s Biweight Estimate (a standard statistical-weighted average method that is relatively unaffected by outlier probe values) to first integrate the individual probe signals to obtain a robust mean of signal across the entire probe set. The algorithm starts by finding the median, and then the distance between each data point from this median is calculated. Based on these distances, the algorithm assigns a weight such that data points far away from the median will have low or even zero influence on the final value. The median is then recalculated once per probe set by incorporating these weights (see Ref. 49 for details).
In this manner, each probe pair contributes to the final estimate of signal intensity. The significance of this contribution is greater if PM signal is higher than MM and if the probe pair signal value is close to the mean of all probe signals. If the PM signal in a probe pair is higher than MM, the MM signal is considered to be informative and is often used as an estimate of stray nonspecific (background) signal. If MM signal for a given probe pair is higher than PM (which may occur in cases of significant cross-hybridization or when transcript levels are below detection limits), the MM signal is considered to be uninformative and is generally not used as a measure of noise. In this alternate scenario, an imputed value, IM, is used instead of MM signal. This IM value is usually deemed to be slightly smaller than PM so as to prevent negative signal values being generated for a given probe pair. If there are many such probe pairs present in a probe set, the detection algorithm will usually call the probe absent, even if its absolute signal intensity is high (53).
In order to increase the sensitivity of the transcript detection procedure, we now routinely consider probe signal intensity as well as the detection p value for probes with detection p values greater than the default cutoff.2 But first, we performed independent measurements of transcript levels using the RT-PCR method (and, as discussed further below, proteomics) to guide our analyses.
RT-PCR Validation
RT-PCR validation experiments are considered to be the "gold-standard" technique for transcript detection, both in terms of sensitivity and especially specificity. For this reason, we designed primers to amplify a selected set of 67 disparate transcription factors that were deemed either absent or present in all three replicate experiments to interrogate using RT-PCR. Based on our results, we found the detection p value threshold to be overly conservativemany gene products predicted to be absent (some with p values as high as 0.8) were in fact readily detectable by RT-PCR (Fig. 2). Although the specificity of the Affymetrix detection call (i.e. p value cutoff 0.04) proved to be good, with few false-positives, the sensitivity was unsatisfactory (excessive false-negatives). This is especially true for low-abundance transcripts (Fig. 2). These data question the suitability of the default p value thresholds (and the algorithms currently used to calculate these) and have forced us to reconsider the entire issue, such as increasing the default p value cutoffs for present detection call or evaluating signal intensity together with the detection p value when deciding on the presence of a probe.
|
|
The MudPIT technique employed in this study (4042, 55) couples high-resolution fractionation of protein enzymatic (i.e. tryptic) digests using multidimensional capillary-scale HPLC to real-time high-efficiency data-dependent MS/MS via ESI (reviewed in Ref. 39). The principle advantage of this approach stems from the very good sample separation achieved by jointly performing ion exchange and reverse-phase chromatography at the peptide level combined with automated procedures for peptide selection and fragmentation. This technique yields very large collections of richly informative spectra, allowing for extensive sequence determination (55, 56).
To improve the odds of detecting lower-abundance proteins of special biological interest, such as transcription factors, we first performed a simple subcellular fractionation procedure to enrich for lower-abundance nuclear factors prior to analysis (42, 57). This is a particularly important consideration given the sizeable overall dynamic range in protein abundances (commonly predicted to be over five orders of magnitude), which contrasts with the much more limited effective detection range of current instrumentation (i.e. two to three orders of magnitude). Although the resulting protein fraction was not pure and contained sizeable levels of cytoplasmic cross-contaminants (42, 57), this easily implemented procedure substantively reduced the levels of higher-abundance cytosolic proteins (e.g. housekeeping enzymes), which usually tend to dominate a proteomic analysis. In practice, this step considerably improves the overall comprehensiveness of proteomic detection (Refs. 42 and 57; data not shown).
For the shotgun analysis, the proteins were concentrated by precipitation, denatured using urea, and digested extensively and sequentially using two proteases; first with endoproteinase Lys-C, which functions under denaturing conditions, then with trypsin (after dilution of the urea), which is a more processive enzyme. The resulting peptide mixture was de-salted by solid-phase extraction and analyzed using the basic MudPIT procedure (see "Experimental Procedures"). Although this approach proved to be very effective, resulting in the identification of many proteins (see below), we have found that multiple repeat MudPIT analyses are generally needed to achieve blanket coverage (that is, a saturating level of detection), even when investigating simplified organellar fractions. Hence, for this study, we performed three analyses on independently isolated cell extracts.
Database Searching and Statistical Validation
To identify the proteins, the entire collection of 100,000 acquired tandem mass spectra was searched against a minimally redundant database of curated human and mouse Swiss-Prot/TrEMBL protein sequences using the SEQUEST search algorithm (43). Because the candidate spectral matches and search scores are open to interpretation and are often inconclusive (56), it was critical to estimate both the accuracy of an individual putative identification and the overall rate of false discovery (proportion of incorrect identifications or false-positives) when performing this analysis. Several heuristics guidelines have been developed to address these important (and related) concerns (56). More rigorous statistical measures have also been introduced recently to facilitate data interpretation (reviewed in Ref. 58).
For this study, each candidate database match was first evaluated using a statistical algorithm, STATQUEST (57), in order to calculate a probability score (likelihood a prediction is accurate) based on the preliminary SEQUEST results. An initial subset of higher-confidence proteins (with a minimum of 85% or greater predicted probability of being correctly identified in at least one of the three replicates) were selected for further consideration. It is important to note that this initial confidence filter threshold applies to individual sequence matches. In practice, due to the nonlinear distribution of database confidence scores (57), the majority of the filtered predictions actually have a very high likelihood of being correct. Indeed, as seen in Fig. 4, most of these putative identifications were predicted with a +99% likelihood of being correctly identified (p values <0.01; that is, far greater than the STATQUEST p value threshold of 0.15).
|
As an independent measure to assess the effectiveness of this filtering, we empirically calculated the preponderance of incorrect identifications in our set of putative high-confidence proteins. This is especially important with repeat experimental datasets, which can accumulate false-positives due to the sheer number of spectra examined. This was done by first populating the reference protein sequence database with an equal number of mock proteins, created by inverting the amino acid orientation of the normal Swiss-Prot/TrEMBL protein sequences. Because these "reverse" sequences are not expected to occur naturally, any matches to these decoy proteins represent spurious false-positives. The final proportion of identifications mapping to reverse relative to normal (or "forward") proteins therefore provided an objective criterion for estimating the false-discovery rate. As mentioned, spurious database matches are often supported with minimal supporting spectral evidence. The plot provided in Supplementary Fig. 3 shows the decreasing ratio of reverse-to-forward sequence database matches typically observed with increasing number of observed spectral counts. Because the vast majority of incorrect matches were preferentially detected with single spectra alone, they can be readily discarded from further analysis by establishing a minimal two-spectral minimum as a final measure of accuracy. Indeed, after filtering the data using this and the initial criteria outlined above (e.g. STATQUEST), fewer than 5% false-positives (reverse proteins) were detected in the final dataset (data not shown).
Regardless of which method of data validation is employed, the filtering process should aim to balance the trade-off between specificity (precision), which reflects the proportion of incorrect identifications or false-positives, and sensitivity (recall), which indicates the proportion of missed identifications or false-negatives. Receiver-operating characteristic (ROC) plots are often used to assess the effects of varying filter stringency on precision and recall (59). Although we do not have knowledge of the correct class labels (because, in most proteomic studies, one usually does not know a priori which proteins are in fact present in a sample), in practice, one can estimate this trade-off empirically based on the fraction of database matches to forward and reverse sequences after various filters are applied. The ROC-like plot shown in Fig. 5 shows the effects of applying different confidence filters to the TC-1 dataset, based either on the preliminary STATQUEST probability scores or the reproducibility of detection (detected in one or more repeat analyses) or the cumulative spectral counts. The relationship between the number of credible candidates (namely, matches to normal forward protein sequences) and false-positives (represented by matches to reverse sequences) passing each filter is complex. Of course, as one applies a more-stringent filter, the overall false-positive rate decreases (as evidenced by the reduced proportion of reverse proteins), whereas the false-negative rate increases (as represented by the decrease in the number of forward proteins detected). But this trade-off is clearly nonlinear. For the most stringent filter, such as only considering proteins identified with high confidence in all three repeat samples or else protein detected with a minimum of 10 or more spectra, the false-positive rate can be reduced to virtually zero, but at the expense of a severely limited sensitivity (i.e. very high false-negative rate).
|
Reproducibility and Variance
Despite the effectiveness of statistical filtering, MudPIT-based profiling is still affected by many other sources of error that should be considered. One of the most troubling is the problem of under-sampling. Shotgun MS/MS fragmentation involves a somewhat stochastic peptide selection step, which is generally biased toward higher-intensity peptides even when data-driven exclusion lists are used (45, 56). Given that the duty cycle (scan rate) of current instrumentation is limited, only a fraction of all eluting peptides will be selected for fragmentation, while many others will be missed. This trend results in preferential detection of higher-abundance proteins (because these often give rise to higher-intensity peptide ions), while lower-abundance proteins frequently go undetected (45). This limitation is compounded by significant variability in the peptide spectral quality run-to-run (60).5 Due to this, MudPIT experiments must generally be repeated several times to achieve adequate overall detection coverage (i.e. saturation) (61). The number of runs needed to reach a useful saturation point will depend on the complexity of the sample under study. In our experience, three repeat analyses are usually sufficient to detect >80% of all detectable proteins present in a sample.5
Saturation of detection is a particularly important consideration if one wishes to compare proteomic profiles between different samples (e.g. different cell lines). Relative protein abundance between samples can be estimated based on the ratio of median cumulative spectral count detected for each protein in the respective samples (45). This method is far simpler to implement and interpret than ones relying on sophisticated chemistries or based on internal standards labeled with stable isotopes (62), but requires a sufficient number of repeat analyses to be robust to outlier effects (45).5 Spurious variance (experimental noise) can be assessed by calculating the standard deviation and overall reproducibility of protein detection run to run (data not shown).
Additional subcellular fractionation and proteomic enrichment procedures aimed at sample simplification can also help surmount the under-sampling problem (42). Alternatively, further advances in instrumentation resulting in higher scan rates, more-efficient modes of peptide sampling, or better spectral analysis, may effectively bypass this concern.
Data Cross-referencing
In addition to the RT-PCR validation, we sought to combine the results from the MudPIT profiling experiments with the microarray datasets to validate our hypothesis that if a probe has high enough signal (raw intensity), then it is likely to be present even if the detection p value is higher than the default cutoff. This first required cross-referencing the MudPIT and microarray datasets using their respective gene product identifiers (IDs or accession numbers).
When working with commercial microarrays, as we have done here, it is often straightforward to obtain the relevant cross-reference mappings between probe IDs and Swiss-Prot/TrEMBL protein accessions. Affymetrix, for example, provides current mappings between its probe IDs to various knowledge databases for all its chip designs, which can be readily downloaded through their website (www.affymetrix.com). As for the proteomic dataset, we needed to obtain the latest Swiss-Prot accession numbers in order to most efficiently map these to the corresponding gene probes. The ExPASy website (us.expasy.org/sprot/) provides a convenient tool, Swiss-Prot ID tracker (us.expasy.org/cgi-bin/idtracker), to retrieve up-to-date Swiss-Prot IDs, including primary accession numbers for any deleted gene product IDs. Also, within ExPASy, is a Swiss-Prot/TrEMBL entry retrieval list tool (us.expasy.org/sprot/sprot-retrieve-list.html), which allows a quick retrieval of the primary accession number (latest unique alphanumeric protein identifier) and secondary accession number (older accession numbers retained after sequence merger) for a given input list of proteins.
To ensure completeness for this study, we obtained all the primary and secondary accession numbers for each of the protein IDs before mapping these to the microarray probes. We were able to map 72% of the reliably identified proteins to probe IDs on the microarray (Supplemental Table III). Presumably, we were not able to map the remaining proteins either because there were no corresponding gene probes were present on the microarray or because a human ortholog was preferentially identified (a possibility, because both human and mouse protein sequences were used in the database search).
One challenge working with disparate proteomic and functional genomic datasets is the constant updates to the relevant annotation (knowledge) databases, such as Swiss-Prot/Trembl in the case of proteins and UniGene in the case of mRNA, from which one derives the respective IDs. To be comprehensive, these tables need to be up-to-date, but without creating legacy issues. Hence, an alternative, and perhaps more rigorous, approach to linking the proteins and microarray probes is to identify matches using a sequence alignment tool like BLAST (www.ncbi.nlm.nih.gov/BLAST/). One can utilize a bulk sequence retrieval tool, like the ExPASy Swiss-Prot/Trembl entry retrieval list tool, to obtain both protein and gene sequence information using accession IDs in the correct format for alignment.
When interpreting a BLAST result, one needs to consider more than just the raw score or expectation (e) value as match criteria. This stems from the fact that BLAST is a locally weighted context algorithm, and even highly significant e-values approaching 0 do not always indicate a perfect alignment. Therefore, additional criteria, such as fraction percent identity, should be used to define an acceptable threshold cutoff. In our study, we opted for an e-value <1020 and >95% fraction identity as thresholds. We do note, though, that a reciprocal validation search, in which BLAST is repeated after swapping the queries and subjects, can help to identify spurious matches between paralogous gene products. But again, one needs to be cautious selecting appropriate cutoff scores in order to maintain a balance between sensitivity and specificity. Finally, it is worth considering that the reference source of the gene and protein sequence information, and whether the sequences were retrieved from the same database, can markedly affect the stringency of a test.
Comparison of the mRNA and Protein Profiles
After cross-referencing, we could evaluate the overlap and relative sensitivity, specificity, and biases of the proteomic and genomic approaches. Using the same detection call procedure as for the RT-PCR analysis, which is based on a predefined default threshold detection p value of 0.04, we analyzed the protein distribution relative to the microarray-derived detection p values and raw probe signal intensities.
A total of 5,945 gene products were detected by the gene chips alone (in at least two of the three repeat experiments, using a p value cutoff of 0.04), reflecting the high coverage attainable by microarray. A subset of the cognate proteins encoded by these transcripts may have been missed by MudPIT because we selectively analyzed a nuclear fraction, or because the corresponding sequences were not represented in the reference database used for the spectral search. Of the 888 gene products that could be unambiguously cross-mapped between the platforms, 762 were deemed to be expressed by microarray in addition to being identified by our proteomic method. Conversely, 126 gene products were detected (i.e. identified with high confidence) exclusively by MudPIT, being somehow missed by the gene chip platform using the default p value filter (p value >0.04; therefore deemed "absent"). Taken at face value, the fact that the genomic and proteomic data are in agreement for the majority of cross-mapped gene product pairs (in that both the gene and corresponding protein were detected) validates the reliability of the two platforms. As seen in Fig. 6A, there was a marked tendency for the MudPIT screening to detect the putative translation end-products of transcripts detected with low p values (that is, at or below the default cutoff). These data further confirm the stringency of the default threshold.
|
Considering that probe signal and the detection p value are intrinsically inversely related due to the nature of the Affymetrix statistical algorithms used to calculate the two, it was not too surprising to see that microarray probes with low detection p values (0.04, or "present") tended to have higher signal intensity than those corresponding to transcripts deemed "absent" (Fig. 1). Likewise, and reassuringly, the probes for transcripts encoding the proteins identified by MS also tended to have higher average signal intensities (Fig. 1). In other words, the average mRNA signal for gene products confirmed to be expressed by MudPIT is much closer to probes detected by microarray analysis than those that were deemed absent (by microarray). These results are in alignment with the heuristic rule summarized above (Fig. 3).
Improving Proteomic Coverage by Data Integration
Using the relatively stringent two-stage quality criteria, false-negatives are an inevitable consequence of proteomic dataset filtering (i.e. Fig. 5). On the other hand, one generally prefers to avoid compromising the integrity of the data by loosening the filters too much. Hypothetically, one could seek to incorporate alternate supporting evidence, such gene expression data, both as a guide to assess the effectiveness of the proteomic filtering criteria (specificity versus sensitivity) and to validate marginal protein identifications. To address this possibility, we examined the suitability of using microarray results to eliminate false-positives in the subset of protein identifications with lower-confidence database scores, considering only those proteins with solid preliminary probability scores (>85% confidence according to STATQUEST) but that were nonetheless removed because the identifications were based on only a single spectrum (a commonly accepted quality criteria) (56).
Fig. 6B shows the ratio of "present" and "absent" calls made for the microarray probe pairs corresponding to transcripts predicted to encode these marginal proteins using a default detection p value 0.04. Because half of all genes are predicted to be expressed, a false-positive database match has an equal chance of mapping to an expressed gene (hence, the ratio of forward-to-reverse sequences should be
1). However, the probes corresponding to even the most marginal protein identifications exhibited a far greater chance of being detected as "present" by microarray (even using the overly stringent p value threshold) than one would expect by chance alone. This implies that one could apply the results of a parallel gene expression study to reduce the false-negative rate resulting from stringent proteomic data filtering, without increasing the number of contaminating false-positives.
Annotation and Functional Inference
To obtain a holistic sense of the global similarities and differences between the two datasets, functional annotation was used to compare the transcriptome and proteomic datasets. Obtaining simple gene product descriptions from a reference annotation resource like ExPASy is often a good starting point for interpreting the biological significance of expression profiles and a logical first place for deducing the functions of proteins of special note (e.g. those with interesting expression patterns), which can then be individually followed up. Alternatively, one can look for general patterns of functional enrichment using a more-generalized annotation resource such as the GO database (www.geneontology.org), which reports on the functional properties of gene products using a more-standardized, computer-friendly schema. The idea is to look for underlying trends in an input list of gene products, by calculating the relative membership enrichment (versus chance alone) in various select functional categories (e.g. GO terms) using a suitable statistical measure. While the GO curation is far from complete, and potentially error prone, it is currently the most comprehensive resource of its type. Moreover, one can use a dedicated computer program or web application, such as GOMiner (discover.nci.nih.gov/gominer/), to automate the analysis.
As might be expected, gene transcripts expressed in TC-1 were enriched for cell communication, regulatory, and signaling proteins known to act as key determinants of differentiation, tissue-specificity, and endocrine cell function (data not shown). As shown in Supplemental Table IV, a disproportionate fraction of the gene products also identified by LC-MS were found to be involved in nucleic acid binding, transcription, mRNA splicing, and genome maintenance (e.g. DNA replication, cell-cycle control, etc), consistent with the subcellular fractionation and enrichment procedure used in this study. Hence, based on the observed subcellular localization and concordant expression patterns, the potential function of at least some of these proteins in
cell physiology can be reasonably predicted.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
In this pilot study, we have attempted to combine large-scale gene and protein expression technology platforms together with the aim of making more reliable, and more comprehensive, inferences regarding endocrine cell function. We have outlined some of the less well-appreciated, but nevertheless important, technical and analytical issues associated with practical implementation of global profiling methods, while offering potential solutions to the most pressing problems. Although the overlap between the gene chip and proteomic patterns reported here represents only a small portion of the total predicted genetic complement of mouse, our data strongly suggest that integration of multiple types of experimental techniques truly does allow one to gain a more complete, statistically sound, and biologically meaningful picture of the sample under study.
It is important to stringently filter genomic datasets using proven, rigorous statistical measures of reliability (24, 36, 56, 58) and to regard preliminary results with caution. Detection calls are an important first step for determining the reliability of probe values. Accurate predictions, and their associated p values, are proving to be important metrics as microarray datasets are increasingly being used as the basis for disease or sample classification systems (e.g. see Ref. 63). The robustness of probe features should be carefully investigated prior to their use as in machine-learning and classification studies so as to avoid spurious biomarker discovery. In this study, we have found the default microarray p value threshold, as suggested by Affymetrix, to be overly conservative with respect to detection specificity at the expense of sensitivity. Using RT-PCR, often considered to be the "gold-standard" technique for transcript detection, we have validated the expression of a number of mRNAs with high-predicted probe p values (>0.04; therefore, classified as marginal or absent by the Affymetrix detection call). Moreover, we have confirmed these initial findings using high-stringency global proteomic profiling. Viewed together, these results indicate that a significant number of genuine transcripts are likely commonly missed in most microarray screens, presumably as false-negatives due to overly restrictive filtering criteria.
Based on this study, we propose, as a heuristic guideline, that researchers incorporate other information, like probe signal intensity as well as other orthogonal experimental data, such as RT-PCR and (ideally) protein MS, to define suitable detection filters for microarray data. Although the exact nature of these criteria would differ from experiment to experiment, this approach is generalizable. By considering additional biologically pertinent factors, such as detected related properties in gene function across a global dataset, one might better discern between the significance of a (noisy) expression profile. This is particularly the case when taking into account biological variation on top of experimental noise.
The problem of making a reliable detection call seems like a far simpler problem as compared with determining differential gene expression for example. In the latter, one is usually dealing with data from several different sampleseither a time series or treatment-control-response experiments, or boththat first need to be normalized so that the signals are comparable. Then, thresholds for deciding upon "significant change" in transcript levels must be decided on. A "yes or no" answer for deciding whether an mRNA was detected requires only a single chip (although, the confidence and power of a result can be increased by repeat analysis, if required), and normalization is not necessary because there is no need for comparing signals. However, detection of genuine gene expression is a very important and interesting research problem in and of itself, though not nearly as alluring as the issue of differential expression, which is possibly a reason why it has received relatively little attention to date.
For starters, it is important to know if a gene is in fact expressed (or not) before one attempts to establish if transcript levels change under different experimental conditions. That is because low-abundance or marginal transcripts are frequently misidentified as "differentially expressed" by standard analysis software, because the absolute signal is often not taken into account if merely looking at fold change. Small fluctuations in background signal, due to technical variability, can sometimes override the signal intensity of an otherwise blank probe, resulting in classification as "differentially expressed" when in fact the transcript is absent altogether. Normalization tries to account for this, but not all normalization algorithms work equally well at dampening or removing variability in low-intensity signal. Hence, one must carefully consider both the absolute signal and detection call to reliably judge the significance and amplitude of log ratios when searching for differentially expressed probes.
This consideration has also been recognized by Affymetrix, as their new proprietary expression analysis algorithm, PLIER (Probe Logarithmic Intensity ERror estimation), takes probe hybridization affinity information into account and uses a sophisticated error model to calculate background and nonspecific hybridization, accepting >10% more probes compared with the Statistical Algorithm (based on Rat 230 2.0 array) (18). Other related algorithms are continuously being developed to improve the overall sensitivity and specificity of mRNA detection, especially for lower-abundance transcripts that are often presumed to be of particular biological relevance.2
For quantitative multisample comparisons, microarray data is usually normalized to minimize technical variability, such as variation in manufacture and processing of the arrays, scanning (and scanner calibration), differences in detection efficiency between the fluorescent dyes, and systematic spatial biases in measured expression levels. Normalization also aims to minimize unwanted variability due to unequal quantities of starting RNA, variable sample preparation, differences in labeling and hybridization efficiencies, the time of day the experiment is performed, and even technician performing the experiment (20, 21). Ultimately, however, biological variation remains the major issue of concern.
The techniques used to normalize Affymetrix data differ in two major aspects: whether they use probe-level data or summarized expression intensity, and whether they perform normalization across the entire dataset or normalize each chip individually. For example Li and Wong"s model-based expression index (MBEI) measure uses probe-level data of the entire dataset, excluding outliers, to fit their statistical model, which is then used to calculate gene expression (64). This is a nonlinear normalization algorithm, which takes into account information specific to each probe.
Quantile normalization, used as the normalization method in the widely accepted RMA expression quantitation method (20, 21), uses probe-level data and does not require a baseline array. An RMA expression measure that uses sequence information (GCRMA) is a related, though more sophisticated, expression quantitation method (34). The difference is that in GCRMA the background adjustment algorithm takes sequence information into account (such as GC content) and uses a stochastic model for binding affinities. For normalization the quantile normalization is used, and median-polish is used as a summary method, as is the case with RMA. This probe-specific background adjustment can also be used to improve the accuracy of other methods that summarize or normalize probe-level data (65, 66).
Because most microarray studies are ultimately aimed at predicting corresponding protein levels, in most scenarios it would probably be advantageous to measure global protein expression patterns directly. The proteome (or, possibly more correctly, the translatome or population of expressed proteins), is conceptually analogous to the transcriptome (population of mRNA transcripts) and is defined as the entire set of proteins produced by a cell or tissue at any given time point (67). MS-based protein expression profiling represents an increasingly attractive, albeit still very challenging, approach for investigating the global biochemical properties of cells and tissues (38, 62). When combined with biochemical prefractionation and enrichment procedures, proteomic screening also offers the potential for extensive functional discovery, such as by determining both the subcellular location and possible interacting partners of proteins of special interest. Nevertheless, much remains to be accomplished in this regard with respect to endocrine cell biology.
Despite recent advances, large-scale proteomic measurements of endocrine cells and tissues represents a daunting experimental challenge. Protein abundance, subcellular localization, and turnover are highly dynamic, while biological patterns are dictated by overlapping developmental signals, physiological cues, environmental constraints, and disease perturbations. Classical gel-based proteomic screening techniques, such as two-dimensional PAGE, have proven to be ineffective for monitoring the proteome, particularly lower-abundance proteins, and are biased against the identification of small, basic, or membrane-bound proteins (39, 67). In contrast, gel-free shotgun protein profiling strategies, such as the one used in this study, allow for far more comprehensive characterization of complex biological samples (39, 41, 67). Since its introduction (39, 41, 67), the MudPIT technique has proven to be a very capable method (reviewed in Refs. 39, 40, 68). Our group routinely applies MudPIT to monitor the protein patterns of various organelles isolated from rodent tissues (i.e. mouse organs) and cultured mammalian cell lines with the objective of elucidating the biochemical properties associated with these various cell types (42, 57). Although the results of our pilot efforts to define the proteome of endocrine cell lines such as islet TC-1 are still quite limited (3, 5), based on the results reported here we would argue that proteomic and functional genomic approaches are highly complementary and indeed synergistic when combined. Although a broad correlation was observed between the mRNA and protein patterns, the correspondence proved to be lower than might be expected.
Of course, assuming analytical error is not a cause, genes whose expressed cognate transcription and translation products do not correlate are of special interest because they may be indicative of protein regulation by posttranslational mechanisms, for instance by targeted protein degradation.6 On the other hand, outliers whose protein expression was significantly higher than expected based on the corresponding mRNA levels (i.e. not detected) might include long-lived proteins and, as a class, might be expected to be less common than outliers whose mRNA expression was higher (due to incomplete proteomic sampling). This latter class may contain proteins subject to proteolytic regulation, or perhaps regulation by transport into or out of the nucleus, to another organelle or alternatively to the plasma membrane (insoluble membrane proteins were largely missed by the MudPIT profiling procedure used in this study). These subsets merit additional scrutiny because they may represent factors important in determining or regulating cell-type-specific biological functions.
The issue of proteomic detection coverage is an important consideration. Part of the current limitations associated with shotgun profiling stems from a failure to properly (and optimally) interpret the vast collection of acquired spectra, leading to both many false-negatives (missed identifications) and false-positives (incorrect identifications). To this end, several groups have being trying to develop appropriate computational and statistical tools and methods for evaluating, validating, and mining large-scale protein expression datasets (45, 57, 58, 69). We have argued here that prior knowledge of gene expression patterns can, in fact, be used as supporting confirmatory evidence in favor of tentative (marginal preliminary scoring) protein identifications. Based on our preliminary findings, we propose, as a second heuristic rule (albeit one still requiring definitive formal proof), that any tentative protein gene product deemed to be expressed by microarray analysis and having a high confidence score (85% likelihood by STATQUEST or a similarly derived statistical measure) should be accepted as correctly identified regardless of the total number of supporting spectra.
Data integration is also complementary to alternate biochemical methods aiming for sample simplification or enrichment to improve proteomic detection, such as subcellular fractionation and/or nondenaturing conventional chromatography, which have proven to be fruitful for extending detection limits as well as providing a more biologically informative context for interpreting profiles (37, 42, 57). Of course, the introduction of new generations of high-performance instruments with much greater performance in terms of sensitivity, dynamic range, and scan speeds, will also surely help to surmount the under-sampling problem.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
Published, March 1, 2005
Published, MCP Papers in Press, March 1, 2005, DOI 10.1074/mcp.R500011-MCP200
1 The abbreviations used are: PM, perfect match; MM, mismatch; MudPIT, multidimensional protein identification technology; RMA, Robust Multichip Average; GO, Gene Ontology; IM, idealized mismatch; ROC, receiver-operating characteristic.
2 M. Maziarz and D. J. Drucker, unpublished observations.
3 P. Hu, personal communication.
4 M. Maziarz, unpublished observations.
5 C. Chung and A. Emili, unpublished observations.
6 Another important form of regulation, reversible posttranslational modification with phosphate, acetyl, glycosyl, or lipid groups, for example, was not addressed in this study
* This study was supported by grants from National Science and Engineering Research Council of Canada, Genome Canada, the Ontario Genome Institute, the McLaughlin Centre for Molecular Medicine, and the Ontario Research and Development Challenge Fund (to A. E.), and by an operating grant from the Canadian Institute of Health Research (to D. J. D.). D. J. D. is supported by a Canada Research Chair in Regulatory Peptides.
S The on-line version of this manuscript (available at http://www.mcponline.org) contains supplemental material.
¶ M. M. and C. C. contributed equally to this study.
To whom correspondence should be addressed: CH Best Institute, Room 402, 112 College Street, Toronto, Ontario, Canada M5G 1L6. Tel.: 416-946-7281; Fax: 416-978-8528; E-mail: andrew.emili{at}utoronto.ca
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|