PROTEOME-3D: An Interactive Bioinformatics Tool for Large-Scale Data Exploration and Knowledge Discovery*
Deborah H. Lundgren
,
Jimmy Eng
,
Michael E. Wright
and
David K. Han
,¶
From the
Center for Vascular Biology, Department of Physiology, University of Connecticut School of Medicine, Farmington, CT 06030, and
Institute for Systems Biology, Seattle, WA 98103
 |
ABSTRACT
|
---|
Comprehensive understanding of biological systems requires efficient and systematic assimilation of high-throughput datasets in the context of the existing knowledge base. A major limitation in the field of proteomics is the lack of an appropriate software platform that can synthesize a large number of experimental datasets in the context of the existing knowledge base. Here, we describe a software platform, termed PROTEOME-3D, that utilizes three essential features for systematic analysis of proteomics data: creation of a scalable, queryable, customized database for identified proteins from published literature; graphical tools for displaying proteome landscapes and trends from multiple large-scale experiments; and interactive data analysis that facilitates identification of crucial networks and pathways. Thus, PROTEOME-3D offers a standardized platform to analyze high-throughput experimental datasets for the identification of crucial players in co-regulated pathways and cellular processes.
Experimental methodologies, fine-tuned in recent years to allow high-throughput protein and cDNA analyses, have resulted in exponential growth of protein and cDNA expression profiles and interaction datasets. A number of large-scale analyses, such as the two-hybrid interaction maps and cDNA microarray technology, now allow interaction and expression datasets from large numbers of genes to be analyzed quickly and efficiently in a single experiment (1, 2). Protein profiling arrays for the comparable large-scale analysis of protein expression patterns are under active development as well (3, 4). When perfected, their output should be equally prolific. Finally, mass spectrometry, possibly the most important proteomics tool to date (5, 6), generates vast quantities of data through large-scale liquid chromatography (LC)1 tandem mass spectrometry (MS/MS) identification of expressed proteins in complex mixtures.
Predictably, technological advances enabling high-throughput analysis have resulted in an accumulation of experimental data at a rate far exceeding the current ability to assimilate that data. Transforming the rapidly proliferating quantities of experimental data into a usable form in order to facilitate data analysis is a challenging task. Numerous specialized databases and graphical tools have been described to organize the growing collection of large-scale experimental datasets (716). These tools have made significant contributions toward functional data organization and the display of protein complexes and hierarchical relationships. Yet the initial interpretation of experimental datasets in an interactive and intuitive way remains a challenge. Important functional information can only be determined through careful and detailed analysis of experimentally identified and quantified data in the context of the current knowledge base. Functional analysis, which is requisite to an exhaustive understanding of cellular networks and pathways, represents a major bottleneck in proteomics today. It is recognized that bridging the expansive gap between the current state of knowledge and the ultimate goal of understanding whole cellular networks requires a global discovery phase to pinpoint pivotal proteins in cellular networks (17). Tools that integrate diverse experimental results with the current knowledge base would undoubtedly facilitate the understanding of biological networks and pathways. Visualization of biological data is an important component of such applications (18).
We describe here a Web-based data exploration and knowledge discovery tool called PROTEOME-3D that utilizes three essential features for effective assimilation and analysis of large-scale experimental datasets: 1) automated construction of a customized database of expressed proteins/mRNAs from the public knowledge base using user-defined criteria; 2) graphical tools for displaying and comparing experimental results in the form of proteomic landscapes; 3) an interactive user interface for in-depth analysis of experimental results. Sample applications are provided to demonstrate how this tool can facilitate the evaluation of experimental results. (For information on how to obtain a copy of PROTEOME-3D, contact David K. Han at han{at}nso.uchc.edu.)
 |
EXPERIMENTAL PROCEDURES
|
---|
Information Flow
The general flow of information through PROTEOME-3D is outlined in Fig. 1. Experimental results generated from isotope-coded affinity tag (ICAT) analysis or from cDNA microarrays are pre-processed to create an input file of protein identities (ids) and abundance ratios (see "Database" subsection below for more detail). Protein ids are then used to generate a customized, user-defined dataset from public databases, and the combined experimental and retrieved data are stored in a local database. The PROTEOME-3D graphical interface is accessed through Internet Explorer. Three-dimensional (3D) display and protein page screens are linked for easy navigation, and each screen communicates with the local database through a servlet stored on the server (19). The protein page provides user-selectable links to public and/or proprietary databases and the capability to construct additional customized links.

View larger version (22K):
[in this window]
[in a new window]
|
FIG. 1. Information flow through PROTEOME-3D, from data generation through processing, storage in the local database, and display via graphical user interfaces. Multiple tab-delimited proteomics or microarray profiling experimental results can be used as input for PROTEOME-3D.
|
|
Database
Experimental results, together with a customized dataset retrieved from public databases, are stored locally in a relational database (Oracle 9i). For each experiment loaded in the database, a list of MS/MS-identified proteins and their calculated abundance ratios is initially read from an INTERACT summary web page, which contains one row of data for each peptide scan conclusively identified by SEQUEST and quantified by XPRESS (20, 21). Alternately, microarray output identified by gene ids and stored in a tab-delimited file is read in a pre-processing step, and a file of corresponding protein ids and abundance ratios is produced. A series of Java application programs are then executed, resulting in population of the local database with the experimental results and desired user-defined data. At a minimum, the experiment table contains, for each experiment, a nested table of entries comprising protein accession number, proportion, and standard deviation of the proportion. The proportion attribute is a mean value derived from the abundance ratio as follows: for each identified peptide in a given experiment, the peptides abundance ratio A:B is used to calculate a proportion B/(A+B), where B is the relative abundance of the peptide in the experimentally perturbed sample and A is its relative abundance in the control sample. Then, from all the peptide samples for a given protein, an average proportion and standard deviation are computed and stored in the nested table entry. This value serves as a normalized representation of abundance ratio, providing comparable values for comparisons between up- and down-regulated proteins, with values ranging from 0 for maximum down-regulation to 1 for maximum up-regulation. Additional information pertinent to each experiment can be stored in the experiment table as well.
Detailed information retrieved from public databases for each protein is stored in the protein table. An example of a set of stored attributes includes National Center for Biotechnology Information (NCBI) Protein Data Bank accession number, molecular weight, pI, cross-references to NCBI nucleotide and Online Mendelian Inheritance in Man (OMIM) databases, maploc, and a variety of descriptive fields such as keywords, definition, function, disease, subcellular location, and pathway. To avoid database redundancies and changing accession numbers, to retrieve latest annotations, and to ensure that the local database accurately reflects latest updates, it is necessary to routinely download the latest publicly available databases and subsequently update the local Oracle database.
 |
RESULTS
|
---|
Data Presentation
3D Graphic Display
An interactive Java3D graphic display represents a given experiment as a set of cone objects (Fig. 2, upper left). Each cone depicts a protein, uniquely identified by its mass (x-axis) and pI (z-axis). The abundance ratio (converted to a proportion, as described above) is graphed on the y-axis and corresponds to the height of the cone. The base of each cone sits on the plane of y = 0.5 (green), which represents a 1:1 abundance ratio. Cones depicting up-regulated and down-regulated proteins project above and below the reference plane, respectively. Cone color is mapped to the interval (0, 1) on the y-axis, with blue, green and red representing 0.0, 0.5, and 1.0, respectively. Bright red cones, then, denote highly up-regulated proteins, and bright blue cones denote those proteins that are highly down-regulated. For specific information on an individual protein, its cone can be selected by mouse click, which highlights the cone and displays a subset of the proteins attributes in a text field along the bottom of the screen. Next to the text field is a button linking the highlighted protein to the protein page screen, which interfaces with the local database and a number of external data sources. Additional buttons at screen bottom allow selection of alternate experimental displays. Global scene manipulation (zooming in or out, rotation, and scene translation up, down, left, or right) is executed through mouse buttons and the Alt key.

View larger version (65K):
[in this window]
[in a new window]
|
FIG. 2. Interactive PROTEOME-3D screens. Graphical display of protein up- and down-regulation for single experiment is shown in upper left. The height/color of each cone represents the proportion of its experimental quantity to its total (control + experimental) quantity. Cones above the translucent green plane are up-regulated; those below are down-regulated; the green plane represents no change. In experiment/experiment comparison (middle and lower left), proteins common to both experiments (intersect) are shown. For proteins up-regulated or down-regulated in both experiments, inner cone is visible within translucent outer cone (as indicated by arrow). Protein page (right) provides an interactive interface, with text fields displaying locally stored protein attributes; links to public or proprietary databases of interest; and a query-building tool for searching the local database. Graphical Display and Protein Page screens are linked via a button to allow easy navigation between them.
|
|
To the right of the graphic display is the content pane, which provides the functionality for experiment-to-experiment comparisons. A second experiment is chosen for comparison with the current experiment from a drop-down list under the heading "Select 2nd Exp." Optionally, a separate color can be chosen to represent each experiment ("Exp Colors"), enabling cones of the two experiments to be easily distinguished in the graphic display. Desired comparisons are executed through radio buttons under the heading "Select Function," with options including intersection, union, and complements. An intersection between two experiments is illustrated in Fig. 2, middle and lower left, where Exp1 cones are displayed in violet and Exp2 cones are displayed in green. The detail image of the intersection (Fig. 2, lower left) shows how the inner cone is visible through the transparent outer cone, allowing a visual comparison of the proteins expression in the two experiments.
Protein Page
The protein page screen (Fig. 2, lower right) provides the functionality for in-depth analysis of a selected protein. It comprises an interface to the local Oracle database, a customizable set of links to user-selected external Web sites, and a query-building tool for use with the local database. The protein page is accessible from the graphic display screen by a button click, as described in the previous paragraph. The screen is functionally divided into two sections. The top section displays attributes of the current protein that are stored in the local database, including comments that can be selected by name from a drop-down list. Additional information about the protein is accessible through customized links, which are defined on a pop-up window where the user can select from default sites or input his/her own URLs. The bottom section of the protein page screen comprises a querybuilder tool, with a button linking back to the display page, allowing results of database queries to be viewed as a 3D graphic display. Fields, operators, and connectors for a query are selected from drop-down lists; the search key is typed into a text box. Subqueries can be combined, using an appropriate connector, until the desired query is complete. After execution, query results can be viewed in a text box or as a 3D graphic display. Additional clauses can be added to the existing query, facilitating a natural progression to increasingly more specific subsets of data, or, alternately, the user can begin again with a new query.
Sample Applications
PROTEOME-3D provides a visual summary of potentially massive sets or selected subsets of experimental data, including experiment/experiment comparisons; an advanced querying capability on locally stored data; and direct exploration of experimental results in the context of the public knowledge base. This combination of features allows efficient navigation through vast quantities of data and extraction of subsets of experimental proteins with unique properties relevant to the area of research. The chosen subset can then be viewed graphically and further analyzed by accessing the public databases, or by experiment/experiment comparison, in which common or unique proteins from a second profiling experiment are jointly displayed in the 3D graphical format. The following examples, using data from ICAT proteomics profiling experiments (20, 25),2 demonstrate specifically how this tool can help analyze large-scale protein profiling experiments systematically and efficiently.
Example 1: Analysis of Membrane Proteins That Are Differentially Regulated During Apoptosis
Apoptotic cells are rapidly recognized and engulfed by neighboring cells and macrophages. This recognition process is theoretically a summation of changes in adhesive and repulsive signals presented on the apoptotic cell surface (22). In an effort to identify adhesive engulfment ligands, we profiled membrane proteins from control Jurkat cells and anti-Fas immunoglobulin M-treated apoptotic Jurkat cells, using a previously described method (20). We then manually explored the experimental results, together with pertinent public databases, searching for candidate proteins for further study. In fact, the impetus for creating an automated discovery tool came from this and similar time-consuming, manual searches of databases and experimental results. Here we explore the utility of PROTEOME-3D in the analysis of this dataset. Briefly, proteins positively identified (with p > 0.9) and experimentally quantified from the INTERACT page (Fig. 3A) were loaded into the PROTEOME-3D local database, automatically providing three inter-related tools for systematic and efficient analysis: 1) the local database itself, storing information retrieved from a variety of public databases, together with experimental data; 2) the interactive 3D graphical display of experimental results; and 3) a protein page linked to the 3D display, providing an interactive, queryable interface to the local database as well as user-defined links to publicly available network and pathway databases. These features allow rapid analysis of proteins for putative engulfment ligands. Among the 114 proteins identified and quantified from the Jurkat experiment, one stands out immediately in the 3D graphic display of experimental results (Fig. 3B): highly up-regulated human annexin I protein (ANX1_HUMAN). Selecting this protein by mouse click, we switch to the protein page for further exploration of this protein (Fig. 3C). Its keyword annotation lists calcium/phospholipid-binding as a known function, and recent literature cited in the OMIM link implicates annexin 1 in anti-inflammatory activity (23, 24). These features suggest a possible role for human annexin I in the apoptotic cell engulfment process and anti-inflammatory activity. Due to its significant up-regulation in the graphic display and its membrane-binding activities, this protein is quickly recognized as an intriguing candidate for further study. In fact, additional experiments were performed on annexin 1 (as a result of the original manual discovery process), and its function as an endogenous engulfment ligand was discovered (25). Thus, the ability to graphically visualize the proteome landscape of a profiling experiment and quickly access known biological activity allows the user to focus on candidate proteins that are most relevant to the particular biological system.

View larger version (50K):
[in this window]
[in a new window]
|
FIG. 3. Utilizing PROTEOME-3D to evaluate ICAT profiling proteomics experimental results. A, Interact file (partially shown) contains a record for each MS peptide scan. B, Graphic display reveals highly up-regulated (red) and highly down-regulated (blue) proteins at a glance. C, Up-regulated protein, ANX1_HUMAN, has been selected for investigation via Protein Page.
|
|
Example 2: Analysis of Membrane Proteins from Prostate Cancer Cells That Are Regulated by Treatment with Androgen Homolog R1881
Selection of a smaller subset of proteins from a large number of experimentally identified and quantified proteins or cDNAs requires efficient and systematic analysis of the experimental dataset. Here, we describe the use of PROTEOME-3D to analyze experimental ICAT profiling data from a prostate cancer cell line (LNCaP) treated with or without androgen analog R1881 for 24 h. The entire experimentally determined dataset includes 4052 proteins displayed in Fig. 4A. It is well documented that multi-step carcinogenesis requires sequential aberrations in chromosomal DNA, and a large number of chromosomal changes associated with prostate cancer have been described (Table I). The ability of PROTEOME-3D to automatically retrieve user-relevant data for local storage makes chromosomal localization of all of the identified and quantified proteins a queryable feature. Querying all proteins mapped to any of the loci listed in Table I results in the selection of 921 proteins. This subset is then further refined using the functional term "Oncogene," resulting in a subset of 31 proteins. Adding to the query the more specific term "Tumor suppressor" results in the two down-regulated and three up-regulated proteins responsive to androgen treatment composing the final subset. This progressive refinement of the search criteria, along with a graphic display of the resultant subset, is depicted in Fig. 4B. Now a much more manageable set of candidate proteins can be further analyzed via the protein page (Fig. 4C), allowing for additional queries and/or in-depth analysis of pathway and network databases. For example, one of the up-regulated proteins, ATM_HUMAN, is selected from the graphic display by mouse click and its links are followed to GENEGO and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway databases (Fig. 4C, lower left and right, respectively). Thus, interactions and networks associated with this experimentally identified and quantified tumor suppressor protein can be easily explored from the PROTEOME-3D platform, allowing easy integration of experimental data with the current knowledge base.

View larger version (60K):
[in this window]
[in a new window]
|
FIG. 4. Exploring experimental data from LNCaP cell line. A, Original set of 4052 proteins is shown. B, Flowchart indicates steps in successive refinement of queries on local database, with the graphic display of the resultant subset. Query 1 selects proteins associated with chromosome locations listed in Table I; Query 2 refines the first query with the keyword "Oncogene"; Query 3 further refines the query with the keyword "Tumor suppressor." C, Locally stored data for ATM_HUMAN, highlighted in B, is displayed on the protein page (top); portions of associated regulatory pathways from GENEGO and KEGG databases, accessible through links on the protein page, are displayed in the bottom panels.
|
|
Example 3: Analysis of Coregulation of Cellular Processes
Many biological processes are controlled by multi-protein complexes whose properties, such as abundance and subcellular location, are integral to that control (26). Exploring coregulation of proteins categoricallyproteins associated with particular functional groups or particular subcellular locations, for exampleprovides crucial biological information. This type of analysis can be easily done with PROTEOME-3D, using the query builder tool and displaying the results graphically (Fig. 5). We functionally categorized proteins from the LNCaP profiling experiment as mitochondrial, glycolytic, housekeeping, and structural by querying locally stored comment and keyword fields. The resulting graphical displays, which summarize the profiles of proteins associated with these functional groups, are shown in Fig. 5A. In Fig. 5B, the experimental results are categorized by subcellular location into the four broad groups of extracellular (secreted), cytoplasm, membrane, and nucleus. In each of the displays, highly up- or down-regulated proteins are easily distinguished by their corresponding bright red or blue color and can be easily selected for further analysis.

View larger version (108K):
[in this window]
[in a new window]
|
FIG. 5. Filtering experimental results by category. A, Multiple functional groups are selected and displayed using the 3D graph: mitochondria, energy87 proteins; fatty acid, glucose, glycolysis62 proteins; cell proliferation, growth, maintenance109 proteins; cell adhesion, cytoskeleton129 proteins. B, Proteins from distinct subcellular compartments are selected and displayed: extracellular, secreted76 proteins; cytoplasm173 proteins; membrane368 proteins; nucleus218 proteins.
|
|
Example 4: Analysis of Multiple Profiling Experiments
The efficient evaluation of multiple, large-scale expression analyses, time-dependent changes in expression, and protein profiles generated by diverse methodologies requires a software platform that can simultaneously display multiple experiments for analysis. We have implemented a unique component of PROTEOME-3D, termed "Multi-Experiment Comparison," where multiple proteome, cDNA microarray, or combinations of profiling experiments can be efficiently analyzed. The first example (Fig. 6, AC), comparing datasets extracted from Jurkat and LNCaP cell lines described above (Example 1 and Example 2), demonstrates salient features of this tool. Fig. 6A illustrates the intersection of proteins identified in each experiment, allowing a visual comparison of expression patterns for proteins common to the two cell lines. Zooming in, as shown previously in Fig. 2 (lower left), allows the experimentalist to discern similar or divergent patterns of regulation of a particular protein between the two experiments. Fig. 6B illustrates the Complement1 function, depicting proteins identified in Exp1 (Jurkat cell line) but not in Exp2 (LNCaP). Additional functions (not shown) include a display of proteins uniquely identified in Exp2 (Complement2), a combination of Complements 1 and 2 (Complement), and all proteins identified in either experiment (Union). Subsets of the data can also be easily extracted for comparison, as in Fig. 6C, where expression profiles of cytoplasmic proteins common to both experiments (Intersection) are displayed.

View larger version (124K):
[in this window]
[in a new window]
|
FIG. 6. Multiple experiment comparisons. Jurkat (Exp1) and LNCaP (Exp2) ICAT experiments are compared in AC. Proteins identified and quantified are displayed in violet (Exp1) and gold (Exp2). A, Proteins commonly identified in both Exp1 and Exp2 (intersection) are shown. B Proteins unique to Exp1 (complement1) are shown. C, Intersection of cytoplasmic proteins in Exps 1 and 2 is shown. cDNA microarray results (Exp1, D), ICAT profiling results (Exp2, E), and their intersection (F) are shown. More highly up- or down-regulated genes/proteins are displayed with translucent colors to allow visualization of the inner cones.
|
|
PROTEOME-3Ds multi-experiment comparison can also be used to analyze a cDNA microarray dataset concurrently with an ICAT proteome profiling experiment, as shown in Fig. 6, DF. LNCaP cells treated with androgen analog R1881 were used to isolate mRNAs and proteins for comparative analysis. The regulation of 56 genes is displayed in Fig. 6D, where specific gene accession numbers have been converted to Swiss-Prot loci in order to compute molecular weight and pI for the 3D graphical display page. Subsequent comparison of 56 genes with 2294 proteins from the ICAT experiment, shown in Fig. 6E, results in 25 common proteins. Using this feature of PROTEOME-3D, the investigator can easily explore and analyze co-regulated and differentially regulated mRNAs and proteins (Fig. 6F).
 |
DISCUSSION
|
---|
Systematic and efficient analysis of vast genomic and proteomic data sets is a major challenge for researchers today. Crucial biological advances in the study of model organisms are made daily, and new information is continuously deposited in publicly available databases. Thus, for each protein or mRNA that is identified and quantified from expression profiling experiments, a wealth of biologically relevant information, such as associated biological networks and pathways, protein interaction partners, biochemical activities, pathological/disease association, subcellular and tissue-specific expression, domain structure and function, may exist in publicly available databases. These datasets can be efficiently utilized by experimentalists to decipher complex functions of proteins if the crucial information is easily accessible. Yet due to the profound differences among the various biological databases housing such diverse data, information retrieval is an overwhelmingly manual, rate-limiting step in the researchers analysis of experimental results, making integration of existing biological data a critical problem (40). Thus, to systematically and efficiently evaluate large-scale experimental results in the context of existing biologically relevant data, at least four crucial features are required: 1) automatic retrieval of user-defined information to construct a customized, queryable database; 2) an intuitive graphical and query platform to display and analyze experimental data in the context of the customized database; 3) efficient utilization of web-based bioinformatics software tools for data interpretation, prediction of function, and modeling; and 4) scalability and reconstruction of the database in response to changing user needs and an ever-expanding base of knowledge and bioinformatics tools.
Creating a software tool to encompass the four crucial features outlined above is a challenging and ongoing task, particularly with respect to the ever-expanding publicly available base of knowledge and bioinformatics tools. PROTEOME-3D represents an initial attempt to automate the laborious and time-consuming process of experimental data analysis in order to efficiently identify the most salient features and evaluate those features in the context of the existing knowledge base. It is a platform built upon the integration of three related components: 1) a scalable, queryable, customized relational database for local storage of user-defined biologically relevant data; 2) an intuitive and interactive 3D display for evaluating and comparing experimental results; and 3) an interactive user interface for systematic analysis of experimental data. We have demonstrated with specific examples how PROTEOME-3D can be effectively utilized for large-scale experimental analysis. The interactive 3D proteomic landscape provides a striking visual overview of experimental results whose notable features can then be further analyzed through integration of the display with the local, queryable database of biologically relevant data. A customizable set of links to user-selected external sources expedites concurrent utilization of those web-based tools of particular relevance to the analysis. This ability to link PROTEOME-3D with a number of web resources is especially critical for investigation of protein families, splice isoforms, and multiple redundant entries in the databases. For example, exploration of a regulated protein/cDNA may require detailed analysis of multiple splice isoforms expressed in that tissue. Thus, the ability to create customized links and retrieve user-defined data is essential for detailed data exploration and knowledge discovery.
We intend to expand the capabilities of PROTEOME-3D in several areas. One critical need is the development of a customizable database interface, implemented as a Web-based form, allowing users to specify attributes of interest from a defined set of public databases for inclusion in their local, queryable database. In this way the software can be adapted to the specific and evolving needs of individual laboratories. In our initial implementation, we retrieved data from GenBank and OMIM databases. We intend to expand that list to include AmiGO, the Gene Ontology database, to enable queries on a standardized set of annotations (16); Alliance for Cellular Signalings Molecule Pages database, to obtain qualitative and quantitative data requisite to network modeling (41, 42); and additional databases, such as the Biomolecular Interaction Network Database (12), Human Protein Reference Database (www.hprd.org), liveDIP (7), and KEGG (www.genome.ad.jp/kegg/kegg2.html), to provide pertinent data on protein activities, networks, pathways, and interactions.
We also envision expanding our software to automatically interface with key bioinformatics resources. For example, the Virtual Cell project developed at University of Connecticut Health Center (43) provides tools for building testable models based on experimental results. It is an invaluable aid in understanding biological systems, fine-tuning theoretical models, and guiding the direction of future research. Virtual Cell is already accessible from our protein page interface via a customized link. We are currently working toward automating at least part of the model-building process, using data retrieved from public databases and supplementing with experimental data generated in the laboratory in order to automatically create the initial compartments and species of the Virtual Cell. As reference databases mentioned above become more densely populated with both qualitative and quantitative protein interaction data, more of the model-building process will be adaptable to automation. Also, integrating output from the Virtual Cell simulation with PROTEOME-3Ds graphical display will allow a visual comparison of Virtual Cells predicted cell protein profiles with the corresponding experimentally determined abundance ratios. This automation will expedite the detailed analysis of large-scale datasets by virtual-experimental tools.
PROTEOME-3Ds multiple-experiment comparison feature is a flexible tool that can be used effectively in a number of diverse applications. For instance, as quantitative protein profiling studies become more prevalent (44), the experiment/experiment comparison can be used to simultaneously display pathological versus normal tissue samples from multiple patients in order to visualize how the profile patterns differ. Alternately, this feature provides the means for simultaneous comparison of protein profiles generated by two different methodologies, such as the microarray-generated differential expression data and ICAT-LC/MS/MS-generated abundance ratios displayed in Fig. 6, to see how well their protein compositions and abundance ratios coincide under a given set of experimental conditions.
Although a number of graphical programs have been recently introduced for use in the analysis of microarray datasetsfor example, Gene MicroArray Pathway Profiler (45), Onto-Express (46), MAPPFinder (47), and GoMiner (48)PROTEOME-3D represents a novel approach to initial data exploration and knowledge discovery. Rather than the text-based graphical files that are output by the programs mentioned above, PROTEOME-3D displays experimental results in the form of interactive 3D proteomic landscapes, which are easily interpreted visually and easily superimposed to provide a visual experiment/experiment comparison as well. In addition to its flexible graphical display, PROTEOME-3D constructs a customized, queryable local database of expressed proteins/mRNAs from the public knowledge base and provides an interactive user interface for efficient and systematic analysis of experimental results. This paper has detailed specific features of PROTEOME-3D and described numerous sample applications, although the tool has a general applicability that extends beyond the specific examples presented here. Its utility in the analysis of large-scale experimental datasets makes it an invaluable tool in multiple biological and computational research environments.
 |
ACKNOWLEDGMENTS
|
---|
We thank R. Aebersold, J. Glomset, and R. Berlin for their comments and support, Les Loew, DongGuk Shin, Ann Cowan, Hsin-wei Wang, and Winfred Kruger for helpful discussion and technical assistance, and members of Han laboratory for discussion. We also thank Andrej Bugrim for giving us a trial license for the Genego software and for his help in network and pathway analysis.
 |
FOOTNOTES
|
---|
Received, June 24, 2003, and in revised form, August 6, 2003.
Published, MCP Papers in Press, August 7, 2003, DOI 10.1074/mcp.M300059-MCP200
1 The abbreviations used are: LC, liquid chromatography; MS/MS, tandem mass spectrometry; 3D, three-dimensional; ICAT, isotope-coded affinity tag; Id, identity; KEGG, Kyoto Encyclopedia of Genes and Genomes; OMIM, Online Mendelian Inheritance in Man. 
2 D. K. Han, unpublished data. 
* This work was supported by National Institutes of Health Grants HL67569, GM65764, and HL 70694. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 
¶ To whom correspondence should be addressed: David K. Han, Center for Vascular Biology, Department of Physiology, MC 3501, E 5041, University of Connecticut Health Center, 263 Farmington Ave., Farmington, CT 06030. Tel.: 860-679-2444; Fax: 860-679-1201; E-mail: han{at}nso.uchc.edu.
 |
REFERENCES
|
---|
- Auerbach, D., Thaminy, S., Hottiger, M. O., and Stagljar, I.
(2002) The post-genomic era of interactive proteomics: Facts and perspectives.
Proteomics
2, 611
623[CrossRef][Medline]
- Shoemaker, D. D., Schadt, E. E., Armour, C. D., He, Y. D., Garrett-Engele, P., McDonagh, P. D., Loerch, P. M., Leonardson, A., Lum, P. Y., Cavet, G., Wu, L. F., Altschuler, S. J., Edwards, S., King, J., Tsang, J. S., Schimmack, G., Schelter, J. M., Koch, J., Ziman, M., Marton, M. J., Li, B., Cundiff, P., Ward, T., Castle, J., Krolewski, M., Meyer, M. R., Mao, M., Burchard, J., Kidd, M. J., Dai, H., Phillips, J. W., Linsley, P. S., Stoughton, R., Scherer, S., and Boguski, M. S.
(2001) Experimental annotation of the human genome using microarray technology.
Nature
409, 922
927[CrossRef][Medline]
- Jenkins, R. E., and Pennington, S. R.
(2001) Arrays for protein expression profiling: Towards a viable alternative to two-dimensional gel electrophoresis?
Proteomics
1, 13
29[CrossRef][Medline]
- MacBeath, G.
(2002) Protein microarrays and proteomics.
Nat. Genet. Suppl.
32, 526
532[CrossRef]
- Aebersold, R., and Goodlett, D. R.
(2001) Mass Spectrometry in Proteomics.
Chem. Rev.
101, 269
295[CrossRef][Medline]
- Hochstrasser, D. F., Sanchez, J., and Appel, R. D
(2002) Proteomics and its trends facing natures complexity.
Proteomics
2, 807
812[CrossRef][Medline]
- Xiaoqun, J. D., Xenarios, I., and Eisenberg, D.
(2002) Describing biological protein interactions in terms of protein states and state transitions.
Mol. Cell. Proteomics
1, 104
116[Abstract/Free Full Text]
- Sirava, M., Schafer, T., Eiglsperger, M., Kaufmann, M., Kohlbacher, O., Bornberg-Bauer, E., and Lenhof, H. P.
(2002) BioMinerModeling, analyzing, and visualizing biochemical pathways and networks.
Bioinformatics
18, S219
S230[Abstract]
- Karp, P. D.
(2001) Pathway databases: A case study in computational symbolic theories.
Science
293, 2040
2044[Abstract/Free Full Text]
- Salamonsen, W., Mok, K. Y. C., Kolatkar, P., and Subbiah, S.
(1999) BioJAKE: A tool for the creation, visualization and manipulation of metabolic pathways.
Proceedings of the Pacific Symposium on Biocomputing
1999, 392
400
- Karp, P. D.
(1998) Metabolic databases.
Trends Biochem. Sci.
23, 114
116[CrossRef][Medline]
- Bader, G. D., Donaldson, I., Wolting, C., Ouellette, B. F. F., Pawson, T., and Hogue, C. W. V.
(2001) BINDThe Biomolecular Interaction Network Database.
Nucleic Acids Res.
29, 242
245[Abstract/Free Full Text]
- Demir, E., Babur, O., Dogrusoz, U., Gursoy, A., Nisanci, G., Cetin-Atalay, R., and Ozturk, M.
(2002) PATIKA: An integrated visual environment for collaborative construction and analysis of cellular pathways.
Bioinformatics
18, 996
1003[Abstract/Free Full Text]
- Ruths, D. A., Chen, E. S., and Ellis, L.
(2000) Arbor 3D: An interactive environment for examining phylogenetic and taxonomic trees in multiple dimensions.
Bioinformatics
16, 1003
1009[Abstract]
- Bohannon, J.
(2002) The human genome in 3D, at your fingertips.
Science
298, 737[Abstract/Free Full Text]
- Ashburner, M., Ball, C. A., Blake, J. A., Butler, H., Cherry, J. M., Corradi, J., Dolinski, K., Eppig, J. T., Harris, M., Hill, D. P., Lewis, S., Marshall, B., Mungall, C., Reiser, L., Rhee, S., Richardson, J. E., Richter, J., Ringwald, M., Rubin, G. M., Sherlock, G., and Yoon, J.
(2001) Creating the gene ontology resource: Design and implementation.
Genome Res.
11, 1425
1433[Abstract/Free Full Text]
- Hubbard, M. J.
(2002) Functional proteomics: The goalposts are moving.
Proteomics
2002, 1069
1978[CrossRef]
- Navarro, J. D., Niranjan, V., Peri, S., Jonnalagadda, C. K., and Pandey, A.
(2003) From biological databases to platforms for biomedical discovery.
Trends Biotechnol.
21, 263
268[CrossRef][Medline]
- Bales, D.
(2002) Dynamic database access from client-side Java Script. (www.oreillynet.com/pub/a/onjava/2002/01/23/javascript.html)
- Han, D. K., Eng, J., Zhou, H., and Aebersold, R.
(2001) Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry.
Nat. Biotechnol.
19, 946
951[CrossRef][Medline]
- Eng, J., McCormack, A. L., and Yates, J. R.
(1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.
J. Am. Soc. Mass. Spectrom.
5, 976
989[CrossRef]
- Fadok, V. A., Bratton, D. L., and Henson, P. M.
(2001) Phagocyte receptors for apoptotic cells: recognition, uptake, and consequences.
J. Clin. Invest.
7, 957
962[CrossRef]
- Walther, A., Riehemann, K., and Gerke, V.
(2000) A novel ligand of the formyl peptide receptor: Annexin I regulates neutrophil extravasation by interacting with the FPR.
Mol. Cell
5, 831
840[CrossRef][Medline]
- Perretti, M., Chiang, N., La, M., Fierro, I. M., Marullo, S., Getting, S. J., Solito, E., and Serhan, C. N.
(2002) Endogenous lipid- and peptide-derived anti-inflammatory pathways generated with glucocorticoid and aspirin treatment activate the lipoxin A(4) receptor.
Nat. Med.
8, 1296
1302[CrossRef][Medline]
- Arur, S., Uche, U. E., Rezaul, K., Fong, M., Scranton, V., Cowan, A. E., Mohler, W., and Han, D. K.
(2003) Annexin I is an endogenous ligand that mediates apoptotic cell engulfment.
Develop. Cell
4, 587
598
- Aebersold, R., and Mann, M.
(2003) Mass spectrometry-based proteomics.
Nature
422, 198
207[CrossRef][Medline]
- Bae, V. L., Jackson-Cook, C. K., Maygarden, S. J., Plymate, S. R., Chen, J., and Ware, J. L.
(1998) Metastatic sublines of an SV40 large T antigen immortalized human prostate epithelial cell line.
The Prostate
34, 275
282[CrossRef][Medline]
- Trapman, J.
(2002) Molecular genetics of prostate cancer. (www.eur.nl/fgg/pathol/research/trapman/pc genetics.htm)
- Steiner, T., Junker, K., Burkhardt, F., Braunsdorf, A., Janitzky, V., and Schubert, J.
(2002) Gain in chromosome 8q correlates with early progression in hormonal treated prostate cancer.
Eur. Urol.
41, 167
171[CrossRef][Medline]
- Verhagen, P. C., Hermans, K. G., Brok, M. O., Van Weerden, W. M., Tilanus, M. G., de Weger, R. A., Boon, T. A., and Trapman, J.
(2002) Deletion of chromosomal region 6q1416 in prostate cancer.
Int. J. Cancer
102, 142
147[CrossRef][Medline]
- Latil, A., Morant, P., Fournier, G., Mangin, P., Berthon, P., and Cussenot, O.
(2002) CHC1-L, a candidate gene for prostate carcinogenesis at 13q14.2, is frequently affected by loss of heterozygosity and underexpressed in human prostate cancer.
Int. J. Cancer
99, 689
696[CrossRef][Medline]
- Jordan, J. J., Hanlon, A. L., Al-Saleem, T. I., Greenberg, R. E., and Tricoli, J. V.
(2001) Loss of the short arm of the Y chromosome in human prostate carcinoma. Cancer Genet.
Cytogenet.
124, 122
126
- Kasahara, K. Taguchi, T., Yamasaki, I., Kamada, M., Yuri, K., and Shuin, T.
(2002) Detection of genetic alterations in advanced prostate cancer by comparative genomic hybridization.
Cancer Genet. Cytogenet.
137, 59
63[CrossRef][Medline]
- Schulz, W. A., Elo, J. P., Florl, A. R., Pennanen, S., Santourlidis, S., Engers, R., Buchardt, M., Seifert, H. H., and Visakorpi, T.
(2002) Genomewide DNA hypomethylation is associated with alterations on chromosome 8 in prostate carcinoma.
Genes Chromosomes Cancer
35, 58
65[CrossRef][Medline]
- Wolter, H. Trijic, D., Gottfried, H. W., and Mattfeldt, T.
(2002) Chromosomal changes in incidental prostatic carcinomas detected by comparative genomic hybridization.
Eur. Urol.
41, 328
334[CrossRef][Medline]
- Tsuchiya, N., Slezak, J. M., Liber, M. M., Bergstralh, E. J., and Jenkins, R. B.
(2002) Clinical significance of alterations of chromosome 8 detected by fluorescence in situ hybridization analysis in pathologic organ-confined prostate cancer.
Genes Chromosomes Cancer
34, 363
371[CrossRef][Medline]
- Brothman, A. R., Maxwell, T. M., Cui J., Deubler, D. A., and Zhu, X. L.
(1999) Chromosomal clues to the development of prostate tumors.
Prostate
38, 303
312[CrossRef][Medline]
- Bova, G. S., and Isaacs, W. B.
(1996) Review of allelic loss and gain in prostate cancer.
World J. Urol.
14, 338
346[Medline]
- Nupponen, N., and Visakorpi, T.
(1999) Molecular biology of progression of prostate cancer.
Eur. Urol.
35, 351
354[CrossRef][Medline]
- Stein, L. D.
(2003) Integrating Biological Databases.
Nature Rev. Gen.
4, 337
345[CrossRef][Medline]
- Taussig, R., Ranganathan, R., Ross, E. M., and Gilman, A. G.
(2002) Overview of the Alliance for Cellular Signaling.
Nature
420, 703
706[CrossRef][Medline]
- Li, J., Ning, Y., Hedley, W., Saunders, B., Chen, Y, Tindill, N., Hannay, T, and Subramaniam, S.
(2002) The Molecule Pages database.
Nature
420, 716
717[CrossRef][Medline]
- Schaff. J., and Loew L. M.
(1999) The virtual cell, in
Biocomputing: Proceedings of the 1999 Pacific Symposium (Altman, R. B., Dunker, A. K., Hunter, L., Klein, T. E., and Lauderdale, K., eds), pp.228
239, World Sci, Singapore
- Aebersold, R.
(2003) Constellations in a cellular universe.
Nature
422, 115
116[CrossRef][Medline]
- Dahlquist, K. D., Salomonis, N., Vranizan, K., Lawlor, S. C., and Conklin, B. R.
(2002) GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways.
Nat. Genet.
31, 19
20[CrossRef][Medline]
- Khatri, P., Draghici, S, G. C. Ostermeier, and Krawetz, S. A.
(2002) Profiling gene expression using Onto-Express.
Genomics
79, 266
270[CrossRef][Medline]
- Doniger, S. W., Salomonis, N., Dahlquist, K. D., Vranizan, K., Lawlor, S. C., and Conklin, B. R.
(2002) MAPPFinder: Using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data.
Genome Biol.
4, R7[CrossRef]
- Zeeberg, B. R., Feng, W., Wang, G., Wang, M. D., Fojo, A. T., Sunshine, M., Narasimhan, S., Kane, D. W., Reinhold, W. C., Labadidi, S., Bussey, K. J., Riss, J., Barrett, J. C., and Weinstein, J. N.
(2003) GoMiner: A resource for biological interpretation of genomic and proteomic data.
Genome Biol.
4, R28[CrossRef][Medline]