Protein Identification

The Good, the Bad, and the Ugly

Ralph A. Bradshaw, Alma L. Burlingame, Steve Carr and Ruedi Aebersold

The identification of proteins and the co-/post-translational modifications that characterize their mature (active) forms is central to proteomic experimentation. This has been accomplished historically using a variety of methods, but mass spectrometry, and in particular tandem applications (MS/MS), has become the core technology because of its efficiency, accuracy, and sensitivity. When coupled to various separation methodologies, it can produce hundreds of identifications from a single complex sample. In the currently heavily used "bottom up" strategy, the proteins are fragmented with a highly specific enzyme such as trypsin and selected peptides subsequently sequenced in the tandem mass spectrometer. The spectra generated are then compared with a database to obtain matching entries to facilitate final sequence identification. Identification by peptide mass mapping is a less demanding approach and can be effective with relatively simple mixtures, particularly where the proteins are derived from a species whose genome is known.

The power of these MS-based approaches is already considerable, and they certainly have not been maximally developed or exploited. Indeed, it is not unreasonable to expect these efforts to grow dramatically in the coming years, providing important advances in our understanding of biological systems and their application to diagnosing, treating, and ultimately curing disease (the good). However, as with all developments in science, there are problems associated with these approaches, and they have already begun to manifest themselves. Clearly, the most serious issue is the large and seemingly increasing number of misidentifications (the bad). These arise for a variety of reasons, most related to human error, which can range from poorly prepared/treated samples, incorrect instrument standardization, and poor quality data to over extending the capacity of the identification software and even to errors in the databases themselves. The fact that these misidentifications all too readily find their way into the scientific record, including the pages of this journal, is thus a concern (the ugly). Given the size of many typical experiments and the speed with which data can be collected, this situation cannot be treated as either a minor issue or a passing phenomenon.

About 2 years ago, the editors and associate editors of MCP began to seriously consider this problem and what could be done about it. It was their conclusion that there were several responses possible but that defining a set of criteria for the publication of such data was an important place to begin. Accordingly, an ad hoc committee composed of several MCP associate editors and interested colleagues, chaired by Steve Carr, produced a working set of guidelines, along with an explanatory document, as a point of departure that was published in the journal in June 2004 (MCP 3, 531–533). The general public response to the Carr guidelines was that it was an important beginning. The next step was a refinement of the Carr guidelines into a more "universal" document, and then, hopefully, their broad adoption by all journals interested in the publication of such data. To achieve this conversion, it was clear that it would require assembling a group of individuals that were representative of the "stakeholders" in this problem, to wit, editors/referees/publishers, vendors of germane instruments, software developers/purveyors, organizations (e.g. Human Proteome Organisation and European Bioinformatics Institute), and most importantly scientists actively using and producing such data. Accordingly, a workshop was planned, under the partial auspices of the ASBMB, and Mike Baldwin, with assistance and input from Steve Carr, Al Burlingame, and Ruedi Aebersold, all of whom had been involved in the writing of the original guidelines, agreed to the role of chief organizer. Some 60 individuals were invited and ultimately 30 participants from all over the world, and most importantly with a broad range of expertise, were able to attend. The meeting took place May 12–13 in Paris, France, and with a great deal of hard work of all concerned produced the desired draft. Although built on the framework of the Carr guidelines, the new document is considerably modified and extended. It deals with issues not covered in the original guidelines, such as quantitation, and should when finally refined provide a rigorous but useable set of instructions to authors that will be enormously helpful in standardizing the protein identification literature, minimizing incorrect identifications, and providing interested parties the documentation required for independent evaluation. The document, as a draft, was released to the public on July 14 and has already been widely circulated/posted. Comments, criticisms, and suggestions are welcome (they should be directed to rablab{at}uci.edu) until October 15, at which time the original Paris group will consider the entirety of the input and make appropriate changes and additions. The completed document is expected to be ready for adoption/use by journals and other interested parties no later than the first of the year. A copy of the draft document may be found following this article or on the MCP website (www.mcponline.org). In the meantime, MCP will continue to use the Carr guidelines, and as a means for implementing them more effectively the journal has changed its electronic submission site so as to allow the more facile review of articles using these technologies. Among other qualifying questions, authors will be asked if they have read the Carr guidelines (now incorporated into the Instructions to Authors). Clearly, an appreciation of our expectations in terms of reporting protein identifications will help to avoid major misunderstandings or even rejections that could subsequently arise during the review process.

The discussions in Paris also underscored some of the remaining problems that presently lie outside the scope of the draft guidelines and solutions for them. One of these is the issue of depositing raw data at a site that can be linked to a journal but that is separate from it and can be accessed and interrogated by the public generally. As a means to determine the usefulness of such repositories, MCP, in collaboration with Karl Clauser and Steve Carr of the Broad Institute of the Massachusetts Institute of Technology and Harvard, has set up a modest test of such a site. Data that have been/will be deposited support publications that have appeared in (or been accepted by) MCP and will be hyperlinked to it. If this experiment proves successful it could grow into a major operation with substantial and widespread support. It can be accessed at http://www.broad.mit.edu/ftp/pub/proteomics/BIMSRtableofcontents.htm, and comments regarding any aspect of it should be directed to clauser{at}broad.mit.edu.

The overall success of the Paris meeting cannot be judged until the editorial process is complete and the final document is in hand. However, based on the enthusiasm of the participants as we finished up the initial draft, prospects for bringing these standards into general use seem quite high. Sufficiently so that we closed the workshop in the finest French tradition–with a glass of champagne!





This Article
Full Text (PDF)
Submit a response
Purchase Article
View Shopping Cart
Alert me when this article is cited
Alert me when eLetters are posted
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Download to citation manager
Glossary
Copyright Permissions
Google Scholar
Articles by Bradshaw, R. A.
Articles by Aebersold, R.
Articles citing this Article
PubMed
PubMed Citation
Articles by Bradshaw, R. A.
Articles by Aebersold, R.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 All ASBMB Journals   Journal of Biological Chemistry 
 Journal of Lipid Research   Biochemistry and Molecular Biology Education