Max Planck Institute of Molecular Cell Biology and Genetics (MPI CBG)
Genomics, Technische Universität Dresden, c/o MPI CBG, Pfotenhauerstrasse 108, 01307 Dresden, Germany
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The availability of the budding yeast genome (1) stimulated efforts toward global mapping of a comprehensive "circuit diagram" of physical and functional protein interactions through bioinformatics (10), genome-wide two-hybrid screening (11, 12), mRNA (13), and protein (14) arrays. Although even a preliminary sketch of a global gene interaction network is a remarkable achievement, these experiments revealed important limitations of the technologies. First, they did not document protein complexes but rather inferred their content from interaction data sets, which are, problematically, contaminated with false positives. In particular, two-hybrid screening only records interactions between pairs of genes and misses interactions stabilized by more than two partners. Thus it appears that there is no substitute for authentic biochemical characterization of protein complexes purified from the original host. Furthermore, native protein assemblages are formed in a complex milieu, which is regulated by folding, modification, limited proteolysis, transportation to specific cellular compartments, and assembly with non-proteinous co-factors such as RNAs, nucleotides, metal cations, etc.
Protein tagging presents a generic approach for the analysis of native protein complexes. The tagged protein is affinity-purified from a whole cell lysate, together with associated proteins, which are subsequently characterized by mass spectrometry (15). Because it is relatively straightforward to fuse affinity tags with target proteins in the budding yeast, the approach was successfully applied to the characterization of numerous assemblages of various molecular weight, localization, and biological function (reviewed in Ref. 16). Further developments were focused on improving protein identification by multidimensional liquid chromatography-tandem mass spectrometry (17, 18) and of protein-tagging methodology (19). The tandem affinity purification (TAP)1 method (20, 21) utilizes two affinity tags spaced by a cleavage site of tobacco etch virus (TEV) proteinase. Compared with other epitope tags such as myc- or HA-, TAP gives better yields of affinity-purified proteins, along with lower background of nonspecifically associated proteins (22, 23).
Two ways to use epitope tagging, immunoaffinity chromatography, and mass spectrometry in the analysis of protein complexes are evident. In one strategy, a large number of baits are processed in parallel by an established high throughput protein purification and identification routine. The biological significance of identified interactions is evaluated later and only for a selection of baits that yielded the most interesting patterns of associated proteins.
Alternatively, a sequential approach offers the advantage of systematic verification of identified interactions. In a first round, a gene is tagged, and its interaction partners are sought. Interacting proteins identified in the first round are subsequently tagged, and the procedure is repeated (24). In contrast to the parallel analysis, this approach, previously termed SEAM for sequential rounds of epitope tagging, immunoaffinity chromatography, and mass spectrometry (25), is better suited for addressing specific biological problems than for charting protein-protein interactions on a proteome scale. Importantly, the function of identified proteins is independently evaluated in biological experiments, which effectively navigate further IPs. It is therefore conceivable that at some point subunit(s) associated with core subunits of complexes with different functionality would be identified thus linking the complexes into a network. In this paper we employed sequential tagging to identify interaction partners for a selection of genes, which may potentially be involved in chromatin remodeling, RNA processing, and regulation of transcription. By identifying interactors of 48 TAP-tagged yeast genes, we assess the analytical perspectives of the technology as a generic functional proteomics tool.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The extraction of yeast cells was performed as described by Logie and Peterson (27). We found that the procedure of Rigaut et al. (20), which employs a French press for breaking cells, produced poor results. Reproducibility was improved significantly by using the glass bead beater protocol described by Logie and Peterson (27). TAP purification was performed according to Rigaut et al. (20) with the following modifications: 10 ml of supernatant collected after a 43,000-rpm centrifugation were allowed to bind to 200 µl of IgG-Sepharose (Amersham Biosciences), equilibrated in Buffer E (27) for 2 h at 4°C using a disposable chromatography column (Bio-Rad). 23 column volumes (the equivalent of 46 liters of yeast culture at A600 = 23) were used per purification. The IgG-Sepharose column was washed with 35 ml of Buffer E without proteinase inhibitors, followed by 10 ml of the TEV cleavage buffer (20). Cleavage with TEV was performed using 10 µl (100 units) of rTEV (Invitrogen) in 1 ml of cleavage buffer for 2 h at 16°C. Calmodulin-Sepharose (Stratagene) purification was performed as described (20). Purified proteins were concentrated according to Wessel and Flugge (28).
Identification of Proteins by Mass Spectrometry
Proteins were separated by electrophoresis using gradient (618%) one-dimensional polyacrylamide gels and visualized by staining with Coomassie. Protein bands were excised and in-gel digested with trypsin (unmodified, sequencing grade; Roche Diagnostics) as described (29, 30). Proteins were identified by a combination of MALDI MS peptide mapping and nanoelectrospray tandem mass spectrometric sequencing as described (31). Briefly, 1-µl aliquots were withdrawn from the in-gel digests and analyzed on a REFLEX III mass spectrometer (Bruker Daltonics) using a thin-layer probe preparation method (32). If no conclusive identification was achieved, the gel pieces were extracted with 5% formic acid and acetonitrile. Unseparated mixtures of recovered tryptic peptides were sequenced by nanoelectrospray tandem mass spectrometry (nanoES MS/MS) as described (33) on a triple quadrupole mass spectrometer API III or on a QSTAR quadrupole time-of-flight mass spectrometer (both from MDS Sciex, Concord, Canada). Database searching was performed against a comprehensive non-redundant database using MASCOT software (34).
![]() |
RESULTS AND DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Although a variety of epitope tags have been described (19), TAP tagging offers several important advantages. The TAP tag consists of two high affinity modules, a calmodulin binding peptide, and a double protein A epitope, which is separated by a cleavage site for TEV protease (20). Protein complex purification is achieved via two-step affinity chromatography, which is carried out under conditions that leave proteins intact. In addition to the high specificity of binding of each part of the tag, the tethered protein complex is cleaved off protein A-IgG-Sepharose beads by the highly specific TEV protease, leaving a bulk of nonspecifically associated proteins on the beads. Therefore the purification results in much less background compared with conventional IP methods.
Altogether, 53 genes were epitope-tagged using the TAP tag, and in 52 cases (98%) the tag was incorporated successfully, as assessed by Western blotting. In one case, homologous recombination was successful, but Western blotting detected no fusion protein. Possibly this was because of an incorrect prediction of the ORF or the absence of noticeable protein expression under cell culturing conditions. Unexpectedly, in no case did we observe lethality or an obviously disturbed phenotype, caused by the tag. In a single case (C-terminal TAP tag fusion to Set1) we observed that the complex was not perturbed; however, the tag did interfere with its enzymatic activity (35). Fusion of the tag to the N terminus of Set1 led to retrieval of the same complex with no interference of enzymatic activity (data not shown).
In four of 52 cases the bait proteins were observed in Western blots; however, insufficient amounts of IPed proteins precluded their detection by mass spectrometry. Fusing the TAP tag to the N termini for two of these four proteins did not improve the yield, and scaling up the purification procedure did not improve results either. The codon bias index (CBI) (36) of each of these four proteins was lower than 0.1 suggesting they are of low abundance.
The successfully tagged genes encoded for proteins of a variety of physical properties with molecular mass between 9 and 175 kDa, calculated pI from 4.5 to 10.0, and CBI from -0.064 up to 0.16. Assuming that CBI represents reasonably a relative level of the protein expression (37), we concluded that TAP tagging was successful for low expressed proteins. Among the 52 successfully tagged genes 11 were essential, according to the YPD database (38). Altogether, 48 tagged genes (91%) were recovered by immunoaffinity chromatography in amounts sufficient for their reliable detection by mass spectrometry.
Confident Identification of Proteins by Mass Spectrometry
Deciphering protein complexes is a challenging task for mass spectrometry. First, protein complexes comprise subunits of various molecular weight, pI, and hydrophobicity, and therefore the number of peptides recovered from their in-gel digests markedly varies. Second, immunoaffinity isolation is performed typically under conditions that preserve relatively weak protein-protein interactions, and therefore co-isolation of nonspecifically associated proteins commonly occurs. Third, immunoaffinity purifications are usually difficult to scale up. If the yield of proteins of interest is low, their isolation from a larger volume of the cell culture results typically in loss of binding specificity and increased background. Although the TAP method reduces protein background considerably (see below), co-migration of two or more proteins within a single band occurs frequently, especially in the low molecular weight region. Therefore mass spectrometry is required to identify confidently the protein even if only a single peptide is recovered from its digest. Typical problems in confident protein identification are discussed here using an example of the analysis of an 20-kDa Coomassie-stained protein band observed in the immunoprecipitate of tagged Set1 (35). Despite the low molecular mass of the protein, 25 prominent peptide peaks were detected in a MALDI MS spectrum of its digest (Fig. 1A), and their masses were used for database searching. The search hit the 15-kDa 40 S ribosomal protein S24 (accession number P26782) with the score 115 (statistical significance threshold score was 52) by matching eight peptide ions with better than 100 ppm mass accuracy and better than 50% sequence coverage. However, the most intense peaks remained unaccounted for, and database searching with masses of unmatched ions did not result in any more hits, thus indicating that other yet unidentified protein(s) might be present in the sample. The digest was further analyzed by nanoES MS/MS, and another six ribosomal proteins, each of which matched a single unique sequenced peptide, were identified (Fig. 1B). Some of these proteins were among low confidence hits from MALDI MS analysis; however, none of the sequenced peptides identified S24 protein, the top hit. A single peptide sequence deduced from the spectrum acquired from a doubly charged ion with m/z 382.0 matched the 15-kDa protein YBR258c (Fig. 1C). However, this hit could not be judged as confident as the retrieved sequence was short and degenerate. It is also known that large multiply charged peptides often undergo partial orifice fragmentation yielding abundant singly and doubly charged y-ions, and therefore database searching should not rely upon the cleavage specificity of trypsin. In fact, the peptide sequence (Leu/Ile)(Leu/Ile)Glu(Met(ox)/Phe)(Leu/Ile)Lys hits more than 200 proteins in a comprehensive database, including six proteins from the budding yeast. We retrospectively examined the MALDI MS map and found that the masses of another four peptides matched the sequence of YBR258c, and none of them matched other yeast protein candidates. We therefore concluded that although neither MALDI MS nor NanoES MS vouched for unambiguous identification, a combination of the two techniques produced a confident hit. Subsequent tagging of YBR258c confirmed that it is a bona fide subunit of the protein complex (35).
|
Protein Background in the TAP Method
Conventional IP experiments often result in complex patterns of co-isolated proteins (44). The immunoprecipitated proteins are usually separated on a one-dimensional polyacrylamide gel, and the pattern of proteins observed in the experiment lane is compared with the one in the control lane, and then only proteins detected selectively in the experiment are subjected to further identification by mass spectrometry. However, this approach is slow and prone to errors. We therefore excised and analyzed by mass spectrometry all bands detected in the experiment lane, effectively using no control. We identified a subset of proteins that were detected repeatedly using the TAP tag and hence are common background contaminants (Table I). Although these proteins vary in function, molecular weight and pI, they are all very abundant. In addition to these common proteins, we detected a few contaminants that were observed only occasionally (Table I) and are most likely because of small variations in the reagents used, in particular because of different batches of calmodulin beads or because of phenotypic alterations caused by the tag.
|
The dissection of the subset of the proteome anchored at Set3 protein is discussed here as an illustrative example. Set3 posses SET and PHD finger domains, which are hallmarks of proteins involved in chromatin regulation and epigenetics (45). Initially Set3 was TAP-tagged and immunoaffinity-isolated, and subsequent mass spectrometric analysis identified eight interacting proteins (Fig. 2). Kap95 and Kap60 belong to the family of importins; Hos2 and Hst1 were putative histone deacetylases; Sif2 and YIL112c contain multiple repeats of generic protein-protein interaction motifs, WD40 and ankyrin, respectively; YCR033w (Snt1) protein contains the putative DNA-binding SANT domain; and Cph1 (cyclophilin A) is a prolyl isomerase. The variability of plausible cellular functions of Set3 interactors prompted questions about the unity of the isolated complex. How many distinct protein assemblies were involved, and were artifacts included?
|
The tagging of Set3C members indicated that three subunits were also engaged in specific interactions with other protein complexes. The interaction with importins Kap60 and Kap95 was only observed when Set3 was tagged. Similarly, tagged Hos2 pulled down seven of eight known subunits of the chaperonin complex, TRiC (which, unlike heat shock chaperones SSA and SSB, does not belong to common background proteins in TAP purifications (Table I)). No interaction with importins and/or TriC was detected when any other Set3C member was tagged.
Tagging Hst1 revealed that it is also engaged in another functionally distinct protein complex with YOR279c and Sum1. To validate its integrity, a third round of tagging and purification was performed on both Sum1 and YOR279c, thereby confirming the interactions among Hst1, YOR279c, and Sum1. Noticeably, no members of Set3C (other than Hst1) were detected in Sum1 and YOR279c precipitates. Thus by starting at a single entry gene, set3, sequential rounds of epitope tagging, immunoprecipitation, and mass spectrometry identified two novel functionally distinct protein complexes with plausible histone deacetylase activity, linked via a shared subunit, Hst1, and also indicated linkage of two Set3C subunits to other known complexes.
Analysis of these and other protein complexes allowed us to draw a few conclusions about strategies for the characterization of a protein "interactome," i.e. of a network of interacting protein complexes (46). Apparently IPs can pool together members of different protein complexes and therefore the "guilty by association" concept of defining what proteins belong to the complex is inherently error-prone. This may become a severe limitation for high throughput "parallel" analysis, in which bona fide interaction partners are established neither by functional experiments nor by sequential tagging and purification of other candidate subunits of the complex.
It is also important to distinguish proteins that represent a core of the complex and that are essential for its integrity, from proteins whose interaction with the complex is transient. Therefore even approximate estimation of stoichiometry of protein interactions adds vitally important pieces of information. Mass spectrometry is well suited to determine relative changes in the concentration of the same protein obtained under different experimental conditions (reviewed in Ref. 47). However, it is not straightforward to compare the concentration of different proteins present in the mixture. Amino acid composition of detected peptides strongly affects the signal intensity observed in a mass spectrum (48), and the pattern of peptide maps and the recovery of individual peptides depend on protein visualization and sample-processing protocols (49, 50). Gel electrophoresis and visualization of bands by Coomassie staining is less dependent on protein properties and is widely applied in expressional proteomic studies (51). Hst1, importins Kap95 and Kap60, and the TriC members were detected in apparently substoichiometric amounts compared with other subunits of Set3C, and thus their transient association with the core of the complex can be inferred. In the case of Hst1, this was confirmed by IP of intact Set3C from hst1 strain (45). Similarly, semiquantitative information, taken together with IP patterns of other tagged Set3C subunits, assisted in charting the boundaries between individual protein complexes pulled down by IP of TAP-tagged Set3C members.
Identified Protein Complexes and Segments of a Protein Interaction Network
In 48 successful IPs, interaction partners were determined for 38 baits (71%), and in 10 IPs only the bait protein was detected (Fig. 3). Noticeably, these 10 idle baits were of average molecular weight and pI. Three of them were abundant proteins (CBI > 0.2), and one gene is essential. The 38 successful baits pulled down a total of 220 interaction partners, which are members of 19 functionally distinct protein complexes with the average of 5.8 interactors per protein, a value that agrees with bioinformatic estimates (10, 52). The complexes comprised from three to 16 subunits and varied in function and cellular localization; however, none of identified preys was a membrane protein. We underscore here that no protein complexes were defined solely on the basis of a single IP experiment.
|
We found our data in noticeable disagreement with the complexity of protein interaction networks suggested by alternative genome mining approaches, such as two-hybrid screening and bioinformatics (1012). As similar discrepancies were observed previously in the analysis of affinity-purified protein complexes (25, 46), we believe the experiments point to some fundamental limitations, which should be further understood and accounted for in the elucidation of the molecular organization of the proteome.
Interaction Partners Identified by TAP and by Two-hybrid Screening
As mentioned above, a total of 48 proteins were successfully tagged, and interaction partners were identified in IPs of 38 baits. We compared further TAP-identified interactors with the ones suggested for the same baits by genome-wide two-hybrid screening (11, 12). Of the 48 baits, 2HY screening defined interaction partners for 35, with 165 interactors in total. Comparison to the set of 220 interactors identified by TAP and mass spectrometry revealed that only 23 proteins (14%) between these two sets overlapped (Fig. 4A).
|
![]() |
CONCLUSION AND PERSPECTIVES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
At present characterization of protein complexes by mass spectrometry is mostly limited to qualitative description of their composition and interactions. However, it is becoming apparent that methods should be developed to describe quantitatively the stoichiometry of protein-protein interactions.
These problems are challenging, but recent developments both in mass spectrometry and in gene manipulation technology suggest these goals are within reach. Deciphering of protein complexes will produce unique information for the understanding of functional organization of genomes of higher eukaryotes, including the human genome.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
Published, MCP Papers in Press, February 1, 2002, DOI 10.1074/mcp.M200005-MCP200
1 The abbreviations used are: TAP, tandem affinity purification; CBI, codon bias index; IP, immunoprecipitation; MALDI MS, matrix-assisted laser desorption/ionization mass spectrometry; M(ox), methionine sulfoxide; nanoES MS, nanoelectrospray mass spectrometry; ORF, open reading frame; TEV, tobacco etch virus; 2HY, two-hybrid screening.
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
¶ To whom correspondence may be addressed. E-mail: stewart{at}mpi-cbg.de.
|| To whom correspondence may be addressed. Tel.: 49-351-210-2615; Fax: 49-351-210-2000; E-mail: shevchenko{at}mpi-cbg.de.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|