From Genomics, Technische Universität Dresden, c/o Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany; and ¶ Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Documented protein-protein interactions and the composition of native protein complexes are very valuable resources, although biological interpretation of this knowledge is not straightforward. So far the concordance of the results obtained by two independent (although similar) protein tagging approaches has been rather poor (1, 2, 4). Furthermore, these data are also in poor concordance with a dataset obtained by two-hybrid screening (5) or inferred via various bioinformatic approaches (6, 7). Although the availability of complementary data is always a positive factor, it seems rather unlikely that observed discrepancies and excessive complexity of protein assemblies could be attributed solely to errors in analytical methods.
Bruce Alberts pictured the cell as a "collection of protein machines" (8). Comparison of the composition and linkages of similar machines in phylogenetically diverged organisms could provide insight into their molecular architecture, regulation, and involvement in various intracellular processes. The fission yeast Schizosaccharomyces pombe is an appropriate organism to validate and extend our understanding of the functional organization of the proteome of Saccharomyces cerevisiae. The tandem affinity purification (TAP)1 procedure (911), which has been successfully employed in purifying protein complexes from the budding yeast, also works in the fission yeast (12, 13). In many (although not in all) cases, bioinformatics can be used either to identify orthologous proteins in the two organisms by homology searches and close inspection of aligned full-length sequences, or to limit the selection to a small number of plausible candidates whose sequences share reasonable percentage of identity and/or display similar functional domains. At the same time, the two organisms are quite distant phylogenetically and have remarkably different physiology (14, 15). The completely sequenced genome of S. pombe (16) provides a valuable resource for facile mass spectrometric identification of isolated proteins.
A combination of the TAP method and "shotgun" mass spectrometric sequencing (17) was applied for comparative characterization of orthologous complexes involved in splicing (18) and cell cycle regulation (19). A remarkable conservation of the composition of orthologous complexes was reported, and a few novel interactors were discovered. However, the shotgun approach lacked aspects of quantification to determine if novel proteins were present in the stoichiometric or substoichiometric amounts, compared with the conserved core subunits. Furthermore, it is not uncommon that proteins are shared between individual protein complexes (4, 11) (we termed such proteins "proteomic hyperlinks"). If a hyperlink protein was inadvertently used as a bait, a mixture of subunits from two or more protein complexes might be isolated. Therefore, phylogenetic interpretation of the differences in identified proteomic environments could be ambiguous. It would be difficult to distinguish if the core of the complex was altered by adding/removing another subunit, or a new interactor was identified, or a novel association represents a yet unknown hyperlink to another individual complex.
In recent years, we and others successfully applied sequential epitope tagging immunoaffinity chromatography and mass spectrometry (SEAM) to characterize protein complexes and segments of protein interaction networks in the budding yeast (4, 11, 20). Although laborious, the approach enabled us to make clear distinction between individual complexes and to identify relevant proteomics hyperlinks. Using the TAP method, we previously isolated and characterized the Sc_Set1C complex (termed after the set1 gene, whose sequence possesses a characteristic SET domain (21)) that methylates lysine 4 in histone H3 and is implicated in epigenetic regulation (22, 23).
Sc_Set1C is comprised of eight subunits, seven of which pulled down the same eight proteins with similar relative stoichiometry upon TAP tagging and immunoaffinity isolation (22). However, the eighth protein, Swd2, was the notable exception. The pool of proteins co-isolated with Swd2 included the members of Sc_Set1C and nine members of another yeast complex termed CPF for cleavage and polyadenylation factor.
Subsequently, two other groups independently confirmed that Swd2 is a bona fide member of the budding yeast CPF, although its association with Sc_Set1C was not reported (1,24). Taken together, these data suggest that Swd2 protein is a subunit of two independent complexes, Sc_Set1C and Sc_CPF, and hyperlinks the histone methylation and polyadenylation machinery in the budding yeast. Although both complexes act at the site of active transcription, the significance of this hyperlink remains elusive.
Here we report the application of a comparative proteomic analysis of the proteomic environments of the orthologous Set1 methyltransferases in the budding and fission yeasts.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The procedure for the purification of protein complexes in S. pombe was essentially the same as employed previously in S. cerevisiae. The breaking and extraction of yeast cells was performed as described by Logie and Peterson (26). TAP purification was performed according to Rigaut et al. (9), with the following modifications: 10 ml supernatant collected after 43,000 rpm centrifugation were allowed to bind to 200 µl IgG Sepharose (Amersham Pharmacia Biotech, Piscataway, NJ), equilibrated in buffer E (27) for 2 h at 4 °C using a disposable chromatography column (Bio-Rad, Hercules, CA). Two to three column volumes (the equivalent of 46 liters of yeast culture at optical density at 600 nm between 2 and 3) were used per purification. The IgG Sepharose column was washed with 35 ml of buffer E without proteinase inhibitors, followed by 10 ml of the tobacco etch virus (TEV) cleavage buffer. Cleavage with TEV was performed using 10 µl (100 U) rTEV (Life Technologies, Inc., Grand Island, NY) in 1 ml cleavage buffer for 2 h at 16 °C. Calmodulin Sepharose (Stratagene, La Jolla, CA) purification was performed as described (11). Purified proteins were concentrated according to Wessel and Fluge (27).
Analysis H3-K4 Methylation in Sp_Swd2.1 and Sp_Swd2.2 Deletion Mutants
Crude cell extracts from exponentially growing cells were prepared by glass bead lysis. Proteins were separated on 15% SDS polyacrylamide gel and blotted onto nitrocellulose membrane following the manufacturers instructions. For detecting lysine 4 (K4) methylated histone H3, an antibody recognizing di- and trimethylated forms of K4 was used (Abcam, Cambridge, UK). Secondary antibody was anti-rabbit IgG-HRP conjugate (Amersham Biosciences, Piscataway, NJ). The signals were visualized using the enhanced chemiluminescence system (Amersham Biosciences).
Identification of Proteins by Mass Spectrometry
Proteins were separated by electrophoresis using gradient (618%) one-dimensional polyacrylamide gels and visualized by staining with Coomassie. Protein bands were excised and in-gel digested with trypsin (Roche Diagnostics, Indianapolis, IN) as described previously (28). Proteins were identified by a combination of matrix-assisted laser desorption/ionization mass spectrometry (MALDI MS) peptide mapping and nanoelectrospray tandem mass spectrometry (nanoES MS/MS) sequencing as described (29). Briefly, 1-µl aliquots were withdrawn from the in-gel digests and analyzed on a REFLEX IV mass spectrometer (Bruker Daltonics, Billerica, MA) on AnchorChip targets (Bruker Daltonics) as described (30). If no conclusive identification was achieved, gel pieces were extracted with 5% formic acid and acetonitrile. Unseparated mixtures of recovered tryptic peptides were sequenced by nanoES MS/MS on a QSTAR Pulsar i quadrupole time-of-flight mass spectrometer (MDS Sciex, Concord, Canada). Database searching with MALDI time-of-flight peptide mass maps and with uninterpreted tandem mass spectra was performed against a database of S. pombe proteins using Mascot software (Matrix Science Ltd., London, UK) installed on a local server. Hits with the MOWSE score exceeding 51 (the threshold score suggested by Mascot) were considered significant, but were accepted only upon manual inspection. Borderline hits were additionally verified by nanoES MS/MS.
The list of proteins identified in each pulldown experiment with relevant identification details is presented in the supplemental material.
![]() |
RESULTS AND DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the budding yeast, TAP of protein complexes was variably accompanied by a co-isolation of a common set of highly abundant background proteins, including housekeeping proteins, metabolic enzymes, and ribosomal proteins (4). We found that persistent background proteins in S. pombe were different from S. cerevisiae, although, once again, highly abundant proteins, including housekeeping proteins and components of the protein synthesis machinery, were mostly observed (Table I).
|
|
|
Orthologous Complexes Sc_Set1C and Sp_Set1C
The genome of S. pombe encodes for the orthologous protein Sp_Set1, which shares 29% of the full-length sequence identity with its S. cerevisiae homologue. We further characterized the organization of the Set1-anchored proteomic environment in S. pombe (Fig. 2, Table II, and supplemental material). Similar to Sc_Set1C, Sp_Set1C also comprises eight subunits that (with the exception of Sp_Shg1) were the most close homologues of the corresponding members of Sc_Set1C. Sp_Set1C includes the trxG (trithorax group) homologue Sp_Ash2. Unlike its S. cerevisiae homologue Sc_Bre2, Sp_Ash2 possesses a PHD-finger domain, which is a hallmark motif of proteins implicated in chromatin regulation (31). Also, Sc_Shg1 is only distantly similar to Sp_Shg1 (10). Both proteins share a short region of similarity at their N termini and have similar elements of the predicted secondary structure (data not shown). Sequence similarity, although weak, was recognizable by conventional and back-BLAST searches using relaxed substitution matrices (such as, PAM30 and PAM70) as well as by Smith-Waterman searches. Having characterized Sp_Set1C, we established the highly conserved nature of this complex in S. cerevisiae and S. pombe, which could not be deduced by simple extrapolation from the known composition of Sc_Set1C. In particular, the biochemically identified Sp_Set1C subunits Shg1, Ash2, Swd2.1, and Swd3 could not be confidently predicted as Sp_Set1C members by reliance on bioinformatics.
|
A Hyperlink Between Set1C and CPF Complexes in S. cerevisiae and S. pombe
In S. cerevisiae, Swd2 was identified as a hyperlink between Sc_Set1C and Sc_CPF (Fig. 2). The genome of S. pombe encodes for two relatively distant homologues of Sc_Swd2, namely SPBC18H10.06c (now termed Sp_Swd2.1) and SPAC824.04 (now termed Sp_Swd2.2), sharing 34 and 32% of sequence identity with Sc_Swd2, respectively and sharing 30% of sequence identity between themselves. By tagging Sp_Set1C members, we established that Sp_Swd2.1 is a member of Sp_Set1C (Fig. 2). However, tagging Sp_Swd2.1 (Fig. 1A) or subunits of Sp_CPF (see supplemental materials) did not provide any evidence that it interacts with the Sp_CPF complex.
We therefore asked if the fission yeast CPF complex includes the other Swd2 homologue, Sp_Swd2.2, and if it might represent a yet undetected hyperlink between Sp_Set1C and Sp_CPF. To elucidate the proteomic environment of Sp_Swd2.2, the protein was tagged and used as a bait in the immunoaffinity purification. It pulled down the Sp_CPF complex, and, importantly, no members of Sp_Set1C complex were detected (Fig. 3). To confirm that Sp_Swd2.2 is a core subunit of the Sp_CPF complex, we further tagged its conserved member Sp_Ysh1. Sp_Ysh1 was chosen because it is a key component of Sc_CPF, and its sequence is remarkably conserved between S. cerevisiae and S. pombe with no other clear homologues shared between these two genomes. Using Sp_Ysh1-TAP as a bait, we pulled down the same subunits of CPF, including Sp_Swd2.2, but no Sp_Swd2.1 was observed in a detectable amount. Taken together, these data suggested Sp_Swd2.2, but not Sp_Swd2.1, is a genuine member of Sp_CPF (Fig. 3).
|
The Function of Swd2 Paralogues Is Completely Diverged in S. pombe
We further investigated if Sp_Swd2.1 and Sp_Swd2.2 can substitute each other in Sp_Set1C and Sp_CPF complexes. To this end, we constructed a strain in which the Sp_Swd2.2 gene was deleted and Sp_Swd2.1 was tagged. The mutant strain grew slower compared with the wild-type strain and other strains with TAP-tagged proteins. Although much less protein material was purified, in a pull down experiment with Sp_Swd2.1 we were able to identify all subunits of Sp_Set1C except Sp_Sdc1, which produces only one peptide upon its digestion with trypsin and was undetectable in a heavy mixture with low molecular mass background proteins. At the same time, no subunits of Sp_CPF were detected (Fig. 4), suggesting that Sp_Swd2.1 cannot substitute for Sp_Swd2.2 in Sp_CPF.
|
We therefore concluded that in the fission yeast, duplicated swd2 genes are functionally specialized (33, 34), with Sp_Swd2.1 protein being a member of Sp_Set1C and Sp_Swd2.2 a member of Sp_CPF (Fig. 3).
A Protein Complex in S. cerevisiae that Is Orthologous to Sp_Lid2C
As was demonstrated above, Sp_Ash2 and Sp_Sdc1 hyperlink Sp_Set1C and Sp_Lid2C complexes (Fig. 2). However, no such link to a complex similar to Sp_Lid2C was observed in S. cerevisiae, although both S. pombe proteins share significant sequence identity with their closest homologues in S. cerevisiae (Table II). We therefore attempted to isolate a protein complex orthologous to Sp_Lid2C from the budding yeast and to determine if it is hyperlinked to Sc_Set1C.
The genome of S. cerevisiae encodes for proteins YJR119c and Snt2 (YGL131c), which share 23 and 22% of full-length sequence identity with Sp_Lid2 and Sp_Snt2, respectively, and display a very similar composition of functional domains. We tagged Sc_Snt2 and YJR119C and isolated their interaction partners, but no proteins homologous to members of Sp_Lid2C were identified (data not shown).
Taken together, the data suggested that no protein complex orthologous to Sp_Lid2C exists in S. cerevisiae, despite the presence of a few reasonable sequence homologues in its genome. We speculate that Sp_Lid2C is possibly involved in H3K9 methylation, which does not occur in S. cerevisiae but is found in S. pombe and in humans (35, 36).
Although genes encoding for subunits of Sc_Set1C complex (other than Swd2) are nonessential, and certain members of Sc_Set1C and Sp_Set1C display only marginal sequence similarity (Table II), the overall composition of these complexes is well conserved. However, orthologous Sp_Set1C and Sc_Set1C complexes function in a differently "wired" proteomic network that comprises conserved (Sc_CPF and Sp_CPF) and nonconserved (Sp_Lid2C and Sc_Snt2C) protein assemblies.
Conserved Composition and Variable Hyperlinks of Orthologous Complexes
Although Set1 complexes in S. cerevisiae and S. pombe are highly conserved, overall their proteomic environment differs substantially. Similar observations were previously made for many orthologous protein complexes characterized in different species (reviewed in Ref. 37).
Our data underscore the value and importance of maximal possible characterizations of the compositions of protein complexes, especially when considering a phylogenetic perspective. Using two entry points, Sc_Swd1 and Sc_Swd3 proteins, Gavin et al. (1) identified a complex (termed complex #108) with a very similar composition to Sc_Set1C. The complex #108 missed two subunits (Swd2 and Shg1) and, consequently, a hyperlink to Sc_CPF complex via Sc_Swd2. At the same time, complex #108 comprised three other proteins (yeast.cellzome.com), whose relation to Sc_Set1C was not independently confirmed. Gavin et al. also tagged seven out of 20 known subunits of Sc_CPF and detected Sc_ Swd2 in all affinity purifications (1). However, Sc_Swd2 itself was not tagged, and its relation to Sc_Set1C was not established. Missed interactions or artificially merged individual protein complexes hampered further comparison of Set1 proteomic environments and reasonable projection of their organization and function to mammalians.
The human genome encodes for two Set1-related genes: KIAA0339 and KIAA1076, sharing 35 and 37% identity to Sc_Set1 and 55 and 45% identity to Sp_Set1. KIAA0339 is engaged in a partially characterized complex comprising at least hAsh2 and WDR5 (a human homologue of Sc_Swd3) (38). This putative Hs_Set1C is also involved in H3K4 methylation (38). The entry point to Hs_Set1C, protein HCF-1, has no apparent homology to any of the core members of Sp_Set1C or Sc_Set1C but is associated with human Sin3 histone deacetylase (HDAC). We therefore speculate that the partial purification of Hs_Set1C was achieved via a hyperlink protein.
Taken together, our data and other published evidence strongly suggest that although orthologous protein complexes may be remarkably conserved, their proteomic environment and hyperlinks to other complexes are not. Furthermore, we propose that the conservation of the core and variability of links represents a common phenomenon in the molecular organization of eukaryotic proteomes.
![]() |
CONCLUSION AND PERSPECTIVES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The comparative analysis of proteomic environments in a multiorganismal perspective offers an intriguing opportunity to extend and complement our understanding of how the evolution of genomes guides the evolution of protein machines. Comparative studies may reach far beyond simple cataloguing of observed differences. Rather, together with advanced bioinformatic approaches, correlations of concerted alterations in sequences of orthologous subunits could highlight functional specializations.
The multiorganismal approach in functional proteomics will likely require biochemical isolation of complexes and identification of cognate proteins beyond the boundaries of known genomes. Although lacking exact protein sequences in a database hampers the identification of proteins, a substantial coverage of yet unknown proteomes might be achieved by sequence-similarity searches (reviewed in Ref. 37). The bottleneck (and the likely focus of further efforts) is in the development of a generic approach for isolating protein complexes from cells and tissues of vertebrate organisms, which might be overcome by advanced genetic engineering methods in the future (39).
![]() |
FOOTNOTES |
---|
Published, MCP Papers in Press, November 15, 2003, DOI 10.1074/mcp.M300081-MCP200
This paper is available on line at http://www.mcponline.org
1 The abbreviations used are: TAP, tandem affinity purification; MALDI MS, matrix-assisted laser desorption/ionization mass spectrometry; nanoES MS/MS, nanoelectrospray tandem mass spectrometry; SEAM, sequential rounds of epitope tagging, immunoaffinity chromatography, and mass spectrometry; TEV, tobacco etch virus; Sc_XXX and Sp_XXX, protein XXX from S. cerevisiae and S. pombe, respectively; Sc_XXXC and Sp_XXXC, protein complex XXX from S. cerevisiae and S. pombe, respectively; CPF, cleavage and polyadenylation factor.
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
S The on-line version of this article (available at http://www.mcponline.org/) contains supplemental material.
A. R. and A. S. contributed equally to this work.
|| To whom correspondence should be addressed. E-mail: shevchenko{at}mpi-cbg.de and stewart{at}mpi-cbg.de
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|