A New Approach for Plant Proteomics
Characterization of Chloroplast Proteins of Arabidopsis thaliana by Top-down Mass Spectrometry*
Vlad Zabrouskov
,
Lisa Giacomelli
,
Klaas J. van Wijk
and
Fred W. McLafferty
,¶
From the
Department of Chemistry and Chemical Biology and the
Department of Plant Biology, Cornell University, Ithaca, New York 14853
 |
ABSTRACT
|
---|
A recently developed methodology for the characterization of complex proteomes, top-down Fourier transform mass spectrometry (FTMS), is applied for the first time to a plant proteome, that of the model plant Arabidopsis thaliana. Of the 3000 proteins predicted by the genome sequence, 97 were recently identified in two separate "bottom-up" mass spectrometry studies in which the proteins were purified and digested and in which the mass spectrometry-measured mass values of the resulting peptides matched against those expected from the DNA-predicted proteins. In the top-down approach applied here, molecular ions from a protein mixture are purified, weighed exactly (±1 Da), and fragmented in the FTMS. Of the 22 molecular weight values found in three isolated mixtures, 7 were chosen, and their primary structures were fully characterized; in only one case was the bottom-up structure in full agreement. The top-down technique is not only efficient for identification of the DNA-predicted precursors, such as that of a protein present as a 5% mixture component, but also for characterization of the primary structure of the final protein. For two proteins the previously predicted cleavage site for loss of the signal peptide was found to be incorrect. Two 27-kDa proteins are fully characterized, although they are found to differ by only 12 residues and 6 Da in mass in a 3:1 ratio; the bottom-up studies did not distinguish these proteins. Direct tandem mass spectrometry dissociation of two 15-kDa molecular ions showed >90% sequence similarity, whereas three-stage mass spectrometry traced their +14-Da molecular mass discrepancies to an unusual N-methylation on the N-terminal amino group; the bottom-up approach identified only one precursor protein. The high potential of the top-down FTMS approach for characterization as well as identification of complex plant proteomes should provide a real incentive for its further automation.
Elucidation of the DNA sequences for bacterial, animal, and human genomes has revolutionized characterization of their expressed proteins (1, 2). In a similar fashion, the recent sequencing of the genome for Arabidopsis thaliana (3, 4) has made plant proteomics an exciting research field (5, 6). The plant cell has unique organelles such as the chloroplast, which is essential not only in photosynthesis but also in biosynthesis, for example, of lipids and amino acids (7). Of the
3000 proteins predicted for the chloroplast of Arabidopsis, identification of 97 as the precursors of isolated proteins has been achieved with two studies (5, 6) utilizing the bottom-up mass spectrometry (MS)1 technique (810); the study in the laboratory of one of the present authors (K. J. vW.) identified 81 proteins (5). However, this is far less effective for characterization of the primary structure of an isolated protein, such as determining the cleavage site for loss of the signal peptide and locating posttranslational modifications, as well as distinguishing highly similar proteins. For this, the recently developed technique of "top-down" MS has uniquely valuable attributes (1118). Here we directly compare the two MS methods for both the identification and characterization of seven chloroplast proteins.
In the common bottom-up approach the protein is first purified and cleaved into peptides (e.g. with trypsin), whose relative molecular weight (Mr) values are measured by MS (810). The resulting "peptide mass fingerprint" is matched against those expected from the DNA-predicted proteins to identify the precursor. MS/MS of individual peptides can provide more specific information for identifications of higher confidence, especially valuable for proteins of lower purity (19). However, to characterize the structure of the isolated protein, i.e. to identify all differences between the predicted and actual protein sequence, requires Mr or MS/MS data from peptides representing all of the protein, which is made difficult by the typical sequence coverage of 1540% (5, 8, 9, 20). Further, peptide mass differences thought to be due to RNA editing, alternative splicing, signal peptide cleavage, and posttranslational modifications could also be due to impurities or self-proteolysis (1, 2, 5). The signal peptide cleavage site may be determined by an additional step of Edman degradation (5) or predicted theoretically (21). However, the high identification value of the well established bottom-up method has led to its convenient automation for routine samples.
In the alternative top-down approach (1118), careful purification is not required, and the protein mixture without digestion is introduced directly into the Fourier transform MS instrument (22) using electrospray ionization (ESI) (23, 24). Application to trace level proteins is illustrated here with a 5% mixture component; top-down characterization of 1% mixture components has been reported (17, 18). The resulting mass spectrum shows accurate Mr values for the proteins present in the mixture. After high resolution separation of the molecular ions of an individual protein, these are dissociated (2532), and the resulting fragment masses are matched against the DNA-predicted sequence to identify the protein. If its Mr value, minus that of the predicted signal peptide lost, does not match the measured value, such discrepancies in the fragment masses can then locate multiple modifications or sequence alterations (1118). This approach is applied here to find accurate (±1 Da) Mr values for 22 proteins and to identify and characterize seven proteins, all from the three soluble proteomes (thylakoid peripheral, thylakoid lumen, and stroma) of the chloroplast of A. thaliana identified previously by bottom-up MS (5, 6).
 |
EXPERIMENTAL PROCEDURES
|
---|
Plant Growth and Protein Isolation
A. thaliana, ecotype Columbia, was grown on soil under a 10-h light (21 °C) and 14-h dark (16 °C) cycle at a light intensity of 100 µmol·m-2·s-1. All preparations were carried out in dim, green light at 4 °C. The leaves were homogenized in buffer (50 mM Hepes-KOH, pH 7.2, 100 mM sorbitol, 4 mM ascorbic acid, 7 mM L-cysteine) and filtered through four layers of Miracloth (Calbiochem, 22 µm), and the suspension was centrifuged for 3 min at 1850 x g. The resulting pellets, resuspended in the same buffer, were centrifuged on a Percoll (Sigma) step gradient (2040-80%) for 10 min at 3000 x g to produce two green bands. The upper band was collected, washed once in grinding buffer and then in buffer with 5 mM MgCl2, and resuspended in the buffer with 1 mM Na-EDTA and nine protease inhibitors (antipain, bestatin, chymostatin, E64, leupeptin, pepstatin, pefablok, phosphoramidon, and apoprotin (50, 40, 20, 10, 5, 0.7, 50, 10, and 2 µg·ml-1, respectively)) at a concentration of 2 mg of chlorophyll/ml. This was passed three times through a Yeda press at 100 bars, and the solid thylakoid membranes were removed by ultracentrifugation (20 min at 150,000 x g). The membrane-free supernatant was precipitated with 20% trichloroacetic acid, washed twice with acetone, and resuspended (20 mg of protein·ml-1) in 50 mM Tris-HCl (pH 7.8), 6 M urea, and 10 mM dithiothreitol to produce the concentrated thylakoid lumen proteins. The thylakoid membranes were stirred on ice with 0.1 M Na2CO3 (1 mg of chlorophyll·ml-1) for 30 min with sonication every 15 min for 30 s and spun for 60 min at 150,000 x g, and the membrane-free supernatant was concentrated with a 10-kDa cutoff Ultracone filter (Millipore Inc.) to 5 mg·ml-1. The proteins were precipitated in 80% acetone at -20 °C, washed several times with acetone, semidried in a stream of nitrogen, and resuspended in 50 mM Tris-HCl (pH 8.1) containing 6 M urea and 20 mM dithiothreitol to produce the concentrated thylakoid peripheral proteins. To isolate chloroplast stroma proteins, leaves were treated similarly except that they were ground in 330 mM sorbitol; the centrifuged and resuspended pellets were loaded on a 4085% Percoll (Sigma) step gradient and centrifuged for 10 min at 3750 x g. The interface was harvested, diluted 10x with grinding medium, and centrifuged for 3 min at 1300 x g. The pellet was resuspended in 10 mM Hepes-KOH (pH 8), 5 mM MgCl2, and the nine protease inhibitors, and this was gently homogenized to release the stroma proteins. The membranes were spun down (150,000 x g for 25 min), and the supernatant was concentrated to 10 mg·ml-1 to produce the concentrated stroma protein fraction.
Size Exclusion Chromatography (SEC) and Ion Exchange Chromatography
Thylakoid lumen proteins were separated by SEC on a 50 x 0.7-cm column (BioRad) packed with Sephacryl 200 (Amersham Biosciences) using 50 mM Tris-HCl (pH 7.8) with 0.1 M NaCl as a mobile phase at 55 µl·min-1 at 4 °C. Fractions were concentrated to 35 mg of protein·ml-1. Thylakoid peripheral proteins were ion exchange chromatography-fractionated on a Q Hi-trap bullet column (Amersham Biosciences) and eluted with a linear gradient of NaCl (01 M). Native stroma extract was loaded on the 100 x 1.2-cm SEC column (BioRad) packed with Sephacryl 500 (Amersham Biosciences) and separated using 50 mM Tris-HCl (pH 7.8) as a mobile phase at 0.5 ml·min-1 (4 °C). The rubisco-containing fraction, eluting as a 550-kDa complex, was denatured in 6 M urea and 20 mM dithiothreitol, and the resulting small and large subunits were separated by SEC as described for thylakoid lumen.
Proteolysis
Protein digestions with Lys-C, Glu-C, and Asp-N (Roche Applied Science) were performed according to the manufacturers instructions. Digestion with Glu-C was carried out for 30 min at 25 °C, Arg-C for 2 h at 37 °C, and Asp-N for 4 h at 37 °C.
MS Analysis
Samples were desalted on a reverse-phase protein trap (Michrom Bioresources Inc.), washed with 2 ml of 0.1:99:0.5 MeCN/H2O/CH3COOH, and eluted with 150 µl of 50:45:5 MeCN/H2O/CH3COOH. This eluent was loaded into a nanospray ESI emitter with a 24-µm inner diameter tip with 1.01.5 kV versus the MS inlet, producing a flow rate of 20300 nl·min-1. The resulting ions were guided into the ion cell (10-9 torr) of a modified 6 T Finnigan FTMS device (22). Fragmentation was achieved by "in-beam"-activated ion electron capture dissociation (ECD) (31), plasma ECD (32), or infrared multiphoton dissociation (IRMPD) (26) for ions entering the FTMS cell or by isolating specific ions in the cell using stored waveform inverse Fourier transform (SWIFT) (33) followed by IRMPD or collisionally activated dissociation (CAD) (25, 27). Short sequence tags deduced from the fragment masses were used to search the public database of A. thaliana (34). Alternatively, the unprocessed spectral data were searched using the ProSight search engine,2 specifically developed for top-down FTMS by Kelleher and co-workers (16, 35). Assignments of fragment masses were made with the computer program THRASH (36). The mass difference (in units of 1.00235 Da) between the most abundant isotopic peak and the monoisotopic peak is denoted in italics after each Mr value.
 |
RESULTS
|
---|
The chloroplast proteome from A. thaliana leaves was separated (5) into three samples, those that enriched the thylakoid peripheral (membrane-associated) proteins, the thylakoid lumen proteins, and the stroma proteins. For the first two samples, SDS gel images indicating the molecular weights of the major components are shown in Figs. 1 and 2 (second column). These samples were further separated by SEC; the effectiveness of this is indicated by gel images of the fractions (Fig. 2).

View larger version (28K):
[in this window]
[in a new window]
|
FIG. 1. Direct protein identification in peripheral lumen by CAD. A, ESI mass spectrum of the thylakoid peripheral protein sample. Right, one-dimensional SDS-PAGE of the same sample (first column, calibrants). Below, expanded 16,309.7-10 molecular ions (circles, theoretical abundance distribution of the best fit to the experiment). B, partial CAD MS/MS spectrum of these ions after their SWIFT selection. C, partial CAD MS3 spectrum of the MS/MS 16,121.8-10 fragment ions.
|
|

View larger version (44K):
[in this window]
[in a new window]
|
FIG. 2. Visualization of soluble lumen proteins. Top, one-dimensional SDS-PAGE of fractions 1018 from SEC separation of the thylakoid lumen sample. Below, ESI mass spectrum of fraction 15 with insets of expanded molecular ion regions. Asterisks, noise peaks (no isotopes).
|
|
Mass Spectrometry
Direct electrospray ionization of the protein mixtures gives mass spectra such as that of the thylakoid peripheral sample (Fig. 1A). For the peak m/z values, m = (Mr + nH)n+, where Mr is the relative molecular weight and z = n, the number of positive charges of the ion. Thus, the mass spectrum indicates the presence of proteins of Mr values 9704-5, 10,867-6, 11,701-7, 13,411-8, 15,524-5, 16,310-10, 20,211-12, 26,567-16, and 26,752-17. The ESI spectrum of SEC fraction 15 from the thylakoid lumen sample (Fig. 2) shows Mr values of 7482-4, 8381-6, 10,452-6, 16,310-10, 16,349-10, and 20,211-12, whereas that of the stromal proteins (Fig. 3) shows Mr values of 9705-5, 10,886-6, 13,183-8, 13,423-8, 13,468-8, 13,772-8, 13,817-8, 14,712-9, 14,810-10, 16,310-10, 16,350-10, 20,213-12, 26,568-16, and 26,753-17. Mr values of 16,310 and
20,212 appear in all three samples.

View larger version (28K):
[in this window]
[in a new window]
|
FIG. 3. Top-down mass spectrometry of soluble stromal proteins. Top, ESI mass spectrum of the stromal protein sample. Below, cleavage assignments from the ECD, CAD, and IRMPD spectra of the 14,810.4-9 (left) and 14,712.2-9 (right) molecular ions to the DNA-predicted sequences for At5g38410 (left) and At1g67090 (right) with sequence tags shaded.
|
|
Tandem mass spectrometry provides primary sequence information by MS isolation and dissociation of individual molecular ions. Backbone cleavages on either side of an amino acid can identify it by the mass difference in the resulting product ions (Figs. 1B, 4, 5, 6, and 7B); such cleavages for adjacent residues provide a "sequence tag" (37), such as Ala-Gln/Lys-Leu/Ile-Gly-OH (C-terminal) in Fig. 1B. This process can be repeated (MS3); in the Fig. 1 spectrum, the 16,309.7-10-Da isotopic cluster was selected from the ESI spectrum and dissociated to produce fragment ions (Fig. 1B), of which the 16,121.9-10-Da ions were again selected and fragmented to produce the MS3 spectrum sequence tag (Fig. 1C). In the same way, in Fig. 7 spectra A
B
C represent MS3.

View larger version (24K):
[in this window]
[in a new window]
|
FIG. 4. Direct protein identification in soluble lumen by CAD. Two m/z regions of the CAD spectrum of the SWIFT-selected 16,348.8-10 molecular ions from Fig. 2 that show the sequence tag.
|
|

View larger version (46K):
[in this window]
[in a new window]
|
FIG. 5. Direct protein characterization in soluble lumen by CAD and IRMPD. Top, several m/z regions of the CAD spectra of the 20,211.3-12 molecular ions (19+) from Fig. 2 that show the sequence tag. Below, cleavage assignments from this full spectrum plus those from an IRMPD spectrum compared with the DNA-predicted sequence of protein At1g06680.
|
|

View larger version (58K):
[in this window]
[in a new window]
|
FIG. 6. Characterization of two similar peripheral lumen proteins. Top left, best fit (circles) of a single theoretical abundance distribution to the isotopic peak data. Top right, best deconvolution fit of two abundance distributions (3:1 ratio) for Mr values differing by 6.0 Da. Bottom, cleavage assignments from the ECD, CAD, and IRMPD spectra of the 26,567.9-16 ions from Fig. 1 to the DNA-predicted sequences for At5g66570 and At3g50820. For the latter, residues that are different are shown below the line, and cleavages that are unique are indicated by a small superscript circle.
|
|

View larger version (31K):
[in this window]
[in a new window]
|
FIG. 7. Localization of posttranslational modification by three-stage MS. A, ESI mass spectrum of the purified small subunit fraction of the rubisco complex. Inset, expanded 15+ molecular ions. B, IRMPD spectrum of all ions of the spectrum shown in A. C, CAD MS3 spectrum of b18 (left inset) 2161.2-Da MS2 fragment ions. Right inset, 2016.2-Da MS3 product ions.
|
|
Dissociation of more abundant molecular ions can also provide extensive sequence information throughout the protein (Figs. 3, 5, and 6); matching these against DNA-predicted sequences can characterize errors and posttranslational modifications (1118). The backbone -CO-NH- cleavages produced by the energetic methods CAD (25, 27) and IRMPD (26) are complementary to the backbone -NH-C
HR- cleavages from ECD (2830), allowing assignment of the protein terminus (N or C) contained by fragment ions from dual cleavages (38).
 |
DISCUSSION
|
---|
The bottom-up MS identification of the Arabidopsis proteins required two-dimensional isolation of the individual protein, proteolysis, and MS. Alternately, the top-down approach here employed more complex protein mixtures, e.g. 9, 6, and 14 components.
Visualization of Proteins in Crude Samples
Electrophoretic separations are conventionally employed for the qualitative assessment of the protein content of the sample. For this the ESI mass spectrum is quite complementary. For example, the Mr values from the thylakoid peripheral proteins of 11,701-7, 16,309-10, 20,211-12, and 26,567-16 correspond to electrophoretic bands (Fig. 1) and provide accurate (usually ±1 Da) molecular weight information. However, with the broad range of Mr values in this sample, the heavy (>45 kDa) proteins shown by electrophoresis do not appear in the ESI spectrum.
Direct Component Identification
For the thylakoid peripheral sample, the SDS-PAGE analysis indicates an
16-kDa protein in possibly 5% concentration (Fig. 1A). SWIFT (33) ejection of all other ions from the cell isolated the corresponding 16,309.7-10 isotopic cluster in fair signal/noise (Fig. 1). CAD of these ions gave detectable high mass products corresponding to sequential mass losses of 75.0 (16,310.0 - 16,235.0), 113.1, 128.1, and 71.0 Da (Fig. 1B). These differences are indicative of C-terminal Gly (57.0) + H2O, Leu/Ile (113.1), Lys (128.095) or Gln (128.06), and Ala (71.0). Searching the DNA-predicted protein sequences (34) for one with this C-terminal sequence found correspondence only for At4g21280 (34) (photosystem II oxygen-evolving complex) with its C-terminal sequence HO-Gly-Leu-Lys-Ala. However, this protein would have instead Mr = 16,123.4-10 after the predicted removal of the signal peptide (5), substantially lower than the measured value of 16,310-10 (±1 Da). To check this discrepancy, the dominant 16,122-10-Da isotopic peaks in the CAD MS/MS spectrum were of sufficient abundance to be fragmented again by CAD (MS3, Fig. 1C); this gave sequential terminal losses of 186.0, 113.2, 87.1, 113.2, and 128.1. These must be from the N terminus, as the next predicted losses from the C terminus of this fragment ion would be Lys (128 Da) and Ala (71 Da) (Fig. 1B). The predicted N terminus of At4g21280, after loss of the signal peptide, matches only part of the observed sequence tag, Ile (113.1 Da)-Ser (87.0)-Ile-Lys (128.1); thus, the protein N terminus is actually longer by a 186-Da fragment. The signal peptide loss did not include the next two predicted residues, Ala-Asp, whose mass sum of 186.1 Da corresponds to the 186-Da difference of the MS3 spectrum. Inclusion of these residues gives a predicted Mr value of 16,309.5-10, in close agreement with the measured value of 16,309.7-10. Although the molecular ion C terminus showed a high tendency for amino acid loss in MS2, the new C-terminal group of this product ion (39) has apparently stabilized this end so that further MS3 dissociation occurs predominantly at the N terminus. Thus, MS2 correctly identified the protein from among those predicted, but the accurate Mr value showed that the predicted sequence was incorrect, with the complementary sequence information of MS3 pinpointing the discrepancy to the N-terminal signal peptide cleavage.
Identification in SEC Fractions
Further concentration of the protein components can increase the amount of MS/MS sequence data and make such top-down identification more straightforward. In SEC fraction 15 from the thylakoid lumen sample, only six protein molecular ions (each in multiple charge states and m/z values) are significant (Fig. 2). CAD of the isolated 16+ ions of Mr = 16,348.8-10 shows a sequence tag of mass differences (Fig. 4) of 97.4 (Pro), 113.0 (Leu/Ile), 128.1 (Lys/Gln), 98.8 (Val), 114.0 (Gly2 or Asn), 96.9 (Pro), and 210.3 Da (Leu/Ile + Pro). Of the predicted proteins, the only match found in the database was At4g05180 (34) (also important for oxygen evolution) with the residues 311 sequence of Pro-Ile-Lys-Val-Gly-Gly-Pro-Leu-Pro but with Mr = 16,149.3-10 after the predicted removal of the signal peptide (5). The Mr value difference of 200.5 Da corresponds to the sum of the masses of the next amino acid residues in the DNA predicted sequence (before signal peptide loss), Glu and Ala (129.0 + 71.0 Da), showing that the mature protein also contains these N-terminal amino acids. In total, CAD produced four N-terminal b ions and 19 C-terminal y ions, confirming the predicted sequence and the absence of posttranslational modifications. Although the bottom-up methodology correctly identified the DNA-predicted precursor protein, top-down fully characterized the primary structure of the isolated protein.
For the Mr = 20,211.3-10 protein, IRMPD of the 18+ ions (Fig. 2) and CAD of the 18+ and 19+ ions yielded extensive sequence information (Fig. 5) that not only included a "tag" region (Ser, Glu)-Gly-Gly-Phe-(Asp, Asn2, Ala)-Val-Ala that matched the predicted protein At1g06680 (34) (oxygen evolution) but also gave a total of 27 fragment ions fully supporting this sequence without posttranslational modifications. Although this is also indicated by the match with the predicted Mr value of 20,212.4-10, cases are known in which an exact (±1 Da) Mr match gave an incorrect retrieval (15, 18). Here the bottom-up identification has been confirmed by top-down, but with far higher reliability.
Characterization of Proteins with Overlapping Molecular Ions
An important component of the thylakoid peripheral sample gives isotopic clusters corresponding to Mr = 26,567.9-16 in the ESI spectrum (Fig. 1), although the measured isotopic abundance distribution is significantly broader than that expected theoretically (Fig. 6, top left) for a single component (see other predicted/experimental matches in Figs. 1 and 2). IRMPD spectra of the isolated 26+, 27+, and 28+ ions gave a mass-difference sequence tag of (Asp, Tyr)-Ala-Ala-Val-Thr-Val-(Gln, Leu) that matched (Fig. 6) regions of two proteins, At5g66570 (34) (Mr = 26,565.3-16) and At3g50820 (Mr = 26,571.3-16); these paralogues of OEC33 are important for oxygen evolution. However, the actual mass values of the five b and eight y ions defining the sequence tag only matched those expected for Mr = 26,565.3-16. Using also the complementary MS/MS technique of electron capture dissociation, CAD and ECD (in-beam and plasma) spectra of the 26,567.9-16 isotopic cluster ions gave a total of 40 fragment ions that could be formed only from the 26,565.3-16 protein (Fig. 6), plus 33 more that could come from both proteins. However, nine fragment ions were also found that would be formed only by the 26,571.3-16 protein, clearly showing the presence of both paralogues. The bottom-up study found only one two-dimensional gel spot, whose proteolysis gave peptides consistent with either of the DNA-predicted precursors (5).
To compare the capabilities of the bottom-up methodology to identify these highly similar proteins using the higher FTMS performance, the protein mixture was concentrated further and subjected to partial proteolysis, and the ESI/FTMS spectrum was measured for the resulting peptide mixture. Glu-C digestion after SEC separation gave 72 peptide Mr values, average mass of
9695 Da, of which eight are assignable to 26,565.3-16 protein and four to 26,571.3-16 protein. Arg-C digestion after separation by ion exchange chromatography gave 110 peptide Mr values, average mass of
13,472 Da, of which the only assignable were two for 26,565.3-16 protein. Asp-N digestion after SEC separation gave 92 Mr values, average mass of
4365 Da, of which 14 matched the 26,565.3-16 protein and four matched the 26,571.3-16 protein. Thus, in total this bottom-up approach would have given eight Mr values that indicated the presence of the second protein, but with >80% unassignable Mr values as well as requiring further sample purification and three sample-consuming proteolytic procedures.
Relative Amounts of Proteins with Overlapping Molecular Ions
With MS/MS characterization of these unmodified paralogues with Mr = 26,565.3 and 26,571.3, the observed abundances of the isotopic peak cluster can now be deconvoluted into two predicted abundance distributions around these Mr values whose sum fits the observed distribution. A 3:1 ratio of the paralogues gives the fit shown in Fig. 6, upper right. In qualitative agreement, MS/MS gave a 40:9 ratio of the number of fragment ions uniquely originating from the paralogues, whereas proteolysis with three enzymes gave a 24:8 ratio of peptides uniquely originating from them.
Characterization of Posttranslational Modifications
MS indicated the stromal protein sample to be the most complex; of 14 Mr values, seven are between 13,182-8 and 14,810-9 (Fig. 3), a challenge for the protein purification used in the bottom-up approach. Of these proteins, the two most abundant show Mr = 14,712.2-9 and 14,810.4-9. Separate CAD spectra of their isolated 14+ molecular ions gave identical extensive sequence tags Cys-Leu/Ile-Ser-Phe-Leu/Ile-Ala-Tyr-Gln/Lys, although these identical mass differences were derived from non-identical fragment ion mass values. Matching of this tag versus the DNA-predicted sequence possibilities identified two paralogues, At5g38410 and At1g67090 (34), of the small subunit of rubisco that is involved in the CO2 fixation. The bottom-up studies only identified At1g67090 (5, 6).
However, for these closely related proteins, the predicted Mr values after signal peptide loss are 14 units low. To examine these discrepancies with a more concentrated sample, the 550-kDa rubisco complex (40) was isolated by SEC with SDS-PAGE identification and denatured, and the corresponding small subunit was isolated by SEC. Its ESI mass spectrum (Fig. 7A) is now dominated by these two components, and dissociation of these ions with IRMPD (Fig. 7B) and in-beam ECD gave product ions representing cleavage of 47 of 125 bonds in At5g38410 and 54 of 124 bonds in At1g67090 (Fig. 3). All of the N-terminal fragment ion masses are 14 Da higher than predicted; the smallest of these in both proteins limits the +14-Da modification to the first four residues. All of the C-terminal fragment masses agree with the predicted values; the +14-Da modification is limited in At5g38410 to the first two N-terminal residues by the y124 ions and to the first six residues in At1g67090 by the z119 ions.
The b18 ions from the two precursors, as well as the other N-terminal fragments of <22 residues, are of the same nominal mass (isobaric); the Lys-2 (128.095 Da) of At5g38410 is replaced by Gln (128.06 Da) in At1g67090 (Fig. 3). IRMPD of the mixed Fig. 7A ions produced the mixed b18 (2161.2 Da) in sufficient abundance so that MS3 dissociation gave a significant abundance of 2016.2-Da fragment ions (Fig. 7C). The difference of 145.0 Da demonstrates that the N-terminal Met (131.0 Da) has the 14-Da modification. Other peaks representing further losses of 128 (Lys or Gln), 99 (Val), and 186 Da (Trp) confirm the presence of the extra 14 Da. It is conceivable that these fragments originated from only one of the two b18 precursors; if the other contained no extra 14-Da group in its first four N-terminal residues, any corresponding fragment ions would be 14 Da higher in mass (vertical arrows, Fig. 7C). The absence of these peaks, as well as the expected similar fragmentation behavior of the two b18 ions, makes it highly probable that both have the extra N-terminal 14 Da. An extra methyl group is the most logical explanation, with N-methylation far more probable than elsewhere on the terminal Met. This rare posttranslational modification was recently found for the first time in plants, also in small rubisco subunits, with the N-terminal methylation characterized by electron ionization MS (41) of the first residue from Edman degradation. Again, MS3 has provided valuable sequence information at the fragment ion terminus farthest from the original cleavage that formed the MS2 fragment ion.
Conclusions
For the present bottom-up and top-down methodologies, the automation of the former makes it the better choice for the first identifications of the precursor proteins from a genome such as Arabidopsis. Bottom-up identified 97 proteins, whereas this study found 22 different protein molecular ions.
However, for the seven of these proteins studied further, the top-down approach has shown unique capabilities for characterization of a complex eukaryotic proteome. Extensive protein purification and enzymatic digestion are not required; an
5% component of the thylakoid peripheral sample was directly characterized by MS2 and MS3. The accurate mass values, ±1 Da or better, of the protein fragment ions can provide sequence tags for direct identification. Even higher mass accuracy has been demonstrated with FTMS (42), such as that sufficient for distinguishing the isobaric fragment ions containing Lys-2 (128.095 Da) of At5g38410 versus Gln-2 (128.06 Da) of At1g67090. Agreement between the Mr value of a predicted protein and the measured Mr value is a strong indication of the absence of posttranslational modifications. However, of the seven intact proteins here characterized directly, all but one, At1g06680, showed different experimental and predicted Mr values. For proteins At4g21280 and At4g05180, MS/MS and three-stage MS product ions corrected the predicted cleavage site for the loss of the signal peptide. Such data also identified two highly similar proteins differing in mass by only 6 Da (for which bottom-up (5, 6) only indicated that one or both were present); deconvolution of their isotopic peaks indicated relative amounts of 3:1. Finally, two similar proteins (of which only one was identified by bottom-up (5, 6)) were found to be posttranslationally methylated, and the unusual N-terminal site of the modification was pinpointed by three-stage MS. Although none of these successful primary characterizations required >50% sequence coverage in the MSn experiment, more extensive efforts with ECD, CAD, and IRMPD have cleaved 250 of 258 interresidual bonds in bovine carbonic anhydrase (43) with a single plasma ECD spectrum showing 183 different cleavages (32). Of particular importance for the future routine application of this top-down methodology to plant proteomes is the development of automated methods for sample separation, MSn, and data analysis, as pioneered by the Kelleher laboratory (16, 35).
 |
ACKNOWLEDGMENTS
|
---|
We are grateful to Yi Wang and Jean-Benoit Peltier for the help with purification of the small subunit of rubisco and to Ying Ge, Jimmy Ytterberg, Huili Zhai, HanBin Oh, Julian Whitelegge, and Kathrin Breuker for helpful discussions.
 |
FOOTNOTES
|
---|
Received, July 11, 2003, and in revised form, September 16, 2003.
Published, MCP Papers in Press, September 22, 2003, DOI 10.1074/mcp.M300069-MCP200
1 The abbreviations used are: MS, mass spectrometry; MS/MS, tandem mass spectrometry; MS2, tandem mass spectrometry; MS3, three-stage mass spectrometry; MSn, multistage mass spectrometry; ESI, electrospray ionization; FTMS, Fourier transform mass spectrometry; SWIFT, stored waveform inverse Fourier transform; CAD, collisionally activated dissociation; IRMPD, infrared multiphoton dissociation; ECD, electron capture dissociation; SEC, size exclusion chromatography; rubisco, ribulose-bisphosphate carboxylase/oxygenase. 
2 The Kelleher Group (Neil L. Kelleher and colleagues), ProSight PTM at prosightptm.scs.uiuc.edu/. 
* This work was supported by Grant MCB 0090942 from the National Science Foundation (to K. J. vW.) and Grant GM16609 from the National Institutes of Health (to F. W. M.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 
¶ To whom correspondence should be addressed. Tel.: 607-255-4699; Fax: 607-255-7880; E-mail: fredwmcl{at}aol.com
 |
REFERENCES
|
---|
- Williams, K. L., and Hochstrasser, D. F.
(1997)
Proteome Research: New Frontiers in Functional Genomics, pp.1
12, Springer-Verlag, Berlin
- Abbott, A.
(1999) A post-genomic challenge: learning to read patterns of protein synthesis.
Nature
402, 715
720[CrossRef]
- Salanoubat, M., Lemcke, K., Rieger, M., Ansorge, W., Unseld, M., Fartmann, B., Valle, G., Blocker, H., Perez-Alonso, M., Obermaier, B., Delseny, M., Boutry, M., Grivell, L. A., Mache, R., Puigdomenech, P., et al.
(2000) Sequence and analysis of chromosome 3 of the plant Arabidopsis thaliana.
Nature
408, 820
822[CrossRef][Medline]
- Tabata, S., Kaneko, T., Nakamura, Y., Kotani, H., Kato, T., Asamizu, E., Miyajima, N., Sasamoto, S., Kimura, T., Hosouchi, T., Kawashima, K., Kohara, M., Matsumoto, M., Matsuno, A., Muraki, A., et al.
(2000) Sequence and analysis of chromosome 5 of the plant Arabidopsis thaliana.
Nature
408, 823
826[CrossRef][Medline]
- Peltier, J. B., Emanuelsson, O., Kalume, D. E., Ytterberg, J., Friso, G., Rudella, A., Liberles, D. A., Soderberg, L., Roepstorff, P., von Heijne, G., and van Wijk, K. J.
(2002) Central functions of the lumenal and peripheral thylakoid proteome of Arabidopsis determined by experimentation and genome-wide prediction.
Plant Cell
14, 211
236[Abstract/Free Full Text]
- Schubert, M., Petersson, U. A., Haas, B. J., Funk, C., Schroder, W. P., and Kieselbach, T.
(2002) Proteome map of the chloroplast lumen of Arabidopsis thaliana.
J. Biol. Chem.
277, 8354
8365[Abstract/Free Full Text]
- Leister, D.
(2003) Chloroplast research in the genomic age.
Trends in Genetics
19, 47
56[CrossRef][Medline]
- Andersen, J. S., Svensson, B., and Roepstorff, P.
(1996) Electrospray ionization and matrix assisted laser desorption/ionization mass spectrometry: Powerful analytical tools in recombinant protein chemistry.
Nat. Biotechnol.
14, 449
457[Medline]
- Qin, J., and Chait, B. T.
(1997) Identification and characterization of posttranslational modifications of proteins by MALDI ion trap mass spectrometry.
Anal. Chem.
69, 4002
4009[CrossRef][Medline]
- Pandey, A., and Mann, M.
(2000) Proteomics to study genes and genomes.
Nature
405, 837
846[CrossRef][Medline]
- Kelleher, N. L., Taylor, S. V., Grannis, D., Kinsland, C., Chiu, H. J., Begley, T. P., and McLafferty, F. W.
(1998) Efficient sequence analysis of the six gene products (774 kDa) from the Escherichia coli thiamin biosynthetic operon by tandem high-resolution mass spectrometry.
Protein Sci.
7, 1796
1801[Abstract/Free Full Text]
- Kelleher, N. L., Lin, H. Y., Valaskovic, G. A., Aaserud, D. J., Fridriksson, E. K., and McLafferty, F. W.
(1999) Top down versus bottom up protein characterization by tandem high-resolution mass spectrometry.
J. Am. Chem. Soc.
121, 806
812[CrossRef]
- McLafferty, F. W., Fridriksson, E. K., Horn, D. M., Lewis, M. A., and Zubarev, R. A.
(1999) Biochemistrybiomolecule mass spectrometry.
Science
21, 1289
1290[CrossRef]
- Kelleher, N. L.
(2000) From primary structure to function: biological insights from large-molecule mass spectra.
Chem. Biol.
7, R37
R45[CrossRef][Medline]
- Meng, F., Cargile, B. J., Miller, L. M., Forbes, A. J., Johnson, J. R., and Kelleher, N. L.
(2001) Informatics and multiplexing of intact protein identification in bacteria and the archaea.
Nat. Biotechnol.
19, 952
957[CrossRef][Medline]
- Meng, F., Cargile, B. J., Patrie, S. M., Johnson, J. R., McLoughlin, S. M., and Kelleher, N. L.
(2002) Processing complex mixtures of intact proteins for direct analysis by mass spectrometry.
Anal. Chem.
74, 2923
2929[CrossRef][Medline]
- Ge, Y., Lawhorn, B. G., ElNaggar, M., Strauss, E., Park, J., Begley, T. P., and McLafferty, F. W.
(2002) Top down characterization of larger proteins (45 kDa) by electron capture dissociation mass spectrometry.
J. Am. Chem. Soc.
124, 672
678[CrossRef][Medline]
- Ge, Y., ElNaggar, M., Sze, S. K., Oh, H. B., Begley, T. P., McLafferty, F. W., Boshoff, H., and Barry, C. E.
(2003) Top down characterization of secreted proteins from Mycobacterium tuberculosis by electron capture dissociation mass spectrometry.
J. Am. Soc. Mass Spectrom.
14, 253
261[CrossRef][Medline]
- Link, A. J., Eng, J., Schieltz, D. M., Carmack, E., Mize, G. J., Morris, D. R., Garvik, B. M., and Yates, J. R., III
(1999) Direct analysis of protein complexes using mass spectrometry.
Nat. Biotechnol.
17, 676
682[CrossRef][Medline]
- Peltier, J. B., Friso, G., Kalume, D. E., Roepstorff, P., Nilsson, F., Adamska, I., and van Wijk, K. J.
(2000) Proteomics of the chloroplast: Systematic identification and targeting analysis of lumenal and peripheral thylakoid proteins.
Plant Cell
12, 319
341[Abstract/Free Full Text]
- Emanuelsson, O., Nielsen, H., and Von Heijne, G.
(1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites.
Protein Sci.
8, 978
984[Abstract]
- Beu, S. C., Senko, M. W., Quinn, J. P., Wampler, F. M., III, and McLafferty, F. W.
(1993) Fourier transform electrospray instrumentation for tandem high resolution mass spectrometry of large molecules.
J. Am. Soc. Mass Spectrom.
4, 557
565[CrossRef]
- Fenn, J. B., Mann, M., Meng, C. K., Wong, S. F., and Whitehouse, C. M.
(1989) Electrospray ionization for mass spectrometry of large biomolecules.
Science
246, 64
71[Medline]
- Henry, K. D., Williams, E. R., Wang, B. H., McLafferty, F. W., Shabanowitz, J., and Hunt, D. F.
(1989) Fourier transform mass spectrometry of large molecules by electrospray ionization.
Proc. Natl. Acad. Sci. U. S. A.
86, 9075
9078[Abstract]
- Gauthier, J. W., Trautman, T. R., and Jacobson, D. B.
(1991) Sustained off-resonance irradiation for collision-activated dissociation involving Fourier transform mass spectrometry. Collision-activated dissociation technique that emulates infrared multiphoton dissociation.
Anal. Chim. Acta
246, 211
225[CrossRef]
- Little, D. P., Speir, J. P., Senko, M. W., OConnor, P. B., and McLafferty, F. W.
(1994) Infrared multiphoton dissociation of large multiply charged ions for biomolecule sequencing.
Anal. Chem.
66, 2809
2815[Medline]
- Senko, M. W., Speir, J. P., and McLafferty, F. W.
(1994) Collisional activation of large multiply charged ions using Fourier transform mass spectrometry.
Anal. Chem.
66, 2801
2808[Medline]
- Zubarev, R. A., Kelleher, N. L., and McLafferty, F. W.
(1998) Electron capture dissociation of multiply charged protein cations. A nonergodic process.
J. Am. Chem. Soc.
120, 3265
3266[CrossRef]
- Zubarev, R. A., Kruger, N. A., Fridriksson, E. K., Lewis, M. A., Horn, D. M., Carpenter, B. K., and McLafferty, F. W.
(1999) Electron capture dissociation of gaseous multiply-charged proteins is favored at disulfide bonds and other sites of high hydrogen atom affinity.
J. Am. Chem. Soc.
121, 2857
2862[CrossRef]
- Zubarev, R. A., Horn, D. M., Fridriksson, E. K., Kelleher, N. L., Kruger, N. A., Lewis, M. A., Carpenter, B. K., and McLafferty, F. W.
(2000) Electron capture dissociation for structural characterization of multiply charged protein cations.
Anal. Chem.
72, 563
573[CrossRef][Medline]
- Horn, D. M., Ge, Y., and McLafferty, F. W.
(2000) Activated ion electron capture dissociation for mass spectral sequencing of larger (42 kDa) proteins.
Anal. Chem.
72, 4778
4784[CrossRef][Medline]
- Sze, S. K., Ge, Y., Oh, H., and McLafferty, F. W.
(2003) Plasma electron capture dissociation for the characterization of large proteins by top down mass spectrometry.
Anal. Chem.
75, 1599
1603[CrossRef][Medline]
- Wang, T. C., Ricca, T. L., and Marshall, A. G.
(1986) Extension of dynamic range in Fourier transform ion cyclotron resonance mass spectrometry via stored wave form inverse Fourier transform excitation.
Anal. Chem.
58, 2935
2938[Medline]
- Schoof, H., Zaccaria, P., Gundlach, H., Lemcke, K., Rudd, S., Kolesov, G., Arnold, R., Mewes, H. W., and Mayer, K. F. X.
(2002) MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome.
Nucleic Acids Res.
30, 91
93[Abstract/Free Full Text]
- Taylor, G. K., Kim, Y., Forbes, A. J., Meng, F., McCarthy, R., and Kelleher, N. L.
(2003) Web and database software for identification of intact proteins using "top down" mass spectrometry.
Anal. Chem.
75, 4081
4086[CrossRef][Medline]
- Horn, D. M., Zubarev, R. A., and McLafferty, F. W.
(2000) Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules.
J. Am. Soc. Mass Spectrom.
11, 320
332[CrossRef][Medline]
- Mortz, E., OConnor, P. B., Roepstorff, P., Kelleher, N. L., Wood, T. D., McLafferty, F. W., and Mann, M.
(1996) Sequence tag identification of intact proteins by matching tandem mass spectral data against sequence data bases.
Proc. Natl. Acad. Sci. U. S. A.
93, 8264
8267[Abstract/Free Full Text]
- Horn, D. M., Zubarev, R. A., and McLafferty, F. W.
(2000) Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry.
Proc. Natl. Acad. Sci. U. S. A.
97, 10313
10317[Abstract/Free Full Text]
- Nold, M. J., Wesdemiotis, C., Yalcin, T., and Harrison, A. G.
(1997) Amide bond dissociation in protonated peptides. Structures of the N-terminal ionic and neutral fragments.
Int. J. Mass Spectrom. Ion Process.
164, 137
153[CrossRef]
- Hubbs, A. E., and Roy, H.
(1993) Assembly of in vitro synthesized large subunits into ribulose-bisphosphate carboxylase/oxygenase. Formation and discharge of an L8-like species.
J. Biol. Chem.
268, 13519
13525[Abstract/Free Full Text]
- Grimm, R., Grimm, M., Eckerskorn, C., Pohlmeyer, K., Rohl, T., and Soll, J.
(1997) Postimport methylation of the small subunit of ribulose-1, 5-bisphosphate carboxylase in chloroplasts.
FEBS Lett.
26, 350
354
- Shi, D. S., Hendrickson, C. L., and Marshall, A. G.
(1998) Counting individual sulfur atoms in a protein by ultrahigh-resolution Fourier transform ion cyclotron resonance mass spectrometry: Experimental resolution of isotopic fine structure in proteins.
Proc. Natl. Acad. Sci. U. S. A.
95, 11532
11537[Abstract/Free Full Text]
- Sze, S. K., Ge, Y., Oh, H., and McLafferty, F. W.
(2002) Top-down mass spectrometry of a 29-kDa protein for characterization of any posttranslational modification to within one residue.
Proc. Natl. Acad. Sci. U. S. A.
19, 1774
1779[CrossRef]