(Received for publication, September 8, 1994; and in revised form, November 22, 1994)
From the
Hydroxyproline-rich glycoproteins (HRGPs) occur in the extracellular matrix of land plants and green algae. HRGPs contain from 2 to 95% of their dry weight as carbohydrate, predominantly as oligoarabinosides and/or as heteropolysaccharides which are O-linked to the hydroxyproline residues. A glycosylation code that determines the presence or absence and extent of arabinosylation at each hydroxyproline residue is likely, as each HRGP has a unique arabinosylation profile. Previously we noted a positive correlation between the contiguity of hydroxyproline residues and the extent of HRGP O-arabinosylation (Kieliszewski, M., deZacks, R., Leykam, J. F., and Lamport, D. T. A.(1992) Plant Physiol. 98, 919-926); most arabinosylated hydroxyproline residues and the longer arabinofuranoside chains occur in HRGPs where Hyp residues occur as blocks of tetrahydroxyproline, while those with little or no contiguous Hyp exhibit very little Hyp arabinosylation. In order to test this Hyp contiguity hypothesis, we have for the first time determined the arabinosylation site specifics of an HRGP, namely the proline and hydroxyproline-rich glycoprotein (PHRGP) isolated from Douglas fir (Pseudotsuga menziesii). Pronase digests of PHRGP yielded a major peptide and three glycopeptides whose structures were determined directly from the unfractionated, underivatized Pronase digest by tandem mass spectrometry using collisionally induced dissociation. We corroborated the peptide and glycopeptide structures by Edman degradation, neutral sugar analyses, hydroxyproline arabinoside profiles, and further mass spectrometric analyses after purification of the major peptide and glycopeptides by a combination of hydrophilic interaction and reverse phase column chromatography. Consistent with the Hyp contiguity hypothesis, the structural analyses indicate that while the sequence Ile-Pro-Pro-Hyp is never arabinosylated and Lys-Pro-Hyp-Val-Hyp is only occasionally monoarabinosylated at Hyp-5, the peptide containing contiguous Hyp, Lys-Pro-Hyp-Hyp-Val, is always arabinosylated at Hyp-3, mainly by a triarabinoside. We also obtained precise molecular masses for both intact and anhydrous hydrogen fluoride-deglycosylated PHRGPs (73.113 and 53.834 kDa) via matrix-assisted laser desorption/ionization time of flight mass spectrometry, representing the first HRGP to be analyzed by this method.
Hydroxyproline O-glycosylation is a posttranslational
modification unique to plants and Chlorophycean
algae(1, 2) . The hydroxyproline-rich glycoproteins
(HRGPs) ()so modified occur predominantly at the cell
surface, where they are implicated in diverse roles ranging from
network formation (3, 4) and disease resistance (5, 6) to cell differentiation and
morphogenesis(7, 8) . HRGPs are characteristically
extended rodlike
molecules(9, 10, 11, 12, 13) with highly repetitive peptide motifs extensively
posttranslationally modified by hydroxylation of
proline(3, 14) , O-glycosylation of serine
and hydroxyproline residues(15, 16) , and both intra-
and intermolecular cross-linking(4, 17) . Thus,
questions about HRGP function must deal not merely with the primary
amino acid sequence as deduced from clones, but also with the mature
glycoprotein whose three-dimensional structure, conformation, and
arrangement in the cell wall all depend on extensive posttranslational
modifications. Of these, hydroxyproline O-glycosylation
figures prominently, accounting for up to 95% of an HRGP's dry
weight (13, 14, 18, 19, 20, 21) ,
and ranges from the addition of a single arabinose or galactose residue
up to a 75-residue
arabinogalactan(2, 13, 18, 19) .
Most commonly, hydroxyproline substituents are short
arabinofuranoside chains (degree of polymerization: 1-5
residues), usually isolated as hydroxyproline arabinosides
(Hyp-arabinosides); they occur in every type of HRGP examined thus far,
including the Ser-Hyp-containing
extensins(1, 2, 10, 20) ,
arabinogalactan-proteins (AGPs)(2, 18, 22) ,
gum arabic glycoprotein (13) , repetitive proline-rich proteins
(RPRPs)(12, 14, 21) , and the solanaceous
lectins(23, 24) . Each HRGP possesses its own unique
Hyp-arabinoside profile based on the number of substituted Hyp residues
and the relative proportion of each oligoarabinoside chainlength.
Hence, four questions arise about the site specificity of
Hyp-arabinosylation in any given HRGP: what determines (a) the
total number of Hyp residues substituted, (b) the size of each
oligoarabinoside substituent, (c) the precise arrangement of
the different oligoarabinosides, and (d) the positions of the
Hyp-arabinosides relative to the larger arabinogalactan polysaccharide
substituents of the AGPs and gums? Elucidation of the rules for HRGP O-glycosylation will help enable us to predict mature
glycoprotein structure from genomic/cDNA sequences and help us
understand how glycosylation contributes to HRGP molecular topography
and, hence, to the possible roles of HRGPs in molecular recognition and
wall self-assembly.
Recently, we discovered that Hyp-arabinosylation
is not random, but correlated with the contiguity of the Hyp
residues(21) . Thus, the tetrahydroxyproline blocks of the
Ser-Hyp-containing extensins represent highly contiguous
Hyp that is also highly arabinosylated mostly with the larger
(3-4 residues) rather than the smaller (1-2 residues)
oligoarabinosides. RPRPs represent the other extreme, where Hyp
residues are largely interspersed with other residues, i.e. there is little contiguous Hyp (25) and correspondingly
little arabinosylated Hyp(14, 21) .
Thus the Hyp contiguity hypothesis approaches the problem of Hyp arabinosylation coding by predicting that Hyp arabinosylation increases with Hyp contiguity rather than with Hyp mole percent, and that noncontiguous Hyp residues are rarely, if ever, arabinosylated. In order to test and further refine this hypothesis, we set out to determine the arabinosylation site specifics for one of the simpler cases of HRGP glycosylation, notably the proline-rich HRGP (PHRGP) from Douglas fir (Pseudotsuga menziesii). Its Hyp residues are only lightly arabinosylated, arabinogalactan polysaccharide is absent, and the polypeptide backbone is largely made up of the simple tandem repeat: Lys-Pro-Hyp-Val-Hyp-Val-Ile-Pro-Pro-Hyp-Val-Val-Lys-Pro-Hyp-Hyp-Val-Tyr (21) , with low Hyp contiguity.
Unique problems arise,
however, in attempts to identify the precise sites of
Hyp-arabinosylation as these involve a base-stable linkage to a
-carbon(1, 2, 15) , which is unlike
glycosylation of seryl or threonyl residues, whose base-labile
-linkage permits
-elimination of the carbohydrate with the
concomitant conversion of Ser or Thr to another amino
acid(26) . Nor can direct Edman degradation determine the site
specifics of arabinosylation, for unlike other glycosylated amino acid
residues, which can often be inferred from blank cycles(27) ,
the trifluoroacetic acid-phenylthiohydantoin cleavage step hydrolyzes
the labile arabinofuranosides removing the distinction between
arabinosylated and nonarabinosylated Hyp. These complications led us to
consider using tandem mass spectrometry (MS/MS) to elucidate the
arabinosylation site specifics of the PHRGP.
MS/MS using collisionally induced dissociation (CID) has become increasingly popular for characterizing the posttranslational modifications of proteins and peptides(28, 29, 30, 31, 32, 33) ; however, it also has drawbacks, as the more hydrophilic components of a sample, such as glycopeptides, frequently are not detected due to a low ion yield (32, 33, 34) . Furthermore, glycopeptides preferentially cleave at O-glycosidic bonds, while the peptide backbone remains intact(29, 33) , which precludes the precise identification of glycosylation sites when more than one potential glycosylation site is present. Thus, the sequencing of glycopeptides by MS/MS typically requires chemical degradation or modification of the glycopeptide before analysis. Recently, however, Medzihradszky et al.(29) reported that MS/MS identified both the peptide sequence and carbohydrate attachment site of a purified, underivatized glycopeptide containing a single hexose residue. This demonstrated the potential of MS/MS for the structural characterization of intact, underivatized glycopeptides, which we have extended to the unfractionated Pronase digests of the gymnosperm glycoprotein, PHRGP.
SDS-PAGE gave a molecular
weight of 97,400 for PHRGP(21) , while size exclusion
chromatography gave a molecular mass of about 669 kDa();
however, neither of these methods provide reliable estimates of HRGP
size, judging from visualization of HRGP monomers by electron
microscopy and polypeptide size as deduced from cDNA
clones(9, 13, 39, 40, 41) .
Indeed, HRGPs behave as if they are much larger than the equivalent
globular protein. Their asymmetric rodlike character arises from a high
pyrrolidine ring content and is further reinforced by glycosylation
(10, 13-16, 22-25, 42, 43). These conformational
constraints maintain the extended conformation, which characterizes all
HRGPs examined(10, 22, 39, 43, 44, 45) and
probably explains their anomalous behavior on gel
filtration(9, 13) . On the other hand, SDS-PAGE
probably overestimates PHRGP molecular weight because Lys-rich
polycations, such as HRGPs, reduce the overall negative charge due to
bound SDS. Glycosylation may further contribute to this effect by
sterically restricting the amount of peptide-bound SDS.
In contrast
to these conventional methods, which have a mass accuracy of only
± 5-10% for the average globular protein(46) ,
MALDI-TOF MS is a straightforward, sensitive (to picomolar levels of
protein), and accurate (0.1-0.2%; (46) and (47) ) method for measuring molecular mass, as the measurements
are based on mass and charge. Here, we report that MALDI-TOF mass
spectra of glycosylated and HF-deglycosylated PHRGP gave molecular
masses of 73,186 ± 146 Da and 53,953 ± 108 Da,
respectively (Fig. 1). These estimates were significantly
smaller than those obtained by SDS-PAGE or gel filtration and imply
that other estimates of HRGP molecular size based exclusively on
SDS-PAGE or size exclusion chromatography (48) need to be
reevaluated.
Figure 1:
MALDI-TOF mass spectrometry of
glycosylated (a) and HF-deglycosylated (b) PHRGP. The
MALDI-TOF mass spectrum of glycosylated PHRGP (a) contained
broad peaks corresponding to the triply (M + H)
, doubly (M +
2H)
, and singly charged (M + H)
molecular ions at m/z 24538.5,
36415.1, and 73113.2, respectively. b, the spectrum of
deglycosylated PHRGP also showed three peaks at m/z 18044.2,
26945.8, and 53834.5, corresponding the the triply, doubly, and singly
charged species, respectively.
Interestingly, the size distribution of the
arabinooligosaccharides is skewed and seems non-random, as we estimate
from PHRGP Hyp-arabinoside profiles ((21) ; cf.Table 1, column 1) and the empirical formula that PHRGP
contains a single Hyp-Ara, 40 Hyp-Ara
, 10
Hyp-Ara
, and 14 Hyp-Ara
, with 82 Hyp residues
nonarabinosylated. In order to determine if the arabinosylated Hyp
residues occurred randomly throughout the protein, or in accordance
with the Hyp contiguity hypothesis, we proteolytically degraded the
PHRGP into small glycopeptides and characterized the arabinosylation
sites biochemically and by MS/MS (see flow chart, Fig. 2).
Figure 2: Experimental flow chart. We digested glycosylated PHRGP with Pronase, then analyzed aliquots of the unfractionated digest by electrospray ionization (ESMS/MS) and fast atom bombardment tandem mass spectrometry (FABMS/MS). To corroborate the results from MS analyses of the Pronase digest and determine the percent arabinosylation of each glycopeptide, we also purified the major peptide and glycopeptide components by a combination of hydrophilic interaction (HILIC) and reverse-phase chromatography. The purified peptide H2P1 and glycopeptides H3P2 and H4P1 were then structurally characterized by Edman degradation, sugar analyses, and mass spectrometry (ESMS and ESMS/MS).
Figure 3: The PHRGP Pronase digest analyzed by electrospray ionization mass spectrometry. We freeze-dried aliquots of the digest to remove ammonium bicarbonate, dissolved the residue in the appropriate matrix solution (see ``Materials and Methods''), and then analyzed the solutions by both CF-FABMS (not shown) and ESIMS (above). We selected the ions common to both spectra (i.e. ions at m/z 439, 701, 833, and 965) for further analysis by CID and MS/MS.
Figure 4:
Nomenclature of fragment ions produced by
high energy CID analysis of molecular ion m/z 965,
Lys-Pro-[Ara]Hyp-Hyp-Val. The nomenclature used
here to describe the PHRGP peptide and glycopeptide fragment ions
combines the Roepstorff and Fohlman system (49) for peptides
(as modified by Biemann; (50) ) with that proposed by Costello
and Vath (51) for glycoconjugates and oligosaccharides. a, fragment ions designated with lowercase letters (a
, b
,
and y
) originate from cleavage of the
peptide backbone only, with a
and b
ions arising from the N terminus, and y
ions from the C terminus. Subscript
numbers indicate the residue at which cleavage occurred, numbered
upward from the respective terminus. a and b, uppercaseletters (X
and Y
) designate ions arising
from oligosaccharide fragmentation only (without cleavage of the
peptide backbone), with the charge retained on the
``reducing'' end (i.e. sugar fragments attached to
the peptide). Again, subscripts number the sugar residues from
the reducing end while the superscripts preceding X
ions (e.g.
X
)
define cleavages of carbon-carbon or carbon oxygen bonds within
arabinosyl ring(51) . y
Y
ions in a arise from fragmentation of both the peptide
backbone and the sugar side chain.
Figure 5:
MS/MS analyses and corresponding fragment
ion series of PHRGP Pronase glycopeptide,
Lys-Pro-[Ara]Hyp-Hyp-Val, M + H
= 965. Both high energy CID (a) and low energy CID (b) of molecular ion m/z 965 gave similar spectra, indicating that the glycopeptide
contained a triarabinosyl chain at Hyp-3. However, only the high energy
spectrum contained ions y
(Val) and a
(Lys-Pro-Hyp-Hyp + 3 Ara), which
distinguished Hyp rather than Val at position 4 of the sequence (a and c). High energy CID also produced X fragment
ions at m/z 905, 875, 861, 729, and 597, which originated from
cleavage within the arabinosyl rings. Because of its intensity, the
molecular ion (M + H
=
965) was not included in the high energy spectrum. Fragment ions m/z 922 and 551 correspond to the molecular ion minus the
valine side-chain (-43 atomic mass units) and the deglycosylated
peptide minus water (Y
- 18 atomic mass
units), respectively. b, the low energy CID spectrum lacked
ions defining the sequence of the peptide C terminus, that is, the ions
here correspond to either peptide sequence, Lys-Pro-Hyp-Val-Hyp or
Lys-Pro-Hyp-Hyp-Val. Fragment ions y
and y
Y
indicate Hyp-3 is the
arabinosylation site. c, the fragment ion series arising from
high energy CID of molecular ion m/z 965. The masses
corresponding to the fragment ion series are listed either above (y
, Y
, and y
Y
ions),
or below (b
, a
, and b
Y
ions) the glycopeptide. The peptide fragmentation site of
ion a
(m/z 198) is not shown in c.
Cleavage of the arabinosyl
rings occurred only in the FAB mass spectrometer (i.e. high
energy CID) giving rise to fragment ions at m/z 905, 875, 861,
729, and 597 which correspond to the X
,
X
,
X
,
X
,
X
ion series (Fig. 4b and Fig. 5a).
The peptide sequence itself was established only by the high energy
CID spectrum, as the similar, but less complex low energy CID analysis (Fig. 5b) was ambiguous regarding the peptide sequence.
However, low energy CID produced fragment ions y (Hyp-Val)
and y
Y
([Ara]Hyp-Hyp-Val),
corroborating Hyp-3 as the glycosylation site.
Figure 6:
MS/MS analyses of PHRGP Pronase
glycopeptide, Lys-Pro-Hyp-Val-[Ara] Hyp, M +H = 701. Both
high energy (a) and low energy (b) CID spectra of
molecular ion m/z 701 contained fragment ions which determined
its peptide sequence, Lys-Pro-Hyp-Val-Hyp, and monoarabinosylation site
at Hyp-5. The diagnostic ions for the peptide sequence were b
(m/z 438), and internal fragment ion
Pro-Hyp-Val at m/z 310, while fragment ion y
determined Hyp-5 as the arabinosylation site. Because of its
intensity, the molecular ion (M + H
= 701) was not included in the high energy spectrum. (c)
Fragment ions and masses which belong to the same ion series are
listed in rows above (y, yY, and Y ions), or below (a and b ions) the glycopeptide
sequence. The peptide fragmentation sites of the a ion series
is not shown in c.
Fragment ions corresponding to
the cleavage of the sugar ring occurred only in high energy CID (Fig. 6a). That is, the ions at m/z 611
([M + 1 - 90]) and 597
([M + 1 - 104]
)
correspond to the
X
and
X
fragments originating from
cleavage of carbon-carbon and carbon-oxygen bonds of the glycosyl ring (cf. Fig. 4b).
Consistent with the Hyp-arabinoside profile of H3P2, the ES mass spectrum (not shown) yielded three molecular ions, m/z 965, 833, and 701, corresponding to the tri-, di-, and monoarabinosylated glycoforms, respectively, which apparently cochromatographed on the HILIC and reverse-phase columns. The low energy CID spectrum of H3P2 (not shown) corroborated the structures that had been deduced by CID analyses of the molecular ions m/z 965 and 833 obtained from the unfractionated Pronase digest (cf.Fig. 3and Fig. 5). Thus, Lys-Pro-Hyp-HypVal is always glycosylated at Hyp-3, and usually with a triarabinoside. Such consistent and specific arabinosylation of dipeptidyl-Hyp is also in accordance with the Hyp contiguity hypothesis.
Our recent structural work led us to suggest that
Hyp-glycosylation is not random, but follows simple rules, such as Hyp
contiguity(21) . To test the Hyp contiguity hypothesis, we
needed to sequence a Hyp-rich polypeptide and identify the glycosyl
substituents at each position. For reasons already stated, the
determination of HRGP arabinosylation site specifics is of great
interest, albeit a non-trivial task, as it requires the correct
assignment of mono-, di-, tri-, and tetra-arabinosides, or lack
thereof, to each Hyp residue in an HRGP sequence. We have virtually
achieved this, as the glycopeptides characterized here represent the
bulk of the PHRGP glycosylated sequences; we calculate from peptide and
sugar recoveries that glycopeptides H3P2 and H4P1 together contained
89% of the PHRGP arabinose residues (i.e. 113 of the 127
residues estimated from the MALDI-TOF mass spectra data). The remaining
arabinose, as well as galactose and the amino acids Ser, His, Arg, and
Thr, which are minor components of the PHRGP, apparently occurs in the
minor HILIC peptides which we did not characterize. Thus, the Douglas
fir PHRGP is not only the first HRGP to be weighed by mass
spectrometry, but also the first for which it has been possible to
define the peptide sequences surrounding the major arabinosylation
sites, pinpoint the precise Hyp residues which are arabinosylated, and
also determine both the frequency of arabinosylation and arabinoside
chain lengths at those sites ( Fig. 7and Fig. 8).
Figure 7: A correlation between Hyp contiguity and Hyp-arabinosylation. The sequences Lys-Pro-Hyp-Val-Hyp (H4P1) and Lys-Pro-Hyp-Hyp-Val (H3P2) are peptide structural, or positional, isomers that differ in the arrangement of their Hyp residues. The Hyp-arabinoside profile of H4P1 (Table 1) indicated that 10% of its total Hyp residues were monoarabinosylated, while CID indicated that arabinosylation occurred only on Hyp-5. Thus, Hyp-5 in the repetitive sequence Lys-Pro-Hyp-Val-Hyp is monoarabinosylated 20% of the time. The Hyp-arabinoside profile of H3P2 (Table 1) indicated that half of the Hyp residues of the sequence Lys-Pro-Hyp-Hyp-Val were arabinosylated predominantly with the triarabinoside, while CID pinpointed Hyp-3 as the only glycosylation site.
Figure 8: Proposed structure of the PHRGP major repetitive glycopeptide. Three peptide sequences, H4P1, H2P1, and H3P2 (underlined) and their glycoforms, comprise the bulk of the PHRGP and occur as part of a larger 18-residue tandem repeat characterized previously(21) . Featured here is the dominant PHRGP glycomotif; however, some variation occurs in the chainlength of the arabinoside (from 1 to 3 residues), and occasionally the single Hyp located between the 2 valine residues (Hyp-5 of H4P1) is monoarabinosylated.
Pronase cleavage of PHRGP yielded three major glycopeptides, which corresponded to glycoforms of the peptide positional isomers, Lys-Pro-Hyp-Val-Hyp and Lys-Pro-Hyp-Hyp-Val, thereby testing the Hyp contiguity hypothesis. The results were consistent with the hypothesis and showed that extensive arabinosylation occurred only for a contiguous Hyp residue, while non-contiguous Hyp is arabinosylated only occasionally, i.e. Hyp-5 of Lys-Pro-Hyp-Val-Hyp, or not at all, as in Ile-Pro-Pro-Hyp. Why Hyp-3 but not Hyp-4 is the inevitable arabinosylation site in H3P2, while Hyp-5 is only an occasional site in H4P1, remains for future work. However, Fig. 8shows that the Hyp-3 triarabinoside of H3P2 is distal rather than proximal to the Val-Tyr-Lys motif, which for dicot extensins, is a putative intermolecular cross-link site (14) and therefore far less likely to sterically hinder cross-link formation than a Hyp-5 triarabinoside. Currently, however, there is no definitive evidence for PHRGP cross-linking, either in vitro or in muro.
Such precise Hyp-arabinosylation suggests a sequence-dependent, rather than a conformation-dependent, enzymic mechanism, as previously suggested for O-Thr/Ser glycosylation (52) and proline hydroxylation(39) . Judging from the number of different arabinosyl linkages in wall proteins (19, 53) and polysaccharides(54) , an arsenal of arabinosyl transferases in plant cells also includes sequence-specific glycosyl transferases.
Remarkably, we captured most of the above structural information in
CID spectra of underivatized glycopeptides present in unfractionated
PHRGP Pronase digests. This was possible in part because the simple
repetitive PHRGP polypeptide backbone produced only a few molecular
ions, greatly simplifying MS/MS analyses. However, composition is also
a critical factor as the pyrrolidine rings of Hyp and Pro impose
conformational constraints which probably lower the dissociation energy
of nearby peptide bonds(55) . Thus, HRGPs seem to be uniquely
tailored for structural analysis by CID and tandem mass spectrometry.
Assuming other HRGPs also fragment readily at Hyp and Pro residues
while retaining their saccharide substituents, it may be possible to
determine the glycosylation site specifics of any extensin-HRGP family
member, including the highly arabinosylated
Ser-Hyp-containing extensins, as well as the AGPs and gums
which contain both arabinosides and polysaccharides O-linked
to Hyp. The Hyp contiguity hypothesis is, therefore, a step toward the
elucidation of more precise Hyp glycosylation codes, ultimately leading
to a complete description of the HRGP molecular topographies that may
be involved in molecular recognition, self-assembly, and morphogenesis
of the extracellular matrix.