(Received for publication, September 26, 1994; and in revised form, November 29, 1994)
From the
Electrospray mass spectrometry was used to accurately measure the molecular masses of single chain lectins from legume seeds and also of three recombinant lectins, expressed in Escherichia coli. The five single chain lectins, Erythrina corallodendron lectin, soybean and peanut agglutinins, Dolichos biflorus lectin, and Phaseolus vulgaris hemagglutinin E, all showed evidence of C-terminal proteolytic processing, in some cases to ``ragged'' ends, when their masses were compared to those expected from their cDNA sequences and their known carbohydrate chains. Recombinant forms of the lectins from E. corallodendron, soybean, and peanut also showed C-terminal trimming, but not to the same points as the natural forms. Discrepancies between the protein and cDNA sequences of the E. corallodendron lectin were resolved by combined liquid chromatography-mass spectrometry peptide mapping and protein sequencing experiments, and the presence of a second glycosylation site was demonstrated. Our data show that all of these lectins undergo C-terminal proteolytic processing of a readily attacked peptide segment. This trimming is frequently imprecise, and the resulting heterogeneity may be a major contributor to the appearance of isolectin forms of these proteins.
Several types of plant seed proteins are known to undergo post-translational proteolysis, beyond the usual removal of signal peptides(1) . The nature of this processing can be seen when amino acid sequences deduced from the corresponding cDNAs are compared with the sequences of the mature proteins obtained by protein sequencing. In the family of the legume lectins, the most complex, and most investigated, processing occurs in concanavalin A. A central glycopeptide is excised from the proprotein and a C-terminal peptide is cut off(2, 3) . Approximately two-thirds of the protein molecules also undergo a peptide ligation, in which the N-terminal residue of the first fragment is fused to the C-terminal residue of the second fragment(2) . This leaves these molecules with N- and C-terminal residues that were originally in the central region of the proprotein. The enzyme responsible for these events is an Asn-specific protease which has recently been isolated from jackbean seeds(4) .
A less complex process occurs in lectins of the
Viciae tribe, which include the pea lectin, lentil lectin, favin, and
the various Lathyrus lectins. As exemplified by the pea
lectin(5) , they are also cleaved into two chains with removal
from the proprotein of an internal peptide and a C-terminal peptide. In
these lectins, the internal cleavage sites are not as centrally located
in the proprotein as those in ConA, ()and there is no
peptide ligation. The resulting
- and
-chains are
6,000
and
20,000 Da, respectively.
The majority of legume lectins do not undergo internal proteolytic cleavages, and hence retain a single chain form. However they may undergo C-terminal clipping, since the cDNA-derived sequences often indicate longer proteins than are found in direct sequencing of the lectins. Examples include the Dolichos biflorus lectin whose cDNA sequence corresponds to a protein of 253 residues(6) , but protein sequencing indicated a mixture of this species and one of only 243 residues(7) , and peanut agglutinin, with 250 residues predicted from the cDNA (8) and 236 residues found by protein sequencing(9) . Since the C-terminal regions of some plant proteins are involved in targeting of these proteins to vacuoles(16) , this behavior may be of biological significance. However, to establish that the single chain lectins are indeed being post-translationally processed, rather than peptides from this region having been lost during peptide mapping etc., it is necessary to determine the C-terminal residue(s) in the mature seed proteins or their exact molecular size. While there are several chemical methods for C-terminal identification and/or sequencing, mass spectrometric determination of the molecular mass is particularly suitable for this purpose.
Recently, mass spectrometric ionization methods have been developed that allow the measurement of the molecular masses of polypeptide chains to accuracies of ±0.01%. One of these methods, termed electrospray, involves the formation of a family of multiply charged ions from the polypeptide molecules(11, 12) . The mass of the polypeptide can be obtained from the m/zvalues of all the ions arising from a given species. In samples with several polypeptide components, mass data for each can be obtained, but the ion currents will not necessarily reflect the true relative amounts of the various species. In particular, smaller polypeptides of a mixture analyzed in positive ion mode under our operating conditions are often more readily ionized than larger ones in a mixture.
We have applied this method to locate the exact sites of proteolytic cleavage in several legume seed lectins, including recombinant forms of three of them, expressed in Escherichia coli. The C-terminal residues were assigned by comparison of the mass data with the masses expected from various cleavage positions in the amino acid sequence predicted from the cDNA, and the known carbohydrate structures, in those lectins that are glycoproteins(13) . The results show that C-terminal truncation to ``ragged'' ends is common in both the natural and recombinant forms of these lectins. The resulting heterogeneity is responsible in large part for the appearance of isolectin forms in many legume lectins.
For LC-MS analyses, an
HP1090L liquid chromatograph (Hewlett Packard), equipped with a
tertiary DR5 solvent delivery system and a 2.1 mm inner diameter
25 cm Vydac 218TP52 column (Vydac Separation Group, Hisperia,
CA) was coupled directly to the mass spectrometer via the IonSpray®
interface. Injections of 20 µl were made, and the effluent from the
column was split such that a flow rate of approximately 15 µl/min
was introduced to the mass spectrometer. The voltage on the ionspray
needle was maintained at 5 kV. High purity air was used as nebulizing
gas at an operating pressure of 60 pounds/square inch. The LC-MS
analyses were performed in full-scan mode (m/z 500-1500) using
dwell times of 4 ms/Da.
Figure 1: Comparison of LC-MS profiles of CNBr digests of seed ECorL (lower panel) and recombinant ECorL (upper panel).
Figure 2: Comparison of LC-MS profiles of tryptic digests of seed ECorL (lower panel) and recombinant ECorL (upper panel).
Figure 3: Consolidated sequence data for seed and recombinant ECorL. The bars indicate the segments checked by N-terminal automated sequencing, and the triangles for glycosylation sites and the superimposed Q indicate the differences in the seed protein.
The DNA sequence of the recombinant clone (17) was also redetermined. The seed protein amino acid sequence is now consistent with the DNA sequencing with the exception of 1 residue, position 117 which remains Pro in the seed and Gln in the recombinant form. The change of residue Phe113 to Asn introduces a second potential glycosylation site, and the mass data and amino acid sequencing (i.e. low yield of Asn at this cycle) confirmed that this site is glycosylated. The mass data are consistent with the same dominant glycoform previously reported for the protein (13) occurring at the Asn113 site as well.
Taking into account the presence of two carbohydrate units, the mass data for the seed lectin (Fig. 4) could be assigned from the corrected sequence (Table 2). The protein masses for the commercial seed sample and one recombinant sample (Fig. 5) showed that C-terminal proteolytic processing had occurred at residue 242 in both forms and a minor cleavage at 241, causing losses of 13 or 14 residues, respectively. The heterogeneity of the carbohydrate, i.e. the presence or absence of the fucose residue, also contributed to the multiple forms of the seed lectin. A second recombinant sample showed no C-terminal processing but was heterogeneous in that not all the molecules had lost the N-terminal Met residue introduced by the expression vector. The seed lectin sample from Rehovot showed pairs of peaks suggesting genetic heterogeneity, although its N-terminal sequence was the same as that of the other samples.
Figure 4: Mass spectrum of seed ECorL, with its deconvolution (inset).
Figure 5: Mass spectrum of recombinant ECorL, with its deconvolution (inset).
Figure 6: Mass spectrum of seed SBA, with its deconvolution (inset).
Figure 7: Mass spectrum of recombinant SBA, with its deconvolution (inset).
Figure 8: Mass spectrum of recombinant PNA, with its deconvolution (inset).
Figure 9: Mass spectrum of DBL, with its deconvolution (inset).
Figure 10: Mass spectrum of PHA-E, with its deconvolution (inset).
It is evident from the mass spectrometry that the differences in lectin chain lengths seen in comparisons of sequences derived from cDNA and protein sequencing are due to C-terminal proteolytic processing, as we had previously shown for peanut agglutinin(20) . Furthermore, for the majority of the lectins examined here, the processing is imprecise, resulting in multiple forms. Although heterogeneity could arise in other ways than post-translational proteolysis to ``ragged ends,'' such as genetic isoforms and glycan variants, the multiple protein species found here were assignable in nearly all cases to particular combinations of the reported dominant glycoforms (13) with various truncated polypeptide chains, within the accuracy of the method, 0.01-0.02%. Information about the glycan structures was essential for interpretation of the results.
The heterogeneity that the imprecise processing produces must be a major contributing factor to the isolectin band patterns commonly seen in disc-gel electropherograms of legume lectins, for example PNA(25) . Recently, it has been shown that pea lectin isoforms are attributable to C-terminal heterogeneity(26) , and SBA isoforms have been purified by chromatofocusing and shown to consist of mixtures of species with different masses(27) .
The fact that so many of the seed lectins showed C-terminal trimming argues that this is occurring naturally in the seed, rather than being an artifact of the purification procedure. The lectins purified in this laboratory were extracted from milled seeds at 4° C, followed by ammonium sulfate fractionation and affinity chromatography. Results with samples obtained commercially as well as sequence data from other laboratories confirm that the C-terminal truncation is common to samples of different origins. Information on intrinsic protease levels in these legume seeds is scanty, but it is interesting that in the case of the soybean where intrinsic proteases have been found(28) , there was actually less C-terminal processing of the agglutinin, i.e. more intermediate-length species. However, extraction experiments in the presence of a mixture of protease inhibitors should be performed.
The points at which processing occurred in the natural and recombinant lectins are summarized in Fig. 11. Most of the single chain lectins lost 10 or more residues. The truncation generally stops at a similar point in the proteins, when their sequences are aligned, i.e. the lower final C-terminal residue number for PNA compared to the other four lectins mainly arises from earlier deletions. There was no strongly preferred amino acid at the cleavage sites, but Asp was the most common. The number of residues removed is rather high to attribute to carboxypeptidase action alone, hence endoproteases may also be participating in the C-terminal processing of the single chain lectins. It is possible that the seeds from which the single chain lectins were derived also contain Asn-specific proteases of the kind found in jackbean(4) , but these may not be able to cleave the lectins internally to create two-chain forms. An Asn-specific protease has been reported in soybeans(28) , and we found that PNA and PHA-E were not cleaved when they were incubated with a commercial sample of the jackbean Asn-specific protease(4) . There is 1 Asn residue in the tail segment of DBL, 2 Asn in ECorL and PNA, and 3 Asn in PHA-E. Only SBA lacks Asn in this region and as noted above, SBA has less complete trimming than the other lectins.
Figure 11: Post-translational cleavage sites in the lectins deduced from the mass spectra. The sequence data are from the cDNA results for ECorL(16) , PNA(8) , SBA(22) , DBL(6) , and PHA-E (24) . A minor cleavage point is indicated with a dashed arrow.
Remarkably, the recombinant proteins were also
post-translationally processed, but at different sites from their seed
counterparts. Since the two samples of recombinant ECorL showed
different amounts of processing, it is evident that this may depend
upon the handling of the sample. In the case of PNA, even more residues
were removed in the recombinant form. Again, it is not possible to say
if endoproteases or a carboxypeptidase are responsible. A
carboxypeptidase with preference for cleavage at Ala and Val has
recently been reported in E. coli(29) . This enzyme is
responsible for C-terminal truncation of penicillin-binding protein 3
and -repressor protein. While it might well participate in the
C-terminal truncation of the recombinant lectins, there are several
other proteases present even in E. coli strains selected for
low proteolytic activity. Recombinant forms of ConA(30) , pea
lectin(31) , and PHA-L (32) have also been expressed in E. coli, and the first two were not processed into two chains
from their proprotein forms. C-terminal trimming of these proteins may
also be occurring since the small molecular weight differences would be
within the errors of SDS-polyacrylamide gel electrophoresis estimates.
In the PHA-L case, additional bands were seen that were suggested to
arise from some species that had undergone C-terminal
processing(32) .
The fact that both the natural and
recombinant lectins could be C terminally processed indicates that the
tail regions are relatively exposed and perhaps loosely structured. It
is possible that these regions have a biological role. Information for
targeting to vacuoles has been shown to reside in the C-terminal
regions of plant proteins (10) , for instance barley
lectin(33) . Such signal regions are usually mainly formed of
hydrophobic residues, and the lectin C-terminal regions include
hydrophobic stretches. Since the proteolysis probably takes place
within the vacuole, the ragged ends produced by it would not interfere
with a targeting function. However, the targeting information for PHA
is thought to reside in an internal segment(34) , and the
C-terminal ``tails'' of ConA and the pea lectin are much
shorter and not as hydrophobic as those of the other lectins. An
alternative role is suggested by the behavior of DBL, in which two
subunits with different C-terminal truncations form a single 2:2
tetramer species which has only two active carbohydrate-binding sites (35) . It has been suggested, therefore, that proteolysis in
the C-terminal region of this lectin has a regulatory effect on
activity (36) . Whether other single chain lectins might be
inactive in their proprotein forms cannot be said since the processing
of most of the subunits to 235 residue forms is so complete. ConA
is inactive in its natural proprotein form(3) , but this is due
to its internal glycopeptide(30, 37) , and the
non-glycosylated recombinant ConA and pea lectin proproteins are both
active(30, 31) . These two possible roles for the
C-terminal region are not necessarily incompatible, i.e. both
targeting and regulation of activity could be effected by the same
segment.
In conclusion, the mass spectrometric measurements have shown that legume lectins commonly undergo C-terminal proteolytic processing, to ragged ends. Given that mass and sequence comparisons of many protein families have been limited by the inaccuracies of the SDS-polyacrylamide gel electrophoresis method, this phenomenon may be more wide spread than presently realized.