(Received for publication, August 15, 1994; and in revised form, October 18, 1994)
From the e 38, A-1090 Vienna, Austria, and
the
Bet v 1, the major allergen of birch pollen, displays a considerable degree of heterogeneity. Several charge variants have been detected by two-dimensional IgE immunoblots and isoelectric focusing techniques. This heterogeneity has been attributed to glycosylation (or other post-translational modifications) or to isogenes coding for Bet v 1 isoforms and/or allelic variants. However, until now, only limited structural data for Bet v 1 have been published. Recently, we described the expression, purification, and immunological properties of recombinant Bet v 1 (rBet v 1) produced in Escherichia coli as a non-fusion protein (Ferreira, F. D., Hoffmann-Sommergruber, K., Breiteneder, H., Pettenburger, K., Ebner, C., Sommergruber, W., Steiner, R., Bohle, B., Sperr, W. R., Valent, P., Kungl, A. J., Breitenbach, M., Kraft, D., and Scheiner, O.(1993) J. Biol. Chem. 268, 19574-19580). Here, we present a more detailed structural characterization of Bet v 1 by both cDNA cloning and mass spectrometry. Thirteen different cDNA clones coding for Bet v 1 isoforms were obtained by polymerase chain reaction amplification of birch pollen cDNA with a sequence-specific 5`-terminal primer and a nonspecific 3`-terminal primer or by immunological screening of a birch pollen cDNA library. These isoforms are referred to as Bet v 1b to Bet v 1n, whereas the previously isolated Bet v 1 cDNA (Breiteneder, H., Pettenburger, K., Bito, A., Valenta, R., Kraft, D., Rumpold, H., Scheiner, O., and Breitenbach, M.(1989) EMBO J. 8, 1935-1938) is now referred to as Bet v 1a. High performance liquid chromatography and plasma desorption mass spectrometry of proteolytic fragments of purified natural Bet v 1 (nBet v 1) and rBet v 1a were used to (i) confirm the primary structure of all Bet v 1 isoforms and (ii) to investigate any possible postsynthetic modifications on rBet v 1a or on the natural mixture of isoallergens obtained from birch pollen. Except for the cleavage of initiating methionine, no postsynthetic modifications were found in either nBet v 1 or rBet v 1a.
In the temperate climate zone of the world, pollen from trees of
the order Fagales (e.g. birch, alder, hazel, oak, and
hornbeam) are a major cause of Type I allergies (Ipsen et al.,
1985; Jarolim et al., 1989a). Birch pollen contains a single
major allergen with a molecular mass of 17 kDa (Ipsen and Loewenstein,
1983), designated Bet v 1. ()More than 96% of all tree
pollen allergic patients display IgE antibodies to Bet v 1, and 60%
react exclusively to this allergen, indicating the importance of this
protein in tree pollen allergy (Jarolim et al., 1989a).
Previously, we isolated and sequenced a cDNA clone coding for Bet v 1 (Breiteneder et al., 1989), which shows high sequence similarities to the single major pollen allergen from alder, Aln g 1 (Breiteneder et al., 1992), from hornbeam, Car b 1 (Larsen et al., 1992), and from hazel, Cor a 1 (Breiteneder et al., 1993). This is in good agreement with the observation that patients displaying specific IgE to Bet v 1 also show symptoms during the flowering season of other trees of the order Fagales.
Interestingly, all of these major tree pollen allergens show significant sequence similarities to a family of plant pathogen-activated genes shown to be induced in somatic tissues by infection with fungi and bacteria. They were identified in pea (Fristensky et al., 1988), parsley (Somssich et al., 1988), potato (Matton and Brisson, 1989), bean (Walter et al., 1990), asparagus (Warner et al., 1992), and soybean (Crowl et al., 1992). Although they have been associated with defense response of plants, the precise role of the respective gene products still remains elusive. Computer-aided sequence comparisons do not point to any known biochemical function. Presently, several families of pathogenesis-related proteins and genes are known (Bowles, 1990), but these families show no similarity to Bet v 1 and homologous proteins.
Bet v 1 shows a considerable degree of heterogeneity. Up to 10 charge variants have been observed by two-dimensional IgE immunoblots (Rohac et al., 1991) and isoelectric focusing techniques (Ferreira et al., 1993). Previously, nBet v 1 was described as an acidic glycoprotein (Ipsen and Hansen, 1989; Larsen et al., 1992), and a single consensus site for N-glycosylation is present in the Bet v 1 sequence (Breiteneder et al., 1989). Thus, glycosylation (or other post-translational modifications) could be an explanation for the observed heterogeneity. However, until now, conclusive evidence for the presence, nature, and location of structural modifications of the Bet v 1 molecule is not available.
Isogenes coding for Bet v 1 isoforms and/or allelic variants could be another explanation for the heterogeneity observed in two-dimensional IgE immunoblots. Southern blot analysis of birch (Valenta et al., 1991) and differences in the reactivity of two anti-Bet v 1 monoclonal antibodies (Rohac et al., 1991) support this view.
To date, only limited structural analysis of Bet v 1 and related proteins has been published. In a previous paper we described the expression, purification, and immunological properties of rBet v 1 produced in Escherichia coli as a non-fusion protein (Ferreira et al., 1993). In the present study, we undertook a more detailed structural characterization of Bet v 1 by both cDNA cloning and mass spectrometry.
A birch pollen cDNA library was
constructed in ZAP (Stratagene, La Jolla, CA) and screened with
serum IgE from an allergic individual selected according to typical
case history, positive skin prick test, and RAST (radioallergosorbent
test) class > 3.5, as described previously (Breiteneder et
al., 1989).
Clones coding for Bet v 1 isoforms were isolated, and both strands were sequenced twice according to the dideoxy chain termination method (Sanger et al., 1977) using a T7 Sequencing Kit (Promega).
rBet v 1a was purified from crude E. coli lysates by chromatofocusing on a PBE-94 exchanger column followed by reversed phase HPLC (Ferreira et al., 1993). Purified Bet v 1 proteins were analyzed by SDS-polyacrylamide gel electrophoresis according to the method of Laemmli(1970) and visualized by staining with Coomassie Brilliant Blue R-250.
Protein concentration was determined by the micro-Kjeldahl method, using glycine as standard (Jacobs, 1959).
After cloning and
sequencing, several PCR-amplified fragments were found to correspond to
the original Bet v 1a clone. In addition, 11 different cDNA clones,
with lengths ranging from 567 to 756 base pairs, were isolated (Bet v
1b to Bet v1l, respectively). Five of these clones showed
3`-untranslated regions of different lengths and contained poly(A)
tails, whereas the other six were truncated at the same position in the
3`-noncoding region (approximately 80 nucleotides downstream of the
stop codon) because of a single base exchange (GA) that created a
new recognition site for HindIII, one of the enzymes used for
cloning the PCR fragments. Two complete Bet v 1 cDNA clones (Bet v 1
m/n) were isolated by screening a birch pollen cDNA library with human
IgE antibodies.
All cDNAs contained open reading frames of 480 nucleotides, coding for putative proteins of 160 amino acids, with calculated molecular masses ranging from 17,450 to 17,573 Da. The deduced amino acid sequences compared with that of Bet v 1a are shown in Fig. 1. In three cases (Bet v 1d/h, Bet v 1f/i, and Bet v 1 m/n) differences in the nucleotide sequences did not result in amino acid changes. Since in all PCR clones the first 19 nucleotides were included in the 5` PCR primer, additional differences at the DNA level could be possible in this region, but these were not detected.
Figure 1: Deduced amino acid sequence alignment of Bet v 1 isoforms from birch pollen. Dots indicate identical amino acids as in Bet v 1a. Arrows mark isoforms identified in the pollen mixture by PDMS analysis. Isoforms b, c, k, and m/n were not individually identified but were confirmed as a group.
Therefore, including Bet v 1a, eleven Bet v 1 protein sequence isoforms have been identified altogether, with amino acid identities ranging from 84.4% (because of differences in 25 amino acids) to 99.4% (a single amino acid exchange) for the different pairs.
Figure 2: Plasma desorption map of tryptic peptides. Total peptide mixtures of tryptic digests of rBet v 1a (A) and nBet v 1 (B) applied to PDMS are shown. Additional mass peaks in (B) can be explained by newly identified isoforms.
Figure 3: Peptide map of Bet v 1 tryptic digests. Peptides obtained from tryptic cleavages were separated by reversed phase HPLC and identified by PDMS. The peak marked (grad.) shows the start of linear gradient; *, peaks corresponding to peptides derived from new isoforms.
Figure 4: Proteolytic peptides of Bet v 1a identified by PDMS. Peptides were generated by trypsin (T1-T19), endoproteinase Glu-C (E1-E14), or a combination of the two enzymes (T1E1-T10E2).
Forty-three peptides (T peptides) were
detected by PDMS in a tryptic digest of nBet v 1, and their molecular
weights were determined from the obtained spectra (Table 1).
Eighteen of the mass signals could be easily matched with the molecular
weights of peptides predicted from the amino acid sequence deduced from
the published Bet v 1a cDNA sequence (T2-T19). The expected
signal at m/z 1987 corresponding to the N-terminal peptide T1
was missing from the spectra. A signal detected at m/z 1856
could be accounted for, assuming that the initial methionine had been
removed, leaving glycine as the NH-terminal amino acid.
This signal could also be matched with the expected T10 fragment (m/z 1856). As two different HPLC fractions of tryptic digests
produced an ion at m/z 1856, these fractions were each
subdigested with endoproteinase Glu-C, and the resulting peptide
mixtures (TE peptides) were analyzed by PDMS (Table 1). The two
fractions produced (M+H)
ions at m/z 729/916 and 988, respectively (predicted mass values for
endoproteinase Glu-C-subdigested T1 lacking the initial methionine were
729, 249, and 916; for T10 the predicted values were 888 and 987).
These fractions were identified as T1 and T10, respectively, thus
confirming the above assignment. The removal of initiating methionine
was also confirmed by NH
-terminal sequence analysis of
purified nBet v 1 and rBet v1a (data not shown). Therefore, the amino
acid sequence defined in the present study corresponds to residues
2-160 of the published cDNA sequence. To be consistent with the
NH
-terminal sequence of the mature Bet v 1 protein, we have
renumbered the sequence of Bet v 1a starting with
Met
-Gly
-Val
-Phe
. . .
and applied the same numbering system to the new Bet v 1 cDNA sequences
presented here. The signal at m/z 1476 was assigned to peptide
T9 (69-80) (Fig. 4) originating from an incomplete trypsin
cleavage. As shown in Fig. 4, PDMS analysis and HPLC
fractionation of nBet v 1 tryptic digests confirmed 100% of the Bet v
1a sequence.
Treatment of nBet v 1 with endoproteinase Glu-C yielded
the 26 peptides (E peptides) shown in Table 1. These peptides
covered about 98% of the Bet v 1a sequence (Fig. 4). Similarly,
as observed in the analysis of tryptic digests, the expected mass
signal at m/z 860 (corresponding to the
NH-terminal peptide E1) was absent in the spectra. The
(M+H)
ion at m/z 729 was unmatched by
any expected fragment according to the published cDNA sequence and,
thereupon, assigned to peptide E1 lacking the initiating methionine.
The (M+H)
ions at m/z 959 and 1860 were
assigned to E1+E2 and E4 peptides, respectively, originating from
incomplete cleavage by endoproteinase Glu-C.
According to the Bet v
1a cDNA sequence, peptides T10 (81-97) and E6 (74-87)
should contain the only potential Asn-linked glycosylation site
(Asn). The spectra of nBet v 1 digested with either
trypsin (Fig. 2) or endoproteinase Glu-C showed strong signals
at m/z 1855 (T10) and 1730 (E6), respectively, demonstrating
that the asparagine residue at position 82 was unmodified. Fig. 5shows the mass spectra recorded on HPLC-purified T10 and
E6 peptides. As described above, purified T10 was also subdigested with
endoproteinase Glu-C, and mass spectra were recorded on the resulting
sample (Table 1).
Figure 5: PDMS of HPLC-purified T10 and E6 peptides. Typical mass spectra of peptides T10 and E6, which correspond to trypsin and endoproteinase Glu-C cleavage peptides of nBet v 1 containing the only potential N-glycosylation site of the Bet v 1a sequence. Predicted masses for unmodified T10 and E6 were 1856 and 1729, respectively.
Treatment of rBet v 1a with trypsin yielded the 17 peptides shown in Table 1, which covered approximately 95% of the amino acid sequence. The 5% of the rBet v 1a sequence not mapped consisted of small tryptic peptides (one pentapeptide, T16, and one tripeptide, T17). Additional coverage of the rBet v 1a sequence was achieved by digestion with endoproteinase Glu-C, which produced the 13 peptides listed in Table 1. Altogether, 100% of the primary structure of rBet v 1a was confirmed. It should be emphasized that all peptides detected by PDMS of rBet v 1a digests were also detected in nBet v 1 digests.
Next, we attempted to analyze peptides originating
from nBet v 1, which could not be assigned to the Bet v 1a sequence by
their molecular weight and the specificity of the enzyme. According to
their molecular weight, these peptides did not correspond to well
characterized autolysis products from trypsin (Vestling et
al., 1990). Another possible explanation is that these unmatched
mass signals could be caused by peptides carrying postsynthetical
modifications. However, the differences between any of the observed
mass values and the masses of predicted proteolytic peptides were not
consistent with the presence of common post-translational
modifications, such as methylation, acetylation, phosphorylation, or O-glycosylation. We speculated that they might have originated
from isoforms of Bet v 1a. Hence, the data were specifically searched
for signals corresponding to proteolytic peptides predicted from the
amino acid sequences deduced from the 13 Bet v 1 cDNA sequences
obtained in the present study (Bet v 1b-n). In this way, 22
(M+H) ions in tryptic digests and 11 in
endoproteinase Glu-C digests could be matched with predicted peptides
of Bet v 1 isoforms (see Table 1). Two of those matched peptides,
E5 (M+H)
= 1554 and E7
(M+H)
= 845, were sequenced by Edman
degradation, confirming these assignments. In total, these matched
peptides covered approximately 83-91% of the amino acid exchanges
in Bet v 1b, c, k, and m/n; 60-70% for Bet v 1j, f/i, e, and d/h;
and 44 and 57% for Bet v 1 g and Bet v 1l, respectively.
Finally, peptides T19-3/E14-3 and T19-5/E14-5 (see Table 1) suggested the existence of truncated Bet v 1 isoforms, missing 3 or 5 amino acids at the C terminus, respectively. This region shows 100% sequence identity in all cDNA clones. Interestingly, these shorter peptides were not detected in proteolytic digests of rBet v 1a, but only in preparations of nBet v 1. In this case, we estimated that less than 30% of nBet v 1 consisted of truncated forms (based on relative peak heights of the signals corresponding to truncated versus intact peptides), very likely because of proteolysis during the pollen extraction procedure.
The aim of the present paper was 2-fold: First, we tried to confirm at the protein level the deduced amino acid sequence of Bet v 1a (formerly referred to as Bet v 1), the major allergen of birch pollen (Breiteneder et al., 1989), and of several other closely related isoallergens that also occur in pollen and whose sequences are presented here. As shown here and by others (Tsarbopoulos et al., 1988; Pedersen et al., 1993), the remarkable detection limit (about 10-100 pmol) and resolving power of PDMS is sufficient to confirm a protein sequence previously determined by cDNA sequencing and to discriminate between closely related isoforms of proteins. Moreover, it is possible to roughly estimate the relative amounts of the isoproteins in the natural mixture by comparing peak areas of isopeptides, which points to the fact that the isoform Bet v 1a represents at least 50% of the total mass of pollen Bet v 1.
Second, it was our aim to investigate any possible postsynthetic modifications on rBet v 1a or on the natural mixture of isoallergens obtained from commercially available birch pollen. The correct postsynthetic modifications of a recombinant protein are of utmost importance if such a protein is to be used for the diagnosis and treatment of human disease, as is the case with recombinant allergens. As shown in several cases, such modifications can strongly influence the immunological properties of proteins (Nilsen et al., 1991; Batanero et al., 1994), and therefore, the present study is closely connected with a previous study in which we investigated the immunological equivalence of rBet v 1a with nBet v 1 by enzyme-linked immunosorbent assay competition experiments (Ferreira et al., 1993). The purified rBet v 1a used here revealed all of the predicted peptides ( Fig. 2and Table 1) but no additional peaks. The only postsynthetic modification observed was cleavage of the N-terminal methionine with nearly 100% efficiency, as was expected.
nBet v 1 from birch pollen was purified to electrophoretic homogeneity by immunoaffinity chromatography and HPLC. After purification it still showed the same complex pattern of spots in two-dimensional IgE immunoblots as was seen in the starting material (data not shown). Our interpretation is that no immunoreactive material (and, therefore, no specific isoform) was lost in the purification procedure. All of the mass peaks (with one single exception) obtained from nBet v 1 after proteolytic digestion were either identical with the ones obtained from purified rBet v 1a or could be explained by peptides predicted to arise from the isoform sequences presented here. Isoforms b, c, k, and m/n were so similar that we could only confirm them as a group. Isoform j was so similar to Bet v 1a that it could not be discriminated from it. Isoform l did not lead to any diagnostic peptide discriminating it from all other isoforms, and therefore, we cannot be sure that it really exists at the protein level. It should be considered that some of the sequence differences observed between the Bet v 1 isoforms might have originated from PCR artifacts. However, this seems unlikely since most of the cDNA sequences obtained by PCR were also confirmed individually at the protein level by PDMS analysis (see Table 1and Fig. 1).
In the mass spectra of nBet v 1 no peaks were found indicating N-glycosylation, O-glycosylation, phosphorylation, methylation, or acetylation of the cleavage peptides. This result is in good agreement with earlier work showing that Bet v 1 is a cytoplasmic protein located at or near the place of ER-bound ribosomes in dry pollen (Grote, 1991), since cytoplasmic proteins are frequently unmodified and never have been found to be N-glycosylated (Hirschberg and Snider, 1987). Phosphorylation would have explained the charge differences of nBet v 1 in two-dimensional immunoblots (Rohac et al., 1991) but does not seem to occur. It was previously shown that phosphopeptides are well detected by PDMS (Craig et al., 1991). Because of the absence of covalent modifications, the production of rBet v 1 in a form that is immunologically (and conformationally) similar to nBet v 1 in E. coli is greatly facilitated.
Finally, comparison of the 14 cDNA sequences showed that three pairs of sequences (f/i, d/h, and m/n) are each coding for the same protein and are different only through silent exchanges. The 3`-noncoding regions within the d/h pair are nearly identical, and therefore, we assume that these sequences represent alleles of the same gene locus. This is conceivable, since the birch pollen used in this study for mRNA and protein extraction was obtained from a variety of different trees. Sequences a and j are probably not allelic, since they show relatively large insertions, deletions, and sequence deviations in their 3`-noncoding regions. However, it is not generally possible to discriminate with certainty between allelic variants and different isoforms by comparison of cDNA sequences. For this, genomic sequences and restriction maps would be needed.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) 77200, 77265-77270[GenBank], 77271-77273[GenBank], 77274, 81972, and 82028.