(Received for publication, July 10, 1995; and in revised form, October 16, 1995)
From the
Bark lectins from the elderberry species belonging to the genus Sambucus have a unique carbohydrate binding specificity for
sialylated glycoconjugates containing NeuAc(2-6)Gal/GalNAc
sequence. To elucidate the structure of the elderberry lectin, a cDNA
library was constructed from the mRNA isolated from the bark tissue of
Japanese elderberry (Sambucus sieboldiana) with
gt11
phage and screened with anti-S. sieboldiana agglutinin (SSA)
antibody. The nucleotide sequence of a cDNA clone encoding full-length
SSA (LecSSA1) showed the presence of an open reading frame with 1902
base pairs, which corresponded to 570 amino acid residues. This open
reading frame encoded a signal peptide and a linker region (19 amino
acid residues) between the two subunits of SSA, the hydrophobic
(A-chain) and hydrophilic (B-chain) subunits. This indicates that SSA
is synthesized as a preproprotein and post-translationally cleaved into
two mature subunits. Homology searching as well as molecular modeling
studies unexpectedly revealed that each subunit of SSA has a highly
homologous structure to the galactose-specific lectin subunit and
ribosome-inactivating subunit of plant toxic proteins such as ricin and
abrin, indicating a close evolutionary relationship between these
carbohydrate-binding proteins.
Plant lectins with defined carbohydrate binding specificities have been isolated from various origins and used as invaluable tools for the detection, fractionation, and isolation of glycoconjugates. The biological roles of these plant lectins, however, are still not clear compared with some animal or microbial lectins that have been shown to play important roles in biological recognition systems. Structural studies on these molecules may provide useful information not only on the molecular basis for the binding specificity, but also on their biological function through the comparison of their structure with other functional proteins.
Bark lectins from the elderberry species
belonging to the genus Sambucus specifically bind to
NeuAc(2-6)Gal/GalNAc sequence (1, 2, 3) and have been used as a useful tool
for the analysis of sialylated
glycoconjugates(4, 5, 6, 7, 8) .
These lectins are tetrameric glycoproteins consisting of two types of
subunits, one with a carbohydrate-binding site and one with an unknown
function(1, 3) . The elderberry bark lectin behaved
basically as a galactose-binding lectin, a most common group of plant
lectins, but the affinity of the elderberry lectins for the sialylated
galactose unit increased several thousandfold compared with the
galactose itself. This was found to be true only when the sialic acid
attached to the 6-position of the
galactopyranose(1, 3) . This remarkable increase in
affinity, which was caused by the attachment of sialic acid at the
specific site of the Gal/GalNAc residue, prompted us to elucidate the
molecular basis for such a unique binding specificity as well as the
possible evolutionary relationships to other galactose-binding lectins.
We report here that the molecular cloning of a cDNA encoding the
bark lectin from Japanese elderberry (Sambucus sieboldiana agglutinin (SSA)) ()as well as molecular modeling
studies unexpectedly revealed that each subunit of SSA has a highly
homologous structure to the galactose-specific lectin subunit and
ribosome-inactivating subunit of plant toxic proteins such as ricin and
abrin. This would indicate a close evolutionary relationship between
these carbohydrate-binding proteins. We also show that both subunits
that constitute the tetrameric lectin molecule are encoded on a single
mRNA, which suggests the presence of post-translational processing to
form two mature subunits.
Figure 1:
Analysis of the in vitro translation products of RNA from Japanese elderberry bark. Total
RNA from the bark tissue of Japanese elderberry was used for in
vitro translation with rabbit reticulocyte lysate. Translation
products were immunoprecipitated with anti-SSA antibody and subjected
to SDS-PAGE in the presence of mercaptoethanol. The arrow indicates the M 58,000 band. SSA and SSA(s) indicate the positions of the SSA tetramer and its
subunits, respectively.
A bark cDNA library was
constructed using the EcoRI site of the expression vector
gt11 phage with the double-stranded cDNA and was screened three
times with an affinity-purified anti-SSA antibody. Four positive clones
were obtained and subcloned into the BamHI site of the
Bluescript II KS
plasmid vector. Sequence as well as
Southern blot analyses of these four cDNA clones showed that these
clones had significant overlapping regions, indicating that they were
derived from the same gene. Arrangement of the sequence of these four
clones yielded a 834-base pair sequence, which corresponded to 278
amino acid residues. Portions of the deduced sequence coincided with
those of two internal peptides isolated by CNBr cleavage of SSA,
suggesting that these clones encoded the cDNA of SSA. However, none of
these four clones contained the region corresponding to the N-terminal
sequences of the two subunits of SSA, indicating that they were not
full-length clones. To isolate a full-length clone, the cDNA library
was rescreened by plaque hybridization using a probe of the 243-base
pair nucleotide corresponding to the 5`-terminal region of the 834-base
pair sequence. Seven positive clones, each 2000 base pairs in size,
were isolated. Southern blot analyses indicated that these clones
belonged to the same group. Analysis of a clone (LecSSA1) with the
longest insertion showed a sequence of 1902 nucleotide base pairs with
an open reading frame encoding a polypeptide with 570 amino acid
residues (Fig. 2).
Figure 2: Nucleotide sequence and deduced amino acid sequence of cDNA clone LecSSA1. The amino acid sequences obtained by the N-terminal sequencing of each subunit of SSA as well as those of internal peptides obtained by CNBr cleavage are indicated in boldface. The signal peptide sequence is indicated in italics. The C terminus of the SSA A-chain (dashed line) and the N terminus of the SSA B-chain (underlined) are indicated.
Alignment of the known sequences of the
N-terminal regions as well as the internal peptides of SSA revealed
that both subunits of SSA were encoded in this open reading frame (Fig. 2). This indicated that a precursor polypeptide
synthesized from the mRNA corresponding to this open reading frame was
post-translationally cleaved into two subunits. From the N-terminal
sequences of the two subunits of SSA(3) , it was shown that the
hydrophobic subunit (SSA A-chain) was encoded at the 5`-terminal side
of the cDNA and that the hydrophilic subunit (SSA B-chain) was encoded
at the 3`-terminal side. To determine the coding region for the first
subunit, the SSA A-chain, the C-terminal sequence of the mature
A-subunit was analyzed. Reduced and alkylated A-subunit was isolated by
reversed-phase HPLC using conditions similar to those reported
previously(3) . Carboxypeptidase Y treatment of the A-subunit
liberated serine, threonine, and valine successively, which
corresponded to the sequence of Val-Ser
in the structure of the precursor polypeptide (Fig. 2).
Combining this information with the known N-terminal sequences of both
subunits, coding regions for the A- and B-subunits were determined as
Val
-Ser
and
Gly
-Ala
, respectively. These results
also showed the presence of the linker peptide portion,
Ser
-Arg
, between the A- and
B-subunits.
Hydropathy plot analysis (Fig. 3) as well as the
N-terminal sequence of the hydrophobic subunit indicated the presence
of a signal peptide consisting of 28 amino acid residues
(Met-Arg
; Fig. 2). Thus, SSA is
synthesized as a single preproprotein and processed into two mature
subunits by post-translational removal of the signal peptide and the
internal linker peptide between the two subunits. The hydropathy plot
also supported the identification of the region for the hydrophobic and
hydrophilic subunits as described above.
Figure 3: Hydropathy plot of the deduced amino acid sequence of SSA. S, signal peptide; L, linker peptide region.
The calculated molecular weights of the A- and B-subunits were 28,774 and 29,055, respectively. The discrepancy between these values and those previously obtained for the two subunits by SDS-PAGE, 31,000 and 35,000, may be explained by the presence of sugar chains in each subunit(2) .
Figure 4: Comparison of the amino acid sequences of the preproprotein of SSA, abrin(23) , ricin(20) , and RCA(19) . Identical amino acid residues are indicated (*), as are conserved amino acid residues among ribosome-inactivating proteins (#)(31) . The amino acid residues of ricin involved in the binding to galactose are also indicated (+)(16) .
Ricin is a glycoprotein and contains two asparagine-linked
oligosaccharides at Asn and Asn
in the
B-chain and another at Asn
in the A-chain(20) .
Both SSA subunits were also shown to be glycosylated by periodic
acid-Schiff staining of the SDS-polyacrylamide gel as well as by lectin
blotting using several horseradish peroxidase-labeled lectins (data not
shown). The sequence of SSA indicated the presence of six potential N-glycosylation sites in the A-chain and two sites in the
B-chain, although none of them coincided with the glycosylation sites
of ricin, nor were the real glycosylation sites identified.
Figure 5:
Molecular model of SSA (A) made
from the crystal structure of the ricin-lactose complex (B) (14, 15, 16, 17) . Only -carbon
atoms are shown. The A-chains of ricin/SSA are shown in the lower
portion of each molecule, and the B-chains are shown in the upper
portion. The lactose molecule bound to ricin is indicated by dots. The arrow indicates Cys
in the
SSA B-chain.
The
disulfide linkages in the SSA molecule were generated based on their
corresponding position in ricin as the positions of the cysteine
residues in the primary structure of SSA coincided well with those in
ricin, except for Cys in the SSA B-chain. One of these
disulfide linkages (Cys
-Cys
) connects
the A- and B-subunits, and the other is present within the B-subunit.
The free cysteine residue in the B-subunit (Cys
) that was
not found in the ricin B-chain was located on the surface of the SSA
molecule (Fig. 5A, indicated by the arrow).
Molecular cloning of the cDNA for SSA (LecSSA1) revealed that
both the hydrophobic and hydrophilic subunits of SSA (SSA A- and
B-chains) are encoded as a single preproprotein in the cDNA. From the
C- and N-terminal sequences of the SSA A- and B-chains, the site for
the post-translational cleavage was determined between Ser and Ser
and also Arg
and Gly
(Fig. 2). The presence of similar post-translational
processing has been reported for some lectins as well as for some
storage proteins. Interestingly, however, the specificity of the
putative endopeptidase for the cleavage of the SSA precursor seems to
be different from those for ricin(20) , RCA(19) ,
abrin(23) , other legume lectins(24, 25) , and
plant storage proteins such as soybean glycinin (26) and pea
legumin(27) . A polypeptide corresponding to the approximate
size of unprocessed A-B chain, although associated with some other
minor bands, was detected by in vitro translation experiments
using rabbit reticulocyte lysate, which may lack such a specific
endopeptidase.
Although there is no direct evidence for the
identification of a SSA subunit that carries the carbohydrate-binding
site, the extensive homology of the hydrophilic subunit (SSA B-chain)
to the lectin subunit of abrin/ricin suggests the presence of the
binding site in this subunit. Structural similarity of the B-subunit of
SSA to that of ricin in the three-dimensional model (Fig. 5)
also supports this. We recently found, by a chemical modification
study, that histidine and tyrosine residues in SSA play an important
role in the binding to sialylated oligosaccharides. ()The
absence of a histidine residue in the coding region for the hydrophobic
subunit (SSA A-chain) further supports that the carbohydrate-binding
site is located in the SSA B-chain.
Each subunit of SSA is connected
to the other subunits by disulfide linkage to form the tetrameric
glycoprotein molecule. As the SSA A-chain has only one cysteine residue
(Cys) near the C terminus, it must form a disulfide
linkage with a SSA B-chain. Otherwise, SSA cannot form the tetrameric
molecule connected through disulfide linkage. On the other hand, the
sequence of the SSA B-chain contains 10 cysteine residues. For the same
reason, one of them should form a disulfide linkage with the SSA
A-chain, and at least another one should participate in the
cross-linkage between another SSA B-chain. The previous findings that
the selective reduction and alkylation of the disulfide linkage between
the SSA subunits yielded 1.4 pyridylethylated cysteines per subunit (7) indicate that the A- and B-subunits are connected through
one disulfide linkage to form a tetrameric molecule in the form of
A-B-B-A. Two cysteine residues, Cys
and
Cys
, which connect the A- and B-subunits, could be
assigned because of their conserved position in the corresponding
subunits of ricin/abrin. Concerning the cysteine residue responsible
for the connection between two B-subunits, the SSA B-chain was shown to
contain one additional cysteine residue (Cys
; Fig. 2) that is not present in the ricin B-chain. This
additional cysteine residue in the SSA B-chain might be responsible for
the connection of two B-subunits as these connections are not present
in dimeric molecules such as ricin and abrin. The presence of this
cysteine residue on the surface of the three-dimensional model of SSA
further supports the possible involvement of this residue in the
cross-linkage between the subunits (Fig. 5A).
The
striking structural similarity of SSA to ricin/abrin-type
ribosome-inactivating proteins as well as to RCA revealed the close
evolutionary relationship of SSA to these toxic plant
proteins(28, 29) , although elderberry is
taxonomically very far from those plants that produce these toxins (for
example, elderberry and R. communis, which produces ricin,
belong to different subclasses). However, there are significant
differences between the properties of SSA and these proteins. First,
despite the fact that the three-dimensional model of the SSA B-chain
suggested the presence of two domains corresponding to two
carbohydrate-binding domains of the ricin B-chain (Fig. 5B), SSA has only one carbohydrate-binding site,
probably in this subunit. Also, the carbohydrate binding specificity of
SSA is significantly different from that of these toxic proteins.
Ricin/abrin-type toxic proteins or RCA is basically specific to D-galactose residues, but the elderberry bark lectins
including SSA are specific to the NeuAc(2-6)Gal/GalNAc
sequence(1, 3) . Although it is difficult to indicate
the amino acid residues responsible for such differences in the binding
specificity at present, further comparison of their structure coupled
with site-directed mutagenesis and chemical modification and
crystallographic/NMR studies will eventually clarify why the elderberry
lectins recognize 2,6-linked sialylated oligosaccharides so
specifically.
The structural similarity of the SSA A-chain to the
A-chain of ricin and abrin raised the question about the biological
function of this subunit. The A-chain of ricin and abrin is a N-glycosidase that hydrolyses a very specific site of rRNA,
resulting in the inhibition of protein synthesis at the
ribosome(30, 31, 32) . The invariant amino
acid residues known for most of the ribosome-inactivating proteins
including those of bacterial origin (Shiga-like toxin) were conserved
in the SSA A-chain, except for Gln(33) . However,
SSA showed only a very weak (several thousandfold weaker than ricin)
inhibitory activity against the in vitro protein synthesis of
rabbit reticulocyte lysate (Table 1). MSSA (free and stabilized
subunits of SSA) also showed only a very limited activity, suggesting
that the inability of SSA to terminate protein synthesis does not
relate to its tetrameric structure and reflects its intrinsic property.
SSA is also quite different compared with the structurally related
tetrameric lectin RCA-120, which was reported to inhibit strongly the in vitro protein synthesis of rabbit reticulocyte
lysate(34) . In this context, it is noteworthy to point out the
recent report of the presence of a new group of ribosome-inactivating
proteins (RIPs), ebulin l and nigrin b, that have been isolated from
the bark tissue of elderberry species (Table 1)(21, 22) . These proteins are composed
of two different subunits and have a molecular size corresponding to
that of ricin/abrin, although their structure has not yet been
elucidated. Interestingly, they could inactivate mammalian ribosomes in vitro, but were inactive on the cell itself. Actually, we
recently discovered the presence of a similar RIP in bark extract from S. sieboldiana, (
)and this makes it difficult to
determine whether the very weak inhibitory activity detected in the SSA
preparation reflects the property of SSA itself or the contamination of
a trace amount of such a RIP. Thus, it can be said that, despite the
structural similarity to ricin/abrin-type toxins, SSA has only a very
weak activity as a RIP or actually does not have such activity.
Structural comparison of these proteins with SSA, ricin/abrin, and RCA
combined with their biological activity will give more insight into
their evolutionary relationship as well as the structure/function
relationships of these proteins.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) D25317[GenBank].