(Received for publication, June 1, 1995; and in revised form, August 8, 1995)
From the
-Agglutinin of Saccharomyces cerevisiae is a cell
wall-associated protein that mediates cell interaction in mating.
Although the mature protein includes about 610 residues, the
NH
-terminal half of the protein is sufficient for binding
to its ligand a-agglutinin.
-Agglutinin
, a
fully active fragment of the protein, has been purified and analyzed.
Circular dichroism spectroscopy, together with sequence alignments,
suggest that
-agglutinin
consists of three
immunoglobulin variable-like domains: domain I, residues 20-104;
domain II, residues 105-199; and domain III, residues
200-326. Peptide sequencing data established the arrangement of
the disulfide bonds in
-agglutinin
.
Cys
is disulfide-bonded to Cys
, forming an
interdomain bond between domains I and II. Cys
is bonded
to Cys
, in an atypical intradomain disulfide bond between
the A and F strands of domain III. Cys
and Cys
have free sulfhydryls. Sequencing also showed that at least two
of three potential N-glycosylation sites with sequence
Asn-Xaa-Thr are glycosylated. At least one of three Asn-Xaa-Ser
sequences is not glycosylated. No residues NH
-terminal to
Ser
were O-glycosylated, whereas
Ser
, and all hydroxy amino acid residues COOH-terminal to
this position were modified. Therefore O-glycosylated Ser and
Thr residues cluster in the COOH-terminal region of domain III, and the O-glycosylation continues into a Ser/Thr-rich sequence that
extends from domain III to the COOH-terminal of the full-length
protein.
Sexual agglutinins are expressed on the surface of haploid budding yeasts, including Saccharomyces cerevisiae (Lipke and Kurjan, 1992; Pierce and Ballou, 1983; Hagiya et al., 1977; Crandall et al., 1974; Crandall and Brock, 1968). During mating, the interaction of complementary agglutinins of each species mediates direct cell-cell contact to promote fusion of pairs of mating partners to form diploid zygotes. Mutants defective in these sexual agglutinins are mating-deficient in liquid medium (Lipke et al., 1989).
S. cerevisiae -agglutinin is a highly
glycosylated cell wall-anchored protein that is constitutively
expressed on cells of the
mating type and is induced to greater
expression levels in response to the mating pheromone, a-factor
(Terrance et al., 1987; Hauser and Tanner, 1989; Lipke et
al., 1989). The open reading frame of the
-agglutinin gene, AG
1, encodes a single polypeptide of 650 amino acids,
including an NH
-terminal secretion signal (residues
1-19) and a COOH-terminal glycosylphosphatidylinositol (GPI) (
)addition signal that is involved in cell wall anchorage
(residues 628-650) (Kodukula et al., 1993; Wojciechowicz et al., 1993; Kapteyn et al., 1994; Lu et
al., 1994, 1995; Van Berkel et al., 1994). The
NH
-terminal part of the mature protein (residues
20-350) contains the binding region, which has been proposed to
consist of three domains (Wojciechowicz et al., 1993). These
features are summarized in Fig. 1.
Figure 1:
Features of the -agglutinin
sequence. The open reading frame of the Ag
1 gene is
shown. The NH
-terminal secretion signal and the
COOH-terminal GPI addition signal are colored solid black.
Proposed IgV domains and the Ser/Thr-rich sequence are
marked.
Within the
NH-terminal half, a segment (amino acid residues
200-326, designated domain III) shows significant similarity to
variable domains of the immunoglobulin superfamily (IgV domains) based
on the amino acid sequence and predicted
-sheet profile analysis
(Wojciechowicz et al., 1993). A His residue essential for
binding has been identified within this putative domain (Cappellaro et al. 1991), and other essential residues have been
identified by site-specific mutagenesis. (
)We have proposed
that domains I and II are also Ig-like, but evidence to support this
contention has been lacking.
In Ig domains, post-translational
modifications help determine tertiary structure (Dwek et al.,
1993; Williams and Barclay, 1988). We have investigated the disulfide
bonding pattern of the 6 Cys residues and the positions of the N- and O-glycosylations in the Ig-like region
(Terrance et al., 1987; Hauser and Tanner, 1989). N-Linked glycans are not important for cell adhesion, because
endo H treatment or synthesis in the presence of tunicamycin does not
affect binding activity (Terrance et al., 1987). O-Linked glycans are also present and appear to account for a
significant portion of the apparent size of -agglutinin
(Wojciechowicz et al., 1993; Lu et al., 1994).
We
have now produced a 332-residue active fragment,
-agglutinin
, in quantities sufficient to
allow investigation of the secondary structure and determine the
positions of post-translational modifications. The results, along with
those of a modified sequence alignment procedure, result in a model for
-agglutinin.
The active material was dialyzed and lyophilized. The
dry powder was resuspended in 10 mM potassium chloride, 10
mM sodium acetate, pH 5.5, 0.01% SDS, and 1 mM EDTA
and incubated with 1:200 to 1:500 molar ratio of endo H for 4-6 h
at 25 °C or overnight at 4 °C. Under these conditions, there
was no detectable proteolysis of the
-agglutinin
. The de-N-glycosylated
-agglutinin
was chromatographed on a
Bio-Gel P-60 size exclusion column (60-ml bed volume) which had been
previously equilibrated with 30 mM sodium acetate buffer, pH
5.5.
Figure 8:
Summary
of sequenced -agglutinin
peptides. Regions
sequenced from with tryptic and S. aureus V8 peptides are underlined with solid or wavy lines,
respectively. Sulfhydryl groups are labeled (SH) and disulfide
bonds are marked. Identified O-linked glycosylation sites are
marked (solid diamonds). Potential N-glycosylation
sites are italicized and stricken out; the two
identified N-glycosylation sites are marked (stacked solid
diamonds).
Figure 2:
Immunoblot of
-agglutinin
from culture supernatant.
Supernatant from a culture of L
21
[pPGK-AG
1
] (17 ml) was
lyophilized to dryness, resuspended in 200 µl of distilled water,
and passed through a Bio-Gel P-10 column preequilibrated with 0.01 M sodium acetate, pH 5.5. Desalted material (20 µl) was
treated without or with endo H (0.5 µl of 1 unit/ml) at room
temperature for 2 h. Samples without and with Endo H treatment were
analyzed by electrophoresis in the absence or presence of the reducing
reagent DTT as indicated.
Figure 3:
Bio-Gel P-60 chromatography of endo
H-treated -agglutinin
. The active material
from DEAE-Sephadex A-25 was lyophilized to dryness. The material was
resuspended and dialyzed against 0.03 M sodium acetate, pH
5.5, treated with endo H (15 µl of 1 unit/ml endo H to 2000 units
of
-agglutinin activity) and loaded onto a Bio-Gel P-60 column
preequilibrated with the same buffer. Fractions (3 ml) were collected
and monitored at 280 nm (A). Aliquots of fractions were
electrophoresed on a 12% SDS-PAGE gel and visualized by staining with
Coomassie Blue (B). Molecular size markers are shown on the left.
Elution of endo
H-treated -agglutinin
from a Bio-Gel P-60
column gave purified
-agglutinin with an apparent molecular size
of 45 kDa for the smaller species on SDS gels (Fig. 3). The
deduced M
of
-agglutinin
from the predicted amino acid sequence is 37,108.
Therefore, N-linked carbohydrate accounts for two-thirds of the apparent
110-kDa molecular mass of -agglutinin
, and
the O-linked carbohydrate remaining after endo H digestion
could account for an additional 8 kDa of apparent mass.
Figure 4:
SDS-PAGE analysis of endoprotease
Arg-C-digested -agglutinin
. Samples of
endoprotease Arg-C-digested
-agglutinin
(left lanes) and endoprotease alone (center
lanes, labeled ``enzyme'') were treated with or
without DTT as marked, electrophoresed on a 15% SDS-polyacrylamide gel,
and the gel was stained with Coomassie Blue. Molecular size standards
on the right were from 97,400 to 4000
Da.
The
NH-terminal sequence of each fragment was determined by
microsequence analysis after electroblotting onto polyvinylidene
difluoride membranes. Both the 16- and 21-kDa fragments had the same
NH
-terminal sequence as mature
-agglutinin, beginning
at Ile
, immediately following the secretion signal
sequence (Table 1). The 21-kDa form represented a species with
some N-linked carbohydrate remaining and generated a 16-kDa
fragment after additional treatment with endo H (data not shown). The
NH
-terminal
-agglutinin polypeptide from Ile
to Lys
would have a molecular mass of 15,119
daltons, close to the value for the 16-kDa peptide. The 31-kDa
fragment, called
-agglutinin
, started with
Ser
-Gly-Pro-Met-Leu-Val (Table 1). The predicted
molecular mass of this peptide is 21,989 Da. The extra 7 kDa of
apparent molecular mass in agglutinin
may be
attributed to the presence of multiple O-glycosylations (see
below). No additional fragments were seen, including any of the
predicted peptides following Arg residues (Fig. 4). Therefore,
endoprotease Arg-C cleaved only at Lys
, instead of any of
the six Arg residues in
-agglutinin
.
Endoprotease Arg-C from Clostridium histolyticum also
cleaved at Lys only (data not shown). Peptide sequencing
confirmed that the cleaved residue was Lys. No fragments were generated
in
-agglutinin
incubated without protease.
Therefore, hydrolysis of
-agglutinin
at
Lys
was endoprotease Arg-C specific and not due to
proteolytic activity in the
-agglutinin preparations or in other
reagents used for the digestion. Tosyl-lysyl chloroketone inhibits
Arg-C (Mazzoni et al., 1991); therefore, Arg-C must have
proteolytic activity toward Lys.
Figure 5:
Far-UV
CD spectra of -agglutinin. Each spectrum represents the average of
five individual spectra taken at 1.0-nm intervals as specified under
``Experimental Procedures.'' Equivalent molar concentration
of each sample were examined. Spectra of native (solid line)
-agglutinin
and endoprotease Arg-C-digested
-agglutinin
(dashed
line).
-Agglutinin
was digested with trypsin in
the presence or absence of DTT, and the products were separated by
reversed phase chromatography on a C18 column. Three tryptic peptides
(T1, T2, T2`) were unique to the nonreduced chromatogram (Fig. 6A), and three peptides (DT1, DT2, and DT3) were
unique to the reduced chromatogram (Fig. 6B). These
peptides were sequenced and compared with the sequences of the
Cys-containing tryptic fragments predicted from the gene sequence (Table 2Table 3Table 4). Peaks T1 and DT1 had the
sequence of the predicted peptide containing both Cys
and
Cys
. As with the change in gel mobility, the change in
retention time in the presence of DTT implied that these two Cys
residues formed an internal disulfide. Similar chromatography and
sequencing analyses of peptides from S. aureus V8 digests
confirmed this assignment ( Table 3and Table 4): peptide
DS2 was seen only after reduction and contained Cys
. As
expected, tryptic peptide T1 containing Cys
and
Cys
was labeled with P-2007 after reduction, but was not
labeled in nonreduced samples (Fig. 7, A and B).
Figure 6:
Chromatogram of reduced and nonreduced
trypsin-digested -agglutinin
. Mixtures of
trypsin digested peptides treated without (A) or with DTT (B) were chromatographed. Peaks unique to the nonreduced (T1,
T2, T2`), and reduced (DT1-DT3) profiles are labeled. The peptide
containing Cys
and Cys
is peak T4 in
nonreduced and peak DT4 in the reduced profile. The amino acid
sequences of these peptides are listed in Table 1, Table 2, Table 3, and Table 4. Both chromatograms were obtained
under standard conditions, and the retention times shown in B apply to both chromatograms. Fraction numbers shown in A correspond to those mentioned in the text for concanavalin A
blotting.
Figure 7:
HPLC chromatograms of P-2007-labeled
tryptic -agglutinin
peptides. Tryptic
peptides labeled with P-2007 in the absence (A) or presence (B) of the reducing reagent TCP were fractionated by reversed
phase HPLC using the standard program, and the eluant was monitored at
340 nm. Peptide T4, containing Cys
and Cys
,
was labeled under nonreduced conditions, isolated (C),
digested with endoprotease Asn-N, and rechromatographed (Panel
D).
Tryptic peaks T2 and T2` each yielded two sequences in
approximately equimolar amounts ( Table 2and Table 4).
These sequences were those expected for disulfide-linked peptides
containing Cys and Cys
. Note that the
peptides containing Cys
do not contain Cys
,
because Lys
is efficiently cleaved (Fig. 6; Table 4). The difference in retention times of T2 and T2` must be
due to differential modification of the fragments; differences in the
extent of glycosylation of the peptide fragment containing Cys
would yield this result. In the chromatogram of tryptic peptides
from reduced
-agglutinin
, peaks T2 and T2`
were absent, and new peaks appeared with retention times of 117 and 154
min (labeled DT2 and DT3 in Fig. 6B).
Sequencing showed that these peaks were peptides predicted to include
Cys
and Cys
, respectively. These results
show that Cys
and Cys
are disulfide bonded.
Sequencing of S. aureus V8-digested peptides ( Table 3and Table 4) and P-2007 labeling (Fig. 7)
also confirmed this result.
To verify that peptide peak T4 in the nonreduced profile contained
Cys and Cys
as free sulfhydryls, this
peptide was labeled with P-2007. This peptide alone was labeled in
reactions of tryptic digests with P-2007 under nonreducing conditions (Fig. 7, A versus B). To determine if the peptide
contained two labeled cysteines, the isolated labeled peptide (Fig. 7C) was further digested with endoprotease Asp-N
and rechromatographed (Fig. 7D). Two additional labeled
peptides were detected at 35 and 45 min, as a result of the digestion.
These peptides had the retention times expected for the labeled
peptides containing Cys
and Cys
,
respectively. The original labeled peptide with a retention time of 53
min, however, was still present, probably due to incomplete digestion.
Therefore, both Cys
and Cys
are free
cysteines.
Eight Ser residues (positions 282, 316, 331, 334, 335,
338, 346, and 350) and 15 Thr residues (positions 289, 299, 303, 307,
308, 311, 314, 315, 329, 339, 340, 341, 342, 345, and 349) were found
to be modified in tryptic peptides and/or S. aureus V8
peptides (Table 6). Therefore, all of the eight Ser and 15 Thr
residues from Ser to the COOH terminus of
-agglutinin
were modified. All other
sequenced Ser and Thr residues were observed as expected (Fig. 8).
-Agglutinin
is fully active and
must therefore form a correctly folded structure. A high proportion of
-sheet structure is present throughout the protein. Thus, physical
evidence bolsters sequence similarity arguments that there are three
IgV-like domains in
-agglutinin
.
Figure 9:
Alignment of three domains of
-agglutinin with each other and with a consensus sequence for IgV
domains (Williams and Barclay, 1988). The positions of the
-strands in the consensus sequence are shown. The alignment is
based on secondary structure prediction and alignment within
prospective
-strands, with gaps allowed only between strands (Chou
and Fasman; Lipke et al., 1995). The sequence between residues
101 and 110 is repeated as the G strand of domain I and the A strand of
domain II, as discussed in the text. Identities are boxed and shaded, similarities are boxed without shading.
Similarity sets are: A, F, I, L, M, V, Y; A, G; C, S, P; D, E; D, N; E,
Q; H, K, R; H, W, Y; N, Q; S, T;.
represents a hydrophobic
residue in the consensus and includes A, F, I, L, M, P, V, Y, and
W.
Assignment of domain III as an
IgV-like domain suggests that there may be additional Ig-like domains
in the NH-terminal region, because multiple sequential Ig
domains are often present in members of the Ig superfamily. In members
of the superfamily that are cell adhesion proteins, 2 to 5 sequential
domains are common. These tandem domains are at the NH
termini of the mature proteins in the vast majority of cases
(Williams and Barclay, 1988). Furthermore, the Ig fold appears to be
more widespread than the Ig superfamily itself and proteins with little
or no sequence similarity to Ig domains form Ig-like folds. Most of
these proteins are involved in cell adhesion or protein-protein
interaction (Holmgren et al., 1992; Overduin et al.,
1995; Shapiro et al., 1995).
The 180
NH-terminal residues of
-agglutinin
are enough to form two more IgV domains, with the G strand of
domain I being the A strand of domain II, as in CD4 (Fig. 9)
(Williams and Barclay, 1988; Williams et al., 1989; Ryu et
al., 1990; Wang et al., 1990; Barclay et al.,
1993). A revised alignment procedure for
-agglutinin
strongly supports a
three-domain assignment (Fig. 9) (Lipke et al., 1995).
When the sequences of the three proposed domains were aligned with each
other and with an IgV consensus based on predicted strand profile (Fig. 9) and hydrophobic moment (Eisenberg et al.,
1984) (data not shown), there was high conformity to the consensus in
all three domains (Table 7). Although there is a low degree of
identity in the alignment, the conserved residues include many of the
IgV consensus residues. The alignments shown scored significantly
better (Z > 3) than did random sequences of the same
composition. Residues in
-agglutininin domains I and II
corresponding to the consensus positions for the IgV domains include a
Cys residue in each domain (the F strand Cys in domain I and the B
strand Cys in domain II) and Trp
corresponding to strand C
of domains I. There are Met residues in all three proposed
-agglutinin domains in positions analogous to the conserved
D-strand Arg in other IgV domains (residues 69, 158, and 274, Fig. 9). In IgV domains, an Asp residue at the beginning of the
F strand forms a salt bridge with this Arg, which it could not do with
the Met residue in the
-agglutinin. In the three proposed
-agglutinin domains, this Asp is also absent (residues 89, 176,
and 293). Although the number of residues conserved among the three
domain is low, the three sequences show about 40% similarity (Table 7). The conserved and identical residues are especially
frequent at positions conserved in mammalian IgV domains ( Fig. 9and Table 7).
The similarity of domains I and II is also consistent with apparent sequence homology by a standard method. Residues 30-94 and 107-180 can be aligned with a Z score of 4.7 (GCG BESFIT, gap weight 3.0, length weight 0.0; Gribskov and Devereux, 1991). Such a score implies a common ancestral sequence and common structure for these regions, which correspond to strands B to F of domains I and II.
Domain III (residues
200-326) was previously proposed to contribute to the binding
site (Cappellaro et al., 1991; Lipke and Kurjan, 1992).
Neither the purified -agglutinin
fragment
nor the unpurified Arg-C digest of
-agglutinin
retained activity, despite the
retention of most of the secondary structure in the cleaved product.
The inactivity of the cleaved product implies that regions of domains I
and/or II are also essential for binding. Such contributions of
multiple domains to the binding site is the rule in the Ig superfamily,
with few exceptions (Williams and Barclay, 1988).
Figure 10:
Structure of -agglutinin. The
standard ``C''-shaped models of Ig domains are shown, with
the B and F strand Cys residues at the points of the C (Williams and
Barclay, 1988). The first two domains are fused to designate the shared
strand. Cys residues are shown in their approximate positions, as are N-glycosylation sites at Asn
and
Asn
. N-Glycosylation sites COOH-terminal to
Asn
have the sequence Asn-Xaa-Thr and are assumed to be
used, based on the sizes of truncated forms of
-agglutinin
(Wojciechowicz et al., 1993). Another possible N-glycosylation site at Asn
is not shown. Only
representative O-glycosylations are
shown.
There are four cysteine
residues in domain III, in the A, B, C`, and F strands. Intradomain
disulfide linkages in Ig-like domains often form between cysteines of
the B and F strands (Williams and Barclay, 1988). Although Cys and Cys
are aligned in positions for the consensus
intradomain disulfide bond, Cys
in strand A and
Cys
in strand F form the actual disulfide linkage. The
position of the disulfide Cys residues is not as highly conserved in
the Ig superfamily as it is in the antibodies themselves. In domain I
of myelin-associated glycoprotein, residues in strands B and E of the
IgV domain form an intrasheet disulfide linkage (Pedraza et
al., 1990). In domain II of CD4, there is a disulfide between
strands C and F (Ryu et al., 1990; Wang et al.,
1990). Thus, the bond between the A and F strands in domain III of
-agglutinin is a new position for intradomain disulfides in the Ig
superfamily. These strands are close enough to allow formation of the
bond (Lipke et al., 1995).
Cys in strand B
and Cys
in strand C` of domain III of
-agglutinin
are free sulfhydryls and can be
derivatized under nonreducing conditions. However, they appear not to
be exposed to solvent, since they were derivatized only under
denaturing conditions (data not shown). A free sulfhydryl is present in
at least one other members of the Ig superfamily. CD8
has a single
IgV domain with three Cys residues, one of which was in the reduced
state in the crystal structure (Leahy et al., 1992). As in
-agglutinin, all Cys residues are buried in the interior of the
domain.
There is at least one
other N-glycosylated residue in
-agglutinin
. Endo H treatment converts the
21-kDa Arg-C digestion fragment to the 16-kDa fragment, so
Asn
, Asn
, or Asn
must be
glycosylated. The 5-kDa size difference would accommodate less than 30
carbohydrate residues, the equivalent of a single N-linked
chain in yeast (Hames, 1990; Klis, 1994). The glycosylated residue is
probably Asn
, because it is the only Asn-Xaa-Thr sequence
in this part of the molecule, and we have repeatedly failed to obtain
the sequence from this residue (peptides T1, DT1, and DS2).
O-Glycosylation is common for cell surface proteins, with O-linked oligosaccharides often in Ser/Thr-rich regions. Many
known cell surface O-glycosylated proteins, like low density
lipoprotein receptor (Goldstein et al., 1985),
decay-accelerating factor (Reddy et al., 1989), the
muscle-specific isoform of N-CAM (Walsh et al., 1989), and
yeast Gas1p/Gpp1p (Gatti et al., 1994) contain clusters of
Ser/Thr enrichment segments in the regions proximal to the membrane.
Expression of low density lipoprotein receptor and decay-accelerating
factor in mutant cells defective for O-glycosylation result in
a rapid cleavage of the binding region from the extracellular surface
(Kozarsky et al., 1988; Reddy et al., 1989). In
-agglutinin, the region rich in hydroxy amino acids extends from
about residue 300 (the F-strand Cys of domain III) to the COOH-terminal
signal for GPI anchor addition at approximately residue 627 (Lipke et al. 1989; Kodukula et al., 1993; Wojciechowicz et al., 1993).
-Agglutinin expressed in the presence
of tunicamycin, which inhibits N-glycosylation, reacts with
ConA, indicating the presence of O-linked mannose residues
(Terrance et al., 1987). This binding is not due to reaction
with modified GPI anchors, because truncated fragments of
-agglutinin lacking the GPI anchor signal also bind ConA (Terrance et al., 1987; Hauser and Tanner, 1989; Wojciechowicz et
al., 1993). The pattern of O-glycosylation in
-agglutinin
indicates that there are
multiple sites glycosylated after residue 282, which is at the
NH
-terminal end of the E strand of domain III. O-Glycosylation is predicted to continue through the
Ser/Thr-rich sequence which extends to about residue 620. Six
additional Asn-Xaa-Thr sequences in this Ser/Thr-rich region are
probably glycosylated based on molecular size of truncated
-agglutinin species before and after treatment with endo H
(Wojciechowicz et al., 1993). This highly glycosylated region
(residues 300-627) would form a ``stalk'' holding the
active site out from the wall surface, consistent with electron
micrographs (Jentoft, 1990; Cappellaro et al., 1994). Finally,
the stalk is predicted to continue to the COOH-terminal GPI anchor,
which is processed in vivo to allow linkage to cell wall
polysaccharides (Lu et al., 1994, 1995).
A drawing of
-agglutinin shows three sequential Ig domains, with N-glycosylation in sites common for such domains (Fig. 10). The binding site includes residues in domain III and
at least one other region. The disulfide bonds between domains I and II
and between the A and F strands in domain III are unique among Ig
domains, and there are two free sulfhydryls in domain III. Following
the Ig domains, there is a heavily N- and O-glycosylated stalk sequence, and the COOH-terminal of the
protein is initially GPI anchored. Therefore
-agglutinin has a
structure that recapitulates many of the features of cell adhesion
proteins in multicellular eukaryotes.