(Received for publication, May 24, 1995; and in revised form, June 14, 1995)
From the
An unusual polymorphic protein family of nine or more variants has been isolated from the byssal adhesive plaques and foot of the marine mussel Mytilus edulis. In accordance with established terminology, the family is referred to as M. edulis foot protein 3 or simply Mefp-3. Variants of Mefp-3 have molecular masses of about 6 kDa, isoelectric points greater than 10.5, and an amino acid composition dominated by six amino acids: glycine, asparagine, 3,4-dihydroxyphenylalanine (Dopa), tryptophan, arginine, and an unknown basic amino acid. The latter has been isolated and identified as 4-hydroxyarginine using fast atom bombardment mass spectrometry and appropriate standards. The primary structure of variant Mefp-3F has been determined by peptide mapping using automated Edman sequencing in combination with fast atom bombardment and matrix-assisted laser desorption ionization mass spectrometry: ADYYGPNYGPPRRYGGGNYNRYNRYGRRYGGYKGWNNGWNRGRRGKYW where Y represents Dopa, and R represents hydroxyarginine. Notably, the 4 occurrences of RY are marked by a resistance to trypsin digestion. Although the conversion of tyrosines to Dopa is essentially complete, hydroxylation of arginines varies between 40 and 80%. In contrast to other mussel adhesive proteins such as Mefp-1 and -2 which have large numbers of highly conserved, tandemly repeated peptide motifs, Mefp-3 has only short sporadic repeats. The specific function of Mefp-3 in byssal adhesion is unknown.
The adhesion of marine mussels to underwater surfaces is of
scientific and technological interest because it is strong, durable,
opportunistic, and not undermined by the presence of water (Waite,
1992). Since mussel adhesion is mediated by the byssus, an external
bundle of quinone-tanned threads tipped with flattened adhesive pads or
plaques, much recent research has focussed on characterizing those
byssal proteins in closest proximity with the substrate surface.
Attempts to extract soluble adhesive molecules directly from the
plaques have met with little success due to their highly cross-linked
nature. Recently, however, Diamond(1993) reported that the plaques
deposited by mussels transferred to sea water at 4-8 °C had a
greater proportion of extractable protein than those at 15-18
°C, thus suggesting that cross-linking might be
temperature-dependent. At least four families of plaque proteins (6,
46, 70, and 120 kDa) have been detected following extraction and
polyacrylamide gel electrophoresis in acid-urea (Diamond, 1993). All
contain the post-translationally modified amino acid L-3,4-dihydroxyphenylalanine (L-Dopa). ()
The polyphenolic protein known as Mytilus edulis foot protein 1 (Mefp-1), was the first of the Dopa-containing byssal precursors to be characterized (Waite and Tanzer, 1981; Filpula et al., 1990). It has a mass of 120 kDa and consists of tandemly repeated decapeptides each containing 2 residues of lysine, 1-2 residues of Dopa (Waite, 1983; Laursen, 1992), 1-2 residues of trans-4-hydroxyproline, and 1 residue of trans-2,3, cis- 3,4-dihydroxyproline (Taylor et al., 1994). Because of its highly adsorptive and surface-active behavior in vitro (Notter, 1988; Olivieri et al., 1992; Hansen et al., 1994), Mefp-1 has long been regarded as a key ingredient of mussel adhesion. Unfortunately, confirmation of this role has been dogged by the extreme insolubility of Mefp-1 in byssus (Diamond, 1993; Rzepecki et al., 1992). Although the presence of Mefp-1 in plaques has been demonstrated by immunohistochemical localization (Benedict and Waite, 1986), recent evidence suggests that the protein may in fact be distributed over the entire byssus as a natural coating or lacquer (Rzepecki et al., 1992). Of the other proteins known to be present in adhesive plaques, the recovery of three is much improved in plaques formed at cold temperatures. Mefp-2 (46 kDa) is a major plaque-specific constituent that consists of 11 tandem repeats of an epidermal growth factor motif 37-41 residues in length with the Dopa modifications limited to the non-epidermal growth factor N and C termini of the protein (Rzepecki et al., 1992; Inoue et al., 1995). The remaining two proteins, Mefp-3 and Mefp-4, are also prominent in cold-shocked plaques but until recently no other detailed information was available.
In this paper, we report on a unique 6-kDa family of Mefp-3 proteins that is synthesized and stockpiled in the mussel foot and then specifically deposited into the adhesive plaques of the byssus. These polypeptides resemble the other byssal precursor proteins in basicity and Dopa content, but are unprecedented in containing high levels of a new post-translational modification of arginine, namely, 4-hydroxyarginine. The latter contributed to the complexity of primary structure determination which was only solved with the application of matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) (Karas and Hillenkamp, 1988).
MALDI-TOF experiments were performed using a Vestec
VT2000 LD-TOF (linear) mass spectrometer (Vestec Corp., Houston, TX).
The MALDI matrix was prepared by dissolving
-cyano-4-hydroxycinnamic acid (10 mg/ml) in 50% acetonitrile. The
Mefp-3 protein or peptides derived thereof were dissolved in this
matrix solution to give a final concentration between 1 and 10
pmol/µl. About 1 µl of this solution was applied to the target
plate and allowed to evaporate. The sample spots were irradiated using
a Laser Science N
laser (LSI, Inc., Cambridge, MA). The
laser (337 nm) has a pulse width of 8 ns and was operated at a
repetition rate of 5 Hz. MALDI ionization generates protonated singly
and doubly charged ions for the Mefp-3 protein (mostly singly charged
ions for peptides) which were accelerated using either 30 or 35 kV
accelerating voltage. The resolution was about 1:300 which was
sufficient to allow mass assignment of the major peaks due to the
different hydroxylation states of the Mefp-3 protein.
The Mefp-3F
protein or peptides derived from enzymatic digestion of Mefp-3F
(HPLC-purified) were dissolved in either 100 mM ammonium
acetate (pH 4.0) or 100 mM ammonium citrate (pH 5.5) and
digested with carboxypeptidase P (Boehringer Mannheim). Alternatively,
aminopeptidase M (Boehringer Mannheim) in 50 mM sodium
phosphate (pH 7.0) buffer was used for N-terminal sequence information.
The reactions were performed at room temperature with the enzyme to
substrate ratio between 1:10 and 1:100 by weight. Aliquots from the
reaction solutions were taken at timed intervals and dissolved in
-cyano-4-hydroxycinnamic acid matrix solution before being
analyzed by MALDI-TOF mass spectrometry.
Figure 1: Polyacrylamide gel electrophoresis of mussel byssus and foot-derived proteins. A, byssus-derived proteins on acid-urea gels stained for protein (CB) and redox cycling (NBT): P, 23 µg of byssal plaque extract; T, 25 µg of byssal thread extract; M3, 9 µg of purified foot-derived protein 3 for comparison. B, foot (phenol gland)-derived proteins on acid-urea gels: AA, 5% acetic acid-extracted proteins (41 µg); UA, 5% acetic acid- and 8 M urea-extracted proteins (37 µg); PCA, perchloric acid-precipitated proteins (11 µg). Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS) and isoelectric focussing (IEF) of Mefp-3F (panels at right): apparent molecular mass determination by SDS-PAGE (standards used were rabbit phosphorylase b (92 kDa), bovine serum albumin (68 kDa), ovalbumin (43 kDa), carbonic anhydrase (31 kDa), soybean trypsin inhibitor (20 kDa), and lysozyme (14 kDa)); isoelectric focussing-PAGE of Mefp-3 in the range pH 7-10. Numbers below the lanes denote aliquots taken from HPLC fractions in Fig. 2B.
Figure 2: A, C-8 reversed phase separation of S-alkylated adhesive plaque-derived proteins. Inset, acid-urea gel electrophoresis of protein aliquots taken from fractions 27-45. Fraction 32 (arrow) was sequenced. B, resolution of dialysis-precipitated Mefp-3 by C-18 reversed phase HPLC. Sample load, 1.5 ml; flow rate, 1 ml/min. Stationary buffer, aqueous 0.1% trifluoroacetic acid; mobile buffer, 0.1% trifluoroacetic acid in acetonitrile. Full-scale absorbance at 280 nm is 0.1 (A) and 0.5 (B), respectively. Each graduation in % Acetonitrile equals 10%. Inset, acid-urea polyacrylamide gel of protein aliquots (5 µg) taken from fractions 30-45 under the elution profile. Electrophoretic variants are denoted A to J. Fractions pooled for bulk digestion and Edman sequencing are enclosed by a bracket. Fraction 39 (arrow) was selected for micro Lys-C digestion followed by MALDI analysis.
Figure 3:
MALDI-TOF mass spectrum of the major
component of Mefp-3 variant F (fraction 39 in Fig. 2B, inset). The peaks labeled STD (M+H) at m/z 8568.9 and STD
(M+2H)
at m/z 4283.4 are due
to the singly and doubly charged ions, respectively, of the internal
calibrant bovine ubiquitin. The inset represents an expansion
of the singly charged Mefp-3F region of the mass spectrum which reveals
the different hydroxylation states of the protein. The observed m/z value of the (M + H)
ion of the most highly
hydroxylated form is 6135.2 which agrees well with the (M +
H)
value 6136.31 calculated for the final amino acid
sequence.
Figure 4:
Purification of hydroxyarginine (R*) from
hydrolysates of Mefp-3 by ion exchange. Beckman AA 20 0.4 mm
column eluted with sodium citrate pH 6 at 0.3 ml/min. Full scale
absorbance is 2.0 at 280 nm; deflection for nonaromatic amino acids
probably reflects changes in the index of refraction (A) and
gel filtration (B) chromatography using Bio-Gel P-2 (45
1.5 cm) eluted with 0.4 M acetic acid at room
temperature. Flow rate was adjusted to 0.5 ml/min. Fractions (not
collected until after 80 ml was eluted) were assayed for conductivity
and by amino acid analysis.
Figure 5: CID mass spectra using FAB ionization of arginine (A), hydroxyarginine derived from Mefp-3 (B), and authentic 4-hydroxyarginine (C).
Figure S1: Scheme 1.
Figure 6: Resolution of Mefp-3F-derived peptides by C-18 reversed phase HPLC. Sample load, 50 µl; flow rate, 1 ml/min. Stationary buffer, aqueous 0.1% trifluoroacetic acid; mobile buffer, 0.1% trifluoroacetic acid in acetonitrile. Full scale absorbance at 280 nm is 0.2. A, trypsin in 0.1 M Tris-ascorbate, pH 7.5. B, trypsin in 0.15 M borate, pH 8.0. C, endoproteinase Lys-C in 0.1 M Tris-ascorbate, pH 7.5.
The small size of peptides produced as well as the possibility of microheterogeneity severely limits the effectiveness of a classical Edman approach here. Thus, mass spectrometry was employed to clarify the sequence of peptides near the C terminus as well as confirm the overall suggested structure.
Figure 7:
MALDI-TOF mass spectrum of an aliquot from
an endoproteinase Lys-C digest of the major component of Mefp-3F. The inset is an expansion of the (M + H) region of LC-2 showing peaks that are separated in mass according
to hydroxylation state (the individual peaks in the cluster are 16 mass
units apart).
The sequence
of the peptide LC-3 was determined from the MALDI-TOF mass spectra of
separate carboxypeptidase P and aminopeptidase M digests. These results
verify the partial peptide sequence deduced by Edman degradation and,
more importantly, resolve the Edman ambiguity in LC-3 beyond the tenth
residue. After separation of LC-3 from LC-2 by reversed phase HPLC, the M of the most highly hydroxylated component of
LC-3 was found to be 1605.7 (using MALDI mass spectrometry with
internal calibration). Fig. 8shows the MALDI-TOF mass spectrum
of peptide LC-3 incubated with carboxypeptidase P for 30 s. The
C-terminal lysine has already been released by this time, and the mass
spectrum now contains four major peaks: (M + H)
= 1478.6, 1421.4, 1249.3, and 1077.1. The mass differences
128.1, 57.2, 172.1, and 172.2 correspond to the consecutive loss of K,
G, R, and R, respectively. Further digestion with
carboxypeptidase P finally liberates glycine (mass difference of 57.2)
establishing GRRGK as a partial C-terminal sequence for peptide
LC-3. A similar experiment with aminopeptidase M successfully cleaved
LC-3 consecutively to the seventh residue (GWNNGWN). This sequence is
part of the tryptic peptide GWNNGWNR (T-25 and TB-9 in Table 2) which strongly suggests that the eighth residue of LC-9
is R. It is of interest to note that this peptide was the only
one in the tryptic digest of the extremely hydrophilic Mefp-3F protein
that was sufficiently hydrophobic to ionize well enough by FAB to make
it possible to acquire a CID spectrum from which its sequence could be
deduced (Biemann, 1990). More importantly, the M
of LC-3 together with the molecular weights of the partial
C-terminal and N-terminal sequences provided enough information to
confirm that R is indeed at position 8 resulting in the sequence
GWNNGWNRGRRGK for LC-3.
Figure 8:
MALDI-TOF mass spectrum acquired after 30
s of carboxypeptidase P digestion of peptide LC-3 from Mefp-3F. The
mass differences indicate that RRGK are the last four C-terminal
residues of this peptide (undigested LC-3 has (M + H) = m/z 1606.7).
Similarly, aliquots from the
digestion of peptide LC-2 with carboxypeptidase P were also analyzed by
MALDI-TOF-MS. The mass spectrum of the digest solution after 30 min is
shown in Fig. 9. Each peak (highest hydroxylated form) is
labeled with the amino acid(s), the mass(es) of which correspond to the
mass differences observed from cluster to cluster. Following the
initial cleavage of K and Y are two nonconsecutive cleavages of
GG and RY, respectively. These data confirmed the sequence
derived (peptides T-5, T-6, TB-2, and TB-3 in Table 2) by the
Edman method. The remaining five clusters of ions define the C-terminal
sequences of peptides T-15 and TB-5 (Table 2). The measured (M
+ H) values of the most hydroxylated forms are m/z 3238.1, m/z 3181.6, m/z 3001.9, m/z 2830.3, m/z 2716.0, which correlate well with the
calculated values 3238.24 (-R), 3181.19 (-G), 3002.1 (-Y), 2829.82 (-R), and 2715.72(-N), respectively.
Further digestion of LC-2 with carboxypeptidase P consecutively cleaved
six more amino acids to give the following C-terminal sequence for
LC-2: GNYNRYNRYGRRYGGYK.
Figure 9:
MALDI-TOF mass spectrum of peptide LC-2
incubated with carboxypeptidase P for 30 min. The most highly
hydroxylated peak in each peptide cluster is labeled with the amino
acid(s) that corresponds to the loss in mass (i.e. molecular
weight minus HO) of that amino
acid.
Endoproteinase Lys-C digestion of Mefp-3F based on the final amino
acid sequence is expected to produce three peptides LC-1, LC-2, and
LC-3 (Fig. 10). The peptide LC-3 has a calculated M value of 1605.7 which agrees well with that
measured for the (M + H)
ion (m/z 1606.1) in Fig. 8using external calibration. The molecular
weight of LC-1 has already been mentioned while the observed (M +
H)
ion for the most hydroxylated component of LC-2 (Fig. 7C) has a (M + H)
of m/z 4182.9 which correlates well with the calculated value of
4183.25.
Figure 10: Sequence of Mefp-3F showing overlap of endoproteinase Lys-C (LC) and tryptic (TB) peptides. Dopa is denoted by Y and hydroxyarginine by R. ? denotes those sequences revealed only by MALDI-TOF mass spectra. Inset shows structure of RY suggesting the H-bond that might protect the underlying peptide bond from trypsin attack.
The C-terminal sequence KYW was confirmed by MALDI
mass spectra of the carboxypeptidase P digest of Mefp-3F itself.
Further digestion of Mefp-3F with carboxypeptidase P resulted in
nonconsecutive losses (i.e. more than one residue)
corresponding to RG followed by loss of RG. These
data supported the amino acid sequence deduced from the mass
spectrometric analysis of LC-3 and also provided the overlap
information necessary to place LC-3 before LC-1 at the C terminus.
In conclusion, all these data together lead to the amino acid
sequence shown in Fig. 10for Mefp-3F. The m/z value
calculated for the (M + H) ion of the most highly
hydroxylated form of Mefp-3F is 6136.31 which corresponds well to the
experimentally determined value (m/z 6135.2) mentioned
earlier. The sequence of other variants, although incomplete, is
expected to show subtle variations from Mefp-3F. This is hardly
surprising in view of the similar amino acid compositions.
Mature byssal adhesive plaques are ordinarily intractable to extraction due to extensive protein cross-linking. When freshly secreted or perturbed by cold shock, however, they contain a small number of extractable proteins. One of these, P-3 and its foot-derived precursor, Mefp-3, are unusual in containing high levels of two intriguing post-translationally modified amino acids: Dopa and 4-hydroxyarginine. Dopa-containing proteins are widely distributed throughout the animal kingdom including organisms from the following animal phyla: Chordata, Mollusca, Annelida, Platyhelminthes, and Cnidaria (reviewed by Waite(1990)). By and large, the proteins serve as precursors for natural adhesives and varnishes that undergo a curing process known as quinone-tanning. The functional effect of incorporating Dopa into the primary structure of proteins is 2-fold: Dopa adsorbs tenaciously to surfaces (Olivieri et al., 1992), and, following catalytic conversion to quinones by catecholoxidase, Dopa mediates protein cross-linking. The economy of serving both a cross-linking and surface coupling function has been noted previously (Waite et al., 1992).
4-Hydroxyarginine has been previously detected only as a free amino acid in the seeds of vetch (Bell and Tirimanna, 1963), lentils (Sulser et al., 1975), and in tissues of sea anemones (Makisumi, 1961) and sea cucumbers (Fujita, 1959), but never as a part of the primary structure of proteins. Neither its function nor the reason for its incomplete conversion from Arg in Mefp-3 are known at this time. There is one unique feature conferred by hydroxyarginine that is apparent from peptide mapping studies with trypsin. While the Arg-Dopa bond is cleaved by trypsin in Tris-ascorbate, it is not when Arg is converted to HOArg. The lability of HOArg-X linkages to trypsin when X is any primary amino acid other than Dopa suggests some interaction between hydroxyarginine and Dopa, e.g. hydrogen bonding that blocks enzyme access to the peptide bond (Fig. 10). Four HOArg-Dopa pairs exist in Mefp-3F. Like Dopa, arginine and presumably its hydroxylated derivative are also an asset for the molecular interactions indispensible for adhesion: Arg can be a hydrogen donor in as many as 5 hydrogen bonds in which the acceptors are usually backbone carbonyl groups (Borders et al., 1994). It is also involved in planar parallel stacking with aromatics that does not impede the hydrogen bonding capacity of arginine (Flocco and Mowbray, 1994). For these reasons, perhaps, Arg-rich proteins bind polyphenols avidly and are rather readily insolubilized by them (Meek and Weiss, 1979). Although no known sequence matches can be found for Mefp-3F, the RG-rich character of the protein is reminiscent of some RNA binding proteins (Burd and Dreyfuss, 1994).
Unlike the other two byssal precursors Mefp-1 and-2, Mefp-3 does not consist of long stretches of tandemly repeated peptides. There are, however, some suggestions of repetition: GWNNGWNR (TB 9) and RYGG (TB 3 and TB 5). Like Mefp-2, Mefp-3 occurs only in the byssal adhesive plaques. Future studies should address what specific role it plays there. Byssal adhesive plaques contain at least 4 different morphological domains when examined by electron microscopy: e.g. the 5-µm-thick lacquer on the plaque surface facing the sea water, the microcellular foam of the plaque interior, the primer mediating the interface between the foam and foreign surface, and the fibers from the thread embedded in the plaque (Benedict and Waite, 1986; Tamarin et al., 1976). So far, there is only enough evidence to correlate 2 of these with proteins, i.e. Mefp-1 with the lacquer and a short chain collagen with the fibers (Qin and Waite, 1995). P-3 or Mefp-3 may be associated with one or more of the other functional roles.