(Received for publication, June 25, 1996, and in revised form, October 15, 1996)
From the Max-Planck-Institut für Psychiatrie,
Abteilung Neurochemie, D-82152 Martinsried, Germany and the
§ Institut für Molekularbiologie und Medizinische
Chemie, Otto-von-Guericke-Universität,
D-39120 Magdeburg, Germany
Frog integumentary mucin B.1 (FIM-B.1) contains
various cysteine-rich modules. In the past, a COOH-terminal "cystine
knot" motif has been found that is similar to von Willebrand factor; this region is generally known to be responsible for dimerization processes. Furthermore, a "complement control protein" motif is present as an internal cysteine-rich domain in FIM-B.1. We characterize here the missing 75% toward the NH2 terminus of the
FIM-B.1 precursor by molecular cloning. Analogous to prepro-von
Willebrand factor, four elements with considerable similarity to
D-domains are present (i.e. D1-D2-D-D3). These domains
have been described as essential for the multimerization of von
Willebrand factor. Thus, the general structure of FIM-B.1 resembles
that of the human mucin MUC2 as well as prepro-von Willebrand
factor; these three molecules at least seem to share common structural
elements allowing similar multimerization mechanisms.
During phylogeny, mucus gels have been conserved as the essential extracellular matrices that protect delicate epithelial surfaces in many ways (1, 2). Mucins have been established as the molecules that primarily determine the defined rheological and viscoelastic properties of these gels. The key step in the formation of such a three-dimensional complex network is the ordered aggregation of linear rodlike monomeric mucins (3). The stiff and extended conformation of the monomers is the result of highly O-glycosylated repetitive serine/threonine-rich regions (4). In contrast, aggregation to multimers is achieved via cysteine-rich modules. Two models would describe such a network: (i) a cross-linked network model and (ii) an entangled network model. However, only the latter fulfills the physicochemical criteria that define a dynamic mucus gel (5).
Due to technical problems, complete molecular structures of mucins are still rare. For example, more than seven human MUC genes (6), bovine and porcine salivary mucins (7, 8) as well as three frog integumentary mucins (FIM-A.1,1 FIM-B.1, FIM-C.1) have been at least partially characterized (9). The latter represent typical extracellular mosaic proteins with astonishing structural similarities to other peptides and proteins (10). FIM-B.1 from Xenopus laevis certainly shows the most interesting molecular architecture. However, only about 25% from the COOH-terminal portion of the sequence has been reported thus far. In addition to a variable number of O-glycosylated type B repeats responsible for polydispersities (11), FIM-B.1 contains at least two different cysteine-rich modules: (i) internally, the "complement control protein motif" (CP, also known as "Shushi structure" or "short consensus repeat" (SCR) (12)) and (ii) a COOH-terminal region with homology to von Willebrand factor (vWF) (13). In vWF, this part is responsible for dimerization (14). A motif spanning 11 cysteine residues named "cystine knot" has been proposed as the active site of the latter (15) and is responsible for dimerization of certain cytokines as well (16). Subsequently, the cystine knot motif has also been found in a variety of other mucins, e.g. bovine salivary mucin (7), porcine salivary mucin (8), MUC2 (17), rMUC2/rMLP (18), and MUC5 (19) as well as the human sublingual gland mucin MG1 (20). Recently, dimerization of MUC2 has been reported (21), and for porcine salivary mucin it has been clearly shown that it forms dimers via its COOH-terminal domain (22). Thus, this motif is now considered to trigger homodimerization as an early event in the biosynthesis of many mucins.
We report here the full-length sequence of the FIM-B.1 precursor starting with the signal sequence as deduced from cDNA cloning.
Isolation of mRNA from the skin of a single adult X. laevis (purchased from the Herpetological Institute, Dr. W. de Rover, Belgium), cyclic thermal amplification via the polymerase chain reaction (PCR), and purification and sequencing of plasmid DNA as well as computerized analysis and homology searches have been described previously (12).
In order to elongate the incompletely known nucleotide sequence
encoding the COOH-terminal portion of FIM-B.1 toward the 5-end, a
multistep amplification procedure (RACE protocol) has been employed (23). Starting from the region encoding the CP motif (12), the
oligonucleotide SCR1 d(CACAGCTTGGTGTATTTC) was used as a specific primer for cDNA synthesis. After dC tailing, amplification occurred with Taq polymerase and a combination of oligonucleotides
REP7 d(CCCTCGAGAATTCGGATC
) and PCR5
d(CCGGATCCTCGAGAATTCTAGA(G)14). The underlined region is
complementary to part of the CP motif in FIM-B.1 (12). After subcloning
the products into the BamHI/EcoRI sites of
pBluescript-II/SK
(Stratagene), clone pS5R7-2 was
obtained. Further cDNA clones were generated in a similar way by a
multistep amplification procedure using a set of specific primers
toward the 5
-end (Fig. 1C).
Based on this sequence information obtained from relatively short cDNA clones, long overlapping cDNA clones were generated by PCR and subsequently analyzed (Fig. 1B).
Fig. 2 represents the cDNA sequence obtained
from a set of overlapping clones using the RACE protocol toward the
very 5-end. The deduced amino acid sequence encodes the amino-terminal
portion of the FIM-B.1 precursor, starting with the signal sequence
until it reaches the CP motif (which served as the anchor for the first specific oligonucleotide used). However, the CP motif cloned here does
not show the identical sequence as characterized previously (12). The
two CP motifs differ in precisely two point mutations also changing two
amino acid residues: K to E and A to G. These two mutations have been
confirmed to be highly specific by the analysis of a series of
independent cDNA clones. In order to distinguish between these two
CP motifs, we designated them SCR (12) and SCR* (sequence
from Fig. 2). Thus, we assumed that the FIM-B.1 precursor could
theoretically contain at least two CP motifs that differ slightly.
To test this hypothesis, oligo(dT)-primed cDNA from X. laevis skin was amplified with Taq polymerase
using the oligonucleotides FIM8
d(CCCGGATCCTCGAGAATTC) and SCR4
d(CCCGGATCC
). The underlined part in FIM8
represents positions 3979-3995 from Fig. 2, and SCR4 is complementary
to the SCR motif (12) and does not recognize SCR*. After
subcloning the PCR products into the BamHI/EcoRI
sites of pBluescript-II/SK
, clones pF8S4.2-5, -7, and -8 were characterized (Fig. 3). All three clones indeed
contained two different CP motifs (i.e. SCR* and
SCR). However, the clones were not identical but differed in their
repetitive parts by specific insertions and deletions. Such
polydispersities are typical of FIM-B.1 and have been shown to result
from alternative splicing of repetitive cassettes (11).
The combined amino acid sequences deduced in Figs. 2 and 3 now
complete the missing amino-terminal portion of FIM-B.1. Together with
the published COOH-terminal part (12, 13), the FIM-B.1 precursor
consists of at least about 2700 amino acid residues (Fig. 4) encoded by
a polydisperse mRNA population with a length of more than 8.3 kilobases. This is in fairly good agreement with Northern blot analysis
which revealed a smear of up to 10 kilobases (24). As indicated in Fig.
4, the difference is probably due to the existence of a
polydisperse cluster of additional CP motifs and repetitive
highly O-glycosylated regions (25). In particular, multiple
CP motifs could represent potential anchor points that non-covalently
cross-link mucin subunits.
The amino-terminal portion of the FIM-B.1 precursor presented here can
be clearly divided into separated domains. As is typical of secretory
proteins, the sequence starts with a hydrophobic signal sequence that
is probably cleaved off after alanine 19 (Fig. 2). Then a mainly basic
repetitive region follows with the motif PAKGG. For this glycine-rich
(until glycine-77) sequence a -turn structure can primarily be
expected. Similar terminal sequences have been detected in
cytokeratins (26) and synapsins (27). Starting with proline 78, the
pattern changes drastically to a threonine-rich sequence also
containing proline and alanine. Such a composition is typical of mucins
(2, 4); however, the acid residues flanking some threonine residues at
positions
1 probably diminish their potential to become
O-glycosylated (28). Similarly, as shown previously for type B repeats
(12), analysis of further cDNA clones revealed polydispersities by
insertion of a variable number of tandem repeats with the motif
PAATDSET after amino acid 122 (25). Thus, the sequence given in Fig. 2
represents a minimal length variant within a polydisperse
population.
Certainly one of the most interesting domains in FIM-B.1 is the
cysteine-rich region between positions 172 and 1330 (Fig. 2) because it
reveals pronounced similarities with pro-vWF (29). In particular, three
subdomains with internal homology (named D1, D2, and D3) as well as a
truncated version located between D2 and D3 (designated as D) can be
recognized. This set of D-domains has been reported to be obligatory
for multimer assembly of pro-vWF (30). This biosynthetic event occurs
unusually late in trans-Golgi and post-Golgi acidic compartments (30)
and seems to be independent of dimerization in the endoplasmic
reticulum (31). Furthermore, multimerization via the D1 and D2 domains
plays an important role in storage granule formation (32). Small vWF
multimers are secreted constitutively, whereas large multimers are
packed into Weibel-Palade bodies and then released via the regulated
pathway (30). An analogous domain structure (as in vWF and FIM-B.1) has
also been reported for the amino-terminal part of MUC2 (33) (which also forms multimers (34)). As shown in Fig. 5, nearly all
cysteine residues are conserved in these three molecules. However, the general similarity of the sequences is not particularly pronounced. The
two most conserved continuous stretches of amino acid residues are
regions in the D1 and the D3 domain with the sequences
T
and V
N, respectively. Remarkably,
the vicinal cysteine residues in the underlined CGLCG motifs are
similar to those at the active site of disulfide isomerase, and they
have been proposed to play a role in multimerization of pro-vWF (35).
In the mature vWF (after cleavage of its pro-sequence (i.e.
at the D2/D
junction; see Fig. 5)), homophilic intersubunit disulfide
bonds have been determined within the D3 domain at positions Cys-379
(36), Cys-459, Cys-462, and Cys-464 (37). The homologous cysteine
residues are conserved in FIM-B.1 and MUC2 (indicated by
triangles in Fig. 5). Whether proteolytic processing of the
FIM-B.1 precursor occurs similarly to pro-vWF is not known yet. A
potential cleavage site would be next to the D2/D
junction between
positions 888 and 889 of the FIM-B.1 precursor (sequence SRKR
T; Fig.
5) liberating a polydisperse mucin-like pro-peptide. This sequence is
close to the equivalent position in pro-vWF and also remarkably
resembles the known processing site in the vWF precursor (sequence
RSKR
S; Fig. 5). It is noteworthy that proteolytic cleavage of
pro-vWF is not essential for multimer formation (38). Taken together, many mucins seem to mimic the covalent stepwise aggregation of vWF to
linear clusters. Molecular structures supporting such a model are now
available for MUC2 (33), rMUC2 (39), FIM-B.1, and obviously also
porcine salivary mucin (22). Furthermore, partial sequences of MUC5
(19), bovine salivary mucin (7), and MG1 (20) indicate that these
mucins could follow the same common hypothetical scheme. Also, the
sperm membrane protein zonadhesin (40) containing a mucin-like domain
and a cluster of D-domains would be a candidate for a similar molecular
mechanism. However, based on the observation that vWF D-domains
bind heparin (41), non-covalent interactions of mucin D-domains with
sulfated carbohydrates should also be taken into consideration. Such a
lectin bond-mediated polymerization model has already been proposed in
the past for mucus gels (42).
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) Y08296[GenBank].