Evolution of the Large Secreted Gel-Forming Mucins

Jean-Luc Desseyn3,*{dagger}, Jean-Pierre Aubert*{ddagger}, Nicole Porchet*{ddagger}§ and Anne Laine*

*Unité 377 INSERM, Lille, France;
{dagger}Department of Pharmacology, University of Washington;
{ddagger}Laboratoire de Biochimie et de Biologie Moléculaire de l'Hôpital C. Huriez, CHRU de Lille, Lille, France; and
§Faculté de Médecine, Université de Lille II, Lille, France

Abstract

Mucins, the major component of mucus, contain tandemly repeated sequences that differ from one mucin to another. Considerable advances have been made in recent years in our knowledge of mucin genes. The availability of the complete genomic and cDNA sequences of MUC5B, one of the four human mucin genes clustered on chromosome 11, provides an exemplary model for studying the molecular evolution of large mucins. The emerging picture is one of expansion of mucin genes by gene duplications, followed by internal repeat expansion that strictly preserves frameshift. Computational and phylogenetic analyses have permitted the proposal of an evolutionary history of the four human mucin genes located on chromosome 11 from an ancestor gene common to the human von Willebrand factor gene and the suggestion of a model for the evolution of the repeat coding portion of the MUC5B gene from a hypothetical ancestral minigene. The characterization of MUC5B, a member of the large secreted gel-forming mucin family, offers a new model for the comparative study of the structure-function relationship within this important family.

Introduction

Mucus protects the underlying epithelium from chemical, enzymatic, and mechanical damage. It consists mainly of mucins, which are heterogeneous, highly glycosylated proteins produced from epithelial cells (Ho and Kim 1991Citation ). All mucins contain a central part which carries numerous oligosaccharide chains. This part, rich in Ser, Thr, and Pro, is composed of tandem repeats. The number of repeats and the amino acid (aa) sequence of each repeat depend on the mucin gene. The central part is flanked at both ends by unique domains with aa composition different from that of the repeat domain.

Sequences of the mucin cDNAs are rarely full-length because of the highly repetitive structure and the extremely large size of some mucin messengers. To date, eight human mucin genes, MUC1MUC7 (including MUC5AC and MUC5B) have been well characterized, and each mucosa or secretory epithelium expresses a characteristic pattern of mucin genes. Mucins are usually subdivided into two groups, the secreted mucins (gel-forming and non–gel-forming) and the membrane-anchored mucins. The second group consists of the two large mucins MUC3 and MUC4, containing EGF-like motifs, and the small mucin MUC1. MUC6, MUC2, MUC5AC, and MUC5B are the secreted gel-forming mucins, and their four genes are contained within a single 400-kb genomic DNA fragment on chromosome 11 band p15.5 (Pigny et al. 1996aCitation ). At least MUC2, MUC5AC, and MUC5B have a common ancestor (Desseyn et al. 1998aCitation ) and define a subclass of mucins. cDNA sequences flanking the central part of this subclass of human mucins (MUC2, MUC5AC, MUC5B) and animal mucins (RMuc2, FIM-B.1, and PSM) code for cysteine-rich domains which are similar to the cysteine-rich domains that flank the three consecutive A (A1-A2-A3) domains of von Willebrand factor (vWF) (Probst, Gertzen, and Hoffmann 1997Citation ; Eckhardt et al. 1991, 1997Citation ; Gum et al. 1992, 1994Citation ; Xu et al. 1992Citation ; Ohmori et al. 1994Citation ; Lesuffleur et al. 1995Citation ; Desseyn et al. 1997a, 1998bCitation ; Joba and Hoffmann 1997Citation ; Li et al. 1998Citation ; van de Bovenkamp et al. 1998Citation ). These cysteine-rich domains are named D (D1-D2-D'-D3 upstream of the central part in mucins and upstream of the A1-A2-A3 domains in vWF and D4 downstream of the central part in mucins and downstream of the A1-A2-A3 domains in vWF), B, C, and CK (cystine knot; fig. 1 ). Partial genomic and cDNA sequences available for the other mucin genes showed that the 3' ends of MUC6 (Toribara et al. 1997Citation ), RMuc5ac (Inatomi et al. 1997Citation ), and BSM (Bhargava et al. 1990Citation ) are similar to the C-terminal regions of the three human mucins MUC2, MUC5AC, and MUC5B.



View larger version (51K):
[in this window]
[in a new window]
 
Fig. 1.—Schematic comparison of the mucins MUC5B, MUC5AC, MUC2, FIM-B.1, and PSM with the human von Willebrand factor (vWF). The four D domains, D1, D2, D3, D4, of vWF, are 351–375 aa in length. The D' domain is 89 aa in length. The three B domains and the two C domains of vWF are 34, 25, 24, 116, and 118 aa in length, respectively. The three A domains are about 220 aa in length. Ovals represent Cys-subdomains (108 aa, 10 Cys). Domains representing mucin-type domains (tandem repeats rich in Ser, Thr, and Pro) are hatched. The four mucin subdomains RI–RIV of MUC5B are followed by a mucin-type polypeptide called R-End (111 aa). The two Cys-subdomains of MUC2 (ovals) flank a nonpolymorphic mucin-type region. Variations in the inner cores (VNTR) of mucin regions are indicated by slashes. The mucin region of FIM-B.1 is interrupted at least three times by an SCR (short consensus repeat of about 60 aa) motif. The central regions of MUC5AC, MUC2, and PSM are composed of tandem repeats of 8, 23, and 81 aa in length, respectively. The number of Cys-subdomains within the central part of MUC5AC (ovals) has not yet been determined. The two rectangles with asterisks in MUC5B, MUC2, and MUC5AC represent the two motifs coded by two small exons in MUC5B which flank the central part of the three mucins

 
The CK domain is 85 aa in length in MUC5B and contains 11 cysteine residues. This domain is similar to the norrin (also called NDP, for Norrie disease protein) and its three-dimensional structure is similar to that of TGF-ß family proteins (Meitinger et al. 1993Citation ). It has been found that vWF (Voorberg et al. 1991Citation ), NDP (Perez-Vilar and Hill 1997Citation ), PSM (Perez-Vilar, Eckhardt, and Hill 1996Citation ; Perez-Vilar and Hill 1998aCitation ), and RMuc2 (Bell et al. 1998Citation ) form disulfide-linked dimers through their respective carboxyl-terminal domains, and vWF and PSM form disulfide-linked multimers through their respective amino-terminal D domains (Mayadas and Wagner 1992Citation ; Perez-Vilar and Hill 1998bCitation ). Following dimerization, multimerization, and glycosylation (N- and O-), most mucins are stored in secretory granules before being secreted on the luminal surfaces of epithelia as large oligomeric molecules.

A 108-aa subdomain, rich in cysteine residues (10 Cys) and called the "Cys-subdomain," has also been found interrupting several times the central repetitive parts of several mucins. This subdomain has been found seven times in MUC5B (Desseyn et al. 1997aCitation ), twice in MUC2 (Toribara et al. 1991Citation ), at least six times in MUC5AC (Meerzaman et al. 1994Citation ; Guyonnet Dupérat et al. 1995Citation ; Klomp, Van Rens, and Strous 1995Citation ), and several times in various homologous animal mucins (Hansson et al. 1994Citation ; Ohmori et al. 1994Citation ; Shekels et al. 1995Citation ; Turner et al. 1995Citation ; Inatomi et al. 1997Citation ). This Cys-subdomain has been well conserved throughout evolution. The Cys residues and some other amino acid residues are absolutely conserved (Desseyn et al. 1997b, 1998aCitation ), and one putative C-mannosylation consensus sequence (W-x-x-W; Krieg et al. 1998Citation ) is always found in the amino-terminal region of this domain. Because this subdomain is found in humans, mice, and rats, it is likely to play an important function, such as packaging or trafficking, for example, or it may interact with other components of the mucus.

Evolutionary studies of mucin genes can help to define their structure-function relationship and elucidate their individual biological roles. The genomic organizations of the two small mucin genes MUC1 and MUC7 have previously been reported (Lancaster et al. 1990Citation ; Bobek et al. 1996Citation ). We recently published the complete genomic sequence of the large secreted mucin MUC5B (Desseyn et al. 1997a, 1997b, 1998bCitation ). Another group reported a 39-aa-longer amino-terminal region (Offner et al. 1998Citation ) with an additional first exon (which we call 0) and a longer exon corresponding to our exon 1 (which we call 1') for the MUC5B gene. Comparison between the full length (15.8 kb) cDNA sequence and the corresponding genomic sequence (39 kb) revealed a total of 49 exons and 48 introns. The additional intron we call 0, between exon 0 and exon 1', is 2.4 kb long (unpublished results) and is a phase 1 intron. Since MUC5B is the only large mucin gene for which both complete cDNA and genomic sequences have been determined, it provides an excellent model for the investigation of mucin evolution.

Materials and Methods

The accession number of the central part (protein) of MUC5B is CAA70926. Precise boundaries of the different repeats have previously been determined (Desseyn et al. 1997bCitation ). Nucleotide sequences of CK domains are available from the EMBL database with the following accession numbers, and the sequences used for alignment and analysis are defined as follows: human NDP (hNDP): NM_000266, nt 571–792; mouse NDP (mNDP): X92394, nt 588–809; MUC5AC: AJ001402, nt 2917–3123; RMuc5ac: U83139, nt 3078–3284; MUC5B: Y09788, nt 9117–9970 (join 9172 to 9829); MUC2: M94132, nt 2680–2877; RMuc2: M81920, nt 2236–2433; BSM: M36192, nt 1524–1721; PSM: M61883, nt 3226–3423; FIM-B.1: J02910, nt 967–1164; Human vWF (hvWF): NM_000552, nt 8215–8418; MUC6: U97698, nt 1033–1242.

The multiple-sequence alignments were made using the CLUSTAL X program (Thompson, Higgins, and Gibson 1994Citation ) and are displayed by TREEVIEW (Page 1996Citation ).

Results and Discussion

Evolutionary History of the Unusual Large Central Exon of MUC5B
The entire mucin MUC5B gene has been cloned within two overlapping cosmid clones (Desseyn et al. 1997a, 1997b, 1998bCitation ). The mucin-type region (rich in Ser, Thr, and Pro) is composed of irregular tandem repeats of 29 aa (87 bp) in domains called RI–RV (Desseyn et al. 1997bCitation ). This mucin-type region is interrupted four times by two associated nontandemly repeated sequences (fig. 1 ). This allowed us to design primers to amplify overlapping cDNAs corresponding to the central part of MUC5B. cDNA cloning and sequencing, together with genomic subcloning and sequencing, allowed us to establish that the central part of MUC5B does not contain any intronic—unique or tandemly repeated—sequence. We then conclude that the central part of MUC5B is coded by a single unusually large exon of 10,713 bp, and it is then likely that other large mucins have their central parts coded by a single exon. Moreover, this suggests that the central part arose through internal duplications rather than through exon shuffling. The availability of both complete cDNA and genomic sequences of the central part of MUC5B (Desseyn et al. 1997bCitation ) now allows us to trace its evolutionary history. The deduced peptide is composed of three kinds of subdomains, Cys-subdomains (108 aa, 10 Cys), R subdomains (309–657 aa, composed of irregular tandem repeats of 29 aa), and R-End subdomains (111 aa). The first three Cys-subdomains (fig. 1 ) are followed by four super repeats. Each super repeat is composed of an R subdomain followed by an R-End subdomain and ending with a Cys-subdomain. Each R-subdomain is composed of 11 (the first two and the last one) or 17 tandem repeats of the irregular motif of 29 aa (87 bp) rich in Ser and Thr (fig. 2A ). The four R-End subdomains are rich in Ser and Thr and are very similar to each other, but they do not exhibit any similarity to any other sequence. Another R-subdomain of 23 irregular repeats of 29 aa follows the fourth super repeat. The presence of repeats at different levels suggests that the central part of the gene evolved by successive duplications. Multiple-sequence alignments and phylogenetic trees (figs. 2A and B ) together allow us to propose a model showing how the five tandem repeat blocks RI–RV have been made up. The repeat RV-14 has an extra pentapeptide, TTTPT (fig. 2A ), that probably arose through partial duplication of the RV-14 sequence. New multiple-sequence alignments and phylogenetic trees were then constructed without this pentapeptide. This shows that the block made up of the five repeats RV-3–RV-8 is highly similar to the block made up of the five repeats RV-18–RV-23. Further alignments without either RV-18–RV-23 or RV-3–RV-8 show that the block RIII-1–RIII-6 and the block RV-1–RV-6 are similar to blocks made up of the six repeats of other subdomains. Moreover, alignments using RIII blocks or RV blocks and phylogenetic trees (data not shown) show that blocks RIII/V-1–RIII/V-6 are more similar to blocks RIII/V-6–RIII/V-12 than to other blocks of six repeats. Thus, these analyses and the order of the subdomains that we defined within the central part of MUC5B allow us to propose a diagram showing the evolution of the repeated sequences (fig. 3 ). This scheme is in agreement with our previous model showing evolution from a single ancestral gene of the three human mucin genes MUC5B, MUC5AC, and MUC2 (Desseyn et al. 1998aCitation ). A part of an ancestral gene encoding a primordial Cys-subdomain triplicated to give rise to three Cys-subdomains (fig. 3a ). The resulting gene, composed of these three subdomains flanked by unique sequences rich in Cys and found in the vWF gene (see below), duplicated into the two ancestor genes of MUC5AC and MUC5B (Desseyn et al. 1998aCitation ). The primordial repeat of 87 bp of MUC5B duplicated several times to form a block composed of 11 irregular repeats of 87 bp, followed by a unique sequence rich in Ser and Thr coding for 111 aa. The ancestral super repeat (Cys/R/R-End subdomains) duplicated into two super repeats (fig. 3b ). Then, the first six repeats of the second block of 11 repeats duplicated (fig. 3c ). This event was followed by a further duplication en bloc of a region composed of the third Cys-subdomain, the block of 11 repeats, the R-End subdomain, the last Cys-subdomain, and the block of 17 repeats of 29 aa (fig. 3d ). Finally, the block composed of the first 11 repeats of 87 bp, the following R-End subdomain, and the Cys-subdomain duplicated en bloc (fig. 3e ). The five repeats RV-3–RV-7 duplicated into the two blocks RV-3–RV-7 and RV-18–RV-23 (fig. 3f ). A sequence encoding the pentapeptide TTTPT of RV-14 duplicated (fig. 3g ).



View larger version (65K):
[in this window]
[in a new window]
 
Fig. 2.—A, Sequence alignment of the 73 tandem repeats of MUC5B. Alignment gaps are indicated by dashes. Multiple alignments were performed with the CLUSTAL X program (Thompson, Higgins, and Gibson 1994Citation ). B, Deduced phylogenetic tree. The neighbor-joining tree is displayed by the TREEVIEW program (Page 1996Citation )

 


View larger version (41K):
[in this window]
[in a new window]
 
Fig. 3.—Evolution scheme of the MUC5B central part. Ovals represent Cys-subdomains. Each long rectangle denotes an irregular motif of 29 aa. Empty small rectangles represent R-End subdomains of 111 aa. The order of the three events e, f, and g is undetermined, and they are represented on a single step

 
Relationships Among Cystine Knot (CK) Motifs and Pattern of Mucin Gene Evolution
Since our previous model of evolution of the three human mucin genes MUC2, MUC5AC, and MUC5B (Desseyn et al. 1998aCitation ), several new mucin cDNAs encoding a CK motif have been analyzed. Sequences coding the carboxy-terminal peptide between the 10 last conserved cysteine residues of the CK motif of MUC5B (Desseyn et al. 1997aCitation ), MUC2 (Gum et al. 1992Citation ), MUC6 (Toribara et al. 1997Citation ), MUC5AC (Buisine et al. 1998aCitation ), RMuc2 (Xu et al. 1992Citation ), RMuc5ac (Inatomi et al. 1997Citation ), frog integumentary mucin FIM-B.1 (Probst, Gertzen, and Hoffmann 1990Citation ), porcine and bovine submaxillary mucins PSM (Eckhardt et al. 1991Citation ) and BSM (Bhargava et al. 1990Citation ), the human Norrie disease protein (hNDP; Berger et al. 1992Citation ), the mouse NDP (mNDP; Berger et al. 1996Citation ) and the human vWF gene (Mancuso et al. 1989Citation ) were aligned using the CLUSTAL X program (Thompson, Higgins, and Gibson 1994Citation ). The alignment was optimized based on the nine cysteine residues conserved in the 12 sequences (fig. 4A ). The phylogenetic tree (fig. 4B ) reveals three subfamilies of mucin genes: MUC6 alone, FIM-B.1, PSM, and BSM together, and MUC5AC, MUC5B, and MUC2 together grouped with their animal homologous mucin genes. It is noticeable that Cys-subdomains of 108 aa containing 10 Cys have been found in all the members of this last subfamily. Moreover, out of the four mucin genes of chromosome 11, MUC6 is closer to FIM-B.1, PSM, or BSM. Another observation reinforces this idea: the D4 and B domains found in vWF, MUC2, MUC5B, and MUC5AC are missing in MUC6, PSM, and BSM (fig. 1 ). Central parts of MUC2, MUC5B, and MUC5AC, in contrast to PSM and FIM-B.1, are flanked by two domains not found in vWF and encoded, as shown at least for MUC5B, by two small exons (indicated with asterisks in fig. 1 ) of 198 and 182 bp, respectively. We can then speculate that the common ancestor gene of the mammalian genes MUC2, MUC5AC, and MUC5B and of their animal homologous mucin genes contains a sequence coding for one Cys-subdomain and flanked at both its ends by these exons.



View larger version (53K):
[in this window]
[in a new window]
 
  Fig. 4.—A, Alignment of CK sequences between the last nine cysteine residues. hNDP and mNDP are human and mouse Norrie disease proteins. Codons coding for cysteine residues and for conserved amino acids are shown in bold. Alignment gaps are indicated by dashes. Asterisks indicate positions which have a single conserved nucleotide. Plus signs indicate positions which have two conserved nucleotides. The alignment was constructed using the CLUSTAL X program (Thompson, Higgins, and Gibson 1994Citation ). B, Deduced phylogenetic tree. The neighbor-joining tree is displayed by the TREEVIEW program (Page 1996Citation )

 
In addition to the genomic sequence of MUC5B, the genomic organization is available for the region downstream of the repetitive part of MUC6, and it showed that the central domain, rich in Ser and Thr, is followed by the CK domain, which is the last domain found in the three other mucins of human chromosome 11. This suggests that during evolution, MUC6 lost several exons coding for the D4, B, and C domains. This may have been possible, since the intron between the central part and the CK domain of MUC6 and the two introns (introns 30 and 46) of MUC5B flanking its domains which are not found in MUC6 have the same class (class 1), and deletion of the genomic part flanked by two introns with the same class preserved the downstream reading frame in MUC6. This may have happened by reverse transcription of a spliced variant mRNA which replaced the endogenous genomic copy through homologous recombination. This is a simple mechanism by which contiguous blocks of introns/exons are removed in one event (Frugoli et al. 1998Citation ). Because the central part of MUC6 does not seem to contain any Cys-subdomain, we can now propose a general scheme of the evolutionary history of the four human mucin genes of chromosome 11 (fig. 5 ). Our interpretation is that the present MUC5B and MUC5AC genes evolved from a common ancestor, which we termed the MUC5ACB progenitor. This progenitor derived from a common progenitor (MUC2-5ACB) to the present MUC2 gene by duplication involving the entire gene. This hypothetical progenitor contained the initial Cys-subdomain and itself had a progenitor in common with the present MUC6 gene. This last progenitor contained the D1-D2-D'-D3 and D4-B-C-CK domains inherited from a common ancestor gene to the vWF gene. In the vWF gene, the B and C domains triplicated and duplicated, respectively, while the MUC6 gene lost several exons coding for the D4, B, and C domains. This putative evolution scheme takes into account the genomic organization, as well as sequence similarities and the order of the four human mucin genes on chromosome 11. The amino- and carboxy-terminal domains rich in Cys are found conserved in mammalian mucins and frog mucin. These regions are most likely preserved from the early ancestor, whereas the lack of similarity in the central part carrying the carbohydrate chains suggests changes in sequence and structural organization that occurred after the amphibian/mammalian divergence.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 5.—Hypothetical diagram showing the evolution of the four human mucin genes clustered on the chromosome 11p15 from a common ancestor of the human vWF gene

 
The 11p15 mucin gene family arose from rare events. Three mechanisms may have acted (reviewed in Danielson and Dores 1999Citation ): gene amplification through amplicon formation, gene duplication through chromosomal breakage and ligation, and gene duplication through unequal crossing over at repeated elements. One of the recent and unexpected findings is that the large mucin genes are differently expressed, spatially and temporally, between the embryo and normal adults (Buisine et al. 1998b, 1999Citation ; Reid, Gould, and Harris 1997Citation ; Reid and Harris 1998Citation ). Although very little information is available concerning the regulatory elements, it is likely that the cis elements differ among mucin genes. A better understanding of their expression pattern and mechanisms that control the cell-specificity and temporal expression may come from their regulatory regions.

After mucin gene multiplication, further recent duplications within their central part, as suggested above for MUC5B, led to the present genomic organization of the four human mucins. Of special interest is the observation that the tandemly repeated coding sequences of mucins which contain most of the potential O-glycosylation sites are not conserved within species and between species. The fact that each mucin has its tandem repeats more or less conserved strongly suggests that (1) the central part has evolved with a selective pressure to keep Ser and Thr codons corresponding to the O-glycosylation sites and (2) each tandem repeat portion has arisen through internal successive duplications.

The most recent major event is probably the formation of the central repetitive region of each mucin gene, since the repeated sequences are not conserved among mucin genes. The single large exon of mucin genes is highly variable in size and sequence between species and between members of species. Tandem repeats expanded through replication slippage, unequal sister chromatid exchanges, and gene conversion (Vinall et al. 1998Citation ). This does not allow any frameshift changes but allows variability among individuals in the number of repeats, although peptides between two consecutive Cys-subdomains are always about 400 aa long (Desseyn 1997Citation ; Wickstrom et al. 1998Citation ), which correlates well with previous electronic microscopy studies showing the heterogeneity of mucin glycoproteins (Sheehan et al. 1991Citation ).

Genomic Organization of the MUC5B, MUC5AC, and vWF Genes
Comparison of the genomic DNA sequence of MUC5B with its cDNA sequence allowed us for the first time to determine the genomic organization of a large mucin gene (Desseyn et al. 1997a, 1997b, 1998bCitation ; Offner et al. 1998Citation ). These studies revealed a total of 48 introns, 30 introns upstream and 18 introns downstream of the large central exon (exon 30). The work on the MUC5B gene shows that 23 out of the 51 introns of the vWF gene have the same position and class in the MUC5B gene (table 1 ), and 9 other introns of MUC5B may be conserved with introns of the vWF gene. Few introns found in MUC5B are not found in vWF, and vice versa. Genomic organization of other human and animal mucin genes (MUC2, MUC5AC, RMuc2, Muc5ac, and FIM-B.1) may be helpful in determining which introns have been gained and which introns have been lost during evolution. Determination of the genomic organization of MUC5B facilitated the determination of the genomic organization of the 3' end of the MUC5AC gene (Buisine et al. 1998aCitation ), which showed that all of the introns found in MUC5AC are conserved (table 1 ) compared with MUC5B (position and class). However, unique tandemly repeated sequences identified in some introns of MUC5B have not been found in MUC5AC. This is probably due to insignificant selective pressure on the intronic sequences. In contrast to exonic regions, intron sizes are not conserved between the two genes. Nevertheless, sequences surrounding splice junctions are more or less perfectly conserved between MUC5B and MUC5AC genes and the first 7 bp of some introns are identical, for example. This probably reflects the fact that intronic splice junctions may not accumulate mutations at the same rate as the rest of the intronic sequences. Although almost no data are available concerning the exon-intron organization of other gel-forming mucin genes, we can anticipate that they all probably depict the same overall genomic organization.


View this table:
[in this window]
[in a new window]
 
Table 1 Similar Introns (Position and Class) Between {{nu}}WF, MUC5B, and MUC5AC

 
A lot of intron classes and positions and, by implication, splice-site consensus sequences are conserved between vWF and MUC5B or MUC5AC, but it is very noticeable that introns of the vWF gene are longer than the corresponding introns of the MUC5B gene and the MUC5AC gene. Although intron comparisons between MUC5B, MUC5AC, and vWF failed to show any conserved regions except splicing sites, we think that some of these introns have functions. Intron 36 of MUC5B is made up almost entirely of perfect direct repeats of 59 bp. The number of repeats is variable among individuals, ranging from three to eight repeats (Desseyn, Rousseau, and Laine 1999Citation ). Each repeat contains one binding site that leads to a specific interaction with a nuclear factor from mucus-secreting cells (Pigny et al. 1996bCitation ). As shown for some factors (Nakamura, Koyama, and Matsushima 1998Citation ), this factor may play a role in splicing and/or in pre-mRNA stability.

Conclusions

The four mucins clustered to human chromosome 11 have a CK domain and thus form a mucin subfamily. This subfamily can be divided into two subfamilies depending on the presence or absence of Cys-subdomains interrupting the large O-glycosylated domain. The number of Cys-subdomains is characteristic of each mucin, and studies on this domain will help to elucidate physiological functions of mucins.

Further identification of novel mucin genes, cloning of new mucin cDNAs and determination of mucin gene structure will help to characterize the structure-function relationship of mucins. Conserved domains in mucin peptide are most likely to have functional significance, but unique polypeptides should be considered to have been formed during evolution due to differing biological constraints.

Acknowledgements

This work was supported by le Comit;aae du Nord de la Ligue Nationale contre le Cancer and l'Association de Recherche contre le Cancer. J.-L.D. was supported by a fellowship from the Minist;agere de l'Education Sup;aaerieure et de la Recherche.

Footnotes

Claudia Kappen, Reviewing Editor

1 Abbreviations: aa, amino acid(s); bp, base pair(s); BSM, bovine submaxillary mucin; CK, cystine knot; FIM, frog integumentary mucin; kb, kilobase(s); NDP, Norrie disease protein; nt, nucleotide(s); PSM, porcine submaxillary mucin; vWF, von Willebrand factor. Back

2 Keywords: gel-forming mucin 11p15 tandem repeat evolution cystine knot Back

3 Address for correspondence and reprints: Jean-Luc Desseyn, Department of Pharmacology, P.O. Box 357750, University of Washington, Seattle, Washington 98195-7750. E-mail: desseyn{at}lille.inserm.fr Back

literature cited

    Bell, S. L., I. A. Khatri, G. Xu, and J. F. Forstner. 1998. Evidence that a peptide corresponding to the rat Muc2 C-terminus undergoes disulphide-mediated dimerization. Eur. J. Biochem. 253:123–131.[Abstract]

    Berger, W., A. Meindl, T. J. Van De Pol et al. (14 co- authors). 1992. Isolation of a candidate gene for Norrie disease by positional cloning. Nat. Genet. 2:84.

    Berger, W., D. Van De Pol, D. Bachner, F. Oerlemans, H. Winkens, H. Hameister, B. Wieringa, W. Hendriks, and H. H. Ropers. 1996. An animal model for Norrie disease (ND): gene targeting of the mouse ND gene. Hum. Mol. Genet. 5:51–59.[Abstract/Free Full Text]

    Bhargava, A. K., J. T. Woitach, E. A. Davidson, and V. P. Bhavanandan. 1990. Cloning and cDNA sequence of a bovine submaxillary gland mucin-like protein containing two distinct domains. Proc. Natl. Acad. Sci. USA 87:6798–6802.

    Bobek, L. A., J. Liu, S. N. Sait, T. B. Shows, Y. A. Bobek, and M. J. Levine. 1996. Structure and chromosomal localization of the human salivary mucin gene, MUC7. Genomics 31:277–282.

    Buisine, M. P., J. L. Desseyn, N. Porchet, P. Degand, A. Laine, and J. P. Aubert. 1998a. Genomic organization of the 3'-region of the human MUC5AC mucin gene: additional evidence for a common ancestral gene for the 11p15.5 mucin gene family. Biochem. J. 332:729–738.

    Buisine, M. P., L. Devisme, M. C. Copin, M. Durand-reville, B. Gosselin, J. P. Aubert, and N. Porchet. 1999. Developmental mucin gene expression in the human respiratory tract. Am. J. Respir. Cell Mol. Biol. 20:209–218.[Abstract/Free Full Text]

    Buisine, M. P., L. Devisme, T. C. Savidge, C. Gespach, B. Gosselin, N. Porchet, and J. P. Aubert. 1998b. Mucin gene expression in human embryonic and fetal intestine. Gut 43:519–524.

    Danielson, P. B., and R. M. Dores. 1999. Molecular evolution of the opioid/orphanin gene family. Gen. Comp. Endocrinol. 113:169–186.[ISI][Medline]

    Desseyn, J. L. 1997. Genomic organization of the human mucin MUC5B. Molecular basis of a new classification of mucins. Ph.D. thesis, University of Lille, France.

    Desseyn, J. L., J. P. Aubert, I. Van Seuningen, N. Porchet, and A. Laine. 1997a. Genomic organization of the 3' region of the human mucin gene MUC5B. J. Biol. Chem. 272:16873–16883.

    Desseyn, J. L., M. P. Buisine, N. Porchet, J. P. Aubert, P. Degand, and A. Laine. 1998a. Evolutionary history of the 11p15 human mucin gene family. J. Mol. Evol. 46:102–106.

    Desseyn, J. L., M. P. Buisine, N. Porchet, J. P. Aubert, and A. Laine. 1998b. Genomic organization of the human mucin gene MUC5B. cDNA and genomic sequences upstream of the large central exon. J. Biol. Chem. 273:30157–30164.

    Desseyn, J. L., V. Guyonnet-Duperat, N. Porchet, J. P. Aubert, and A. Laine. 1997b. Human mucin gene MUC5B, the 10.7-kb large central exon encodes various alternate subdomains resulting in a super-repeat. Structural evidence for a 11p15.5 gene family. J. Biol. Chem. 272:3168–3178.

    Desseyn, J. L., K. Rousseau, and A. Laine. 1999. Fifty-nine bp repeat polymorphism in the uncommon intron 36 of the human mucin gene MUC5B. Electrophoresis 20:493–496.

    Eckhardt, A. E., C. S. Timpte, J. L. Abernethy, Y. Zhao, and R. L. Hill. 1991. Porcine submaxillary mucin contains a cystine-rich, carboxyl-terminal domain in addition to a highly repetitive, glycosylated domain. J. Biol. Chem. 266:9678–9686.[Abstract/Free Full Text]

    Eckhardt, A. E., C. S. Timpte, A. W. Deluca, and R. L. Hill. 1997. The complete cDNA sequence and structural polymorphism of the polypeptide chain of porcine submaxillary mucin. J. Biol. Chem. 272:33204–33210.[Abstract/Free Full Text]

    Frugoli, J. A., M. A. McPeek, T. L. Thomas, and C. R. McClung. 1998. Intron loss and gain during evolution of the catalase gene family in angiosperms. Genetics 149:355–365.

    Gum, J. R. Jr., J. W. Hicks, N. W. Toribara, E. M. Rothe, R. E. Lagace, and Y. S. Kim. 1992. The human MUC2 intestinal mucin has cysteine-rich subdomains located both upstream and downstream of its central repetitive region. J. Biol. Chem. 267:21375–21383.[Abstract/Free Full Text]

    Gum, J. R. Jr., J. W. Hicks, N. W. Toribara, B. Siddiki, and Y. S. Kim. 1994. Molecular cloning of human intestinal mucin (MUC2) cDNA. Identification of the amino terminus and overall sequence similarity to prepro-von Willebrand factor. J. Biol. Chem. 269:2440–2446.[Abstract/Free Full Text]

    Guyonnet Dupérat, V., J. P. Audie, V. Debailleul, A. Laine, M. P. Buisine, S. Galiegue-Zouitina, P. Pigny, P. Degand, J. P. Aubert, and N. Porchet. 1995. Characterization of the human mucin gene MUC5AC: a consensus cysteine-rich domain for 11p15 mucin genes? Biochem. J. 305:211–219.

    Hansson, G. C., D. Baeckstrom, I. Carlstedt, and K. Klinga-Levan. 1994. Molecular cloning of a cDNA coding for a region of an apoprotein from the ‘insoluble’ mucin complex of rat small intestine. Biochem. Biophys. Res. Commun. 198:181–190.[ISI][Medline]

    Ho, S. B., and Y. S. Kim. 1991. Carbohydrate antigens on cancer-associated mucin-like molecules. Semin. Cancer Biol. 2:389–400.[Medline]

    Inatomi, T., A. S. Tisdale, Q. Zhan, S. Spurr-Michaud, and I. K. Gipson. 1997. Cloning of rat Muc5AC mucin gene: comparison of its structure and tissue distribution to that of human and mouse homologues. Biochem. Biophys. Res. Commun. 236:789–797.[ISI][Medline]

    Joba, W., and W. Hoffmann. 1997. Similarities of integumentary mucin B.1 from Xenopus laevis and prepro- von Willebrand factor at their amino-terminal regions. J. Biol. Chem. 272:1805–1810.[Abstract/Free Full Text]

    Klomp, L. W., L. Van Rens, and G. J. Strous. 1995. Cloning and analysis of human gastric mucin cDNA reveals two types of conserved cysteine-rich domains. Biochem. J. 308:831–838.[ISI][Medline]

    Krieg, J., S. Hartmann, A. Vicentini, W. Glasner, D. Hess, and J. Hofsteenge. 1998. Recognition signal for C-mannosylation of Trp-7 in RNase 2 consists of sequence Trp-x-x-Trp. Mol. Biol. Cell 9:301–309.

    Lancaster, C. A., N. Peat, T. Duhig, D. Wilson, J. Taylor-Papadimitriou, and S. J. Gendler. 1990. Structure and expression of the human polymorphic epithelial mucin gene: an expressed VNTR unit. Biochem. Biophys. Res. Commun. 173:1019–1029.[ISI][Medline]

    Lesuffleur, T., F. Roche, A. S. Hill, M. Lacasa, M. Fox, D. M. Swallow, A. Zweibaum, and F. X. Real. 1995. Characterization of a mucin cDNA clone isolated from HT-29 mucus-secreting cells. The 3' end of MUC5AC? J. Biol. Chem. 270:13665–13673.[Abstract/Free Full Text]

    Li, D., M. Gallup, N. Fan, D. E. Szymkowski, and C. B. Basbaum. 1998. Cloning of the amino-terminal and 5'-flanking region of the human MUC5AC mucin gene and transcriptional up-regulation by bacterial exoproducts. J. Biol. Chem. 273:6812–6820.[Abstract/Free Full Text]

    Mancuso, D. J., E. A. Tuley, L. A. Westfield, N. K. Worrall, B. B. Shelton-Inloes, J. M. Sorace, Y. G. Alevy, and J. E. Sadler. 1989. Structure of the gene for human von Willebrand factor. J. Biol. Chem. 264:19514–19527.[Abstract/Free Full Text]

    Mayadas, T. N., and D. D. Wagner. 1992. Vicinal cysteines in the prosequence play a role in von Willebrand factor multimer assembly. Proc. Natl. Acad. Sci. USA 89:3531–3535.

    Meerzaman, D., P. Charles, E. Daskal, M. H. Polymeropoulos, B. M. Martin, and M. C. Rose. 1994. Cloning and analysis of cDNA encoding a major airway glycoprotein, human tracheobronchial mucin (MUC5). J. Biol. Chem. 269:12932–12939.[Abstract/Free Full Text]

    Meitinger, T., A. Meindl, P. Bork, B. Rost, C. Sander, M. Haasemann, and J. Murken. 1993. Molecular modelling of the Norrie disease protein predicts a cystine knot growth factor tertiary structure. Nat. Genet. 5:376–380.[ISI][Medline]

    Nakamura, Y., K. Koyama, and M. Matsushima. 1998. VNTR (variable number of tandem repeat) sequences as transcriptional, translational, or functional regulators. J. Hum. Genet. 43:149–152.[ISI][Medline]

    Offner, G. D., D. P. Nunes, A. C. Keates, N. H. Afdhal, and R. F. Troxler. 1998. The amino-terminal sequence of MUC5B contains conserved multifunctional D domains: implications for tissue-specific mucin functions. Biochem. Biophys. Res. Commun. 251:350–355.[ISI][Medline]

    Ohmori, H., A. F. Dohrman, M. Gallup, T. Tsuda, H. Kai, J. R. Gum Jr., Y. S. Kim, and C. B. Basbaum. 1994. Molecular cloning of the amino-terminal region of a rat MUC 2 mucin gene homologue. Evidence for expression in both intestine and airway. J. Biol. Chem. 269:17833–17840.[Abstract/Free Full Text]

    Page, R. D. 1996. TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12:357–358.[Medline]

    Perez-Vilar, J., A. E. Eckhardt, and R. L. Hill. 1996. Porcine submaxillary mucin forms disulfide-bonded dimers between its carboxyl-terminal domains. J. Biol. Chem. 271:9845–9850.[Abstract/Free Full Text]

    Perez-Vilar, J., and R. L. Hill. 1997. Norrie disease protein (norrin) forms disulfide-linked oligomers associated with the extracellular matrix. J. Biol. Chem. 272:33410–33415.[Abstract/Free Full Text]

    ———. 1998a. The carboxyl-terminal 90 residues of porcine submaxillary mucin are sufficient for forming disulfide-bonded dimers. J. Biol. Chem. 273:6982–6988.

    ———. 1998b. Identification of the half-cystine residues in porcine submaxillary mucin critical for multimerization through the D-domains. Roles of the CGLCG motif in the D1- and D3-domains. J. Biol. Chem. 273:34527–34534.

    Pigny, P., V. Guyonnet-Duperat, A. S. Hill et al. (14 co- authors). 1996a. Human mucin genes assigned to 11p15.5: identification and organization of a cluster of genes. Genomics 38:340–352.

    Pigny, P., I. Van Seuningen, J. L. Desseyn, S. Nollet, N. Porchet, A. Laine, and J. P. Aubert. 1996b. Identification of a 42-kDa nuclear factor (NF1-MUC5B) from HT-29 MTX cells that binds to the 3' region of human mucin gene MUC5B. Biochem. Biophys. Res. Commun. 220:186–191.

    Probst, J. C., E. M. Gertzen, and W. Hoffmann. 1990. An integumentary mucin (FIM-B.1) from Xenopus laevis homologous with von Willebrand factor. Biochemistry 29:6240–6244.

    Reid, C. J., S. Gould, and A. Harris. 1997. Developmental expression of mucin genes in the human respiratory tract. Am. J. Respir. Cell. Mol. Biol. 17:592–598.[Abstract/Free Full Text]

    Reid, C. J., and A. Harris. 1998. Developmental expression of mucin genes in the human gastrointestinal system. Gut 42:220–226.

    Sheehan, J. K., R. P. Boot-Handford, E. Chantler, I. Carlstedt, and D. J. Thornton. 1991. Evidence for shared epitopes within the ‘naked’ protein domains of human mucus glycoproteins. A study performed by using polyclonal antibodies and electron microscopy. Biochem. J. 274:293–296.[ISI][Medline]

    Shekels, L. L., C. Lyftogt, M. Kieliszewski, J. D. Filie, C. A. Kozak, and S. B. Ho. 1995. Mouse gastric mucin: cloning and chromosomal localization. Biochem. J. 311:775–785.[ISI][Medline]

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.[Abstract]

    Toribara, N. W., J. R. Gum Jr., P. J. Culhane, R. E. Lagace, J. W. Hicks, G. M. Petersen, and Y. S. Kim. 1991. MUC-2 human small intestinal mucin gene structure. Repeated arrays and polymorphism. J. Clin. Invest. 88:1005–1013.[ISI][Medline]

    Toribara, N. W., S. B. Ho, E. Gum, J. R. Gum Jr., P. Lau, and Y. S. Kim. 1997. The carboxyl-terminal sequence of the human secretory mucin, MUC6. Analysis Of the primary amino acid sequence. J. Biol. Chem. 272:16398–16403.[Abstract/Free Full Text]

    Turner, B. S., K. R. Bhaskar, M. Hadzopoulou-Cladaras, R. D. Specian, and J. T. Lamont. 1995. Isolation and characterization of cDNA clones encoding pig gastric mucin. Biochem. J. 308:89–96.[ISI][Medline]

    van de Bovenkamp, J. H., C. M. Hau, G. J. Strous, H. A. Buller, J. Dekker, and A. W. Einerhand. 1998. Molecular cloning of human gastric mucin MUC5AC reveals conserved cysteine-rich D-domains and a putative leucine zipper motif. Biochem. Biophys. Res. Commun. 245:853–859.[ISI][Medline]

    Vinall, L. E., A. S. Hill, P. Pigny, W. S. Pratt, N. Toribara, J. R. Gum, Y. S. Kim, N. Porchet, J. P. Aubert, and D. M. Swallow. 1998. Variable number tandem repeat polymorphism of the mucin genes located in the complex on 11p15.5. Hum. Genet. 102:357–366.[ISI][Medline]

    Voorberg, J., R. Fontijn, J. Calafat, H. Janssen, J. A. Van Mourik, and H. Pannekoek. 1991. Assembly and routing of von Willebrand factor variants: the requirements for disulfide-linked dimerization reside within the carboxy-terminal 151 amino acids. J. Cell Biol. 113:195–205.[Abstract]

    Wickstrom, C., J. R. Davies, G. V. Eriksen, E. C. Veerman, and I. Carlstedt. 1998. MUC5B is a major gel-forming, oligomeric mucin from human salivary gland, respiratory tract and endocervix: identification of glycoforms and C-terminal cleavage. Biochem. J. 334:685–693.[ISI][Medline]

    Xu, G., L. J. Huan, I. A. Khatri, D. Wang, A. Bennick, R. E. Fahim, G. G. Forstner, and J. F. Forstner. 1992. cDNA for the carboxyl-terminal region of a rat intestinal mucin-like peptide. J. Biol. Chem. 267:5401–5407.[Abstract/Free Full Text]

Accepted for publication March 31, 2000.