The gene encoding mouse Muc19: cDNA, genomic organization and relationship to Smgc
D.J. Culp1,
L.R. Latchney1,
M.A. Fallon1,
P.A. Denny2,
P.C. Denny2,
R.I. Couwenhoven3 and
Sally Chuang1
1 University of Rochester Medical Center, Center for Oral Biology and the Department of Pharmacology and Physiology, Rochester, New York
2 University of Southern California, Division of Diagnostic Sciences, School of Dentistry, Los Angeles, California
3 University of Maryland Dental School, Department of Diagnostic Sciences and Pathology, Baltimore, Maryland
 |
ABSTRACT
|
---|
We previously demonstrated expression of full-length transcripts for sublingual mucin apoprotein, Muc19, of
24 kb (Fallon MA, Latchney LR, Hand AR, Johar A, Denny PA, Georgel PT, Denny PC, and Culp DJ. Physiol Genomics 14: 95106, 2003). We now describe the complete sequence and genomic organization of the apomucin encoded by 43 exons. Southern analyses indicate a central exon of
18 kb containing 36 tandem repeats, each encoding 163 residues rich in serine and threonine. Full-length transcripts are an estimated 22,795 bp in length that span 106 kb of genomic DNA. The transcriptional start site is 24 bp downstream of a TATA box and 42 bp upstream of the conceptual translational start codon. The putative apoprotein has an estimated mass of 693.4 kDa and contains 7,524 amino acids (80% serine, threonine, glycine, alanine, and proline). We present a model for rat Muc19 transcripts and compare the conceptually translated Muc19 proteins for mouse, rat, pig, and the 3' end of human Muc19. Conserved among these apoproteins are a signal peptide, a large tandem repeat region, von Willebrand factor type C and D domains, a trypsin inhibitor-like Cys-rich domain, and a COOH-terminal cystine knot-like domain. Southern blot analyses indicate transcripts for Muc19 and Smgc (submandibular gland protein C) are splice variants of a larger gene, Muc19/Smgc. Comparative Northern analyses between the major salivary glands demonstrate highly selective Muc19 expression in neonatal and adult sublingual glands, whereas Smgc is expressed in neonatal submandibular and sublingual glands. Regulation of Muc19/Smgc gene expression is discussed with respect to alternative splicing and mucous cell cytodifferentiation.
salivary glands; exocrine cells; secretion; mucins; SMGC
 |
INTRODUCTION
|
---|
SALIVA PRODUCTION RESULTS from the coordinated action of a diverse array of glandular structures, each contributing differentially to the total pool of organic and fluid constituents. Mucins are recognized as important components of saliva and are considered to play multiple roles in preserving the integrity of the oral cavity including lubrication, hydration, and the selective clearance or adherence of microorganisms (28). High-molecular-weight mucins in human saliva, termed MG1 salivary mucins, are the product of at least two gene products, MUC5B and MUC4 (16, 24, 30). MUC5B transcripts have been localized to mucous cells of major and minor salivary mucous glands (24), although expression is abundant in sublingual glands and barely detectable in submandibular glands (24, 30). MUC4 transcripts are present at moderate levels in human submandibular glands, with much lower levels in sublingual glands (16). MUC4 is considered the human homolog of rat sialomucin complex, a membrane-bound heterodimeric glycoprotein complex (21). Although splice variants of MUC4 have been discovered and may represent secreted forms (21), they are not likely packaged and released via the regulated secretory pathway. Recently, the 3' end of MUC19, the human homolog to porcine submandibular mucin (PSM), was cloned and transcripts localized to mucous cells of submandibular and tracheal glands, whereas expression in sublingual glands was not determined (8). These combined results suggest MUC19 and MUC5B contribute to human salivary MG1 mucins via regulated exocrine secretory pathways, but may be more selectively expressed by mucous cells of submandibular and sublingual glands, respectively.
In contrast to humans and other large mammalian species, mucous cells and high-molecular-weight mucins within the major salivary glands of rodents are almost exclusively localized to sublingual glands (26). Furthermore, in rodents, submandibular acinar cells are solely a seromucous cell type and secrete a low-molecular-weight mucin, Muc10 (10). We reported recently that neonatal mice harboring a spontaneous autosomal recessive mutation, sld ("sublingual gland differentiation" arrest), express barely detectable levels of high-molecular-weight mucins compared with wild-type mice (13). We further demonstrated that mutant sublingual glands are also deficient in steady-state levels of large transcripts (
24 kb) that encode for an apomucin, now designated Muc19 (13). These results collectively indicate that Muc19 represents the high-molecular weight (>106 Da) and highly glycosylated (>80% carbohydrate) mucin that is the primary exocrine secretion product of rodent sublingual glands (31). In addition, Escande and coworkers (12) were unable to detect transcripts for Muc5B in mouse submandibular and parotid glands. Muc19 is therefore the predominant, if not the only, large gel-forming mucin expressed by the major salivary glands of rodents.
The selective expression of Muc19 within the salivary systems of rodents and humans (8) suggests strong evolutionary pressures to preserve the protective functions of this mucin within the oral cavity. As an initial step toward elucidation of the regulation and structure-function relationships of Muc19, we have determined the complete coding sequence of mouse Muc19 and delineated its genomic organization. For comparison, we also present a model for rat Muc19 transcripts and compare the conceptually translated Muc19 proteins for mouse, rat, pig, and the 3' end of human MUC19. Interestingly, we find that Muc19 is a splice variant of a complex gene that also encodes the recently cloned salivary protein, Smgc (submandibular gland protein C) (32). SMGC proteins are a primary secretion product of the transient type I (terminal tubule) cells expressed during postnatal development of rodent submandibular glands (32). In adult rats and female mice, Smgc expression is nearly undetectable in submandibular glands and found only in a small subpopulation of intercalated duct cells (32). We therefore compared Muc19 and Smgc expression in all three of the major salivary glands, both in neonates and in adult mice. Regulation of Muc19/Smgc gene expression is discussed with respect to alternative splicing and mucous cell cytodifferentiation.
 |
MATERIALS AND METHODS
|
---|
Materials.
All chemicals were purchased from (Invitrogen, Carlsbad, CA) unless otherwise noted. Restriction enzymes and their digestion buffers were obtained from New England Biolabs (Beverly, MA). Kits were used according to the manufacturers specifications, unless indicated.
Animals and excision of glands.
Adult breeders of NFS/NCr mice were purchased from the National Cancer Institute animal program maintained by Charles River Laboratories, Frederick, MD. Adult Swiss Webster mice were obtained from Simonsen Labs, Gilroy, CA. Animals were killed by exsanguination following CO2 inhalation or by cervical dislocation. Excised tissues were immediately frozen in liquid nitrogen and stored in either liquid nitrogen or at 80°C.
RNA isolation, cDNA library preparation, and screening.
Frozen glands were ground to a fine powder, and RNA was isolated using TRIzol. Poly(A)+ RNA was isolated on oligo-dT cellulose (PolyA Pure kit; Ambion, Austin, TX). Construction of a sublingual cDNA library and screening was performed as described previously for submandibular glands (10).
RT-PCR and cloning.
To derive clone p3941, total RNA from adult submandibular/sublingual gland complexes (1 µg) was reverse transcribed for 1 h at 42°C in the presence of 500 ng oligo-dT, 3 mM MgCl2, 10 mM Tris·HCl buffer (pH 8.3), 75 mM KCl, 0.5 mM dNTP, and 200 U SuperScript reverse transcriptase. One-fifth of the resultant reaction mix was added to a PCR mix containing 2.5 mM MgCl2, 1.0 mM dNTP, 0.25 U AmpliTaq (Applied Biosystems, Foster City, CA), and 2 pmol of each primer. PCR cycling profile was as follows: 94°C for 2 min followed by 35 cycles (94°C, 15 s; 53°C, 30 s; 72°C, 1 min) and a final 1-min extension at 72°C. The 5' primer (5'-ACTGCCCAAAGCAGACCTGC) is homologous to nucleotides 44094428 of bovine submaxillary mucin cDNA (GenBank accession no. AF016589) and nucleotides 3957339592 of porcine submaxillary mucin cDNA (GenBank accession no. AF005273). The 3' primer (5'-GCATTCCCCTGCGCATCTTGCCAT) is homologous to the reverse strand for nucleotides 45494572 of bovine submaxillary mucin cDNA and nucleotides 3971339736 of porcine submaxillary mucin cDNA. The resultant 164-bp product was gel purified, ligated into pCRII (Invitrogen), and cloned. Clones were identified by colony PCR. Plasmids were isolated from positive clones and sequenced. Clone p3941 contained a 155-bp insert with high homology to the target regions of bovine and porcine submaxillary mucins.
For RT-PCR of segments within 5' sequence of Muc19, we used the Qiagen OneStep RT-PCR kit (contains Omniscript reverse transcriptase, Sensiscript reverse transcriptase, and HotStarTaq DNA polymerase). Total RNA from adult sublingual glands (1 µg, DNase digested) was reverse transcribed for 30 min at 50°C with gene-specific primers and PCR performed. PCR cycling profiles were 95°C for 15 min followed by 40 cycles (94°C, 1 min; 55°C, 1 min; 72°C, 1 min) and a final 10-min extension at 72°C. Primers are listed in Table 1. Products were run on 1% agarose gels, and banded DNA was extracted using the Rapid Gel Extraction System (Marligen Bioscience, Ijamsville, MD). DNA was then ligated into pGEM-T (Promega, Madison, WI), and DH10B cells were transformed by electroporation.
To clone Smgc transcripts from adult sublingual glands, we used the Invitrogen SuperScript One-Step RT-PCR for Long Template kit (contains SuperScript II reverse transcriptase and Platinum Taq DNA Polymerase High Fidelity). Total RNA (0.1 µg, DNase digested) was reverse transcribed for 30 min at 55°C. The reaction included the 5' primer (5'-TCTCTACACTTAGGTCCCAGATCGTC) complementary to bp 1439 of exon 1 (GenBank accession no. AY459348) and the 3' primer (5'-AACAGACAACAGCCCTGCCTC) reverse and complement to Smgc (bp 24092429) within exon 18. The PCR cycling profile was 94°C for 2 min at 68°C followed by 35 cycles (94°C, 15 s; 60°C, 30 s; 68°C, 4 min) and a final 10-min extension at 72°C. Products were gel purified, ligated into pGEM-T, and cloned as described above. To compare Smgc transcripts in sublingual glands of adult and 3-day-old mice by RT-PCR, we followed the procedure described by Zinzen et al. (32) incorporating the FailSafe PCR buffer system (Epicentre, Madison, WI) and 5' primer (5'-ACAGTCTCTACACTTAGGTCCCA) complementary to bp 1032 of exon 1 (GenBank accession no. AY459348) and the 3' primer (5'-GGATGACCAGTCACAAACACTATC) reverse and complement to Smgc (bp 26242647), near the end of exon 18. Major bands were excised and sequenced directly. In all cases, negative controls included reactions run without added RNA or without reverse transcriptase.
RLM-RACE.
RNA ligase-mediated rapid amplification of cDNA ends (RLM-RACE) was performed using reagents supplied in the Ambion First Choice RLM-RACE kit. For 5' RLM-RACE, total sublingual RNA (10 µg) after digestion with DNase (DNA-free, Ambion) was treated with calf intestinal phosphatase followed by tobacco acid pyrophosphatase and ligation of the supplied 5' adapter. Reverse transcription was then performed with random decamers and MMLV reverse transcriptase. Resultant cDNA served as template for nested PCR (Taq DNA polymerase, from Qiagen; Robocycler, from Stratagene, La Jolla, CA). The initial PCR reaction included the 5' outer primer provided by Ambion and the 3' primer, 5'-GCACTCAGACTCGAAGGAAAAGATG (reverse and complement to Muc19, bp 256280). The subsequent nested reaction included the 5' primer (5'-ACACTGCGTTTGCTGGCTTTG) complementary to nucleotides 1939 of the 5' adapter and the 3' primer (5'-CTTGCCCCACATAGTTGCTTCAC) reverse and complement to Muc19 bp 207229. Both PCR cycling profiles were 94°C for 3 min followed by 35 cycles (94°C, 1 min; 60°C, 1 min; 72°C, 1 min) and a final 7-min extension at 72°C. Products were gel purified, ligated into pGEM-T, and cloned as described above.
For 3' RLM-RACE, total sublingual RNA (10 µg, DNase treated) was reverse transcribed in the presence of the supplied 3'adapter, and the resultant cDNA served as template for PCR. The PCR reaction included the 5'primer (5'-GCAGCCCCTGCTTCAAGCTC) complementary to bp 340359 of the tandem repeats in Muc19 and the 3' outer primer provided by Ambion (reverse and complement to the adapter). The PCR cycling profile was the same as for 5' RLM-RACE except the annealing temperature was increased to 66°C and extension time was increased to 2 min. The product was gel purified, ligated into pGEM-T, and cloned into DH10B cells as described above.
DNA sequencing of RT-PCR products and cDNA clones.
DNA sequence analysis for initial work to clone the 3' end of Muc19 in Swiss Webster mice was by the dideoxynucleotide chain-termination method as described previously (3). All other sequencing was performed at the University of Rochester Functional Genomics Center on an Applied Biosystems 3100 Genetic Analyzer using ABI Big Dye version 3.1 chemistry. Plasmid DNA samples were isolated (Qiagen Mini Prep Kits), and M13 primers were utilized. Prior to sequencing, PCR products were prepared using the QIAquick PCR Purification Kit (Qiagen).
Probes for Southern and Northern blot analyses.
All probes were generated by asymmetric PCR in 15-µl reactions that included 20 ng DNA template, 1 µM primer, 1 U Taq DNA polymerase (Qiagen Taq PCR Kit), 1.33 µM [
-32P]dNTP (3,000 Ci/mmol dGTP or 6,000 Ci/mmol dATP, PerkinElmer Life Sciences), 0.7 µM non-isotopic d(G/A)TP, and 50 µM of each of the three remaining dNTPs. The PCR cycling profile was 94°C for 5 min followed by 30 cycles (94°C, 30 s; 49°C, 2 min; 72°C, 2 min) and a final 7-min extension at 72°C. Each probe was separated from the reaction mixture using QIAquick nucleotide removal kit (Qiagen) and used at 106107 cpm/ml of hybridization solution. Primers were synthesized (Invitrogen) and included the following: Muc19 exon 1 (5'-TTGCCAACAAAGCAGAGCACCACAG); Muc19 exon 2 (5'-CCTTCAACACCAGAAGCCAC); Muc19 tandem repeats (5'-GAGGTCTCATTAGAGGCTGC); and Smgc exon 18 (5'-CCCCAAAACCTGACTTGATTGC). Templates were as follows: Muc19 exon 1 (95 nucleotides): 5'-ATGGGCCTGAACAGTCTCTACACTTAGGTCCCAGATCGTCACCATGAAGCTGATACTTCTGTACCTGGCTGTGGTGCTCTGCTTTGTTGGCAAAG; Muc19 exon 2 region (100 nucleotides): 5'-CAGGTGCAGCACGAAGTCCCACTACGACTAGAACACCCACACCCAGTACTTCAGGTAATTTACTTTCAGGAGAATTATTTGTGGCTTCTGGTGTTGAAGG (exon sequence underlined); Muc19 tandem repeat (504 nucleotides): 5'-AATTCGGCACGAGCTCCTGCCAGTTCCACATCTGGGAGAGCTGCCACCACCACCAGCACCGCCACCACCACGACCACCACCACCACCACCGCCACCACAGTGGGTTCTGCTGGGTCCTCTGCTCCCACTGCCTCGTCCACAGCTGCTGGGAGTGGCCTGAGGGAAGCCGCTAATGCAACCTCAGCACCTGTCTCCACATCTGGACAGCCTGGCGCCTCAACGGGATCTTCTGGGACCTCGTCCAGTGTGTCATCTACTGCTGCCGCCACCACAGCAGGCACCACCACAGCAGCCTCTAATGAGACCTCAGCTCCTGCCTCCACAGCTGGGCCTACCAGCTCTGCCACCACGGCAGCCCCTGCTTCAAGCTCAGCTTCTTCGGCCACCACACTGGCTGAAACAGCTGGCAGCACCACAGGCCCAGCTGTAAGCACAACCTCTGCTGGTTCCACATCTGCAAGAGCCGCCACCACCTCCCCAGGAGGTTCATCTGGGTCCTCTTCT (primer bind site underlined); Smgc exon 18 (100 nucleotides): 5'-TAGAGTCATCTCTCATTTGAAAGTCATGATTTTTGAAATCTACTTTTAAAGCAACAATGAAAATCATATTCATAATTTGCAATCAAGTCAGGTTTTGGGG. Templates of 100 nucleotides or less were synthesized and PAGE purified by Invitrogen. The template to Muc19 tandem repeats was isolated from an EcoRI/EarI restriction digest of the Muc19 cDNA clone pSL112 which contains the last tandem repeat and 3' sequence (1,457 nucleotides total) cloned into the EcoRI site of pBluescript SK. All resultant probes are the full-length reverse complement of template DNA except for the Muc19 tandem repeat probe (296 nucleotides).
Northern analysis.
Samples of total RNA (10 µg/lane) were separated on agarose gels under formaldehyde denaturing conditions and transferred to nylon membranes (BrightStar-Plus, Ambion). Each gel also contained one or two lanes containing RNA Millennium Markers (Ambion). All other procedures were as described previously (13).
Southern analysis.
Genomic DNA was isolated from kidneys using the AquaPure Genomic DNA kit from Bio-Rad Laboratories (Hercules, CA). Samples were digested with the indicated restriction enzymes under exhaustive conditions (unless noted), separated on agarose gels, depurinated, transferred to nylon membranes (Nytran SuPerCharge; Schleicher and Schuell, Keene, NH) using a PosiBlot system (Stratagene) and cross-linked (Stratolinker, Stratagene). Prehybridizations (1 h at 42°C) and hybridizations (20 h at 42°C) were performed in UltraHyb solution (Ambion). All other procedures were as described by Sambrook and Russell (27). Autoradiography was performed using Kodak MR film (Eastman Kodak). Each agarose gel contained lanes loaded with the High Molecular Weight DNA Ladder and also the 1 Kb Plus DNA Ladder, both from Invitrogen.
Chromosome mapping of Muc19.
Genomic clone isolation and fluorescence in situ hybridization (FISH) mapping for Muc19 was conducted by Genome Systems (St. Louis, MO). Initially, a mouse genomic P1 clone (F354, insert
24 kb) containing genomic sequence to Muc19 was identified using the insert from pSL112 as probe. Genomic DNA from F354 was labeled with digoxigenin dUTP by nick translation and hybridized to metaphase chromosomes from mouse embryo fibroblast cells. Specific labeling was detected using fluoresceinated anti-digoxigenin antibodies and subsequent counterstaining with DAPI.
Development of cDNA models and sequence analysis.
Development of a model for the 5' end of mouse Muc19 based primarily on alignment by TBLASTN (1) of PSM protein sequence (GenBank accession no. AAC62527) to translated mouse genomic sequence within 15E3 (NT_039621.2). In addition, we used programs for exon predictions (MZEF and Xpound) available from the RUMMAGE sequence annotation server (http://gen100.imb-jena.de/rummage/index.html) (29) as well as the National Center for Biotechnology Information (NCBI) data mining tool for open reading frames (ORF finder, http://www.ncbi.nlm.nih.gov/gorf/gorf.html). The rat Muc19 cDNA model was derived in a similar manner based on alignment of PSM protein and the conceptually translated mouse Muc19 sequence to translated rat genomic sequence within 7q35 (NW_047783.1). See the Supplemental Material (available online at the Physiological Genomics web site)1
for nucleotide and protein sequences of the rat Muc19 cDNA model.
Protein sequence alignments were calculated using the algorithm ClustalW (MacVector 7.2.2; Accelrys, San Diego, CA). Unless indicated, the identity protein matrix was used in alignments. Protein domains were determined by SignalP v1.2 (http://www.cbs.dtu.dk) (23) and by alignments to the Conserved Domain Database, v2.00 (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml) (20). Potential splice sites as well as sites for N- and O-glycosylation were determined by the programs NetGene2, NetNGlyc v1.0, and NetOGlyc v3.1, respectively (http://www.cbs.dtu.dk) (5, 6, 14).
 |
RESULTS
|
---|
Cloning the 3' end of Muc19 and chromosome mapping.
Oligonucleotide primers were designed to highly conserved regions of the cysteine-rich motif identified previously in the 3' end of both bovine and porcine submandibular mucins (BSM and PSM, respectively) (4, 11). Subsequent RT-PCR of sublingual mRNA from Swiss Webster mice resulted in a 164-bp product that was subsequently cloned to yield p3941. The insert from p3941 was then used to screen a Swiss Webster sublingual cDNA library (lambda-ZAP II). Sequencing of 15 positive clones (0.92.7 kb inserts) yielded 2,636 bp of contiguous sequence encoding 3 tandem repeats of 489 bp, preceded by the last 47 bp of a repeat and followed by 1,122 bp of non-repeat sequence. All repeats are identical except for the most 3' repeat, which has a T versus a C in positions 179 and 380. Conceptual translation results in repeats of 163 residues that are rich in serine and threonine followed by 3' end cysteine-rich sequence. The cDNA sequence is available in GenBank (accession no. AY172172) and was subsequently named Muc19 (LocusLink ID 239611). The insert in p3941 represents bp 20952258 of AY172172.
One of the fifteen 3' end clones from Swiss Webster mice, pSL112, containing the last repeat and 3' end sequence of Muc19 cDNA (1.6 kb insert), was used as probe to identify a mouse genomic P1 clone containing Muc19 (F354,
24 kb insert). FISH was then performed using F354 as probe. Eighty metaphase cells were analyzed with 67 exhibiting specific labeling of the distal portion of a chromosome with the size and banding pattern of chromosome 15. The specificity and location of labeling was verified in measurements of 10 samples of chromosome 15 that were double labeled with F354 and a marker specific for the centromeric region of chromosome 15. A representative experiment is shown in Fig. 1. F354 hybridization was located 80% of the distance from the heterochromatic-euchromatic boundary to the telomere, consistent with band 15E3.

View larger version (82K):
[in this window]
[in a new window]
|
Fig. 1. Metaphase fluorescence in situ hybridization (FISH) of mouse chromosomes dual labeled with a marker specific for the centromeric region of chromosome 15 (arrows) and a mouse genomic P1 clone containing Muc19 (arrowheads). Hybridization of the P1 clone is in band E3.
|
|
Cloning and sequencing the 3' end of Muc19 in NFS/N mice and comparison of sequences from four strains of mice.
Subsequent to cloning the 3' end of Muc19 from Swiss Webster mice, we found that a colony of NFS/N mice harboring a spontaneous autosomal recessive mutation (sld) display disrupted transcriptional regulation and/or mRNA stability of Muc19 (13). We therefore used NFS/N mice in all subsequent studies to develop reference sequence toward eventual elucidation of the sld mutation. As an initial step, we cloned the 3' end of Muc19 by 3' RLM-RACE incorporating primers to sequence complementary to bp 340359 of the tandem repeats in Muc19 and to the 3' adapter supplied by the manufacturer. The resultant 1.3-kb product was cloned, and its sequence was found to be nearly identical to the 3' end sequence we derived previously from Swiss Webster mice (GenBank accession no. AY172172, from bp 10262635), except for four single nucleotide polymorphisms (SNPs). Further comparison of these two 3' end sequences with those from two other strains of mice (B6D2 and C57BL6/6J) demonstrates a high degree of conservation in coding sequence. As shown in Table 2, all SNPs are conserved among one of two nucleotides. In two of the three codons affected, the substituted amino acid has similar polarity.
View this table:
[in this window]
[in a new window]
|
Table 2. Comparisons of four mouse strains in SNPs identified in cDNA or genomic sequence with the 3' end of mouse Muc19
|
|
Elucidation of cDNA sequence 5' of the tandem repeats.
To delineate cDNA sequence of Muc19 5' of the tandem repeats, we compared analogous protein sequence of PSM to the translated sequences of the mouse genome (BLASTX). Significant alignments (expect <0.001) were only obtained within chromosome 15 in a region of E3 upstream of genomic sequence aligning to the 3' end sequence of Muc19. From these results we assembled a model of 30 exons and designed primers to amplify putative 5' mRNA sequence by RT-PCR using sublingual mRNA from NFS/N mice. Shown in Fig. 2 is a map of exons within the mRNA sequence of Muc19 and overlapping RT-PCR products that were either sequenced directly or cloned and sequenced. The clones represented by the solid black bar at the NH2 terminus were produced by RLM-RACE and provided sequence encoded by the first three exons of Muc19, which were not identified by homology to PSM. Also shown are the cDNA clones from Swiss Webster mice that were sequenced to elucidate the 3' end of Muc19. The resultant 5' sequence has a minimum of sixfold coverage and extends 179 nucleotides into the first tandem repeat of Muc19. At this point, tandem repeat sequences are localized to one or more exons indicated as exon 33. It should be noted that there was no evidence for splice variants within the 5' or 3' sequences of Muc19 from RT-PCR results or during the evaluation of multiple clones. Moreover, the 5' cDNA sequence from NFS/N mice is identical to that for C57BL/6J mice as predicted from the NCBI genomic database (NT_039621.2).

View larger version (14K):
[in this window]
[in a new window]
|
Fig. 2. Map of exons within mouse Muc19 mRNA identified by RT-PCR products sequenced directly (dark gray bars) or subsequently cloned and sequenced (light gray bars). The solid black bar at the NH2 terminus indicates the cloned products from RNA ligase-mediated rapid amplification of cDNA ends (RLM-RACE). Vertical bars represent exon boundaries. The 5' sequence has a minimum of 6-fold coverage and extends 179 nucleotides into the first tandem repeat of Muc19 localized to one or more exons indicated as exon 33.
|
|
Exon 1, as determined from RLM-RACE, is composed of 95 bp containing the sequence ATGGGCCTGAACAGTCTCTACACTTAGGTCCCAGATCGTCACCATGAAGCTGATACTTCTGTACCTGGCTGTGGTGCTCTGCTTTGTTGGCAAAG. Of eight clones sequenced, five identified the initial A as the transcriptional start site, whereas one clone indicated the G at position 3 and two clones indicated the C at position 7. At position 44 is the putative translational ATG start codon within a Kozak consensus sequence (underlined) (15). Within genomic sequence (NT_039621.2) a TATA box consensus sequence (TATAAAT) is located 28 bp upstream of the A at position 1.
Evaluation of exon(s) encoding tandem repeats.
In the results described above, both 5' and 3' cDNA sequences of Muc19 contain tandem repeat sequence. From comparisons of each cDNA end sequence with genomic sequence (NT_039621.2) we assigned intron/exon boundaries for a single putative exon (exon 33) containing tandem repeats. Within genomic sequence NT_039621.2 this exon contains 11 full tandem repeats of 489 bp plus one partial repeat. In Fig. 3, the conceptually translated sequences for the three cloned tandem repeats contiguous with the 3' end of Muc19 are compared with tandem repeats within the genomic database. The 163 residue repeats display remarkable identity. One of two consistent exceptions is a valine at position 60 of the last tandem repeat in both cloned (Repeat-3) and genomic sequence (NCBI-12) that results from a T versus a C at nucleotide 179. Also, residues 1719 of the first repeat (NCBI-1) that encode the sequence SAA are also encoded in clones from NFS/N mice (not shown). The consensus repeat sequence (see Fig. 3) contains abundant potential sites for O-glycosylation (80 sites) as well as two potential sites for N-glycosylation. Given the high degree of identity in the tandem repeats and the difficulties inherent in sequencing such repetitive elements, we did not attempt to clone and sequence additional sequence within the putative exon 33.

View larger version (115K):
[in this window]
[in a new window]
|
Fig. 3. ClustalW alignment of the conceptually translated sequences for the 3 tandem repeats cloned at the 3' end of Muc19 (repeats 13) with 12 tandem repeats in the current NCBI genomic sequence (NT_039621.2). Alignments were calculated using the identity protein matrix. Consensus sequence is given at the bottom of each alignment panel with potential N- (^) and O-glycosylation (*) sites indicated.
|
|
Based upon our assigned intron/exon boundaries, putative exon 33 is
6.1 kb (not shown) in the current genomic sequence (NT_039621.2). We suspected this exon (or exons) containing tandem repeats is actually substantially larger for four reasons. First, we determined previously that Muc19 transcripts are
24 kb (13), of which only 4.0 kb and 1.1 kb account for the 5' and 3' ends, respectively. Second, in an earlier version of the genomic sequence (assembly build 23), tandem repeat sequences were separated by nearly 14 kb of ambiguous sequence comprising two gaps of 776 bp and 13,159 bp. Because the assembly of short stretches of DNA by shotgun sequencing cannot sort out long stretches of repeating sequences, we reasoned these initial gaps indeed represent additional repeats and that non-repeat intronic sequence is likely absent. Third, we scanned the current genomic sequence (NT_039621.2) containing exon 33 for consensus splice donor and splice acceptor sites (NetGene2, http://www.cbs.dtu.dk/services/NetGene2). There are no predicted splice donors or acceptor sites within the non-tandem repeat sequence of exon 33. Within each tandem repeat are two predicted splice acceptor sites (confidence scores: 0.27 and 0.82) and one splice donor site (confidence score: 0.60). However, the two splice acceptor sites are each out of phase with the splice donor site. Conceptual splicing at each splice acceptor site results in at least one stop codon. Thus it is unlikely that exon 33 of Muc19 represents two or more contiguous exons with splice acceptor and donor sites within the tandem repeat sequence. Finally, unassigned sequence in the NCBI database (NT_061421) contains about 1.2 kb of repeat sequence. We therefore performed Southern analyses to assess the length of repeat sequence and to test whether tandem repeats are within a single exon and of high similarity. Two unique restriction sites were identified immediately prior (BciVI) and following (EarI) tandem repeat sequences within the putative exon 33. Two sites (MboI and BlpI) unique to the consensus repeat-sequence were also identified (see model in Fig. 4A). These restriction sites formed the basis of Southern analyses of mouse genomic DNA using a probe specific to the first 296 nucleotides of repeat sequence. First, exhaustive digestion of genomic DNA with BciVI and EarI resulted in a single reactive band of
18 kb (Fig. 4B, second lane from far right). Similarly, equivalent bands of 18 kb were obtained with DNA isolated from each of four bacterial artificial chromosomes (BACs) containing 15E3 genomic sequence (RPCI234I22, RPCI23413K12, RPCI24241I20, and RPCI24179P16; not shown). We also digested genomic DNA with BciVI and EarI, plus 1 of 11 other restriction enzymes having diverse recognition sites not contained within the consensus repeat sequence (see Fig. 4C). As shown in Fig. 4B, the 18 kb of genomic sequence that hybridized to the tandem repeat probe was resistant to all 11 enzymes. We then treated genomic DNA with BciVI and EarI followed by graded levels of digestion with BlpI, or with MboI. Similar results were obtained for both enzymes. As shown in Fig. 4D for BlpI, partial digestion produced a ladder of DNA fragments reactive to the tandem repeat probe. In Fig. 4E is a plot of estimated fragment sizes in the DNA ladder, demonstrating they are approximate multiples of the predicted 489-bp repeat nucleotide sequence. These results are consistent with genomic sequence encoding tandem repeats that contain; 1) no introns, 2) tandem repeat nucleotide sequences of high identity, and 3)
36 repeats (
18 kb).

View larger version (52K):
[in this window]
[in a new window]
|
Fig. 4. Restriction analysis of the tandem repeat region in mouse Muc19. A: restriction map of exon(s) containing tandem repeats. Note that the first full tandem repeat is preceded by sequence identical to the last 81 bp of a repeat. B: Southern analyses of mouse genomic DNA (300 ng/lane) probed with the first 296 nucleotides of tandem repeat sequence, after exhaustive digestion with the restriction enzymes indicated. MseI #2 and #3 denote results from two additional preparations of genomic DNA. C: listing of the recognition sequence for each enzyme used in B. D: Southern analyses of mouse genomic DNA (1 µg/lane) probed with the first 296 nucleotides of tandem repeat sequence. DNA was initially cleaved with BciVI and EarI followed by decreasing amounts of BlpI. E: plot of estimated fragment sizes in D and resultant linear regression line.
|
|
Mouse Muc19 apoprotein, a model for rat Muc19 and comparison of Muc19 among species.
Based on our results described above, we compiled a cDNA sequence for mouse Muc19 that incorporates our 5' and 3' end sequences from NFS/N mice plus consensus tandem repeat sequence for a total of 36 repeats. Conserved in this cDNA are the variations in the first and last repeat sequences, as discussed above. The cDNA sequence is available in GenBank (accession no. AY570293). Alignment of this Muc19 cDNA (BLASTN) to genomic sequence NT_039621.2 and the assignment of intron/exon boundaries based on flanking splice donor and receptor motifs results in the genomic model given in Table 3. Muc19 cDNA is encoded by 43 exons over a genomic distance of
106.4 kb. All intron/exon boundaries conform to the GT-AG consensus (22).
We compared the conceptually translated cDNA of mouse Muc19 to that for PSM and for available human sequence. Because the rat sublingual gland has served as an important model to study the biochemistry of rodent Muc19 (9, 31) as well as signaling processes regulating its secretion (17), we first derived a model for rat Muc19 for further comparison. A genomic model for putative rat Muc19 cDNA is given in Table 4 and includes 48 exons over a genomic distance of
121 kb. All intron/exon boundaries conform to the GT-AG consensus. Comparison of the sizes of exons and introns as well as their boundaries between mouse (Table 3) and rat (Table 4) suggests a high degree of conservation of Muc19 between the two species. The most notable difference is the addition of five exons (3640) within the 3' end of rat Muc19. These exons are predicted based upon alignment to the 3' end of PSM. Unlike the mouse and porcine mucins, the large exon (exon 33) encodes for at least 14 imperfect repeat sequences. An alignment of these putative repeats is shown in Fig. 5. Not apparent in Table 4 is a gap of about 1 kb within the genomic sequence containing exon 33. This gap results in the truncation of repeat 7 (Fig. 5) and suggests that additional repetitive sequence may be present (see Supplemental Data). Also shown in Fig. 5 is a consensus repeat sequence of 415 residues.

View larger version (118K):
[in this window]
[in a new window]
|
Fig. 5. ClustalW alignment of 14 imperfect tandem repeat sequences in putative exon 33 of our model for rat Muc19 (Table 4). Alignments were calculated using the identity protein matrix. A gap ( 1 kb) within rat genomic sequence (NT_039621.2) results in truncation of repeat 7. Consensus sequence is given at the bottom of each alignment panel.
|
|
In Fig. 6 is a comparison of the conceptually translated sequences from the 3' ends of Muc19 homologs of mouse, rat, porcine, and the recently reported human sequence (8). All four proteins contain a unique Ser/Thr-rich domain immediately following the tandem repeats. This domain contains potential O-glycosylation sites and is longer and more conserved between porcine and human sequences. Highly conserved in all three proteins are two mucin-like cysteine-rich motifs; von Willebrand factor type C and a COOH-terminal cystine knot-like domain. Also conserved are two potential N-glycosylation sites as well as two domains demonstrated in PSM to be involved in protein dimerization via cystine bonds in the endoplasmic reticulum (25). In paired analyses against rat, human, and porcine sequences, the mouse 3' end protein sequence has 58%, 27%, and 25% identities and an additional 6%, 5%, and 6% similarities, respectively.

View larger version (82K):
[in this window]
[in a new window]
|
Fig. 6. ClustalW alignment of conceptually translated sequences from the 3' end of mouse Muc19 with homologs from porcine, human, and our model for rat Muc19 (GenBank accession nos. AY570293, AF005273, and AAP41817 and Supplemental Data, respectively). Alignments were calculated using the identity protein matrix. Indicated are potential N- (^) and O-glycosylation (*) sites in mouse Muc19, a von Willebrand factor type C domain (black bar), a COOH-terminal cystine knot-like domain (gray bar), and two motifs (gray boxes) involved in protein dimerization via cystine ( ) bonds in the endoplasmic reticulum (residues conserved among other gel-forming mucins are underlined). Residues of identity are dark gray; residues of similarity are light gray.
|
|
Shown in Fig. 7 are the alignments of cDNA sequences 5' of the tandem repeats for mouse Muc19, PSM, and our model for rat Muc19. The 5' cDNA sequence of human MUC19 has not yet been determined. All three conceptual proteins contain a signal peptide for exocrine secretion, three von Willebrand factor type D domains and a trypsin inhibitor-like cysteine-rich domain (TIL). A unique Ser/Thr-rich domain immediately prior to tandem repeats contains potential N- and O-glycosylation sites. Also conserved are two CGLCG domains identified in PSM (25). The more 5' cysteine is required for the formation of protein multimers in the trans-Golgi. The more 3' cysteine functions to impede multimer formation at neutral pH. The gray box in Fig. 7 denotes another motif involved in mucin apoprotein multimer formation via cystine bonds in the trans-Golgi (25). The mouse 5' protein sequence has 86% and 66% identities plus 3% and 10% similarities, compared with the rat and porcine sequences, respectively.

View larger version (112K):
[in this window]
[in a new window]
|
Fig. 7. ClustalW alignment of conceptually translated sequences 5' to the tandem repeats of mouse Muc19 to the porcine and rat homologs (GenBank accession nos. AY570293 and AF005273 and Supplemental Data, respectively). Alignments were calculated using the identity protein matrix. Indicated are potential N- (^) and O-glycosylation (*) sites in mouse Muc19, a signal peptide for exocrine secretion (wavy line), von Willebrand factor type D domains (black bars), a trypsin inhibitor-like cystine-rich domain (gray bar), a unique serine/threonine-rich domain (arrowed black line), two CGLCG domains (dark shaded rectangles), and a motif (gray box) involved in protein multimer formation via cystine bonds ( ) in the trans-Golgi (residues conserved among other gel-forming mucins are underlined). Residues of identity are dark gray; residues of similarity are light gray.
|
|
In Fig. 8 is shown alignment of the tandem repeat protein sequence of mouse Muc19 with that for PSM and the consensus repeat sequence predicted for rat Muc19. Although the rat consensus repeat is predicted to be more than twofold larger than in mouse Muc19, much of the mouse sequence is conserved within six domains of similarity in the rat sequence (see Fig. 8). Based upon this alignment, the mouse repeat sequence (163 residues total) has 74 residues of identity (45%) and 44 residues of similarity (27%) with the rat consensus sequence. Also apparent in Fig. 8 is alignment of all PSM repeat sequence within domains 3, 5, and 6. Within this alignment, the 81 residues of PSM have 32% and 35% identity and an additional 26% and 24% similarity to mouse and rat sequence, respectively.

View larger version (31K):
[in this window]
[in a new window]
|
Fig. 8. ClustalW alignment of the consensus tandem repeat protein sequences for mouse Muc19 (see Fig. 3), rat Muc19 (see Fig. 5), and PSM (GenBank accession no. AF005273). Alignments were calculated using the Blosum 30 protein matrix. Mouse sequence is conserved in six numbered and underlined domains of similarity with rat Muc19. Residues of identity are dark gray; residues of similarity are light gray.
|
|
The conceptually translated mouse Muc19 apoprotein containing 36 tandem repeats is composed of 7,524 residues with an estimated mass of 693.4 kDa. In Table 5 are the amino acid compositions predicted for mouse Muc19 as well as those for PSM and rat Muc19. As expected, all three proteins display characteristics of secreted and gel-forming mucins, rich in Ser, Thr, Ala, Gly, and Pro (range 62% to 80%) and with low levels of aromatic residues. Also apparent in Table 5 is the marked similarity in amino acid compositions between that predicted for rat Muc19 to that determined previously for a highly enriched preparation of rat sublingual gland mucin (9).
View this table:
[in this window]
[in a new window]
|
Table 5. Comparison of amino acid compositions of conceptually translated mouse Muc19, PSM, our model for rat Muc19 and isolated RSLGM
|
|
Relationship of mouse Muc19 to Smgc.
Based upon mouse genomic sequence (NT_039621.2), exon 1 of mouse Muc19 is separated from exon 2 by over 28 kb (see Table 3). Exon 1 contains the translational start site and encodes most of the signal peptide. Interestingly, exon 1 appears to be shared by additional transcripts that encode for another protein, SMGC (submandibular gland protein C). Smgc has recently been cloned (32) and is encoded by an additional 17 exons positioned between exon 1 and Muc19 exon 2 (see exon map, Fig. 9). Exons 10, 11, and 12 determined from Smgc cDNA are duplications of exons 7, 8, and 9, but are not within the current genomic sequence (32). The absence of exons 10, 11, and 12 in the mouse genomic database is likely due to limitations in sorting repetitive sequences in the assembly of contigs from short shotgun sequences (32). Given this potential for deletion of repetitive exons from the assembly of genomic sequences, we as well as others (32) hypothesized that exon 1 of Muc19 may have high identity to Smgc exon 1 and is thus not appropriately mapped. We therefore performed restriction enzyme mapping of genomic DNA within the region of Muc19 exons 1 and 2 to test 1) for fragments of the size predicted from the genomic sequence and 2) for a duplicate exon 1 localized between the last exon of Smgc (exon 18) and Muc19 exon 2. Shown in Fig. 10A is a map of the region of Muc19 exons 1 and 2, including Smgc exons 218 and the positions of restriction sites used in our analysis. Results of probing digested DNA with oligonucleotide probes to Muc19 exons 1 and 2 and Smgc exon 18 are shown in Fig. 10B. The three enzymes AvrII, PflFI, and SapI cut 5' to Smgc exon 18 and 3' to Muc19 exon 2, but not in between (see Fig. 10A). Genomic DNA that was cut with each of these three enzymes produced fragments of the predicted sizes. Moreover, as expected for each enzyme, fragments of equivalent size were observed when probed for either Smgc exon 18 or for Muc19 exon 2. Each enzyme was also used in combination with MfeI, which has two sites between Smgc exon 18 and Muc19 exon 2. Again, fragments of the predicted sizes were obtained in all cases when probed for Smgc exon 18 and for Muc19 exon 2. The restriction analysis for this region is thus consistent with the genomic sequence. Probing for exon 1 after digestion of genomic DNA under each of the same six enzymatic conditions also gave results consistent with the genomic sequence. More importantly, no evidence was obtained for a duplication of exon 1 positioned between Smgc exon 18 and Muc19 exon 2, especially after treatment of genomic DNA with each of the three restriction enzymes in combination with MfeI. It should also be noted that the PflFI fragments (±MfeI) containing exon 1 appear to be at least 23 kb, based on their similar mobilities to the AvrII fragment in lane 1. The PflFI fragments are thus
2 kb larger than predicted from the current genomic sequence (Fig. 10B). This difference is likely due, in part, to the duplication of exons 7, 8, and 9 in Smgc, as explained above. In an additional experiment, genomic DNA was cut with five different restriction enzymes (46 bp recognition sites) and probed for exon 1. In each case, we observed a single band that was of the predicted size based upon the genomic sequence (Fig. 10C).

View larger version (6K):
[in this window]
[in a new window]
|
Fig. 9. Exon map of mouse Muc19/Smgc gene within genomic sequence (NT_039621.2). In sublingual glands, exon 1 is shared by transcripts encoding submandibular gland protein C (Smgc), a small splice variant of Smgc (truncated-Smgc, t-Smgc), and Muc19. Smgc is encoded by an additional 17 exons, exons 218. Exons 10, 11, and 12 of Smgc (line with arrows) are not within the current genomic sequence but are duplications of exons 7, 8, and 9 (see Ref. 32). t-Smgc is encoded only by exons 1, 17, and 18. Muc19 is encoded by exons 1 and 1960.
|
|

View larger version (40K):
[in this window]
[in a new window]
|
Fig. 10. Restriction enzyme mapping of the genomic region encompassing Muc19 exons 1 and 2. A: restriction map of genomic region containing Muc19 exons 1 and 2, plus Smgc exons 218. Enzyme recognition sites are as follows: A, AvrII; P, PflFI; S, SapI; M, MfeI. B: Southern blots after cutting genomic DNA with enzymes, as indicated, and hybridization to 95100 nucleotide probes specific for exon 1, Smgc exon 18, or Muc19 exon 2. Fragment sizes predicted from genomic sequence is given below each lane. C: Southern blot after cutting genomic DNA with enzymes, as indicated, and hybridization to a 95-nucleotide probe specific for exon 1.
|
|
We demonstrated previously that Muc19 expression is abundant in sublingual glands compared with submandibular glands and that expression levels are greater in sublingual glands from adults vs. 3-day-old mice (13). On the other hand, expression of Smgc transcripts in mice has only been studied in submandibular glands (32). Given that these two transcripts represent splice variants of the same gene, we compared their expression in each of the three major salivary glands of adult and 3-day-old male mice by Northern blot analysis. As expected, Muc19 transcripts are abundant in sublingual glands (Fig. 11A). Expression of Muc19 in the other two major salivary glands is barely detectable only in 3-day-old parotid glands. Full-length Smgc transcripts (
3 kb) are detected in submandibular glands of 3-day-old males but not adults (Fig. 11B), consistent with the results of Zinzen et al. (32). Interestingly, low levels of Smgc transcripts are observed in 3-day-old male sublingual glands. Smgc expression in adult submandibular glands is restricted to females and only in a small subset of granular intercalated duct cells (32). Furthermore, they described the cloning of Smgc splice variants from submandibular glands that were undetectable by Northern analysis (32). We therefore further probed adult sublingual glands for Smgc transcripts by RT-PCR, using primers to exon 1 and exon 18. Two bands of 590 and 820 bp were amplified from adult RNA of both sexes (not shown). Subsequent cloning and sequencing of both RT-PCR products determined that the shorter band represents a small splice variant of Smgc encoded only by exons 1, 17, and 18 (termed truncated, or t-Smgc). The larger band only contained Muc19 sequence downstream of exon 1. Further analysis suggested this larger product resulted from mis-priming by the reverse primer within exon 8 of Muc19 (7 of 8 nucleotides at the 3' end of the primer align to this site). Clones derived from this larger band derived from Muc19 are thus represented in Fig. 2 and Table 1 (primers F4 and B22). We further compared expression of Smgc transcripts in sublingual glands of adult and 3-day-old mice of both sexes by RT-PCR, but incorporated the more specific primers to exons 1 and 18 described by Zinzen et al. (32). As shown in Fig. 11C, adult glands express only an 800-bp product corresponding to t-Smgc. At 3 days of age, both t-Smgc and full-length Smgc (2.6 kb band) are expressed. Faint bands between 0.8 and 2.4 kb likely represent other splice variants of Smgc, similar to those found in neonatal submandibular glands (32).

View larger version (89K):
[in this window]
[in a new window]
|
Fig. 11. Muc19 and Smgc expression in the three major salivary glands, sublingual (SLG), submandibular (SMG), and parotid (Par). A: Northern blot of RNA samples from male mice hybridized with a probe to the first 296 nucleotides of consensus Muc19 tandem repeat sequence. B: duplicate samples as in A, but hybridized with a 100 nucleotide probe complementary to the initial 3'-untranslated region of Smgc in exon 18. Both blots are of 1% agarose gels, 10 µg RNA/lane, stripped and probed for 18S ribosomal RNA (bottom). C: RT-PCR for Smgc incorporating primers to exons 1 and 18 as described by Zinzen et al. (32). In both cases, transcribed RNA contained equal proportions from male and female glands. All results are representative of two separate experiments.
|
|
 |
DISCUSSION
|
---|
PSM is one of the first large gel-forming mucins to be sequenced (11); yet its human and rodent homologs have only recently been discovered. We cloned the 3' end of mouse Muc19 (GenBank accession no. AY172172) and reported that mice harboring the autosomal recessive mutation, sld, display aberrant expression of transcripts for sublingual apomucin, Muc19 (13). Chen and coworkers (8) also cloned the 3' end of mouse Muc19 as well as human MUC19. In the current study, in situ hybridization results confirm the localization of mouse Muc19 to 15E3, as predicted in the NCBI genomic sequence. The deduced amino acid compositions of Muc19 of mouse, pig, and rat are very similar except for differences due mainly to variations in their tandem repeats. Comparisons of the deduced protein sequences of Muc19 homologs demonstrate a high degree of conservation in sequences 5' and 3' to the tandem repeats, especially within domains acknowledged as hallmarks for the large gel-forming mucins (von Willebrand factor type C and D domains and a COOH-terminal cystine knot-like domain) (12) and a TIL domain. The combination of a low number of SNPs observed between mouse strains combined with the absence of detectable splice variants of Muc19 also manifests conservation of sequences 5' and 3' to the tandem repeats. Such conservation suggests these domains play important roles in the protective functions of mucins, although none has yet been determined. It is tempting to speculate that the TIL domain may impart protection of Muc19 mucins from constituent proteases in saliva. Also conserved are small cysteine-containing domains shown previously to function in the intracellular processing and multimerization of PSM (25). Less conserved in both structure and length are the Ser/Thr-rich non-repeat domains bordering the central tandem repeats. These unique domains contain multiple potential sites of O-glycosylation and are of greater length within the 3' sequence of human and porcine mucins compared with rodents. Based on our model for rat Muc19, the 3' Ser/Thr-rich non-repeat domain is larger in the rat than mouse homolog and accounts for the five additional exons in this coding region for the rat. It is unknown whether these domains are indeed glycosylated and if they function either in the intracellular processing of Muc19 mucins or in their extracellular protective roles.
Based on a combination of cloning, Southern analyses, and examination of the current genomic sequence, we provide evidence for a single exon within mouse Muc19 that contains
36 near-identical tandem repeats of 463 bp. Much of this sequence is absent in the current mouse genomic sequence and should be accounted for in subsequent updates. The length of tandem repeats as well as the size of the repeat-containing exon of mouse Muc19 represents the largest of the mouse gel-forming mucins, Muc2, Muc5ac, Muc5b, and Muc6 (12). On the other hand, the exon encoding the 243-bp identical repeats of porcine Muc19 is nearly twice as large, an estimated 33 to 34 kb (12). As more mucin genes have been elucidated, it is apparent that the Ser/Thr-rich repeat sequences are not conserved between species (12). Our comparisons between the mouse, rat, and pig repeat protein sequences are consistent with this observation. In putative rat Muc19, the repeat sequences are both imperfect in length and sequence. There are many examples of mucins with imperfect repeats (12). For example, mouse Muc6 contains repeats of 116173 nucleotides (12). Interestingly, upon closer scrutiny of the Ser/Thr-rich repeat sequences of mouse, rat, and porcine Muc19, we find the apparent conservation of subsequences. We speculate these subsequences represent conserved functional domains, possibly to help program for defined posttranslational modifications (e.g., specific oligosaccharide structures).
A surprising result of cloning the NH2 terminus of Muc19 by RLM-RACE and Southern analyses is that transcripts for Smgc and Muc19 appear to be splice variants of the same gene (Muc19/Smgc) and share exon 1. Splice variants of Smgc transcripts are readily detected by RT-PCR in sublingual and submandibular glands (32) but are undetectable by Northern blot analysis, suggesting they are present in low copy numbers and likely not translated. We find only a unique and highly truncated Smgc splice variant (t-Smgc) in adult sublingual glands of both sexes, indicating the absence of a sex-specific regulation of Smgc observed in adult submandibular glands (32). In neonatal sublingual glands, full-length Smgc transcripts are present, although at much lower levels than in submandibular glands. In rats, SMGC protein expression in sublingual glands at 5 days of age was previously shown to be barely detectable, although mucous cell granules occasionally displayed anti-SMGC immunoreactivity (2). We therefore speculate that SMGC, a protein of unknown function, is expressed during the initial exocrine cytodifferentiation of cells in sublingual glands predestined to the mucous cell phenotype.
As described in the introduction, collective evidence indicates Muc19 is the only large gel-forming mucin expressed by the major salivary glands of rodents. The faint detection of Muc19 transcripts in neonatal parotid glands is likely the result of the few scattered mucous cells found in neonatal glands (2). We did not detect Muc19 in submandibular glands by Northern blot analysis, reflecting the specific expression of Muc10 by the homogeneous seromucous acinar cells in rodent glands (10, 26). Although rarely observed, acini containing a few mucous cells reactive to anti-sublingual mucin antibody are present in rat submandibular glands (18). Such cells may explain the detection of Muc19 by the more sensitive RT-PCR technique in mouse submandibular glands by Chen et al. (8).
Given the different tissue, sex, and developmental expression patterns of Muc19 and Smgc in the major salivary glands of rodents, the gene encoding both of these transcripts must be under the control of complex regulatory mechanisms. As one possible mechanism, a cellular decision to express SMGC will involve a net interaction between splicing enhancers and silencers by trans-acting protein factors that result in retention of exons 118 encoding SMGC (7, 19). With inclusion of exon 18, the resultant spliced RNA will contain the polyadenylation signal of Smgc, thus leading to mRNA cleavage and subsequent polyadenylation at this site. The cleaved 3' portion of the primary transcript will be retained in the nucleus and destroyed. Alternatively, the balance of splicing enhancers and silencers may cause removal of the mutually exclusive Smgc exons 218. The polyadenylation signal of Smgc is therefore removed and exons 1960 of the Muc19/Smcg pre-mRNA (i.e., Muc19 exons 243) will be retained and predicted to be relatively stable and available for translation. If mRNAs were spliced to retain exon 18 in addition to exons 1960, then the translation stop codon within exon 18 would be seen as premature (in terms of the location of the mRNA end) and induce nonsense codon (STOP codon) mediated mRNA decay (19). The simultaneous synthesis of translatable mRNAs for SMGC and Muc19 would thus be mutually exclusive and predicted not to occur in the same cell. This scenario is consistent with our hypothesis in which SMGC is expressed initially but transiently in the cytodifferentiation process of the mucous cell phenotype, followed by the expression of Muc19.
 |
GRANTS
|
---|
This study was supported by National Institutes of Health Grant DE-14730 and by Fellowship HL-07126 to M. A. Fallon.
 |
ACKNOWLEDGMENTS
|
---|
We thank Doug Weston for excellent technical assistance. We also thank Dr. Lily Mirels and Dr. Harold Smith for valuable suggestions.
 |
FOOTNOTES
|
---|
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: D. J. Culp, Center for Oral Biology, 601 Elmwood Ave., Box 611, Rochester, NY 14642-8611 (E-mail: david_culp{at}urmc.rochester.edu).
1 The Supplemental Material for this article is available online at http://physiolgenomics.physiology.org/cgi/content/full/00161.2004/DC1. 
 |
REFERENCES
|
---|
- Altschul SF and Gish W. Local alignment statistics. Methods Enzymol 266: 460480. 1996.[ISI][Medline]
- Ball WD, Hand AR, Moreira JE, and Johnson AO. A secretory protein restricted to type I cells in neonatal rat submandibular glands. Dev Biol 129: 464475, 1988.[CrossRef][ISI][Medline]
- Bekhor I, Wen Y, Shi S, Hsieh CH, Denny PA, and Denny PC. cDNA cloning, sequencing and in situ localization of a transcript specific to both sublingual demilune cells and parotid intercalated duct cells in mouse salivary glands. Arch Oral Biol 39: 10111022, 1994.[ISI][Medline]
- Bhargava AK, Woitach JT, Davidson EA, and Bhavanandan VP. Cloning and cDNA sequence of a bovine submaxillary gland mucin-like protein containing two distinct domains. Proc Natl Acad Sci USA 87: 67986802, 1990.[Abstract]
- Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, and Brunak S. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4: 16331649, 2004.[CrossRef][ISI][Medline]
- Brunak S, Engelbrecht J, and Knudsen S. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220: 4965, 1991.[ISI][Medline]
- Cartegni L, Chew SL, and Krainer AR. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 3: 285298, 2002.[CrossRef][ISI][Medline]
- Chen Y, Zhao YH, Kalaslavadi TB, Hamati E, Nehrke K, Le AD, Ann DK, and Wu R. Genome-wide search and identification of a novel gel-forming mucin MUC19/Muc19 in glandular tissues. Am J Respir Cell Mol Biol 30: 155165, 2004.[Abstract/Free Full Text]
- Culp DJ, Graham LA, Latchney LR, and Hand AR. Rat sublingual gland as a model to study glandular mucous cell secretion. Am J Physiol Cell Physiol 260: C1233C1244, 1991.[Abstract/Free Full Text]
- Denny PC, Mirels L, and Denny PA. Mouse submandibular gland salivary apomucin contains repeated N-glycosylation sites. Glycobiology 6: 4350, 1996.[Abstract]
- Eckhardt AE, Timpte CS, DeLuca AW, and Hill RL. The complete cDNA sequence and structural polymorphism of the polypeptide chain of porcine submaxillary mucin. J Biol Chem 272: 3320433210, 1997.[Abstract/Free Full Text]
- Escande F, Porchet N, Bernigaud A, Petitprez D, Aubert JP, and Buisine MP. The mouse secreted gel-forming mucin gene cluster. Biochim Biophys Acta 1676: 240250, 2004.[ISI][Medline]
- Fallon MA, Latchney LR, Hand AR, Johar A, Denny PA, Georgel PT, Denny PC, and Culp DJ. The sld mutation is specific for sublingual salivary mucous cells and disrupts apomucin gene expression. Physiol Genomics 14: 95106, 2003; doi:10.1152/physiolgenomics.00151.2002.[Abstract/Free Full Text]
- Gupta R, Birch H, Rapacki K, Brunak S, and Hansen JE. O-GLYCBASE. version 4.0: a revised database of O-glycosylated proteins. Nucleic Acids Res 27: 370372, 1999.[Abstract/Free Full Text]
- Kozak M. An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res 15: 81258148, 1987.[Abstract]
- Liu B, Offner GD, Nunes DP, Oppenheim FG, and Troxler RF. MUC4 is a major component of salivary mucin MG1 secreted by human submandibular gland. Biochem Biophys Res Commun 250: 757761, 1998.[CrossRef][ISI][Medline]
- Luo W, Latchney LR, and Culp DJ. G-protein coupling to M1 and M3 muscarinic receptors in sublingual glands. Am J Physiol Cell Physiol 280: C884C896, 2001.[Abstract/Free Full Text]
- Man YG, Ball WD, Culp DJ, Hand AR, and Moreira JE. Persistence of a perinatal cellular phenotype in submandibular glands of adult rat. J Histochem Cytochem 43: 12031215, 1995.[Abstract/Free Full Text]
- Maquat LE. Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat Rev Mol Cell Biol 5: 8999, 2004.[CrossRef][ISI][Medline]
- Marchler-Bauer A, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, Liebert CA, Liu C, Madej T, Marchler GH, Mazumder R, Nikolskaya AN, Panchenko AR, Rao BS, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Vasudevan S, Wang Y, Yamashita RA, Yin JJ, and Bryant SH. CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res 31: 383387., 2003.[Abstract/Free Full Text]
- Moniaux N, Escande F, Batra SK, Porchet N, Laine A, and Aubert JP. Alternative splicing generates a family of putative secreted and membrane-associated MUC4 mucins. Eur J Biochem 267: 45364544, 2000.[Abstract/Free Full Text]
- Mount SM. A catalogue of splice junction sequences. Nucleic Acids Res 10: 459472, 1982.[Abstract]
- Nielsen H, Engelbrecht J, Brunak S, and von Heijne G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10: 16, 1997.[CrossRef][ISI]
- Nielsen PA, Bennett EP, Wandall HH, Therkildsen MH, Hannibal J, and Clausen H. Identification of a major human high molecular weight salivary mucin (MG1) as tracheobronchial mucin MUC5B. Glycobiology 7: 413419, 1997.[Abstract]
- Perez-Vilar J and Hill RL. Identification of the half-cystine residues in porcine submaxillary mucin critical for multimerization through the D-domains. Roles of the CGLCG motif in the D1- and D3-domains. J Biol Chem 273: 3452734534, 1998.[Abstract/Free Full Text]
- Pinkstaff CA. The cytology of salivary glands. Int Rev Cytol 63: 141261, 1980.
- Sambrook J and Russell DW. Molecular Cloning: A Laboratory Manuel. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press, 2001.
- Tabak LA. In defense of the oral cavity: structure, biosynthesis, and function of salivary mucins. Annu Rev Physiol 57: 547564, 1995.[CrossRef][ISI][Medline]
- Taudien S, Rump A, Platzer M, Drescher B, Schattevoy R, Gloeckner G, Dette M, Baumgart C, Weber J, Menzel U, and Rosenthal A. RUMMAGE: a high-throughput sequence annotation system. Trends Genet 16: 519520, 2000.[CrossRef][ISI][Medline]
- Troxler RF, Iontcheva I, Oppenheim FG, Nunes DP, and Offner GD. Molecular characterization of a major high molecular weight mucin from human sublingual gland. Glycobiology 7: 965973, 1997.[Abstract]
- Watson GE, Latchney LR, Luo W, Hand AR, and Culp DJ. Biochemical and immunological studies and assay of rat sublingual mucins. Arch Oral Biol 42: 161172, 1997.[CrossRef][ISI][Medline]
- Zinzen KM, Hand AR, Yankova M, Ball WD, and Mirels L. Molecular cloning and characterization of the neonatal rat and mouse submandibular gland protein SMGC. Gene 334: 2333, 2004.[CrossRef][ISI][Medline]
Copyright © 2004 by the American Physiological Society.