Cloning of the Amino-terminal and 5'-Flanking Region of the Human MUC5AC Mucin Gene and Transcriptional Up-regulation by Bacterial Exoproducts*

Daizong LiDagger , Marianne GallupDagger , Nancy Fan§, David E. Szymkowski§, and Carol B. BasbaumDagger

From the Dagger  Department of Anatomy and Cardiovascular Research Institute, University of California, San Francisco, California 94143 and § Roche Bioscience, Palo Alto, California 94304

    ABSTRACT
Top
Abstract
Introduction
Materials & Methods
Results
Discussion
References

To obtain gene regulatory sequence for the mucin gene MUC5AC, we have isolated the MUC5AC amino terminus cDNA and 5'-flanking region. This was possible through the use of rapid amplification of cDNA ends-polymerase chain reaction (RACE-PCR) in which the 5' sequence of the human gastric mucin cDNA HGM-1 (1) was used to design the first MUC5AC-specific primer. Primers for subsequent rounds of RACE were designed from the 5'-ends of amplified RACE products. After five rounds of RACE-PCR, we could no longer generate upstream extensions of the cDNA and hypothesized that we had reached the 5'-end. Primer extension and RNase protection analysis confirmed this. Combined nucleotide sequence for the RACE-PCR products was 3.3 kb with an open reading frame encoding 1100 amino acids. A putative translation start site was found at nucleotide +48. This was followed by a 45 nucleotide putative signal sequence. This amino-terminal sequence contains no tandem repeats but is >60% similar to the amino-terminal nucleotide sequence of MUC2. The positions of cysteine residues in this MUC2-similar region are almost 100% conserved between the two genes. Northern analysis showed expression of cognate RNA in the stomach and airway but not muscle and esophagus. This pattern was the same as that obtained using previously reported 3'-MUC5AC sequences. We have cloned approximately 4 kb of genomic DNA upstream of the transcription start site and have sequenced 1366 nucleotides containing a TATA box, a CACCC box, and putative binding sites for NFkappa B and Sp 1. Within 4 kb of the transcription start site are elements mediating transcriptional up-regulation in response to bacterial exoproducts.

    INTRODUCTION
Top
Abstract
Introduction
Materials & Methods
Results
Discussion
References

Mucin is a glycoprotein secreted from epithelial cells at many body surfaces. In the airways, mucin interacts with cilia to trap and clear pathogens and irritants. This mucociliary mechanism is impaired when mucin is produced excessively as in cystic fibrosis, chronic bronchitis, and asthma. Mucociliary impairment leads to airway mucus plugging, which promotes chronic infection, airflow obstruction, and sometimes death.

Nine mucin genes are known to be expressed in man: MUC1-4, MUC5AC, MUC5B, MUC6-8 (2-12). The mRNAs encoding two of them, MUC2 and MUC5AC, have been shown to be up-regulated in cystic fibrosis airways (13, 14)1 and likely contribute to the airway mucus plugging characteristic of this disease. Insofar as DNA-RNA transcription is controlled by mechanisms amenable to pharmaceutical intervention, an understanding of mucin transcription may suggest ways of inhibiting mucin overproduction.

Both MUC2 and MUC5AC map to chromosome 11p15.5 and may have arisen from a common ancestral gene. The structure of MUC2 is known. Its central region, comprising >50% of the polypeptide, contains two tandem repeat sequences rich in threonine, serine, and proline (4, 17); this is flanked up- and downstream by cysteine-rich regions (17, 18). The threonine and serine residues represent O-glycosylation sites, whereas the cysteine residues are thought to mediate intermolecular interactions underlying mucus gel formation. The isolation of the amino terminus of the MUC2 cDNA by anchor PCR2 provided sequence for probing a genomic library to obtain the 5'-flanking sequence (17). Using portions of this sequence in luciferase vectors, we identified DNA elements controlling the MUC2 response to the common cystic fibrosis pathogen Pseudomonas aeruginosa (14).

Much less information is available regarding MUC5AC. Understanding the transcriptional control of this gene will require isolation of the amino terminus and 5'-flanking region. To date, MUC5AC amino-terminal cDNAs have not been reported. Although PCR-based techniques can in principle extend existing cDNA fragments over long distances, the large size of the MUC5AC mRNA (10-12 kb) (8, 9), and the potential presence of a central repetitive region present obstacles to extending the existing cDNA sequences to the 5'-end.

A significant aid in this regard was provided by publication of the sequence of cDNA HGM-1 cloned from the human stomach (1). This cDNA likely derives from MUC5AC as nucleotides 1942-2281 are 99% similar to the MUC5AC clone JUL 32 (19) and nucleotides 2190-2541 are 92% similar to the 5'-end of MUC5AC clone NP3a (1). As noted by Klomp et al. (1), the ~8% discrepancy between HGM-1 and NP3a suggests that portions of HGM-1 are repeated twice in MUC5AC. HGM-1's similarity to NP3a would place one HGM-1-like sequence near the 3'-end since NP3a contains a polyadenylation signal; its ~60% similarity to the MUC2 D3-domain (1) would place another HGM-1-like sequence near the 5'-end since the MUC2 D3 domain is within 3 kb of the MUC2 transcription start site (17). Hypothesizing that HGM-1 itself is present near the MUC5AC 5'-end, we used an HGM-1 sequence as the first gene-specific primer in repetitive 5'-RACE-PCR reactions. This approach ultimately permitted amplification of a 3.3-kb upstream extension of HGM-1, which we call MUC5AC-5'-RACE product (MUC5AC-5'RP). Primer extension, RNase protection assays, and the presence of a translation start site and putative signal sequence indicate that this sequence is at the gene's 5'-end. Genomic DNA immediately upstream of MUC5AC-5'RP has the structural properties of a promoter and contains elements mediating transcriptional up-regulation in response to bacterial exoproducts. We conclude that our cloned sequences are the amino-terminal and 5'-flanking region of MUC5AC. The availability of these sequences should aid identification of the elements controlling MUC5AC overexpression in disease.

    MATERIALS AND METHODS
Top
Abstract
Introduction
Materials & Methods
Results
Discussion
References

Cell Culture-- The human lung epithelial carcinoma cell line NCIH292 was grown in RPMI 1640 medium supplemented with 10% heat-inactivated fetal calf serum (Life Technologies, Inc.). The human colon carcinoma line HM3 (4) was grown in Dulbecco's modified Eagle's medium with high glucose and 10% fetal calf serum. In some experiments, cells were exposed for 6 or 24 h to P. aeruginosa.

Bacterial Culture and Preparation of Cell-free Supernatants-- P. aeruginosa strain PAO1 was grown in M9 buffer (20) for 72 h at 37 °C (to late log phase). Cell-free supernatant was obtained by centrifugation at 10,000 rpm for 60 min at 4 °C and by filtration through a 0.22-µm filter (Corning). Supernatant was aliquoted and stored at -80 °C until used.

Exposure of Tissues and Cells to Bacterial Cell-free Filtrates-- To look at the effects of P. aeruginosa on MUC5AC steady state mRNA, incubation was as described (21). Briefly, cells were washed twice with phosphate-buffered saline at 37 °C. Samples were then incubated with bacterial supernatant or buffer (M9) diluted 1:4 with mammalian cell culture medium for 6 h. Total RNA was obtained from pelleted cells scraped from the culture dish (22). Lactate dehydrogenase release was measured (LDH 320, Sigma) to detect any cell lysis.

cDNA Synthesis and 5'-RACE-PCR-- Sources known to contain abundant MUC5AC mRNA (P. aeruginosa-exposed NCIH292 cells or human stomach) were subjected to RNA extraction (22). Total RNA (3 µg) was used to generate double-stranded cDNA using the Marathon cDNA Amplification kit (CLONTECH). The double-stranded cDNA was ligated with the Marathon cDNA adaptor and purified on a chromaspin +TE-1000 column (CLONTECH) in a total volume of 100 µl. 5'-RACE was performed using the double-stranded cDNA as template with one HGM-1 gene-specific primer (Gm1) and the adaptor primer AP1 or AP2. Additional gene-specific primers (Gm5, Gm9, Gm9G, and Gm9H) were generated based on the sequences of progressively amplified 5'-RACE products.

Northern Blot Analysis of Tissue Distribution of MUC5AC mRNA, Results Using MUC5AC-5'RP and NP3a Probes-- Total RNA was extracted from human tissues according to previously described methods (22). RNA samples (20 µg) were separated on 1.0% agarose gels containing 2.2 M formaldehyde and then transferred to a positively charged nylon membrane (Gene Screen, NEN Life Science Products). cDNA probes were labeled with [alpha -32P]dCTP using a Life Technologies, Inc. random primer labeling kit. For the MUC5AC 3'-end, a cDNA fragment was amplified from tissue mRNA using primers NP3a3' and NP3a5'. The insert for the probe was gel-purified from a construct made by TA cloning the PCR product into pCRII vector (Invitrogen). For the new sequence MUC5AC-5'RP, probes were made from amplified fragments using primers TER and GM9. Labeled probe was added to 10 ml of hybridization buffer containing 50% formamide, 10% dextran sulfate, 0.2% Denhardt's, 50 mM TRIS-HCl, pH 7.5, 1 M NaCl, and 0.1% sodium pyrophosphate to give a concentration of 2-5 × 106 cpm/ml. Membrane hybridization and washing were performed using conditions described previously (23).

Primers-- Primers used for 5'-RACE, construction of Northern blot probes, genomic library screening and DNA walking, primer extension, and RNase protection assays are shown in Table I.

DNA Cloning and Sequencing-- After RACE-PCR, amplified fragments were purified by low-melting point agarose gel electrophoresis, cut with appropriate restriction enzymes and cloned into pBluescript II SK(-) (Stratagene) or sequenced directly. Escherichia coli (SURE strain, Stratagene) was transformed with plasmids containing these fragments. Transformants were grown at 37 °C or 30 °C. Both sense and antisense strands were sequenced. Sequencing reactions were carried out using SequiTherm Long-Read cycle sequencing kits (Epicentre Technologies) and Thermo Sequenase fluorescent labeled primer cycle sequencing kits (Amersham Pharmacia Biotech) with the IRD41 (Li-cor) labeled primers. Sequence data were assembled by Lasergene software (DNAstar). Homology and transcription factor binding site searches were performed using MatInspector release 2.1 and Transcription Element Search Software (TESS, University of Pennsylvania) and MacVector software (IBI).

Chromosome Localization of PCR-amplified DNA Fragments-- Two mouse/human hybrid cell line DNA panels were purchased from Bios. Cell line 1049 contained human chromosomes 5 and 11. Cell line 1079 contained human chromosomes 2 and 5. DNA from each cell line was used as a PCR template with RACE product primers to determine the chromosomal location of RACE products.

Primer Extension Analysis of Transcription Start Site-- When progressive 5'-RACE reactions could no longer amplify additional sequence from either the stomach tissue or airway cell cDNA templates, we performed primer extension using primer Gm9H (approximately 100 bp from the putative 5'-end of the mRNA) to confirm that we had reached the transcription start site. Primer extension was done using the Promega avian myeloblastosis virus reverse transcriptase primer extension system. Briefly, 0.1 pmol of 32P-end labeled primer Gm9H was incubated with 5 µl (40-50 µg) total RNA from tissue or cells and 5 µl of 2 × PE buffer at 58 °C for 20 min. After cooling to room temperature, 9 µl of a master mix containing 2 × PE buffer, 6.25 mM sodium pyrophosphate and 1 µl of avian myeloblastosis virus reverse transcriptase was added to each sample. After 30 min of incubation at 42 °C, the samples were diluted with 20 µl of loading dye, denatured by heating for 10 min at 90 °C, and run on a 6% acrylamide, 7 M urea, TBE gel, along with sequencing ladder and size markers.

RNase Protection Analysis of Transcription Start Site-- To confirm transcription start site location as determined by RACE-PCR and primer extension assays, we performed RNase protection assays. The labeled RNA probe required for this assay was generated from a PCR product designed to incorporate the T7 promoter. This PCR fragment was amplified from a 12-kb genomic clone (7"A) derived from screening a human genomic library in the Lambda FIX II vector (Stratagene) and was known from sequencing data to contain the putative exon I of MUC5AC. The library was screened with a probe generated from PCR of a 5'-RACE product with primers GM9 and GM2.6 using methods described in Ref. 23. The primers used to generate the RNA probe template from the genomic clone were RPA-T7 containing sequence from exon I and the T7 promoter and primer RPA-5' containing upstream genomic sequence (see Table I). This enabled us to generate high specific activity [32P]UTP-labeled RNA probes using RNA polymerase. For RPA analysis of MUC5AC mRNA levels in cells exposed to P. aeruginosa, primers NP3a5' and NP3a3' were used to PCR-amplify a 294-bp fragment that was then cloned into pCRII vector (Invitrogen). To monitor amounts of RNA used in each reaction, we used p-TRI-cyclophilin or p-TRI GAPDH vectors (Ambion) to generate antisense RNA probes. For the assay, total RNA was hybridized with 5 × 105 cpm of probe overnight at 42 °C. The RNA:RNA template was digested for 15 min at room temperature with 0.5 units of RNase A and 20 units of RNase T1, precipitated and run on a 6% polyacrylamide/urea-sequencing gel with a sequencing ladder for size determination.

                              
View this table:
[in this window]
[in a new window]
 
Table I
Primers used for various applications as described in the text

5'-Genomic DNA Walking-- Genomic DNA was amplified from DNA provided in the human PromoterFinderTM DNA walking kit (CLONTECH) according to instructions provided by the manufacturer. For long sequence amplifications, we used the LA PCR kit (TaKaRa) and high fidelity expand PCR kit (Boehringer Mannheim) using primers GM9H5' and adaptor primers AP1 and AP2.

Construction of a Cell Line Stably Transfected with the MUC5AC 5'-Flanking Region and Determination of Luciferase Activity After Treatment with P. aeruginosa Exoproducts-- A DNA fragment extending from -4.0 kb to +68 bp was cloned into the MluI/SmaI site of pGL3 basic vector (Promega). This construct, referred to as M4-2, was co-transfected with pcDNA3 into the epithelial cell line HM3. G418-selected colonies were pooled, expanded, and used in luciferase reporter assays. Stably transfected HM3 cells were seeded at 105 cells/well in 96-well tissue culture plates (Dynatech) in Dulbecco's modified Eagle's medium with high glucose, 10% fetal bovine serum and 200 µg/ml G418 (Life Technologies, Inc.). Six days later (1 day post-confluence), cells were exposed for 24 h to P. aeruginosa supernatant diluted at 5, 25, or 50% into culture medium. Cells were washed once with phosphate-buffered saline and stored frozen at -80 °C. After thawing, cells were assayed for luciferase activity using LucLite reagent (Packard) and a TopCount luminometer (Packard).

    RESULTS
Top
Abstract
Introduction
Materials & Methods
Results
Discussion
References

RACE-PCR, cDNA Cloning and Sequence Determination of the MUC5AC Amino Terminus-- Based on the 99% sequence identity between MUC5AC clone JUL 32 (19) and HGM-1 nucleotides 1947-2278 (1), we hypothesized that HGM-1 is a part of MUC5AC. Based on >60% similarity between HGM-1 and the amino-terminal cysteine-rich domain of MUC2 (D-domain 3), we hypothesized that HGM-1 is an amino-terminal sequence. This led us to initiate 5'-RACE-PCR experiments aimed at extending HGM-1 to the MUC5AC transcription start site (Fig. 1).


View larger version (13K):
[in this window]
[in a new window]
 
Fig. 1.   RACE-PCR cloning and sequencing strategy for MUC5AC-5'RP. In panel A, the relative positions of the five primers used for RACE are shown. Their relationship to the transcription start site is indicated by the numbers above them. The hatched box represents the overlap homology between MUC5AC-5'RP and HGM-1. In panel B, the arrows represent the length and direction of individual sequencing reactions. The sizes in bp of the five 5'-RACE-PCR products are indicated above.

In the first round of 5'-RACE-PCR, we used an HGM-1-specific primer (GM1) and an adaptor primer (AP1). This yielded a 900-bp PCR fragment. Sequence data showed that this fragment was the 5' extension of human gastric mucin (HGM) (1) and was >65% similar to the MUC2 D-domain 3 just 5' to the central repeat region. Primer GM5 was designed based on the 5'-end of this fragment and was used in a second round of 5'-RACE-PCR. This generated an 1100-bp PCR fragment whose 5'-end was used to design primer GM9. When used in a third round of 5'-RACE-PCR, GM9 generated a 700-bp fragment. Primer GM9G was designed based on the 5'-end of this fragment and was used in a fourth round of RACE-PCR to generate a 600-bp fragment. Primer Gm9H, 103-bp downstream of the 5'-end of the fourth round RACE-PCR product, was used in a fifth round of RACE-PCR and generated a 110-bp product. Repeated efforts to generate larger products with primer GM9H from both gastric tissue and NCIH292 (airway) cell cDNA yielded PCR products with identical sequence that were ~100 bp in length. This suggested that GM9H was approximately 100 bp from the 5'-end of the mRNA as processed in both gastric tissue and NCIH292 cells.

The overall cDNA sequence obtained by 5'-RACE is about 3.3-kb (Fig. 2). There is an open reading frame of 3300 nucleotides, 290 of which directly overlap and are in frame with those encoding human gastric mucin. At +48 is an ATG codon embedded in a Kozak consensus sequence (24). This is a putative translation start site. Following this is a putative secretory protein signal sequence. The entire open reading frame encodes 1100 amino acids. The nucleotide sequence is approximately 65% similar to the MUC2 amino-terminal sequence (1-3500, Fig. 3A). No tandem repeat sequence is present, but there are three cysteine-rich domains (D1-D3) in which the cysteine positions correspond almost exactly to those previously described for the amino terminus of human MUC2 (Fig. 3B).


View larger version (73K):
[in this window]
[in a new window]
 
Fig. 2.   Nucleotide and deduced amino acid sequence of MUC5AC-5'RP. The positions of nucleotides are indicated by the numbers on the left and amino acids on the right. The hydrophobic putative secretory protein signal sequence is underlined.


View larger version (49K):
[in this window]
[in a new window]
 
Fig. 3.   Similarity matrix of cDNA and alignment of protein for MUC5AC-5'RP and MUC2. A, the cDNAs were compared using the MacVector similarity matrix routine with a cutoff of 65% similarity. Lines indicate regions of similarity. B, alignment of amino acids was done using MacVector alignment program. Solid lines represent identity and double dots (:) indicate conservative substitutions. Conserved cysteine residues are indicated by asterisks (*) and the D-domains are indicated by bent lines.

Northern Blot Analysis of Tissue Distribution of RNA Corresponding to Newly Cloned Sequence-- Our interpretation that the 5' extension of HGM-1 (MUC5AC-5'RP) is at the 5'-end of MUC5AC rests primarily on the 99% similarity between a portion of HGM-1 and the MUC5AC cDNA JUL 32 (19). Further confirmation of the identity between our new sequence and MUC5AC was provided by Northern blot analysis in which we observed that a probe from our new sequence showed tissue-specific hybridization identical to that obtained using a probe from the previously described MUC5AC C-terminal cDNA NP3a (8) (Fig. 4).


View larger version (73K):
[in this window]
[in a new window]
 
Fig. 4.   Northern blot analysis. Total RNA samples (20 µg) were analyzed as described under "Materials and Methods" using a cDNA probe from MUC5AC-5'RP (A) and a cDNA probe from NP3a (8) (B). Lanes are as follows: 1, stomach; 2, esophagus; 3, muscle; 4, bronchus; 5, lung. An RNA ladder was used for size markers as indicated. The blot was stripped and reprobed with GAPDH to assess amount and quality of RNA loaded.

Chromosome Mapping-- Human chromosome 11p15 contains a mucin gene cluster currently known to include MUC5AC as well as MUC5B, MUC6 and MUC2. To obtain further supporting evidence that the newly cloned RACE-PCR sequence is part of MUC5AC, we performed chromosomal mapping experiments. As shown in Fig. 5, MUC5AC-5' primers amplified a product from mouse-human hybrid cell line 1049, but not from cell line 1079. As both cell lines contained DNA from chromosome 5 but only 1049 contained DNA from chromosome 11, the results clearly show that our RACE product MUC5AC-5'RP maps to chromosome 11. This is consistent with identification of this product as part of MUC5AC.


View larger version (34K):
[in this window]
[in a new window]
 
Fig. 5.   PCR for chromosomal localization. PCR products were run on a 1% agarose gel and stained with ethidium bromide. Cell line 1049 contains human chromosomes 5 and 11. Cell line 1079 contains human chromosomes 2 and 5. cDNA from each cell line was used as a PCR template. cDNA from cell line 1049 shows a band of the predicted size (650 bp) as indicated by the arrow.

Primer Extension and RNase Protection Analysis-MUC5AC-- 5'RP contains a putative translation start site and signal sequence near its 5'-end (Fig. 2) suggesting that its 5'-end is at or near the transcription start site. To investigate this, we performed primer extension and RNase protection analysis. For primer extension, we used primer GM9H, which is approximately 100 bp upstream of the 5'-end of our RACE-PCR product as estimated from agarose gels. The primer extension reaction yielded a product of 114 bp (Fig. 6A) when RNA from gastric tissue or airway cells was used as a template, supporting the view suggested by RACE-PCR that the transcription start site was approximately 100 bp upstream of primer GM9H.


View larger version (72K):
[in this window]
[in a new window]
 
Fig. 6.   Primer extension and RNase protection assay. A, autoradiography showing a 114-bp primer extension product obtained from human stomach RNA (lane 2). Lane 1 shows a sequencing ladder, and lane 3 is a 32P-labeled ØX174 HinfI marker with band sizes as indicated. B, autoradiography after electrophoresis of the protected fragment shows a major band and two minor bands. Lane 1 is a sequencing ladder, lane 2 is total RNA from NCIH292 cells exposed to P. aeruginosa, lane 3 is total RNA from gastric tissue, and lane 4 is total RNA from HM3 cells exposed to P. aeruginosa. The arrows extend from individual nucleotides in the sequence of MUC5AC-5'RP that correspond to the protected bands. The nucleotide corresponding to the primer extension (P.E.) result and the nucleotide predicted by the computer search to be the start of transcription are also indicated.

For RNase protection analysis, we used as probe a portion of the genomic clone 7"A containing the putative exon I and upstream sequence (see "Materials and Methods"). We examined a total of three RNA samples (Fig. 6B). These were taken from gastric tissue, colon carcinoma cells (HM3) and lung carcinoma cells (NCIH292). RNA from each sample protected the same three probe fragments, indicating putative start sites at 1, 6, and 8 bp upstream of the start site predicted by primer extension. The start site predicted by computer program NNPP (promoter prediction by neural network, Lawrence Berkeley National Laboratory, Human Genome Center) was at 4 bp upstream of the site indicated by primer extension. As it fell approximately in the middle of the range of possible start sites, we designated the computer-predicted start site as +1.

Cloning and Sequencing of DNA Upstream of the Transcription Start Site-- To obtain DNA immediately flanking the transcription start site, we performed 5'-genomic DNA walking using the gene-specific primer GM9H5' (+68/-39) and two adaptor primers, AP1 and AP2 (see "Materials and Methods"). This yielded a 4-kb genomic DNA fragment (M4-2) the sequence of which is shown in Fig. 7. We have confirmed the sequence -300/+1 as well as downstream sequence through exon 1 (+1 to +120) by sequencing a subclone of genomic clone 7"A. The upstream sequence contains a TATA box at -23/-29, further supporting the view that our RACE-PCR product MUC5AC-5'RP is at the 5'-end of the mRNA and that the designated transcription start site, +1 is accurate. Present in the putative promoter region are NFkappa B, Sp-1, GRE, AP-2, and CACCC box sites.


View larger version (80K):
[in this window]
[in a new window]
 
Fig. 7.   Nucleotide sequence of genomic clone M4-2. The positions of potential transcription factor binding sites including a TATA box are underlined. The transcription start site (+1) is indicated by a bent arrow.

Up-regulation of MUC5AC Transcriptional Activity by P. aeruginosa-- Availability of the upstream regulatory region permits analysis of potential abnormalities in MUC5AC transcription in disease models. We observed large inductions of MUC5AC RNA in epithelial cells exposed to P. aeruginosa or its exoproducts in cell-free supernatants (Fig. 8A). That this was controlled at the transcriptional level was indicated by 15-20-fold induction of transcriptional activity in epithelial cells stably transfected with MUC5AC-luciferase reporter constructs and exposed to P. aeruginosa (Fig. 8B). These findings indicate the presence of elements responsive to P. aeruginosa in the 4-kb DNA fragment immediately upstream of the MUC5AC transcription start site. Analysis of deletion mutants will permit precise identification of these elements and open the way to identification of cognate transcription factors.


View larger version (33K):
[in this window]
[in a new window]
 
Fig. 8.   Up-regulation of MUC5AC promoter activity in response to bacterial exoproducts. A, RNase protection assay showing up-regulation of the endogenous MUC5AC mRNA in response to P. aeruginosa. In both the airway epithelial cell line NCIH292 and the colon epithelial cell line HM3, MUC5AC-protected mRNA fragments are more abundant in the presence than in the absence of P. aeruginosa. All cells were studied after 6 h of incubation. Lane 1, low serum medium; lane 2, PAO1 supernatant:low serum medium, 1:4; lane 3, PAO1 live bacteria. GAPDH- and cyclophilin-protected RNA fragments reflect the amount of RNA incubated per lane. B, luciferase reporter assays. The MUC5AC upstream flanking region (-4000/+68) and a luciferase reporter were cloned into pGL3 basic vector and transfected into HM3 cells. G418-selected colonies were pooled and expanded before 24 h of exposure to P. aeruginosa supernatant at the concentrations shown on the x-axis. Fold induction (mean of data from three separate bacterial colonies ± S.E.) in luciferase activity is shown. The data indicate that the 4-kb flanking region contains one or more elements responsive to P.aeruginosa exoproducts.

    DISCUSSION
Top
Abstract
Introduction
Materials & Methods
Results
Discussion
References

In this series of studies, we isolated the amino terminus and 5'-flanking region of the MUC5AC mucin gene as a first step toward understanding the dysregulation of mucin mRNA production in the airways of cystic fibrosis patients. Hypothesizing that the previously reported cDNA HGM-1 was relatively upstream in the MUC5AC sequence, we performed progressive RACE-PCR amplifications that eventually reached the transcription start site. We used a similar approach to isolate the 5'-flanking region from genomic DNA.

Evidence That HGM-1 and Its 5'-RACE Extended Product Are Part of MUC5AC-- HGM-1 is a human gastric mucin cDNA (1) containing cysteine clusters interspersed with threonine-, serine-, and proline-rich domains (1). Cysteine-rich domains are considered to be typical of mucin sequences, having been reported in many mucins including MUC2 (18, 19), MUC5AC (8), MUC5B (7), and MUC6 (10) as well as in rat (23), pig (26), cow (27), and frog (28) mucins. The cysteine-rich domains in mucins show varying degrees of similarity to the D-domains of von Willebrand factor.

The evidence that HGM-1 is part of MUC5AC essentially rests on the observation that HGM-1 nucleotides 1942-2281 are 99% similar to the MUC5AC clone JUL 32 (19) and nucleotides 2190-2541 are 92% similar to the 5'-end of MUC5AC clone NP3a (1). By the same reasoning, HGM-1's extended 5' sequence MUC5AC-5'RP is also part of MUC5AC.

Evidence That the MUC5AC RACE-PCR Product Contains the Gene's 5'-End-- That 5'-RACE-PCR yielded products with identical 5'-ends after several successive amplifications regardless of whether stomach or airway cDNA was used as a template, first suggested we had reached the 5'-end. The results of subsequent primer extension and RNase protection assays supported this. Further support was provided by characteristics of the DNA both upstream and downstream of the putative transcription start site: 25 bp upstream of the start site is a TATA box, and 48 bp downstream is a putative translation start codon (ATG) embedded in a Kozak consensus sequence (24) followed by a 45-bp signal sequence.

Current Model of MUC5AC-- The overall structure of MUC5AC, as pieced together from evidence currently available, is compared with the structure of MUC2 in Fig. 9. The structure of the MUC5AC carboxyl terminus has been known since the cloning of NP3a, a cDNA isolated from a nasal polyp library. Its identification as part of MUC5AC rests on the fact that cDNAs containing part of the NP3a sequence had previously been designated as MUC5 (25) and were later designated as MUC5AC (19) The recognition that NP3a comprises the gene's 3'-end rests on its containing a polyadenylation signal and poly(A) tail. It also contains a homologue of the MUC2 D-domain 4 (8). A similar cDNA, L31, was isolated from an HT29 (colon carcinoma) cell library (9).


View larger version (15K):
[in this window]
[in a new window]
 
Fig. 9.   Diagram comparing elements of the MUC2 structure with those known so far of the MUC5AC structure. Positions of representative MUC5AC cDNAs are shown. The four cysteine-rich D-domains are >60% similar between MUC2 and MUC5AC. The size of the MUC2 repeat region is 8.4 kb. The corresponding region for MUC5AC is estimated at >= 6 kb (see "Discussion"). Regions with vertical hatching are threonine/serine/proline-rich repeats, and regions with oblique hatching are cysteine-rich domains.

Other than the positioning of NP3a and L31 at the 3'-end, it has not been possible to assign any of the other known MUC5AC cDNAs to particular positions in the coding sequence. cDNAs containing threonine/serine/proline-rich repetitive sequences (JER 47, JER 58, Mar 2, 10, 11, and CEL 2) from a tracheobronchial library (19) and cDNA 4F from a stomach library (15) are assumed to occupy positions in a central part of the gene based on comparisons with MUC2. Our Northern blots (Fig. 4) suggest that the size of the full-length MUC5 AC mRNA is 12-14 kb and possibly larger. Subtracting the amount of currently known sequence at the 5'- and 3'-ends, we estimate the size of the repeat region in MUC5AC to be at least 6 kb. The size of the repeat region in MUC2 is 8.4 kb.

Despite considerable interest in the gene 5'-end and promoter, prior to this report no 5' cDNAs had been conclusively identified. Klomp et al. (1) had noted, however, that HGM-1, a cDNA isolated from a gastric cDNA library, is >60% similar to the MUC2 D-domain 3, which is approximately 3 kb downstream of the MUC2 5' transcription start site. Taken together with evidence that HGM-1 is part of MUC5AC (see above), its similarity to this upstream region of MUC2 suggested that HGM-1 might comprise an upstream region of MUC5AC. Our 5'-RACE-PCR studies yielded a 3.3-kb 5' extension of HGM-1. Within the MUC5AC sequence reported here are homologues of MUC2 D-domains 1 and 2 and the 5'-end of D-domain 3. Upstream of D-domain 1 is a signal peptide and translation start site.

Sequence similarity between MUC2 and MUC5AC cDNAs has suggested a common ancestral origin. With the extension of the 5'-end of MUC5AC and manifestation of the conserved domain structure between the two genes, this theory gains increased support.

The cloning work described here has provided insights not only into mucin gene structure and evolution but also into the mechanisms by which mucin is overproduced in human disease. Although MUC5AC has been recognized for some time to encode an airway mucin, it was only recently discovered that its expression is up-regulated in airway disease. The work reported here is the first to establish that this up-regulation is controlled at the transcriptional level and that key cis- and trans-activating factors operate within 4 kb upstream of the transcription start site. Availability of the newly cloned sequence will permit precise identification of transcriptional control mechanisms and will facilitate elucidation of upstream signaling pathways as well.

    ACKNOWLEDGEMENTS

We thank Jian-Dong Li, M.D., Ph.D. for helpful discussion and providing the P. aeruginosa supernatants.

    FOOTNOTES

* This work was supported by National Institutes of Health Public Health Service Grants HL 24136 and HL 43762 and a grant from the state of California Tobacco Research and Development Program.The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) AF015521, AF016834.

To whom correspondence should be addressed. Tel.: 415-476-3835; Fax: 415-476-4845; E-mail: cbas{at}itsa.ucsf.edu.

1 Dohrman, A., Miyata, S., Gallup, M., Li, J.-D., Chapelin, C., Coste, A., Escudier, E., Nadel, J., and Basbaum, C. (1998) Biochim. Biophys. Acta, in press.

2 The abbreviations used are: PCR, polymerase chain reaction; RACE, rapid amplification of cDNA ends; HGM, human gastric mucin; RPA, RNase protection assay; kb, kilobase(s); bp, base pair(s).

    REFERENCES
Top
Abstract
Introduction
Materials & Methods
Results
Discussion
References

  1. Klomp, L., van Rens, L., and Strous, G. (1995) Biochem. J. 308, 831-838[Medline] [Order article via Infotrieve]
  2. Gendler, S. J., Lancaster, C. A., Taylor-Papadimitriou, J., Duhig, T., Peat, N., Burchell, J., Pemberton, L., Laloni, E. N., Wilson, D. (1990) J. Biol. Chem. 265, 15286-15293[Abstract/Free Full Text]
  3. Gendler, S., Spicer, A., Lalani, E. N., Duhig, T., Peat, N., Burchell, J., Pemberton, L., Boshell, M., Taylor-Papadimitriou, J. (1991) Am. Rev. Respir. Dis. 144, S42-S47[Medline] [Order article via Infotrieve]
  4. Gum, J. R., Byrd, J. C., Hicks, J. W., Toribara, N. W., Lamport, D. T. A., Kim, Y. S. (1989) J. Biol. Chem. 264, 6480-6487[Abstract/Free Full Text]
  5. Gum, J. R., Hicks, J. W., Swallow, D. M., Lagace, R. L., Byrd, J. C., Lamport, D. T. A., Siddiki, B., Kim, Y. S. (1990) Biochem. Biophys. Res. Commun. 171, 407-415[Medline] [Order article via Infotrieve]
  6. Porchet, N., Van Cong, N., Dufosse, J., Audie, J. P., Guyonnet-Duperat, V., Gross, M. S., Denis, C., Degand, P., Bernheim, A., Aubert, J. P. (1991) Biochem. Biophys. Res. Commun. 175, 414-422[Medline] [Order article via Infotrieve]
  7. Dusseyn, J.-L., Guyonnet-Duperat, V., Porchet, N., Aubert, J.-P., and Laine, A. (1997) J. Biol. Chem. 272, 3168-3178[Abstract/Free Full Text]
  8. Meezaman, D., Charles, P., Daskal, E., Polymeropoulos, M., Martin, B., and Rose, M. (1994) J. Biol. Chem. 269, 12932-12939[Abstract/Free Full Text]
  9. Lesuffleur, T., Roche, F., Hill, A., Lacasa, M., Fox, M., Swallow, D., Zweibaum, A., and Real, F. (1995) J. Biol. Chem. 270, 13665-13673[Abstract/Free Full Text]
  10. Toribara, N., Ho, S. B., Gum, E., Gum, J., R., Jr., Lau, P., Kim, Y. S. (1997) J. Biol. Chem. 272, 16398-16403[Abstract/Free Full Text]
  11. Bobek, L., Tsai, H., Biesbrock, A., and Levine, M. (1993) J. Biol. Chem. 268, 20563-20569[Abstract/Free Full Text]
  12. Shankar, V., Gilmore, M., Elkins, R., and Sachdev, G. (1994) Biochem. J. 300, 295-298[Medline] [Order article via Infotrieve]
  13. Gendler, S., Madsen, C., Aust, M., Yankaskas, J., Jennings, J., and Kasperbauer, J. (1996) Pediatric Pulmonology 13S, 290[CrossRef] (abstr.)
  14. Li, J.-D., Dohrman, A., Gallup, M., Miyata, S., Gum, J., Kim, Y., Nadel, J., Prince, A., and Basbaum, C. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 967-972[Abstract/Free Full Text]
  15. Ho, S., Roberton, A., Shekels, L., Lyftogt, C., Niehans, G., and Toribara, N. (1995) Gastroenterol. 109, 735-747[Medline] [Order article via Infotrieve]
  16. Deleted in proof
  17. Gum, J. R., Jr., Hicks, J., Toribara, N., Siddiki, B., and Kim, Y. (1994) J. Biol. Chem. 269, 2440-2446[Abstract/Free Full Text]
  18. Gum, J., Hicks, J., Toribara, N., Rothe, E., Lagace, R., and Kim, Y. (1992) J. Biol. Chem. 267, 21375-21383[Abstract/Free Full Text]
  19. Guyonnet-Duperat, V., Audie, J., Debailleul, V., Laine, A., Buisine, M., Galiegue-Zouitina, S., Pigny, P., Degand, P., Aubert, J.-P., and Porchet, N. (1995) Biochem. J. 211-219
  20. Jackowski, J., Szepfalusi, Z., Wanner, D., Seybold, Z., Sielczak, M., Lauredo, I., Adams, T., Abraham, W., and Wanner, A. (1991) Am. J. Physiol. 260, L61-L67[Abstract/Free Full Text]
  21. Massion, P., Inoue, H., Richman-Eisenstat, Grunberger, D., Jorens, P., Housset, B., Pittet, J.-F., Wiener-Kronisch, J., and Nadel, J. (1994) J. Clin. Invest. 93, 26-32[Medline] [Order article via Infotrieve]
  22. Chomczynski, P., and Sachi, N. (1987) Anal. Biochem. 162, 156-159[CrossRef][Medline] [Order article via Infotrieve]
  23. Ohmori, H., Dohrman, A. F., Gallup, M., Tsuda, T., Kai, H., Gum, J. R., Jr., Kim, Y., Basbaum, C. (1994) J. Biol. Chem. 269, 17833-17840[Abstract/Free Full Text]
  24. Kozak, M. (1991) J. Biol. Chem. 266, 19867-19870[Free Full Text]
  25. Aubert, J.-P., Porchet, N., Crepin, M., Duterque-Coquillaud, M., Vergnes, G., Mazzuca, M., Debuire, B., Petiprez, D., and Degand, P. (1991) Am. J. Respir. Cell Mol. Biol. 5, 175-185
  26. Eckhardt, A. E., Timpte, C. S., Abernethy, J. L., Zhao, Y., Hill, R. L. (1991) J. Biol. Chem. 266, 9678-9686[Abstract/Free Full Text]
  27. Bhargava, A. K., Woitach, J. T., Davidson, E. A., Bhavanandan, V. P. (1990) Proc. Natl. Acad. Sci. U. S. A. 97, 6798-6802
  28. Probst, J., Gertzen, E.-M., and Hoffmann, W. (1990) Biochemistry 29, 6240-6244[Medline] [Order article via Infotrieve]


Copyright © 1998 by The American Society for Biochemistry and Molecular Biology, Inc.