From the Department of Anatomy and Cardiovascular
Research Institute, University of California, San Francisco, California
94143 and § Roche Bioscience, Palo Alto, California
94304
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To obtain gene regulatory sequence for the mucin
gene MUC5AC, we have isolated the MUC5AC amino
terminus cDNA and 5'-flanking region. This was possible through the
use of rapid amplification of cDNA ends-polymerase chain reaction
(RACE-PCR) in which the 5' sequence of the human gastric mucin cDNA
HGM-1 (1) was used to design the first MUC5AC-specific
primer. Primers for subsequent rounds of RACE were designed from the
5'-ends of amplified RACE products. After five rounds of RACE-PCR, we
could no longer generate upstream extensions of the cDNA and
hypothesized that we had reached the 5'-end. Primer extension and RNase
protection analysis confirmed this. Combined nucleotide sequence for
the RACE-PCR products was 3.3 kb with an open reading frame encoding
1100 amino acids. A putative translation start site was found at
nucleotide +48. This was followed by a 45 nucleotide putative signal
sequence. This amino-terminal sequence contains no tandem repeats but
is >60% similar to the amino-terminal nucleotide sequence of
MUC2. The positions of cysteine residues in this
MUC2-similar region are almost 100% conserved between the
two genes. Northern analysis showed expression of cognate RNA in the
stomach and airway but not muscle and esophagus. This pattern was the
same as that obtained using previously reported 3'-MUC5AC
sequences. We have cloned approximately 4 kb of genomic DNA upstream of
the transcription start site and have sequenced 1366 nucleotides
containing a TATA box, a CACCC box, and putative binding sites for
NFB and Sp 1. Within 4 kb of the transcription start site are
elements mediating transcriptional up-regulation in response to
bacterial exoproducts.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Mucin is a glycoprotein secreted from epithelial cells at many body surfaces. In the airways, mucin interacts with cilia to trap and clear pathogens and irritants. This mucociliary mechanism is impaired when mucin is produced excessively as in cystic fibrosis, chronic bronchitis, and asthma. Mucociliary impairment leads to airway mucus plugging, which promotes chronic infection, airflow obstruction, and sometimes death.
Nine mucin genes are known to be expressed in man: MUC1-4, MUC5AC, MUC5B, MUC6-8 (2-12). The mRNAs encoding two of them, MUC2 and MUC5AC, have been shown to be up-regulated in cystic fibrosis airways (13, 14)1 and likely contribute to the airway mucus plugging characteristic of this disease. Insofar as DNA-RNA transcription is controlled by mechanisms amenable to pharmaceutical intervention, an understanding of mucin transcription may suggest ways of inhibiting mucin overproduction.
Both MUC2 and MUC5AC map to chromosome 11p15.5 and may have arisen from a common ancestral gene. The structure of MUC2 is known. Its central region, comprising >50% of the polypeptide, contains two tandem repeat sequences rich in threonine, serine, and proline (4, 17); this is flanked up- and downstream by cysteine-rich regions (17, 18). The threonine and serine residues represent O-glycosylation sites, whereas the cysteine residues are thought to mediate intermolecular interactions underlying mucus gel formation. The isolation of the amino terminus of the MUC2 cDNA by anchor PCR2 provided sequence for probing a genomic library to obtain the 5'-flanking sequence (17). Using portions of this sequence in luciferase vectors, we identified DNA elements controlling the MUC2 response to the common cystic fibrosis pathogen Pseudomonas aeruginosa (14).
Much less information is available regarding MUC5AC. Understanding the transcriptional control of this gene will require isolation of the amino terminus and 5'-flanking region. To date, MUC5AC amino-terminal cDNAs have not been reported. Although PCR-based techniques can in principle extend existing cDNA fragments over long distances, the large size of the MUC5AC mRNA (10-12 kb) (8, 9), and the potential presence of a central repetitive region present obstacles to extending the existing cDNA sequences to the 5'-end.
A significant aid in this regard was provided by publication of the sequence of cDNA HGM-1 cloned from the human stomach (1). This cDNA likely derives from MUC5AC as nucleotides 1942-2281 are 99% similar to the MUC5AC clone JUL 32 (19) and nucleotides 2190-2541 are 92% similar to the 5'-end of MUC5AC clone NP3a (1). As noted by Klomp et al. (1), the ~8% discrepancy between HGM-1 and NP3a suggests that portions of HGM-1 are repeated twice in MUC5AC. HGM-1's similarity to NP3a would place one HGM-1-like sequence near the 3'-end since NP3a contains a polyadenylation signal; its ~60% similarity to the MUC2 D3-domain (1) would place another HGM-1-like sequence near the 5'-end since the MUC2 D3 domain is within 3 kb of the MUC2 transcription start site (17). Hypothesizing that HGM-1 itself is present near the MUC5AC 5'-end, we used an HGM-1 sequence as the first gene-specific primer in repetitive 5'-RACE-PCR reactions. This approach ultimately permitted amplification of a 3.3-kb upstream extension of HGM-1, which we call MUC5AC-5'-RACE product (MUC5AC-5'RP). Primer extension, RNase protection assays, and the presence of a translation start site and putative signal sequence indicate that this sequence is at the gene's 5'-end. Genomic DNA immediately upstream of MUC5AC-5'RP has the structural properties of a promoter and contains elements mediating transcriptional up-regulation in response to bacterial exoproducts. We conclude that our cloned sequences are the amino-terminal and 5'-flanking region of MUC5AC. The availability of these sequences should aid identification of the elements controlling MUC5AC overexpression in disease.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Cell Culture-- The human lung epithelial carcinoma cell line NCIH292 was grown in RPMI 1640 medium supplemented with 10% heat-inactivated fetal calf serum (Life Technologies, Inc.). The human colon carcinoma line HM3 (4) was grown in Dulbecco's modified Eagle's medium with high glucose and 10% fetal calf serum. In some experiments, cells were exposed for 6 or 24 h to P. aeruginosa.
Bacterial Culture and Preparation of Cell-free
Supernatants--
P. aeruginosa strain PAO1 was grown in M9
buffer (20) for 72 h at 37 °C (to late log phase). Cell-free
supernatant was obtained by centrifugation at 10,000 rpm for 60 min at
4 °C and by filtration through a 0.22-µm filter (Corning).
Supernatant was aliquoted and stored at 80 °C until used.
Exposure of Tissues and Cells to Bacterial Cell-free Filtrates-- To look at the effects of P. aeruginosa on MUC5AC steady state mRNA, incubation was as described (21). Briefly, cells were washed twice with phosphate-buffered saline at 37 °C. Samples were then incubated with bacterial supernatant or buffer (M9) diluted 1:4 with mammalian cell culture medium for 6 h. Total RNA was obtained from pelleted cells scraped from the culture dish (22). Lactate dehydrogenase release was measured (LDH 320, Sigma) to detect any cell lysis.
cDNA Synthesis and 5'-RACE-PCR-- Sources known to contain abundant MUC5AC mRNA (P. aeruginosa-exposed NCIH292 cells or human stomach) were subjected to RNA extraction (22). Total RNA (3 µg) was used to generate double-stranded cDNA using the Marathon cDNA Amplification kit (CLONTECH). The double-stranded cDNA was ligated with the Marathon cDNA adaptor and purified on a chromaspin +TE-1000 column (CLONTECH) in a total volume of 100 µl. 5'-RACE was performed using the double-stranded cDNA as template with one HGM-1 gene-specific primer (Gm1) and the adaptor primer AP1 or AP2. Additional gene-specific primers (Gm5, Gm9, Gm9G, and Gm9H) were generated based on the sequences of progressively amplified 5'-RACE products.
Northern Blot Analysis of Tissue Distribution of MUC5AC mRNA,
Results Using MUC5AC-5'RP and NP3a Probes--
Total RNA was extracted
from human tissues according to previously described methods (22). RNA
samples (20 µg) were separated on 1.0% agarose gels containing 2.2 M formaldehyde and then transferred to a positively charged
nylon membrane (Gene Screen, NEN Life Science Products). cDNA
probes were labeled with [-32P]dCTP using a Life
Technologies, Inc. random primer labeling kit. For the
MUC5AC 3'-end, a cDNA fragment was amplified from tissue
mRNA using primers NP3a3' and NP3a5'. The insert for the probe was
gel-purified from a construct made by TA cloning the PCR product into
pCRII vector (Invitrogen). For the new sequence MUC5AC-5'RP,
probes were made from amplified fragments using primers TER and GM9.
Labeled probe was added to 10 ml of hybridization buffer containing
50% formamide, 10% dextran sulfate, 0.2% Denhardt's, 50 mM TRIS-HCl, pH 7.5, 1 M NaCl, and 0.1% sodium
pyrophosphate to give a concentration of 2-5 × 106
cpm/ml. Membrane hybridization and washing were performed using conditions described previously (23).
Primers-- Primers used for 5'-RACE, construction of Northern blot probes, genomic library screening and DNA walking, primer extension, and RNase protection assays are shown in Table I.
DNA Cloning and Sequencing--
After RACE-PCR, amplified
fragments were purified by low-melting point agarose gel
electrophoresis, cut with appropriate restriction enzymes and cloned
into pBluescript II SK() (Stratagene) or sequenced directly.
Escherichia coli (SURE strain, Stratagene) was transformed with plasmids containing these fragments. Transformants were grown at
37 °C or 30 °C. Both sense and antisense strands were sequenced. Sequencing reactions were carried out using SequiTherm Long-Read cycle
sequencing kits (Epicentre Technologies) and Thermo Sequenase fluorescent labeled primer cycle sequencing kits (Amersham Pharmacia Biotech) with the IRD41 (Li-cor) labeled primers. Sequence data were
assembled by Lasergene software (DNAstar). Homology and transcription factor binding site searches were performed using MatInspector release
2.1 and Transcription Element Search Software (TESS, University of
Pennsylvania) and MacVector software (IBI).
Chromosome Localization of PCR-amplified DNA Fragments-- Two mouse/human hybrid cell line DNA panels were purchased from Bios. Cell line 1049 contained human chromosomes 5 and 11. Cell line 1079 contained human chromosomes 2 and 5. DNA from each cell line was used as a PCR template with RACE product primers to determine the chromosomal location of RACE products.
Primer Extension Analysis of Transcription Start Site-- When progressive 5'-RACE reactions could no longer amplify additional sequence from either the stomach tissue or airway cell cDNA templates, we performed primer extension using primer Gm9H (approximately 100 bp from the putative 5'-end of the mRNA) to confirm that we had reached the transcription start site. Primer extension was done using the Promega avian myeloblastosis virus reverse transcriptase primer extension system. Briefly, 0.1 pmol of 32P-end labeled primer Gm9H was incubated with 5 µl (40-50 µg) total RNA from tissue or cells and 5 µl of 2 × PE buffer at 58 °C for 20 min. After cooling to room temperature, 9 µl of a master mix containing 2 × PE buffer, 6.25 mM sodium pyrophosphate and 1 µl of avian myeloblastosis virus reverse transcriptase was added to each sample. After 30 min of incubation at 42 °C, the samples were diluted with 20 µl of loading dye, denatured by heating for 10 min at 90 °C, and run on a 6% acrylamide, 7 M urea, TBE gel, along with sequencing ladder and size markers.
RNase Protection Analysis of Transcription Start Site-- To confirm transcription start site location as determined by RACE-PCR and primer extension assays, we performed RNase protection assays. The labeled RNA probe required for this assay was generated from a PCR product designed to incorporate the T7 promoter. This PCR fragment was amplified from a 12-kb genomic clone (7"A) derived from screening a human genomic library in the Lambda FIX II vector (Stratagene) and was known from sequencing data to contain the putative exon I of MUC5AC. The library was screened with a probe generated from PCR of a 5'-RACE product with primers GM9 and GM2.6 using methods described in Ref. 23. The primers used to generate the RNA probe template from the genomic clone were RPA-T7 containing sequence from exon I and the T7 promoter and primer RPA-5' containing upstream genomic sequence (see Table I). This enabled us to generate high specific activity [32P]UTP-labeled RNA probes using RNA polymerase. For RPA analysis of MUC5AC mRNA levels in cells exposed to P. aeruginosa, primers NP3a5' and NP3a3' were used to PCR-amplify a 294-bp fragment that was then cloned into pCRII vector (Invitrogen). To monitor amounts of RNA used in each reaction, we used p-TRI-cyclophilin or p-TRI GAPDH vectors (Ambion) to generate antisense RNA probes. For the assay, total RNA was hybridized with 5 × 105 cpm of probe overnight at 42 °C. The RNA:RNA template was digested for 15 min at room temperature with 0.5 units of RNase A and 20 units of RNase T1, precipitated and run on a 6% polyacrylamide/urea-sequencing gel with a sequencing ladder for size determination.
|
5'-Genomic DNA Walking-- Genomic DNA was amplified from DNA provided in the human PromoterFinderTM DNA walking kit (CLONTECH) according to instructions provided by the manufacturer. For long sequence amplifications, we used the LA PCR kit (TaKaRa) and high fidelity expand PCR kit (Boehringer Mannheim) using primers GM9H5' and adaptor primers AP1 and AP2.
Construction of a Cell Line Stably Transfected with the MUC5AC
5'-Flanking Region and Determination of Luciferase Activity After
Treatment with P. aeruginosa Exoproducts--
A DNA fragment extending
from 4.0 kb to +68 bp was cloned into the
MluI/SmaI site of pGL3 basic vector (Promega).
This construct, referred to as M4-2, was co-transfected with pcDNA3
into the epithelial cell line HM3. G418-selected colonies were pooled,
expanded, and used in luciferase reporter assays. Stably transfected
HM3 cells were seeded at 105 cells/well in 96-well tissue
culture plates (Dynatech) in Dulbecco's modified Eagle's medium with
high glucose, 10% fetal bovine serum and 200 µg/ml G418 (Life
Technologies, Inc.). Six days later (1 day post-confluence), cells were
exposed for 24 h to P. aeruginosa supernatant diluted
at 5, 25, or 50% into culture medium. Cells were washed once with
phosphate-buffered saline and stored frozen at
80 °C. After
thawing, cells were assayed for luciferase activity using LucLite
reagent (Packard) and a TopCount luminometer (Packard).
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
RACE-PCR, cDNA Cloning and Sequence Determination of the MUC5AC Amino Terminus-- Based on the 99% sequence identity between MUC5AC clone JUL 32 (19) and HGM-1 nucleotides 1947-2278 (1), we hypothesized that HGM-1 is a part of MUC5AC. Based on >60% similarity between HGM-1 and the amino-terminal cysteine-rich domain of MUC2 (D-domain 3), we hypothesized that HGM-1 is an amino-terminal sequence. This led us to initiate 5'-RACE-PCR experiments aimed at extending HGM-1 to the MUC5AC transcription start site (Fig. 1).
|
|
|
Northern Blot Analysis of Tissue Distribution of RNA Corresponding to Newly Cloned Sequence-- Our interpretation that the 5' extension of HGM-1 (MUC5AC-5'RP) is at the 5'-end of MUC5AC rests primarily on the 99% similarity between a portion of HGM-1 and the MUC5AC cDNA JUL 32 (19). Further confirmation of the identity between our new sequence and MUC5AC was provided by Northern blot analysis in which we observed that a probe from our new sequence showed tissue-specific hybridization identical to that obtained using a probe from the previously described MUC5AC C-terminal cDNA NP3a (8) (Fig. 4).
|
Chromosome Mapping-- Human chromosome 11p15 contains a mucin gene cluster currently known to include MUC5AC as well as MUC5B, MUC6 and MUC2. To obtain further supporting evidence that the newly cloned RACE-PCR sequence is part of MUC5AC, we performed chromosomal mapping experiments. As shown in Fig. 5, MUC5AC-5' primers amplified a product from mouse-human hybrid cell line 1049, but not from cell line 1079. As both cell lines contained DNA from chromosome 5 but only 1049 contained DNA from chromosome 11, the results clearly show that our RACE product MUC5AC-5'RP maps to chromosome 11. This is consistent with identification of this product as part of MUC5AC.
|
Primer Extension and RNase Protection Analysis-MUC5AC-- 5'RP contains a putative translation start site and signal sequence near its 5'-end (Fig. 2) suggesting that its 5'-end is at or near the transcription start site. To investigate this, we performed primer extension and RNase protection analysis. For primer extension, we used primer GM9H, which is approximately 100 bp upstream of the 5'-end of our RACE-PCR product as estimated from agarose gels. The primer extension reaction yielded a product of 114 bp (Fig. 6A) when RNA from gastric tissue or airway cells was used as a template, supporting the view suggested by RACE-PCR that the transcription start site was approximately 100 bp upstream of primer GM9H.
|
Cloning and Sequencing of DNA Upstream of the Transcription Start
Site--
To obtain DNA immediately flanking the transcription start
site, we performed 5'-genomic DNA walking using the gene-specific primer GM9H5' (+68/39) and two adaptor primers, AP1 and AP2 (see "Materials and Methods"). This yielded a 4-kb genomic DNA fragment (M4-2) the sequence of which is shown in Fig.
7. We have confirmed the sequence
300/+1 as well as downstream sequence through exon 1 (+1 to +120) by
sequencing a subclone of genomic clone 7"A. The upstream sequence
contains a TATA box at
23/
29, further supporting the view that our
RACE-PCR product MUC5AC-5'RP is at the 5'-end of the
mRNA and that the designated transcription start site, +1 is
accurate. Present in the putative promoter region are NF
B, Sp-1,
GRE, AP-2, and CACCC box sites.
|
Up-regulation of MUC5AC Transcriptional Activity by P. aeruginosa-- Availability of the upstream regulatory region permits analysis of potential abnormalities in MUC5AC transcription in disease models. We observed large inductions of MUC5AC RNA in epithelial cells exposed to P. aeruginosa or its exoproducts in cell-free supernatants (Fig. 8A). That this was controlled at the transcriptional level was indicated by 15-20-fold induction of transcriptional activity in epithelial cells stably transfected with MUC5AC-luciferase reporter constructs and exposed to P. aeruginosa (Fig. 8B). These findings indicate the presence of elements responsive to P. aeruginosa in the 4-kb DNA fragment immediately upstream of the MUC5AC transcription start site. Analysis of deletion mutants will permit precise identification of these elements and open the way to identification of cognate transcription factors.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In this series of studies, we isolated the amino terminus and 5'-flanking region of the MUC5AC mucin gene as a first step toward understanding the dysregulation of mucin mRNA production in the airways of cystic fibrosis patients. Hypothesizing that the previously reported cDNA HGM-1 was relatively upstream in the MUC5AC sequence, we performed progressive RACE-PCR amplifications that eventually reached the transcription start site. We used a similar approach to isolate the 5'-flanking region from genomic DNA.
Evidence That HGM-1 and Its 5'-RACE Extended Product Are Part of MUC5AC-- HGM-1 is a human gastric mucin cDNA (1) containing cysteine clusters interspersed with threonine-, serine-, and proline-rich domains (1). Cysteine-rich domains are considered to be typical of mucin sequences, having been reported in many mucins including MUC2 (18, 19), MUC5AC (8), MUC5B (7), and MUC6 (10) as well as in rat (23), pig (26), cow (27), and frog (28) mucins. The cysteine-rich domains in mucins show varying degrees of similarity to the D-domains of von Willebrand factor.
The evidence that HGM-1 is part of MUC5AC essentially rests on the observation that HGM-1 nucleotides 1942-2281 are 99% similar to the MUC5AC clone JUL 32 (19) and nucleotides 2190-2541 are 92% similar to the 5'-end of MUC5AC clone NP3a (1). By the same reasoning, HGM-1's extended 5' sequence MUC5AC-5'RP is also part of MUC5AC.Evidence That the MUC5AC RACE-PCR Product Contains the Gene's 5'-End-- That 5'-RACE-PCR yielded products with identical 5'-ends after several successive amplifications regardless of whether stomach or airway cDNA was used as a template, first suggested we had reached the 5'-end. The results of subsequent primer extension and RNase protection assays supported this. Further support was provided by characteristics of the DNA both upstream and downstream of the putative transcription start site: 25 bp upstream of the start site is a TATA box, and 48 bp downstream is a putative translation start codon (ATG) embedded in a Kozak consensus sequence (24) followed by a 45-bp signal sequence.
Current Model of MUC5AC-- The overall structure of MUC5AC, as pieced together from evidence currently available, is compared with the structure of MUC2 in Fig. 9. The structure of the MUC5AC carboxyl terminus has been known since the cloning of NP3a, a cDNA isolated from a nasal polyp library. Its identification as part of MUC5AC rests on the fact that cDNAs containing part of the NP3a sequence had previously been designated as MUC5 (25) and were later designated as MUC5AC (19) The recognition that NP3a comprises the gene's 3'-end rests on its containing a polyadenylation signal and poly(A) tail. It also contains a homologue of the MUC2 D-domain 4 (8). A similar cDNA, L31, was isolated from an HT29 (colon carcinoma) cell library (9).
|
![]() |
ACKNOWLEDGEMENTS |
---|
We thank Jian-Dong Li, M.D., Ph.D. for helpful discussion and providing the P. aeruginosa supernatants.
![]() |
FOOTNOTES |
---|
* This work was supported by National Institutes of Health Public Health Service Grants HL 24136 and HL 43762 and a grant from the state of California Tobacco Research and Development Program.The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) AF015521, AF016834.
¶ To whom correspondence should be addressed. Tel.: 415-476-3835; Fax: 415-476-4845; E-mail: cbas{at}itsa.ucsf.edu.
1 Dohrman, A., Miyata, S., Gallup, M., Li, J.-D., Chapelin, C., Coste, A., Escudier, E., Nadel, J., and Basbaum, C. (1998) Biochim. Biophys. Acta, in press.
2 The abbreviations used are: PCR, polymerase chain reaction; RACE, rapid amplification of cDNA ends; HGM, human gastric mucin; RPA, RNase protection assay; kb, kilobase(s); bp, base pair(s).
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|