Alternative Core Promoters Regulate Tissue-specific Transcription from the Autoimmune Diabetes-related ICA1 (ICA69) Gene Locus*

Robert P. FridayDagger §, Susan L. PietropaoloDagger , Jennifer ProfozichDagger , Massimo TruccoDagger §, and Massimo PietropaoloDagger ||

From the Dagger  Division of Immunogenetics, Department of Pediatrics, Diabetes Institute, Rangos Research Center, Children's Hospital of Pittsburgh, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15213 and § Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania 15261

Received for publication, October 4, 2002, and in revised form, October 25, 2002

    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Islet cell autoantigen 69-kDa (ICA69), protein product of the human ICA1 gene, is one target of the immune processes defining the pathogenesis of Type 1 diabetes. We have characterized the genomic structure and functional promoters within the 5'-regulatory region of ICA1. 5'-RNA ligase-mediated rapid amplification of cDNA ends evaluation of ICA1 transcripts expressed in human islets, testis, heart, and cultured neuroblastoma cells reveals that three 5'-untranslated region exons are variably expressed from the ICA1 gene in a tissue-specific manner. Surrounding the transcription initiation sites are motifs characteristic of non-TATA, non-CAAT, GC-rich promoters, including consensus Sp1/GC boxes, an initiator element, cAMP-responsive element-binding protein (CREB) sites, and clusters of other putative transcription factor sites within a genomic CpG island. Luciferase reporter constructs demonstrate that the first two ICA1 exon promoters reciprocally stimulate luciferase expression within islet- (RIN 1046-38 cells) and brain-derived (NMB7) cells in culture; the exon A promoter exhibits greater activity in islet cells, whereas the exon B promoter more efficiently activates transcription in neuronal cells. Mutation of a CREB site within the ICA1 exon B promoter significantly enhances transcriptional activity in both cell lines. Our basic understanding of expression from the functional core promoter elements of ICA1 is an important advance that will not only add to our knowledge of the ICA69 autoantigen but will also facilitate a rational approach to discover the function of ICA69 and to identify relevant ICA1 promoter polymorphisms and their potential associations with disease.

    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Islet cell autoantigen 69 kDa (ICA69) is identified with a group of Type 1 diabetes-related islet autoantigens considered to be specific protein targets of the diabetogenic autoimmune response. By using sera from pre-diabetic individuals, Pietropaolo and co-workers (1) first identified ICA69 through immunoscreening of a human islet cDNA expression library. The 1785-bp nucleotide sequence of the full-length clone and its deduced 483-amino acid protein coding region demonstrated no overt homology to known molecules at the time of its discovery, and nucleic acid and protein analyses revealed that the molecule is primarily expressed in neuroendocrine tissues (1, 2). More recent subcellular fractionation studies of murine brain tissue have shown that the majority of ICA69 protein is cytosolic and soluble, although a subfraction appears to be membrane-bound and associates with synaptic-like microvesicles (3). Although of unknown significance, smaller isoforms of the protein may be expressed from at least three human transcript variants (1, 4-6), representing truncated cDNAs that arise from alternative splicing of coding region exons (4).

ICA69 is encoded on human chromosome 7p22 by the ICA1 gene (7), which is composed of 14 coding exons and three 5'-untranslated region (UTR)1 sequences, each of which splices with exon 1 in a mutually exclusive manner (4). In addition, multiple cDNA coding region splice variants from human, mouse, and rat have been identified within islet and brain bacteriophage lambda  cDNA libraries by immunoscreening with human serum or by DNA probe hybridization (1, 4-6). Intron-exon boundaries were established for the human and murine ICA1 genes using a combination of lambda  phage genomic DNA library screening and PCR experiments (4). Collectively, these data argue for a high level of evolutionary conservation of the ICA1 gene, not only upon comparison of the human protein to the rat (6) and mouse (5) homologues but also in terms of exon/intron partitioning (4).

Recently a protein from the nematode Caenorhabditis elegans, termed ric-19, was reported to exhibit amino acid homology with ICA69 (3). Based on functional studies of ric-19 in C. elegans, these authors have proposed that ICA69/ric-19 participates in the process of neuroendocrine secretion through an association with secretory vesicles. These data suggest that ICA69 may be involved in the insulin secretory pathway in islet beta  cells, as the molecule is known to be specifically expressed within islets (8) and by insulin-producing cell lines maintained in culture (2). However, the true cellular function of ICA69 and its importance to normal mammalian pancreatic islet physiology remain unknown.

Most basic and clinical research investigations concerning ICA69 have focused on the importance of the molecule as an autoimmune target in Type 1 diabetes. Three lines of evidence from at least four independent groups substantiate a role for ICA69 autoimmunity in diabetes. 1) First degree relatives of diabetic patients who developed the disease during follow-up have detectable serum levels of ICA69 autoantibodies (1, 9). 2) T-cells isolated from newly diagnosed diabetic patients and from non-obese diabetic (NOD) mice demonstrate reactivity against the recombinant ICA69 molecule (10-12). 3) T-cells specific for the ICA69 peptide Tep-69 play a driving role in the acceleration of islet cell destruction in the NOD mouse model of Type 1 diabetes (13), whereas intraperitoneal injection of Tep-69 is associated with apparent immune toleration and decreased diabetes incidence in NOD mice (14). It has been reported that a majority of patients with recent onset Type 1 diabetes shows evidence of autoreactive T-cells and/or autoantibodies with immune specificity for the ICA69 molecule (11), but it must also be acknowledged that some investigators have questioned the significance of ICA69 autoantibodies based on their own studies (15).

Motivated by an interest in understanding how autoantigen expression in key body tissues relates to autoimmunity and as a prerequisite to searching for functional polymorphisms in the promoter region of the gene encoding ICA69, we have defined the basic structure and functional characteristics of the ICA1 promoter. Sequences adjacent to the multiple ICA1 transcription initiation sites contain motifs typical of a non-TATA, non-CAAT, GC-rich regulatory region, including consensus Sp1/GC box sites, Inr (initiator) elements, and CREB sites. The major alternative transcription initiation sites associate with independent 5'-UTR exons, and a detailed analysis of ICA1 transcripts from different tissues provides evidence for a tissue-specific utilization of the distinct initiation sites consistent with the observed 5'-UTR heterogeneity of mature protein-coding ICA69 transcripts. In vitro luciferase reporter gene assays of promoter function correlate with the observed preferential transcription initiation site usage within different tissues, whereas site-directed mutagenesis of promoter reporter constructs demonstrate the importance of an Sp1/GC box site and a CREB site to the regulation of expression from exons A and B, respectively. The significance and potential implications of the ICA1 promoter structure and function are discussed in the context of understanding ICA69 biology and its role as a Type 1 diabetes autoantigen.

    EXPERIMENTAL PROCEDURES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Cell Lines-- Two adherent cell lines were maintained in culture in order to provide RNA for transcript analysis and for testing promoter activity of cloned ICA1 5'-flanking sequences in a firefly luciferase reporter assay. The human neuroblastoma cell line NMB7 was grown in RPMI containing 10% fetal bovine serum, supplemented with L-glutamine and penicillin/streptomycin. Rat insulinoma (RIN 1046-38) cells were maintained in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum, L-glutamine, and penicillin/streptomycin. Cells were incubated at 37 °C in a 5% CO2 atmosphere. Passage of cells was conducted at 70-80% confluence as necessary. All cell culture reagents were obtained from Invitrogen and lot-certified in-house. NMB7 cells were provided by Dr. Ira Bergman and Judi Griffin (Children's Hospital of Pittsburgh, Pittsburgh).

Nucleic Acid Purification-- Human total genomic DNA was purified from whole blood using the Qiagen Genomic DNA Extraction Kit (Qiagen, Inc., Valencia, CA). Briefly, 10 ml of heparinized blood was successively subjected to cellular lysis, nuclear isolation, nuclear lysis, and anion-exchange chromatography using the buffers and prepared columns supplied with the Qiagen kit. Typical yields ranged from 150 to 350 µg of total genomic DNA per 10 ml of whole blood.

Bacterial plasmid DNA was isolated by one of two methods, depending upon the quantity and concentration desired. For DNA sequencing and restriction enzyme analysis of subcloned DNA, the QIAprep Spin Miniprep Kit (Qiagen) was used to isolate 10-20 µg of plasmid DNA from 3 to 5 ml of an overnight bacterial culture. Alternatively, plasmid DNA used for transfection of cultured cells was prepared from 100 to 200 ml of overnight bacterial culture using the Qiagen Maxiprep DNA Isolation Kit. Typical DNA yields ranged from 150 to 450 µg.

For BAC clone DNA, a Qiagen Maxiprep protocol modified for use with BACs was followed (protocol available from manufacturer). Major modifications to the basic protocol included the use of a larger culture volume (500 ml) and elution of BAC clone DNA from the column with buffer warmed to 50 °C. Yields of BAC clone DNA ranged from 100 to 150 µg.

YAC clone DNA was co-purified along with yeast genomic DNA according to a standard protocol (16) with some modifications. Briefly, following 2000 × g centrifugation of 100 ml of fresh yeast cell culture at room temperature for 10 min, the cell pellet was resuspended in SCE buffer (0.9 M sorbitol, 0.1 M sodium citrate, 0.06 M EDTA) with freshly added 0.3 M beta -mercaptoethanol and lyticase enzyme (Sigma). Formation of yeast spheroplasts was allowed to proceed for 1-2 h at 37 °C with gentle shaking. Spheroplasted cells were then pelleted by centrifugation at 1000 × g for 10 min. Yeast cell lysis was achieved by suspension of spheroplasts in lysis buffer (0.5 M Tris-Cl, pH 8.0, 3% N-lauroylsarcosine, 0.2 M EDTA, 1 mg/ml proteinase K) buffer and incubation for 30-45 min at 65 °C in a water bath. Overnight treatment of cell lysate with RNase PLUS (5 Prime right-arrow 3 Prime, Inc., Boulder, CO) at 37 °C adequately removed contaminating yeast RNA from the sample. Isolation of the DNA fraction was achieved through two successive organic extractions with an equal volume of 50:50 phenol/chloroform, followed by 3-4 extractions of the aqueous phase with chloroform only. YAC and yeast DNA were co-precipitated from the aqueous phase with 0.1 volumes of 3 M NaCl and 2.5 volumes of ice-cold 100% ethanol. After gentle spooling, the precipitated DNA was washed in 70% ethanol and air-dried. Purified DNA was resuspended in TE buffer, pH 8.0, with a typical preparation yielding 300-700 mg of nucleic acid as measured by A260 measurement.

PCR Amplification-- Due to the nature of the sequences being amplified, the PCR technology used in an experiment was adapted to each DNA target and template. Reaction components were used in amounts and concentrations as recommended by the manufacturer unless otherwise noted. Suggested annealing temperatures for each PCR kit were adjusted based on the sequence identity of the amplification primers and the sequence composition of the amplification target with the assistance of Oligo 4.0 (Molecular Biology Insights, Inc., Cascade, CO) primer design software. Reactions were cycled 30-35 times unless otherwise specified.

For PCR amplification of simple target DNAs (i.e. <5 kb with moderate to low GC content) AmpliTaq DNA polymerase enzyme and buffers (PerkinElmer Life Sciences/Applied Biosystems) were used. GC-rich regions of ICA1 sequence from human genomic, YAC, and BAC DNA samples were amplified using reagents from the Advantage GC Genomic PCR kit (Clontech, Palo Alto, CA), whereas the Advantage GC cDNA Enzyme (Clontech) facilitated amplification of GC-rich plasmid inserts and cDNA templates generated for 5'-RACE analysis. In cases of long PCR (>5 kb), or when other PCR methods failed, the eLONGase long PCR enzyme mix (Invitrogen) was employed. Oligonucleotides were synthesized in the DNA Sequencing and Synthesis Core Facilities of the Diabetes Institute, Children's Hospital of Pittsburgh.

YAC and BAC Library Screening-- PCR primers designed from the ICA1 exon 2 intron-exon boundary sequences were used to screen the Centre d'Etude du Polymorphisme Humain (CEPH) Mega-YAC Human DNA Library by systematic amplification of YAC clone DNA pools (primer sequences: 5'-CCTGGGACTTACAGGATCGA-3' and 5'-GACAGCAATAAAGAGCTCAC-3', annealing temperature 55 °C, 178-bp PCR product). The California Institute of Technology BAC Library (CITB Release IV, Research Genetics, Inc., Pasadena, CA) was similarly screened using a PCR approach. The PCR amplimer used for BAC library screening was a microsatellite (CA repeat) centered 1830 bp upstream of the ICA1 translation initiation codon (primer sequences: 5'-TATGAAACAGTGTTATTCTGGACCT-3' and 5'-GTACAGTATAGTAGTGCTAACA-3', annealing temperature 55 °C, 540-bp PCR product). Stab vials or frozen aliquots of each PCR-positive YAC and BAC library clone identified through screening were obtained, and purified DNA extracted from their respective cultures was retested under PCR conditions similar to those used for library screening to verify that the target ICA1 sequences were present.

Subcloning of PCR Products-- If necessary, amplified products from GenomeWalker-PCR, RT-PCR, and 5'-RACE experiments were gel-purified using the Qiagen Gel Extraction Kit (Qiagen) before subcloning, or they were subcloned by direct ligation of an aliquot of the PCR. The gel-purified or neat PCRs were subcloned into the pCR 2.1 vector (Original TA Cloning Kit, Invitrogen). When PCR primers were designed to include restriction sites, they were digested with the appropriate restriction enzyme(s) and ligated into an overhang-compatible aliquot of the pGL3 basic luciferase reporter vector. All ligation reac-competent Escherichia coli (Invitrogen) and plated on 100-mm LB agar plates containing 50 µg/ml ampicillin or kanamycin and 50 µg/ml 5-bromo-4-chloro-3-indoyl-beta -D-galactopyranoside (X-gal) for blue-white color selection of transformants (if applicable).

In situations where T/A PCR product ligation proved to be inefficient because of 3'right-arrow 5'-exonuclease activity from the PCR enzyme or enzyme mix used for amplification, the PCR product(s) were "tailed" with dA-overhangs prior to ligation. Briefly, 1 unit of AmpliTaq DNA polymerase was added to the completely cycled PCRs, incubated for 15 min at 37 °C, and immediately extracted with an equal volume of phenol/chloroform. The tailed products were then ethanol-precipitated, resuspended in a small volume (~1/2 of original reaction volume), and directly used in the T/A ligation step.

5'-Rapid Amplification of cDNA Ends (5'-RACE)-- The FirstChoice RLM-RACE Kit (Ambion, Austin, TX) was used for RNA ligase-mediated (RLM) RACE analysis of ICA1 transcripts, because it permits selective amplification of capped RNA molecules from non-poly(A)-selected RNA. Briefly, total cellular RNA is treated with calf intestinal alkaline phosphatase to remove the 5'-phosphate group from uncapped mRNA precursors, tRNA, rRNA, and small nuclear RNA molecules, followed by phenol/chloroform extraction and recovery of the dephosphorylated RNA by ethanol precipitation. Dephosphorylated RNA is then incubated with tobacco acid pyrophosphatase to remove m7Gpp from the cap structure of the 5' end of capped RNAs, leaving a single 5'-terminal phosphate group. Ligation of a synthetic RNA adapter of known sequence to the calf intestinal alkaline phosphatase- and tobacco acid pyrophosphatase-treated RNA proceeds in the presence of E. coli RNA ligase. Adapter-ligated RNA is reverse-transcribed into cDNA using Moloney murine leukemia virus-reverse transcriptase enzyme in the presence of random decamer primers. The resultant single-stranded cDNA then serves as template in nested PCRs using adapter sequence-specific primers (provided with the RLM-RACE kit) and gene-specific primers (GSP1 and GSP2) designed from ICA1 exons 1 and 2. The sequences of these latter primers are as follows: GSP1 (antisense exon 2), 5'-TGCATCTTATTTACAACTGACTTATCTTGA G-3' and GSP2 (antisense exon 1/2 boundary), 5'-TGTAAGTCCCAGGGATAACTGCATTTGTGT CCTGA-3'. The Advantage GC cDNA enzyme was used in all nested RLM-RACE PCRs.

Cloning of ICA1 Sequences into pGL3 Basic-- To clone segments of the ICA1 5'-flanking region and UTR exons, a 1028-bp genomic segment spanning the entire region was amplified from CITB-503D2 DNA via PCR, followed by T/A ligation of the product into the pCR2.1 vector (amplification primers, 5'-TAGGAAGCAGCTATGCCAACACT-3' and 5'-CAGAGAAGGCAGCTCCTACCA-3'). Excision of various segments of the cloned PCR product using pairs of restriction endonucleases recognizing sites found in the pCR2.1 vector arms, internal restriction sites of the insert, or a combination of the two allowed for directional cloning of defined ICA1 sequences into a pGL3 Basic vector having compatible overhangs. A second strategy for cloning ICA1 sequences into pGL3 Basic involved the design of ICA1-specific primers with restriction endonuclease sites added at the 5' end. After spin column chromatography purification of a PCR product amplified with these primers, the product was digested with one or more restriction enzymes to create overhangs compatible with those generated on an aliquot of the pGL3 Basic vector. Heat inactivation and gel purification or spin column purification of the digested PCR product was then followed by ligation into pGL3 Basic.

Site-directed Mutagenesis-- Two pGL3 promoter reporter constructs, ExA -957 and ExB -440, were modified using the QuikChange Site-directed Mutagenesis Kit (Stratagene, Cedar Creek, TX) to introduce mutations at suspected key sites within these promoters. Sequence mutations were introduced to the Sp1/GC box, Inr, (Sp1/GC box + Inr), and CREB sites within ExA -957 using the following oligonucleotides, respectively (mutant name and bases mutated are in boldface and in parentheses): 5'-CCTGCCGGAGAGCAGGGtattGGTCACTCTGGGCGGCG (ExA-GC, -564 to -561), 5'-CGGAGAGCAGGGGCGGGGTggaggTGGGCGGCGGATCCG (ExA-Inr, -557 to -553), 5'-CCTGCCGGAGAGCAGGGtattGGTggaggTGGGCGGCGGATCCGAGC (ExA-GC/Inr, mutations of -564 to -561 and -557 to -553), and 5'-CCTGTCCGCCAGGTCATcggcACGCAAACGCTATGGCCACGTGG (ExA-CREB, -612 to -609). For the ExB -440 construct, the Sp1/GC box and CREB sites were modified with the following oligonucleotides, respectively: 5'-CCGGTTCCTGCGCTCCCCaataCCCTTTCCCTCGCCTTCG (ExB-GC, -196 to -193) and 5'-CCCTTTCCCTCGCCTTCGatccACGCTGACGTCGGATGAGTG (ExB-CREB, -174 to -171). The mutation strategy of the QuikChange protocol was adhered to for all site-directed mutation reactions, using the above oligonucleotides in combination with a reverse complement sequence primer in each PCR-based mutagenesis reaction. After digestion of the reactions with DpnI to remove non-mutated, methylated DNA, each mutated plasmid reaction was used to transform XL1-Blue supercompetent cells. Resultant colonies were then miniprepped and screened via automated fluorescent sequencing for successful mutation incorporation.

Luciferase Assays-- For luciferase transfection experiments, NMB7 and RIN 1046-38 cells were plated at a density of 0.8 × 105 cells/well of a 12-well plate the day before transfection. Growth in the appropriate complete medium for 20-24 h generally resulted in 50-70% cellular confluence in each well at the time of transfection. Transient transfection of luciferase constructs and mutants thereof into the various cultured cell lines using Effectene Transfection Reagent (Qiagen) was followed by incubation of the transfected cells at 37 °C and cellular lysis 35-45 h after transfection. Luminescence assays of cellular lysates allowed for a semi-quantitative measure of luciferase production driven by each cloned segment of the ICA1 5'-flanking region. Within a given assay, plate wells were set up in triplicate for each transfected construct or control vector. The amount of DNA transfected was held constant for each construct and cell line, with a total amount 0.3 µg/well of a 12-well plate. Each Effectene reagent was used in the amount recommended by the manufacturer's protocol in proportion to the amount of DNA applied to each well. The strength of the promoting activity for each construct was assessed by comparison to basal luciferase expression from the promoterless pGL3 Basic vector transfected into triplicate samples of the same cell type within the same assay. To allow for normalization of firefly luciferase values based on transfection efficiency, a co-reporter vector expressing Renilla luciferase from the thymidine kinase promoter (pRL-TK) was included at a ratio of 1:10 of co-reporter plasmid to experimental promoter construct (or control vector) in the transfection mixture. Careful optimization of transfection conditions to maximize transfection efficiency provided an assay system yielding consistent results from repeated experiments.

Transfected cells were lysed by adding 100 µl of Passive Lysis Buffer (Promega, Madison, WI) to each well of a 12-well plate, followed by vigorous pipetting of the detached cells. Cell lysates were subjected to two freeze-thaw cycles (liquid N2 and 20 °C H2O) and either immediately assayed for luciferase activity or stored at -70 °C for analysis the following day. Firefly and Renilla luciferase activities of each lysate were measured sequentially via manual reagent injection in a Monolight 2010 luminometer using the Dual-Luciferase Reporter Assay System (Promega).

In order to compare inter-construct firefly (FF) luciferase activity values, the raw data relative light unit (RLU) readings were corrected by normalizing each sample according to transfection efficiency. One Renilla luciferase RLU (R-RLU) measurement from the pGL3 Basic control transfectant samples in a given experiment was selected and used to normalize each measured FF luciferase value as follows: ((normalizing R-RLU) divide  (sample R-RLU)) × (sample FF-RLU) = (Nml sample FF activity). The normalized (Nml) triplicate values for each construct were then averaged to arrive at a relative measure of luciferase activity for that ICA1 promoter reporter construct. Fold increases in promoter activity over the pGL3 Basic vector were calculated from the following formula: (Avg Nml sample FF activity) divide  (Avg Nml pGL3 Basic control) = (sample fold increased activity over pGL3 Basic); where Avg is average. These calculations were performed independently for each transfection experiment data set (n = 3-5), and the average of all results obtained for a given ICA1 promoter reporter construct was used as a measure of relative promoter strength. Where indicated, statistical analysis of luciferase reporter data was performed using the Mann-Whitney U test.

Oligonucleotide Synthesis-- All oligonucleotides used as primers in the various PCR-based methods were synthesized on an ABI 394 DNA Synthesizer (Applied Biosystems, Inc.) using solid phase synthesis and phosphoramidite nucleoside chemistry, unless a primer was provided with a particular molecular biology kit.

Automated Fluorescent Sequencing-- Automated fluorescent sequencing of plasmid DNA or purified PCR products was performed using an ABI 377 Automated DNA Sequence Analyzer (Applied Biosystems, Inc.) with either the dRhodamine or BigDye Terminator Cycle Sequencing Kits (Applied Biosystems, Inc.). Typically, TA vector-cloned PCR products were sequenced using the universal -21 M13 and M13 reverse primers or internal primers designed from insert sequences. Direct sequencing of PCR products involved centrifugal filtration purification of the amplified DNA on Amicon Microcon YM-50 columns (Millipore Corp.) followed by sequencing with the same primers used for amplification. Inserts contained within the pGL3 Basic vector were sequenced with primers designed from regions flanking the multiple cloning site of the vector (pGL3-upstream, 5'-AGTGCAAGTGCAGGTGCCAG AA-3' and pGL3-downstream, 5'-CTTTATGTTTTTGGCGTCTTCCAT-3') or with primers internal to the cloned ICA1 sequence.

Sequencing of YAC and BAC Clone Ends-- To sequence YAC clone ends, the junctions between the two YAC vector arms and the insert sequence were first amplified and subcloned from a YAC fragment library constructed using the Universal GenomeWalker (GW) kit (Clontech Laboratories, Inc.). A gene-specific primer set (GSP1 and GSP2) was designed for each of the YAC vector arms to be used in combination with the set of nested GenomeWalker adapter primers provided with the GenomeWalker kit: HYAC-C, 5'-GCTACTTGGAGCCACTATCGACTACGCGAT-3' and LS-2, 5'-TCTCGGTAGCCAAGTTGGTTTAAGG-3' (left YAC arm); HYAC-D, 5'-GGTGATGTCGGCGATATAGGCGCCAGCAAC-3' and RA-2, 5'-TCGAACGCCCGATCTCAAGATTAC-3' (right YAC arm). Reaction conditions suggested by the GenomeWalker kit were employed without modification. Any PCR products amplified from the five YAC GenomeWalker library reactions were subcloned into the pCR2.1 vector for plasmid-based automated fluorescent sequencing.

The ends of each BAC clone were sequenced directly using 1 µg of purified BAC clone DNA in each of two automated fluorescent sequencing reactions extended from the two universal primers -21 M13 and M13 reverse.

Sequence Homology Analyses-- Homology searches of nucleic acid and protein amino acid sequences were conducted through the Basic Local Alignment Search Tool (BLAST) server available on the National Center for Biotechnology Information (NCBI) internet website (www.ncbi.nlm.nih.gov).

GenBankTM Sequence Submissions-- Novel ICA1 cDNA and promoter function-associated genomic regions were submitted to the GenBankTM data base using the BankIt submission tool available through the NCBI website. YAC and BAC clone end sequences were submitted to the GSS data base via electronic mail to the address: batch-sub{at}ncbi.nlm.nih.gov.

Transcription Factor Binding Site Analysis of 5'-Flanking Sequences-- To assess the 5'-flanking and UTR regions of ICA1 for potential regulatory sequences, genomic DNA sequences of interest were analyzed using the public domain MatInspector version 2.2 software program available on the internet (genomatix.gsf.de/cgi-bin/matinspector/matinspector.pl). A core similarity of >= 0.900 from the MatInspector analysis was used as a cut-off for consideration of potential query sequence matches with known transcription factor recognition sequences.

    RESULTS
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Genomic Organization of the ICA1 5'-UTR Exons-- We identified a single ICA1-containing CEPH Mega-YAC clone (CEPH-813G2) in a PCR-based YAC library screen. By using DNA purified from this YAC clone, we then initiated a cloning strategy involving primer-based genome walking in the 5' direction from ICA1 exon 1, but we failed to identify any ICA1 5'-untranslated sequence within ~12 kb of genomic DNA sequence upstream of the translation initiation codon (data not shown). We did, however, identify a microsatellite (18-20 adjacent CA dinucleotides) within this interval (~1830 bp upstream of the ATG), for which we designed flanking primers that uniquely amplify this marker locus from CEPH-813G2 DNA as well as from human genomic DNA. The microsatellite flanking primers were used in a successful PCR-based screen of the California Institute of Technology BAC (CITB) library. Two BAC clones, CITB-426N6 and CITB-503D2, were identified as PCR-positive for the expected microsatellite amplimer band.

Sequence from a Human Genome Project (HGP) PAC clone (RP11-560C1, GenBankTM accession number AC007009, R.H. Waterston, Genome Sequencing Center, Washington University School of Medicine, St. Louis, MO) that encompasses all three of the known ICA1 5'-UTR sequences became publicly available shortly after we had identified the two ICA1-positive BAC clones. We confirmed these data (Fig. 1) by amplifying and directly sequencing PCR products spanning the region of interest from our YAC and BAC clone DNA using flanking primers. It must be noted that we have labeled the ICA1 5'-UTR exon sequences as exons A, B, C rather than adopt the -3, -2, -1 exon notation proposed by Gaedigk et al. (4). This modification to the exon identifiers was made because the genomic alignment of the UTR exons reported by these authors is in error, as they had localized the true leading exon (exon A or exon -2) between exons B (exon -3) and C (exon -1) in their report of the exon-intron boundary data. We feel that maintaining the negative integer notation for these exons would lead to confusion regarding the overall genomic organization and promoter structure of the ICA1 gene.


View larger version (16K):
[in this window]
[in a new window]
 
Fig. 1.   Organization of the ICA1 5'-UTR exons. Flanking PCR primers were used to amplify the genomic region containing ICA1 5'-UTR exons from YAC and BAC clone DNA. Because of the high GC content of the region, specialized GC-rich PCR conditions were employed. Amplified PCR products were directly sequenced using the indicated amplification primers, confirming the UTR exon arrangement shown. The three 5'-UTR exons localize >26 kb upstream of the translation initiation codon in exon 1 yet span a genomic interval of <600 bp. The three exons are not found spliced to one another in any combination within isolated ICA1 cDNAs or ESTs, indicating that the splicing to exon 1 is mutually exclusive.

The ICA1 gene locus within its chromosomal context is summarized in Fig. 2, demonstrating that the ICA1 gene is transcribed from 7p22 in a Telright-arrowCen orientation. The BAC clones CITB-426N6 and CITB-503D2 that we identified share significant overlap with respect to their genomic sequence content (according to BAC end sequencing results), and together they span a gap that had once existed between HGP clones RP11-560C1 and RP4-594A5. It is also notable that the RP11-560C1 PAC clone insert begins ~10 kb to the 5' side of ICA1 exon A and extends downstream to include every ICA1 exon with the exception of exon 14. Sequence from the YAC clone ends has allowed us to map our data onto HGP sequence data, with the YAC insert spanning 1.09 Mb of the HGP chromosome 7 working draft sequence (Fig. 2). The entire ICA1 locus is contained within this YAC clone, along with nearly 1 Mb of downstream sequence extending centromeric from 7p22.


View larger version (17K):
[in this window]
[in a new window]
 
Fig. 2.   Chromosomal context of isolated YAC and BAC clones at the ICA1 locus. CEPH(Mega YAC)-813G2 intercepts a 1.1-Mb sequence mapping to HGP chromosome 7 clone NT_007844.6. The two PAC clones RP4-733B9 (accession number AC005532) and RP4-594A5 (accession number AC007128) define this interval, as they yielded positive BLAST hits for left (L) and right (R) CEPH-813G2 end sequences, respectively. The relationship of the two BAC clones from our library screen (CITB-426N6 and CITB-503D2) to PAC clones RP11-560C1 (accession number AC007009) and the ICA1 locus was determined by BAC end sequencing and is also illustrated. The only other defined gene encompassed by the YAC sequence is RPA3 (replication protein A3), although C1GALT1 (UDP-galactose:N-acetylgalactosamine-alpha -R beta 1,3-galactosyltransferase) lies within 1 Mb telomeric to the start of ICA1 immediately beyond the end of the YAC clone insert. The direction of transcription for each of these genes is indicated by an arrow below the chromosome 7 clone NT_007844.6 sequence. Hypothetical genes derived from computer analysis of genome data are not included in the diagram; however, it should be noted that a high concentration of putative protein coding segments flanks the left end of the YAC clone, whereas the remainder of the interval is rather sparsely populated by potential genes or ESTs found in the data base.

Analysis of cDNAs and ESTs from the ICA1 5'-UTR-- To characterize the ICA1 gene transcription initiation site(s), we first examined ICA69 cDNA and EST sequences available in the NCBI GenBankTM data bases for messages with potentially full-length 5' ends. As shown in Fig. 3, publicly available human ICA1 transcripts generally do not agree with respect to the sequence content and/or length of their 5'-untranslated leader regions. Three species of mRNA differing in 5' end sequence content are found, corresponding to the splicing of the known exon A, B, and C sequences downstream to either of two splice acceptor sites in ICA1 exon 1 common to all transcripts. Splicing to exon 1 is variable, with some exon B transcripts using a splice acceptor farther downstream than the splice site found in all exon A transcripts.


View larger version (35K):
[in this window]
[in a new window]
 
Fig. 3.   cDNA clones and ESTs containing ICA1 5'-UTR exon sequence. A BLAST search of the non-redundant (NR) and human EST data bases using genomic sequence from the 5'-UTR region and exon 1 returned 19 matches for sequences demonstrating splice patterns consistent with ICA1 transcripts as indicated by downstream splicing to exon 2 (not shown). Alignment of these cDNA and EST clones does not clearly define a consensus for initiation sites among the three UTR exons. Although three of the exon A clones approach the size for this exon as determined from our data (as per Figs. 4 and 5), these sequences were not obtained with the purpose of defining transcription initiation sites for the gene. For exon B containing transcripts, the apparent agreement of sequences terminating at -230 is likely an artifact related to the existence of a NotI restriction site at this location. The remaining exon B clones exhibit little agreement in their 5' termini. The numerical annotation adopted for this figure is consistent with that used in the text and is arbitrarily defined by designating the last base of exon C as -1. All of the untranslated sequences are numbered negatively from this point upstream along the genomic sequence continuum. The first base of exon 1 is numbered +1. An internal exon 1 alternative splice acceptor site occurs around +66, and an asterisk denotes the location of the translation initiation codon at +80 relative to this splice acceptor site.

Additionally, there is great variability in the 5' termini of sequences from each of the exons depicted in Fig. 3. Notably, the 5' exon B untranslated sequences from cDNA clone IS4 and ESTs AW583029, BG484463, and BI754058 are short by comparison with the ESTs truncating at the NotI restriction site, whereas the exon A clones variably terminate over a range of 56 bp. The extension exon B transcripts to the NotI restriction site is likely to be an artifact of the EST library preparation method, which commonly employs this rare cutter to generate sticky ends for cloning, so the 5' termini of these clones have been cleaved off.

RLM-RACE Localization of the ICA1 Transcription Initiation Sites-- We identified transcription initiation sites and determined the lengths of ICA1 exons A-C by analyzing islet, testis, heart, and NMB7 total cellular RNA via RNA ligase-mediated 5'-RACE (RLM-RACE). Gene-specific primers (GSPs) for amplification of ICA1 5'-UTRs were designed from the sequences of exons 1 and 2. Because these exons are common to all ICA69-encoding transcripts, different ICA1 5' end sequence clones could be sequenced individually regardless of the amplified UTR flanked by the known RLM-RACE adapter and exon 1/2 sequence primers.

The results of our RLM-RACE analysis are summarized in Fig. 4. For each RNA sample analyzed at least 20 RACE-PCR-generated clones were independently sequenced. Overall, significant agreement of the first nucleotide from the sequenced exon A and B transcripts is noted both within and among the different tissues. Although every sequenced clone beginning with exon A or exon B did not start at exactly the same nucleotide, a cluster of start sites within a 2-4-bp interval was consistently detected for the major species from each exon. Additional variability in the exon A starting nucleotide was observed for testis transcripts, as two clones extended beyond the major start site and three clones proved to be shorter. The alternative starting nucleotides for exon B transcripts appear to be utilized in a somewhat tissue-specific manner. Most notably, every RLM-RACE clone derived from heart tissue transcripts agrees with regard to 5'-UTR exon length and starting 5'-nucleotide, extending ~35 nucleotides upstream of the common exon B start site identified from testis, islet, and neuroblastoma RNA samples.


View larger version (26K):
[in this window]
[in a new window]
 
Fig. 4.   Summary of RLM-RACE results for ICA1 initiation. Four different sources of RNA were assayed to determine the frequency of use for the 5'-UTR exons in ICA1 transcription initiation. Four major transcript variants (exons A, B1, B2, and C) that include sequence from the 5'-UTR exons were identified. Exon A and exon B transcripts were more highly represented overall as compared with exon C transcripts. Notably, exon A transcripts were absent from the set of neuroblastoma and heart ICA1 transcripts sequenced, whereas the heart transcript set was dominated by the expression of the longer exon B2 transcript variant. The asterisk indicates that for transcripts appearing to begin in exon 1, there was no one dominant starting nucleotide and that this region is the same region in which alternative splice acceptance of 5'-UTR exons occurs. The dagger  signifies that the "Total n" reflects a larger number of sequences than those included in the table. The remaining few sequences variably terminated at points within exons A and B but without an apparent pattern.

Complete Genomic Organization of the ICA1 Gene-- By having defined the sizes of the ICA1 5'-UTR exons, our knowledge of ICA1 exon-intron organization is now complete (Table I). The genomic interval from the RLM-RACE determined transcription initiation site for exon A through coding exon 14 spans >148.7 kb as calculated from available chromosome 7 HGP data. Several of the intron distances determined from these data differ significantly in comparison to the PCR-amplified size data reported by Gaedigk et al. (4). Of note, the length of the intron between exon C and exon 1 as well as the lengths of introns 2, 6, and 8 were previously unknown. Remarkably, intron 6 is 59,651 kb long based on clone RP11-560C1 data, comprising 40.1% of total ICA1 gene length. Significant refinements of intron lengths over the previous report are noted for introns 3, 7, 12, and 13. For example, the length reported for intron 3 by Gaedigk et al. (4) was 6.5 kb; however, we confirmed the length of this intron to be only 749-bp by amplifying and directly sequencing the intron with primers designed from exons 3 and 4 (data not shown).

                              
View this table:
[in this window]
[in a new window]
 
Table I
Summary of ICA1 exon-intron lengths and genomic organization

Defining ICA1 Promoter Function-- Our approach to analyzing the ICA1 5'-flanking sequences for promoter activity was instructed by the confirmation of initiation sites for transcription of the ICA1 gene and assisted in part by a computer analysis of the flanking sequences for potential transcription activator sequences and transcription factor (TF)-binding sites. The sequence of this region and the locations of TF clusters are presented in Fig. 5. Fig. 5 includes the entire sequence of interest from the functional ICA1 5'-flanking region, with the UTR exon sequences and major transcription initiation sites identified. Clusters of high scoring sequence matches to TF-binding sites (MatInspector version 2.2 results) are arrayed schematically in Fig. 6A to emphasize the high density of potential regulatory elements in proximity to the three major ICA1 transcription initiation sites. A close inspection and analysis of the sequence also reveal that it meets criteria to define a CpG island, specifically being a sequence tract of >200 bp with GC content of >50% and an observed:expected ratio for the occurrence of the dinucleotide CG [O:E(CpG)] of >0.6 (17). For a 1000-bp segment of ICA1 5'-UTR flanking region extending downstream from base -680, the sequence is composed of 73.9% GC bases and has an O:E(CpG) ratio of 0.81. 


View larger version (58K):
[in this window]
[in a new window]
 
Fig. 5.   Sequence of the ICA1 5'-UTR exons and TF-binding sites in the 5'-flanking region. The entire sequence of the ICA1 5'-UTR exons and flanking regions is depicted with exon sequences displayed in capital letters and intron sequences in all lowercase letters. The consensus Inr sequence at the major initiation site for exon A is in boldface, as are the major initiating nucleotides for exons B and C. Although the exon A Inr is an ideal match to the pyrimidine (Py)-rich consensus Inr sequence (PyPyA+1N(T/A)PyPy), the initiating nucleotides for exon B transcription also exist within pyrimidine-rich tracts that associate closely with Sp1-binding site motifs. Other close matches to TF consensus sequences are indicated as underlined segments. The strand polarity of TF recognition sequences is indicated by either a + for sense or - for antisense orientation.


View larger version (21K):
[in this window]
[in a new window]
 
Fig. 6.   A, TF-binding site sequence matches identified by the MatInspector version 2.2 program are illustrated in a schematic depiction of the ICA1 5'-UTR exons and flanking region. Clustering of potential TF-binding sites surrounding exon A and around -900 is indicated by dotted outlines on the figure. Each of the three exons is preceded by an Sp1/GC box site, whereas exon A has two additional Sp1/GC box sites at its 3' end. Of the remaining TF indicated on the figure, many are constitutively expressed in cells, but others are noted to be somewhat more tissue-specific (i.e. MyoD and OLF1). B, summary of luciferase reporter vector constructs used to assess ICA1 5'-flanking region promoter activity. Shown are the basic structures of the four luciferase reporter vectors constructed using portions of ICA1 flanking region sequence and 5'-UTR exons. Dashed lines indicate where sequence is missing, such that the joined segments would be directly juxtaposed in the plasmid vector. The ExA -957 and ExB -440 were specifically designed to omit the native exon A and B splice donors, respectively, so that cryptic splicing events would be less likely to affect luciferase gene translation. C, results of luciferase reporter assays for ICA1 promoter constructs. Each of the four experimental constructs and the control (no insert) vector were transfected separately into each of the two cell lines indicated. Fold increase in luciferase activity was calculated from the raw data set as described in the text. Transfection of the exon A and B only constructs (ExA -957 and ExB -440) resulted in opposite expression profiles in the islet- versus neuron-derived cell lines. This observation is consistent with a tissue-specific pattern of expression from the ICA1 exons A and B. Expression from the exon C construct was less efficient, perhaps due to the inclusion of the native exon C splice donor in the luciferase reporter vector. Results are expressed as mean ± S.E. of 3-5 independent sets of transfection experiments performed in triplicate.

The delineation of independent ICA1 5'-UTR exons required that our promoter cloning strategy address the possibility that independent promoter activities are associated with the three separate exons. Thus, we constructed four basic luciferase promoter reporter plasmids (Fig. 6B). The first includes upstream flanking sequence contiguous with a downstream sequence interval encompassing all three 5'-UTR exons (ExABC -1012, total plasmid insert length 1031 bp). The three remaining reporter plasmids contain sequences from only one of the 5'-UTR exons, along with a reasonable amount of upstream flanking sequence truncated so as not to include any portion of the preceding exon (ExA -957, 453 bp; ExB -440, 293 bp; and ExC -90, 109 bp). The ExA -957 and ExB -440 constructs were engineered to exclude the splice donor site at their 3' termini so that cryptic splicing events would be minimized during transcription of the luciferase reporter gene in vivo. The two cell lines used in our promoter reporter assays were chosen based on evidence for cellular ICA69 expression assessed by RT-PCR and on their similarities with tissues having the highest levels of ICA69 expression, namely pancreatic islets (rat insulinoma cells, RIN 1046-38) and neuronal tissue (human neuroblastoma cells, NMB7).

The results of our ICA1-luciferase promoter reporter assays are summarized in Fig. 6C. Transfection of construct ExABC -1012 resulted in 1.3- and 3.9-fold increases in reporter gene (luciferase enzyme) activity in RIN 1046-38 and NMB7 cells, respectively, as measured in a dual luciferase assay. Interestingly, however, transfection of cells with the independent exon A and B constructs (ExA -957 and ExB -440) demonstrated a difference in promoting activity that was dependent upon the transfected cell type. The ExA -957 construct was more active in RIN 1046-38 (rat insulinoma) cells, exhibiting a 2.5-fold increase in luciferase activity, whereas the ExB -440 construct showed a greatly enhanced, 12-fold level of activity in NMB7 (neuroblastoma) cells. The augmentation of luciferase activity for the isolated exon A and B constructs over the ExABC -1012 construct is thought to result from the exclusion of splice donors from the exon 3' ends that may be facilitating cryptic splicing in the luciferase transfection system. The construct designed to isolate exon C from the other 5'-UTR exons (ExC -90), includes the exon C splice donor signal and 19 bp of 3'-flanking sequence. Transfection of this plasmid did not result in any significant increase in luciferase activity as compared with the empty pGL3 Basic control vector in either of the cell lines tested.

Functional Impact of Site-directed Mutagenesis of ICA1 Promoter Elements-- The results of luciferase assays involving the mutated ExA -957 (Fig. 7A) and ExB -440 (Fig. 7C) constructs are summarized in Fig. 7, B and D. In NMB7 cells, it is again obvious that there is very little transcriptional activity from exon A, as the parent ExA -957 construct and all four mutants showed essentially no activity over background pGL3 Basic transcription (Fig. 7B). This finding correlates with the results of our transcript analysis detailed above. For RIN 1046-38 cells, however, the ExA -957 mutants exhibit increases in promoter activity for mutation of the Sp1/GC box and the Inr independently, as well as for the double mutant combination of the two (Fig. 7B). The ExA-CREB mutant shows no difference in promoting activity over the parent ExA -957 construct (Fig. 7B).


View larger version (23K):
[in this window]
[in a new window]
 
Fig. 7.   Site-directed mutations of promoter elements in transiently transfected RIN 1046-38 and NMB7 cells. A, alignment of the parent exon A (WT) and mutant sequences (Mut) introduced to the Sp1/GC box (ExA-GC, -564 to -561), Inr (ExA-Inr, -557 to -553), and CREB sites within exon A (ExA-CREB, -612 to -609). B, for RIN 1046-38 cells, the ExA -957 mutants exhibit increases in promoter activity for mutation of the Sp1/GC box and the Inr independently, as well as for the double mutant combination of the two [Sp1/GC box + Inr]. The ExA-CREB mutant shows no difference in promoting activity over the parent ExA -957 construct. C, alignment of the parent exon B (WT) and mutant sequences (Mut) introduced to the Sp1/GC box (ExB-GC, -196 to -193), and CREB sites within exon B (ExB-CREB -174 to -171). D, mutation of the exon B CREB site leads to a significant increase in luciferase expression in both RIN 1046-38 and NMB7 cells, corresponding to 7.7- (p < 0.05) and 2.0-fold enhancement in activity over the parent ExB -440 vector. Results are expressed as mean ± S.E. of at least 3 independent sets of transfection experiments performed in triplicate. By using Mann-Whitney test, *, p < 0.05 comparing the mean of results for the mutation of the exon B CREB site with the mean of results for the parent exon B. p values < 0.05 were deemed statistically significant.

For exon B mutants, there are some very dynamic changes seen in both cell lines, particularly with mutation of the CREB site (Fig. 7D). There is a 6.3-fold increase in luciferase activity in RIN 1046-38 cells when transfected with the ExB-GC mutant, whereas a similar increase is not seen when ExB-GC is transfected into NMB7 cells (Fig. 7D). Interestingly, however, mutation of the exon B CREB site results in a marked augmentation of luciferase expression in both RIN 1046-38 and NMB7 cells, with 14.7- and 22.9-fold increases in luciferase activity over pGL3 Basic, respectively, corresponding to 7.7- (p < 0.05) and 2.0-fold increases in activity over the parent ExB -440 vector (Fig. 7D).

    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

In the present study, we have explored the complex structure and functional characteristics of the diabetes-related autoantigen gene ICA1. Initiation of ICA1 transcription is found to originate from any of three distinct 5'-untranslated exons having independent transcription initiation signals characteristic of non-TATA, non-CAAT, GC-rich promoters. Transcripts utilizing each of the three 5'-UTR exons coexist in many ICA1-expressing tissues, although the 5'-UTR exon sequences are never included together in the same transcript. We present evidence, however, that exon A transcripts predominate in islets and testis RNA samples as compared with exon B or C transcripts, whereas exon B transcripts are the major expressed form in neuronal and cardiac tissue RNA samples. No additional ICA1 5'-UTR sequences were detected. An earlier report (4) had suggested that the three 5'-UTR exons are alternatively spliced, but our data conclusively demonstrate that the identified 5' termini of these exons lack the appropriate splice acceptor consensus sequences (Fig. 5). Furthermore, the 5'-RLM-RACE procedure identified the same 5' end initiating nucleotides among ICA1 transcripts amplified from different tissues, suggesting that these initiation sites are common starting points for transcription rather than artifacts of the amplification procedure.

Parallel functional studies to screen cloned ICA1 5'-flanking sequences for transcription promoting activity further support the conclusion that utilization of ICA1 5'-UTR exons for transcription initiation is tissue-specific. Specifically, a promoter reporter construct containing only exon A and its upstream flanking sequence showed greater activity in islet-derived RIN 1046-38 cells than in neuroblastoma cells, whereas a second construct designed from exon B and its upstream flanking sequence was preferentially active in the neuron-derived cells rather than islets. Based on sequence surrounding the identified transcription initiation sites, we originally hypothesized that the Sp1/GC box-Inr element paring was likely to play a role in the activation of transcription from exon A, given the common association of these consensus sequences reported in the literature (18). However, when promoter sequences were mutated at these sites, there was an associated increase in the activity of luciferase transcription, at least as measured in islet-derived RIN 1046-38 cells. Although these experiments do not provide definitive evidence to suggest a precise molecular mechanism to account for this observation, they imply that the Sp1/GC box site plays an inhibitory role in controlling exon A transcription. It seems reasonable to postulate that Sp3, the Sp transcription factor family member associated with transcription inhibition (19-21), rather than Sp1, is the major transcription factor recognizing and binding to the GC box site in RIN 1046-38 cells. With mutation of the exon A GC box, Sp3 cannot bind as well to the sequence and, therefore, has less chance to inhibit expression from exon A. Similarly, a modest increase in transcriptional activity within RIN 1046-38 cells is noted when the exon B Sp1/GC box site is mutated, perhaps additional evidence that Sp3 plays a role in controlling ICA1 transcription within RIN 1046-38 cells, in contrast to NMB7 (neuronal) cells where these mutations have no effect. Further experimentation, such as electrophoretic mobility shift assays, designed to explore the potential for Sp1/3 binding to the GC box sites will be necessary to provide support for these hypotheses (22).

An understanding of the role for CREB site binding proteins in controlling transcription from ICA1 exon B will also benefit from additional studies of transcription factor binding to ICA1 promoter sequences. Mutation analysis of the exon B CREB site, resulting in marked increases in luciferase activity for both RIN 1046-38 and NMB7 cells, suggests a negative regulatory role for CREB on gene expression through binding to regulatory elements in the ICA1 promoter (23-26). Likewise, Reusch et al. (23, 27) reported that CREB plays a pivotal role in adipocyte survival likely regulating the expression of specific pro- and anti-apoptotic genes such as Akt. Thus, our data suggests that this CREB-related influence may be less cell type-specific than the effects of mutation at the GC box sites, but the effect on expression in pancreatic islet-derived cells does appear to be of a greater magnitude than that noted for neuron-derived cells.

The transcription factor CREB and its wide profile of inducibility has mainly been implicated in glucose homeostasis, growth factor-dependent cell survival, and T-cell receptor signaling (28). CREB was the first transcription factor for which it was demonstrated that phosphorylation regulates its activity; the molecule is activated by cAMP and a variety of other signals. Its family members consist of the activating transcription factor 1 (ATF1) and the cAMP-response element modulator. CREB is a substrate for a host of cellular kinases including AKT (29), p38/Ras (30), MAP-KAP-2 (31), protein kinase C (32), pp90rsk (33), and calcium-calmodulin kinases II (34) and IV (35). Although CREB is perhaps one of the most studied phosphorylation-dependent transcription factors, relatively little is known about the physiological role of this protein in different cellular microenvironments. There is still discussion on how signal discrimination is achieved within the CREB system. Even though several signals have been shown to promote phosphorylation of CREB at Ser-133, it is assumed that CREB can distinguish cAMP from non-cAMP signals at the level of co-activator CREB-binding protein recruitment (28). The characterization of cofactors modulating CREB-binding protein commitment to a specific signaling pathway from a wide array of cellular stimuli is currently under investigation.

Knowing that mutation at the CREB site within the ICA1 exon B promoter enhances ICA1 transcriptional activity will be of importance in completing ongoing experiments to augment expression of ICA69 in islet cell lines that, in turn, may aid in elucidating the role for ICA69 in trafficking between the trans-Golgi network and immature secretory granules of pancreatic beta -cells that has been proposed (36).

Two additional elements of ICA1 promoter structure are likely to provide important clues toward understanding the tissue-specific aspects of ICA1 promoter function. First, the high density of potential TF-binding sites surrounding the three UTR exons provides fodder for mechanisms of transcription activation or repression based on TF availability in different tissues and on the potential involvement of known tissue-specific factors with potential binding sites in the ICA1 promoter region, like GKLF (gastrointestinal tract), OLF1 (olfactory neuroepithelium), MyoD (myogenic cells), or MZF1 (myeloid cells). Second, existence of the ICA1 promoter within a genomic CpG island hints at a role for DNA methylation/demethylation in the control of ICA1 expression. The methylation state of genomic DNA, most often within CpG islands, plays a central role in the epigenetics of gene imprinting (38-41) and has been implicated, although perhaps not proven (37), as an on/off switch for cell type-specific gene expression during cellular differentiation (41-47). The observed differences in ICA1 promoter activity between RIN 1046-38 and NMB7 cells will be better understood from studies investigating TF recognition and modulation of the ICA1 promoter and through an analysis of DNA methylation within native genomic DNA of the ICA1 CpG island sequence.

The variability of ICA1 transcripts, with respect to both the identity of 5'-UTR sequences and coding region splicing events, is likely to impact ICA69 protein expression. The potential contribution of alternative ICA1 splice forms to ICA69 translation is obvious (4), although no studies to date have provided definitive evidence for the existence of different ICA69 protein isoforms. On the other hand, 5'-UTR sequence identity could influence in vivo ICA69 mRNA stability and translation efficiency, depending upon the secondary structure of the 5'-UTR (48) and upon the number of ATGs contributed by a given 5'-UTR upstream of the accepted ICA69 translation start site in exon 1 (49). Although some variability of the exon 1 splice acceptor for 5'-UTRs has been detected, all human ICA1 transcripts known to us splice the 5'-UTR to one of a few closely related exon 1 consensus acceptors upstream of the translation initiation codon (Ref. 4 and data not shown). Therefore, it is unlikely that truly alternative translation initiation can occur. Identifying which of these factors further enhances the tissue- and cell type-specific differences in ICA1 expression that we have observed and to what extent they influence ICA69 protein translation remain important unanswered questions in our quest to understand the biology of this molecule.

The sum of available ICA1 and ICA69 structural and functional genetic data point to a complex process of transcript generation from an expansive gene locus. Of note, the large introns (largest almost 60 kb) and overall gene size (>150 kb) by nature require a high fidelity of RNA processing to command the expression of a comparatively small protein (483 amino acids). From the standpoint of understanding basic biology, the significance of introns and intron size continues to be a subject of discussion in the literature (50-52). Available cDNA sequences suggest that opportunities for alternative intron/exon processing of ICA1 transcripts may contribute to isoform variation of ICA69 protein expression (53), although the particular influence that extremely large introns would contribute to this process is unknown. That these large introns contain other genes or genetic regulatory elements is a strong possibility (41, 51), especially considering that two ICA1 introns measure ~26 and ~60 kb, larger than many whole genes and easily large enough to envelop genetic regulatory elements of importance to ICA1 gene function. More simply, considering the remarkable cross-species conservation of the ICA69 coding region, perhaps large introns merely confer a relative level of protection against coding region mutations and disruptive recombination events because these processes would be more likely to affect non-coding intron sequences on a statistical basis (54, 55). It is difficult to draw any firm conclusions regarding the significance of such huge introns to the ICA1 gene; we anticipate that additional collective gene structure data from the various ongoing genome projects will enhance our understanding of these observations.

The basic structure and function of the ICA1 core promoter units contributed by this work and the suggestion of candidate regions harboring additional ICA1 5'-regulatory sequences will facilitate the targeted screening of genomic regions for polymorphisms that could alter ICA1 promoter function. There is already evidence that a polymorphic variable number of tandem repeats in the insulin promoter is functionally important to variations of insulin expression and correlates with diabetes susceptibility (56-60). In addition, multiple research groups have recently reported the expression of ICA69, insulin, and the other major diabetes autoantigens glutamic acid decarboxylase (GAD65) and IA-2 within cells of the thymus (60-65) and in peripheral lymphoid organs (66). The centrality of these lymphoid tissues to the maintenance of immune self- tolerance suggests that perhaps at least one determinant of which proteins are targeted as autoantigens is the level or nature of expression in tissues other than where it has its characteristic biological effect. Immunohistochemical and molecular analyses of autoantigen-containing cells within lymphoid tissues have identified features of a dendritic cell phenotype (65-67), implicating these powerful antigen presenting cells in the process of establishing and maintaining immune tolerance via de novo expression of self-protein antigens. These observations lead us to postulate that inheritance of functional polymorphisms within promoters controlling the expression of identified autoantigens will ultimately be correlated with the occurrence of autoimmunity and autoimmune diseases. With our fundamental understanding of the structure and function of the ICA1 gene promoters in hand, a rational approach to identify relevant ICA1 promoter polymorphisms and to investigate disease associations is now feasible.

    ACKNOWLEDGEMENTS

We thank Drs. William Rudert, Robert Ferrell, Timothy Wright, and Michael Gorin for helpful discussions and insights; Dr. Ram Menon and Angel Shaufl for assistance with luciferase assays; Dr. Alessandro Doria (Joslin Diabetes Center, Boston) for providing facilities and instruction in BAC library screening; Dr. David Patterson (Eleanor Roosevelt Institute, Denver, CO) for YAC library screening; Dr. Christopher Newgard (Duke University, Durham, NC) for providing the RIN 1046-38 cell line; and Chip Scheide for computer support.

    FOOTNOTES

* This work was supported by the University of Pittsburgh M.D.,Ph.D. Program (to R. P. F.), the Henry Hillman Endowment Chair in Pediatric Immunology (to M. T.), National Institutes of Health Grants R01 DK53456 and R01 DK56200 (to M. P.), and an American Diabetes Association Career Development award (to M. P.).The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EBI Data Bank with accession number(s) AF146364, BZ286433, BZ286436, BAC503D2, BZ286434, BZ286435, YAC813G2, BZ286437, YAC813G2, and BZ286438.

Present address: Medical Services-GRB 740, Massachusetts General Hospital, 55 Fruit St., Boston, MA 02114-2696.

|| To whom correspondence should be addressed: Division of Immunogenetics, Diabetes Institute, Rangos Research Center, Children's Hospital of Pittsburgh, University of Pittsburgh School of Medicine, 3460 Fifth Ave., Pittsburgh, PA 15213. Tel.: 412-692-6491; Fax: 412-692-8131; E-mail: pietroma+@pitt.edu.

Published, JBC Papers in Press, October 29, 2002, DOI 10.1074/jbc.M210175200

    ABBREVIATIONS

The abbreviations used are: UTR, untranslated region; RACE, rapid amplification of cDNA ends; RT, reverse transcriptase; RLM, RNA ligase-mediated; CITB, the California Institute of Technology BAC Library; CEPH, the Centre d'Etude du Polymorphisme Humain; Inr, initiator; CREB, cAMP-responsive element-binding protein; NOD, non-obese diabetic; FF, firefly; RLU, relative light unit; NCBI, National Center for Biotechnology Information; HGP, Human Genome Project; GSP, gene-specific primers; TF, transcription factor.

    REFERENCES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

1. Pietropaolo, M., Castaño, L., Babu, S., Buelow, R., Kuo, Y.-L., Martin, S., Martin, A., Powers, A. C., Prochazka, M., Naggert, J., Leiter, E. H., and Eisenbarth, G. S. (1993) J. Clin. Invest. 92, 359-371[Medline] [Order article via Infotrieve]
2. Karges, W., Pietropaolo, M., Ackerley, C., and Dosch, H.-M. (1996) Diabetes 45, 513-521[Abstract]
3. Pilon, M., Peng, X. R., Spence, A. M., Plasterk, R. H., and Dosch, H.-M. (2000) Mol. Biol. Cell 11, 3277-3288[Abstract/Free Full Text]
4. Gaedigk, R., Karges, W., Hui, M. F., Scherer, S. W., and Dosch, H.-M. (1996) Genomics 38, 382-391[CrossRef][Medline] [Order article via Infotrieve]
5. Karges, W., Gaedigk, R., Hui, M. F., Cheung, R. K., and Dosch, H.-M. (1996) Biochim. Biophys. Acta 1360, 97-101
6. Miyazaki, I., Gaedigk, R., Hui, M. F., Cheung, R. K., Morkowski, J., Rajotte, R. V., and Dosch, H.-M. (1994) Biochim. Biophys. Acta 1227, 101-104[Medline] [Order article via Infotrieve]
7. Gaedigk, R., Duncan, A. M. V., Miyazaki, I., Robinson, B. H., and Dosch, H.-M. (1994) Cytogenet. Cell Genet. 66, 274-276[Medline] [Order article via Infotrieve]
8. Stassi, G., Schloot, N., and Pietropaolo, M. (1997) Diabetologia 40, 120-122[CrossRef][Medline] [Order article via Infotrieve]
9. Martin, S., Kardorf, J., Schulte, B., Lampeter, E. F., Gries, F. A., Melchers, I., Wagner, R., Bertrams, J., Roep, B. O., Pfutzner, A., Pietropaolo, M., and Kolb, H. (1995) Diabetologia 38, 351-355[CrossRef][Medline] [Order article via Infotrieve]
10. Roep, B. O. (1996) Diabetes 45, 1147-1156[Abstract]
11. Roep, B. O., Duinkerken, G., Schreuder, G. M. Th., Kolb, H., DeVries, R. R. P., and Martin, S. (1996) Eur. J. Immunol. 26, 1285-1289[Medline] [Order article via Infotrieve]
12. Miyazaki, I., Cheung, R. K., Gaedigk, R., Hui, M. F., Van der Meulen, J., Rajotte, R. V., and Dosch, H.-M. (1995) J. Immunol. 154, 1461-1469[Abstract/Free Full Text]
13. Winer, S., Gunaratnam, L., Astsatourov, I., Cheung, R. K., Kubiak, V., Karges, W., Hammond-McKibben, D., Gaedigk, R., Graziano, D., Trucco, M., Becker, D. J., and Dosch, H.-M. (2000) J. Immunol. 165, 4086-4094[Abstract/Free Full Text]
14. Karges, W., Hammond-McKibben, D., Gaedigk, R., Shibuya, N., Cheung, R., and Dosch, H. M. (1997) Diabetes 46, 1548-1556[Abstract]
15. Lampasona, V., Ferrari, M., Bosi, E., Pastore, M. R., Bingley, P. J., and Bonifacio, E. (1994) J. Autoimmun. 7, 665-674[CrossRef][Medline] [Order article via Infotrieve]
16. Chaplin, D. D., and Brownstein, B. H. (1992) in Current Protocols in Molecular Biology (Ausubel, F. M. , Brent, R. , Kingston, R. E. , Moore, D. D. , Seidman, J. G. , Smith, J. A. , and Struhl, K., eds) , John Wiley and Sons, New YorkUnit 6.10
17. Gardiner-Garden, M., and Frommer, M. (1987) J. Mol. Biol. 196, 261-282[Medline] [Order article via Infotrieve]
18. Carey, M., and Smale, S. T. (2000) Transcriptional Regulation in Eukaryotes: Concepts, Strategies, and Techniques , Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
19. Braun, H., Koop, R., Ertmer, A., Nacht, S., and Suske, G. (2001) Nucleic Acids Res. 29, 4994-5000[Abstract/Free Full Text]
20. De Luca, P., Majello, B., and Lania, L. (1996) J. Biol. Chem. 271, 8533-8536[Abstract/Free Full Text]
21. Hagen, G., Muller, S., Beato, M., and Suske, G. (1994) EMBO J. 13, 3843-3851[Abstract]
22. LeVan, T. D., Bloom, J. W., Bailey, T. J., Karp, C. L., Halonen, M., Martinez, F. D., and Vercelli, D. (2001) J. Immunol. 167, 5838-5844[Abstract/Free Full Text]
23. Reusch, J. E., and Klemm, D. J. (2002) J. Biol. Chem. 277, 1426-1432[Abstract/Free Full Text]
24. Choi, R. C., Siow, N. L., Zhu, S. Q., Wan, D. C., Wong, Y. H., and Tsim, K. W. (2001) Mol. Cell. Neurosci. 17, 732-745[CrossRef][Medline] [Order article via Infotrieve]
25. Cibelli, G., Jungling, S., Schoch, S., Gerdes, H. H., and Thiel, G. (1996) Eur. J. Biochem. 236, 171-179[Abstract]
26. Della Fazia, M. A., Servillo, G., and Sassone-Corsi, P. (1997) FEBS Lett. 410, 22-24[CrossRef][Medline] [Order article via Infotrieve]
27. Reusch, J. E., Colton, L. A., and Klemm, D. J. (2000) Mol. Cell. Biol. 20, 1008-1020[Abstract/Free Full Text]
28. Mayr, B., and Montminy, M. (2001) Nat. Rev. Mol. Cell. Biol. 2, 599-609[CrossRef][Medline] [Order article via Infotrieve]
29. Du, K., and Montminy, M. (1998) J. Biol. Chem. 273, 32377-32379[Abstract/Free Full Text]
30. Pugazhenthi, S., Nesterova, A., Sable, C., Heidenreich, K. A., Boxer, L. M., Heasley, L. E., and Reusch, J. E. (2000) J. Biol. Chem. 14, 10761-10766[CrossRef]
31. Tan, Y., Rouse, J., Zhang, A., Cariati, S., Cohen, P., and Comb, M. J. (1996) EMBO J. 15, 4629-4642[Abstract]
32. Yamamoto, K. K., Gonzalez, G. A., Biggs, W. H., III, and Montminy, M. R. (1998) Nature 334, 494-498
33. Xing, J., Ginty, D. D., and Greenberg, M. E. (1996) Science 273, 959-963[Abstract]
34. Sun, P., Enslen, H., Myung, P. S., and Maurer, R. A. (1994) Genes Dev. 8, 2527-2539[Abstract]
35. Matthews, R. P., Guthrie, C. R., Wailes, L. M., Zhao, X., Means, A. R., and McKnight, G. S. (1994) Mol. Cell. Biol. 14, 6107-6116[Abstract]
36. Solimena, M., Spitzenberger, F., Pietropaolo, S., Verkade, P., Habermann, B., Lacas-Gervais, S., Mziaut, H., Trucco, M., and Pietropaolo, M. (2002) Diabetes Metab. Rev. 18 Suppl. 4, 32 (abstr.)
37. Jones, P. A., and Takai, D. (2001) Science 293, 1068-1070[Abstract/Free Full Text]
38. Paulsen, M., and Ferguson-Smith, A. C. (2001) J. Pathol. 195, 97-110[CrossRef][Medline] [Order article via Infotrieve]
39. Pfeifer, K. (2000) Am. J. Hum. Genet. 67, 777-787[CrossRef][Medline] [Order article via Infotrieve]
40. Siegfried, Z., Eden, S., Mendelsohn, M., Feng, X., Tsuber, B.-Z., and Cedar, H. (1999) Nat. Genet. 22, 203-206[CrossRef][Medline] [Order article via Infotrieve]
41. Beohar, N., and Kawamoto, S. (1998) J. Biol. Chem. 273, 9168-9178[Abstract/Free Full Text]
42. Cao, Y.-X., Jean, J.-C., and Williams, M. C. (2000) Biochem. J. 350, 883-890[CrossRef][Medline] [Order article via Infotrieve]
43. Lübbert, M., Tobler, A., and Daskalakis, M. (1999) Leukemia (Baltimore) 13, 1420-1427[CrossRef]
44. Newell-Price, J., King, P., and Clark, A. J. (2001) Mol. Endocrinol. 15, 338-348[Abstract/Free Full Text]
45. Persengiev, S. P., and Kilpatrick, D. L. (1996) Neuroreport 8, 227-231[Medline] [Order article via Infotrieve]
46. Schwab, J., and Illges, H. (2001) Int. Immunol. 13, 705-711[Abstract/Free Full Text]
47. Takizawa, T., Nakashima, K., Namihira, M., Ochiai, W., Uemura, A., Yanagisawa, M., Fujita, N., Nakao, M., and Taga, T. (2001) Dev. Cell 1, 749-758[Medline] [Order article via Infotrieve]
48. Ross, J. (1995) Microbiol. Rev. 59, 423-450[Abstract]
49. Kozak, M. (1991) J. Cell Biol. 115, 887-903[Abstract]
50. Dibb, N. J. (1993) FEBS Lett. 325, 135-139[CrossRef][Medline] [Order article via Infotrieve]
51. Duret, L. (2001) Trends Genet. 17, 172-175[CrossRef][Medline] [Order article via Infotrieve]
52. Hurst, L. D., Brunton, C. F. A., and Smith, N. G. C. (1999) Trends Genet. 15, 437-439[CrossRef][Medline] [Order article via Infotrieve]
53. Hanke, J., Brett, D., Zastro, I., Aydin, A., Delbrück, S., Lehmann, G., Luft, F., Reich, J., and Bork, P. (1999) Trends Genet. 15, 389-390[CrossRef][Medline] [Order article via Infotrieve]
54. Carvalho, A. B., and Clark, A. G. (1999) Nature 401, 344[CrossRef][Medline] [Order article via Infotrieve]
55. Comeron, J. M., and Kreitman, M. (2000) Genetics 156, 1175-1190[Abstract/Free Full Text]
56. Davies, J. L., Kawaguchi, Y., Bennett, S. T., Copeman, J. B., Cordell, H. J., Pritchard, L. E., Reed, P. W., Gough, S. C., Jenkins, S. C., Palmer, S. M., Balfour, K. M., Rowe, B. R., Farrall, M., Barnett, A. H., Bain, S. C., and Todd, J. A. (1994) Nature 371, 130-136[CrossRef][Medline] [Order article via Infotrieve]
57. German, M. (2000) in Diabetes Mellitus: A Fundamental and Clinical Text (LeRoith, D. , Taylor, S. I. , and Olefsky, J. M., eds), 2nd Ed. , pp. 11-19, Lippincott Williams & Wilkins, Philadelphia
58. Lucassen, A. M., Screaton, G. R., Julier, C., Elliott, T. J., Lathrop, M., and Bell, J. I. (1994) Hum. Mol. Genet. 4, 501-506[Abstract]
59. Kennedy, G. C., German, M. S., and Rutter, W. J. (1995) Nat. Genet. 9, 292-298
60. Pugliese, A., Zeller, M., Fernandez, A., Jr., Zalcberg, L. J., Bartlett, R. J., Ricordi, C., Pietropaolo, M., Eisenbarth, G. S., Bennett, S. T., and Patel, D. D. (1997) Nat. Genet. 15, 293-297[Medline] [Order article via Infotrieve]
61. Egwuagu, C. E., Charukamnoetkanok, P., and Gery, I. (1997) J. Immunol. 159, 3109-3112[Abstract]
62. Sospedra, M., Ferrer-Francesch, X., Dominguez, O., Juan, M., Foz-Sala, M., and Pujol-Borrell, R. (1998) J. Immunol. 161, 5918-5929[Abstract/Free Full Text]
63. Smith, K. M., Olson, D. C., Hirose, R., and Hanahan, D. (1997) Int. Immunol. 9, 1355-1365[Abstract]
64. Vafiadis, P., Bennett, S. T., Todd, J. A., Nadeau, J., Grabs, R., Goodyer, C. G., Wickramasinghe, S., Colle, E., and Polychronakos, C. (1997) Nat. Genet. 15, 289-292[Medline] [Order article via Infotrieve]
65. Werdelin, O., Cordes, U., and Jensen, T. (1998) Scand. J. Immunol. 47, 95-100[CrossRef][Medline] [Order article via Infotrieve]
66. Pugliese, A., Brown, D., Garza, D., Murchison, D., Zeller, M., Redondo, M., Diez, J., Eisenbarth, G. S., Patel, D. D., and Ricordi, C. (2001) J. Clin. Invest. 107, 555-564[Abstract/Free Full Text]
67. Pietropaolo, M., Giannoukakis, N., and Trucco, M. (2002) Nat. Immunol. 3, 335


Copyright © 2003 by The American Society for Biochemistry and Molecular Biology, Inc.