(Received for publication, December 12, 1996)
From the Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111
Our analysis of cDNA and genomic clones unexpectedly revealed that the chicken gata-5 gene is differentially expressed from alternative first exons. Moreover, we show that the respective transcripts are differentially processed to yield mRNAs for two distinct isoforms of GATA-5. The major isoform, which we described previously, has two CXNCX17CNXC zinc fingers typical of a vertebrate GATA factor. The minor isoform, on the other hand, has only one such zinc finger. We show that this novel isoform localizes within the nuclei of transfected cells and can bind to a consensus GATA site. This truncated isoform of GATA-5 is compromised in its ability to transactivate a simple target gene, however, and thus is functionally distinct from the major isoform of GATA-5.
The identification of cis-acting elements in erythroid-specific promoters and enhancers that conform to the consensus WGATAR sequence led to the cloning of GATA-1, which proved to be the founding member of a family of transcription factors (1, 2). Five additional members of this GATA factor family have thus far been identified from various vertebrate species using low stringency hybridization screens and other methods (3-6). These vertebrate GATA factors display the greatest degree of conservation over their DNA-binding domains, which are composed of two related zinc fingers of the general form CXNCX17CNXC. Factors with zinc fingers that conform to this consensus motif have also been found in lower eukaryotes. However, in contrast to the previously reported vertebrate GATA factors, some of the invertebrate GATA factors have only one zinc finger (7-10). These single fingers are most similar to the second zinc fingers of vertebrate GATA factors, and accordingly, the latter have been shown to be necessary and sufficient for most of the DNA binding specificity of vertebrate GATA factors (11, 12).
Vertebrate GATA factors can be grouped into two subfamilies on the basis of sequence and expression pattern similarities. For example, GATA-1/2/3 are all expressed (albeit not exclusively) in hematopoietic cells, whereas GATA-4/5/6 are all expressed (albeit not exclusively) in the heart. Insights into the biological relevance of GATA factor expression have recently been obtained using a variety of approaches, including gene disruption assays in transgenic mice. In particular, mice that fail to express GATA-1 or GATA-2 exhibit lethal hematopoietic defects (13, 14), whereas mice that fail to express GATA-3 display severe nervous system and liver hematopoietic defects (15). Similarly, it has been determined that GATA-4 expression is required for embryonal stem cells to differentiate either into cardiac myocytes or into primitive endoderm in cell culture (16, 17).
As noted above, the gata-4/5/6 genes have overlapping, but not identical, expression patterns. Whereas all three of these genes are expressed in myocardial and endocardial cells, the gata-5/6 genes are also robustly expressed in gut epithelial cells, and the gata-6 gene is additionally expressed in the liver, lung, and ovary (6). As a initial step toward determining the molecular basis for the tissue-specific expression of GATA-5 in the heart and gut, we cloned and characterized the chicken gata-5 gene. Somewhat surprisingly, the structure of this gene was found to differ markedly from the otherwise conserved gata-1/2/3 genes, which serves to highlight yet another distinction between these two subgroups of vertebrate GATA factors.
Even more surprising, we found that the chicken gata-5 gene is differentially expressed from two alternative first exons. Moreover, the respective transcripts yield mRNAs for two distinct isoforms of GATA-5. One of these isoforms is novel for a vertebrate GATA factor in that it has a DNA-binding domain composed of a single zinc finger. The functional properties of this novel GATA-5 isoform were assayed in some detail and are compared with the properties of the major isoform of GATA-5 that we previously described (6).
Two gata-5 genomic clones were isolated by screening a chicken genomic library with a GATA-5 cDNA probe (6) using standard protocols (18). Fragments harboring gata-5 coding exons were identified by Southern blot analysis, cloned into plasmids, and sequenced with primers directed against GATA-5 cDNA sequences.
RNase Mapping ExperimentsA tissue lysate RNase protection
kit (Amersham Corp.) was used to map gata-5 exons 1b and 2. The riboprobes that were used for this analysis spanned either a
genomic SmaI fragment (exon 1b; see Fig. 4) or a genomic
BamHI/NcoI fragment (exon 2; see Fig. 3). These
riboprobes were prepared using commercial reagents (Promega).
PCR1 Protocols
Poly(A)+ mRNA was isolated from various
tissues as described previously (6). Anchor and nested antisense
primers for 5-RACE assays were directed against unique sequences
located within the 5
-untranslated region of GATA-5 cDNA: anchor
primer, 5
-GTCCTGGGCACGTAGACG-3
; and nested primer,
5
-GATACATGTTCCGTCCTCG-3
. These primers were used in conjunction with
generic sense primers from a 5
-RACE kit (CLONTECH). The PCR products
were cloned into the pCRII plasmid (Invitrogen), and the inserts were
sequenced using SP6 and T7 primers and Sequenase reagents (U. S.
Biochemical Corp.).
Poly(A)+ mRNAs from various tissues (see above) were
used to prepare cDNAs as described previously (19). These cDNAs
were then used as templates in RT-PCR reactions. Sense strand primers specific for exons 1a and 1b and an antisense primer for the
3-untranslated region of the GATA-5 cDNA are as follows: exon 1a
sense primer, 5
-AATTGCCACCCTCCCGACG-3
; exon 1b sense primer,
5
-CATGGTCTGAGCGCAGC-3
; and antisense primer,
5
-GGGATGCGTTTATTTGCT-3
. The 40 PCR cycles (1 min at 95 °C, 1 min
at 58 °C, and 1.5 min at 72 °C) were followed by a 4-min
incubation at 72 °C. The PCR buffer (Perkin-Elmer) was supplemented
with 4% dimethyl sulfoxide. The resultant PCR products were cloned
into the pCRII plasmid and sequenced using SP6 and T7 primers and
Sequenase reagents.
The RT-PCR products described in the preceding paragraph were shuttled into a cytomegalovirus-driven eukaryotic expression vector (pcDNA3, Invitrogen) for cotransfection assays. Recombinant plasmids with inserts in the correct orientation were sequenced to verify that the inserts were devoid of mutations.
The reporter plasmid for cotransfection assays (see below) was made by inserting a consensus WGATAR binding site (12) into the unique BamHI site that resides immediately upstream of the minimal liver/bone/kidney alkaline phosphatase promoter in the pTATA/CAT reporter plasmid (20). The consensus binding site that was cloned into this reporter plasmid was also used as a probe for the gel shift assays described below.
Transfection AssaysCOS-7 cells (American Type Culture
Collection) were grown to 70-90% confluency on 60-mm dishes in
Dulbecco's modified Eagle's medium (supplemented with 10% fetal
bovine serum) and cotransfected with 12 µg of Lipofectamine (Life
Technologies, Inc.) and varying amounts of expression and reporter
plasmids (see Fig. 9). Standard protocols were use to make extracts and
to assay protein concentrations and chloramphenicol acetyltransferase
activities (21). The amounts of radiolabeled substrate and product were
quantified using an AMBIS radioanalytic imaging system.
Gel Shift Assays
Nuclear extracts were prepared from
transfected COS-7 cells (see above) essentially as described previously
(22) except that leupeptin (1 µg/ml) and aprotinin (1 µg/ml) were
added to all buffers. These extracts were used to program gel shift
assays in combination with a consensus GATA binding site probe (12) that was prepared by annealing the following pair of oligonucleotides: 5-GATCTGCGGATAAAAGGCCGGAATTCG-3
and
5
-GATCCGAATTCCGGCCTTTTATCCGCA-3
.
The mutated site that was used as a competitor fragment to demonstrate
binding site specificity (see Fig. 8) differs from the consensus site
at the underlined bases (GAT was changed to CCC, and ATC was changed to
GGG). The gel shift binding buffer contained 0.1% bovine serum
albumin, 25 mM KCl, 10 mM Tris (pH 8.0), 1 mM EDTA, 1 mM dithiothreitol, and 4 µg of
poly(dI·dC)/30-µl reaction. Binding reactions were carried out at
4 °C for 30 min. The samples were then supplemented with loading
buffer and resolved on a 6% acrylamide gel cast in 0.25 × Tris
borate/EDTA (19). The buffer was circulated manually at 30-min
intervals. Following electrophoresis, the gels were dried under vacuum
and then exposed to X-Omat film (Eastman Kodak Co.) at 70 °C
between intensifying screens.
We
previously sequenced a cDNA clone that spans the chicken GATA-5
open reading frame and includes 64 and 168 bp of 5- and 3
-untranslated sequences, respectively (6). This cDNA insert was
used as a probe to isolate two overlapping gata-5 genomic phage clones. Fragments containing gata-5 coding exons were
mapped using Southern blots, cloned into plasmids, and sequenced with gata-5-specific primers. The GATA-5 open reading frame was
thus revealed to span six exons (Figs. 1 and
2). Consensus splice donor and acceptor sites were found
to flank each of the coding exons (i.e. exons 2-7; the
noncoding first exons are discussed below) as expected (Fig. 2).
We next used an RNase protection assay to map the upstream end of the first coding exon. As shown in Fig. 3, an 89-nucleotide fragment was protected when this assay was programmed with RNA from adult heart or adult gut (which express GATA-5). No protected fragments were obtained when brain or skeletal muscle (which do not express GATA-5) was instead used as the source of RNA. The fact that this exon breakpoint mapped precisely to a consensus splice acceptor site (denoted site 3 in Fig. 2) and the fact that several other gata genes have been shown to have noncoding first exons (14, 23, 24) suggested that the chicken gata-5 gene might similarly contain a noncoding first exon(s).
Evidence for Two Distinct Noncoding GATA-5 First ExonsSince exhaustive screens of several cDNA libraries failed to provide evidence for such a noncoding first exon, we resorted to a directed RACE/PCR analysis of embryonic heart and adult gut mRNAs (two tissues in which GATA-5 is expressed robustly). Unexpectedly, two distinct cDNA sequences were found to lie immediately upstream of the presumptive second exon sequence. We resolved that the genomic copies of these two novel sequences were located 3.5 and 1.5 kilobases, respectively, upstream of the common second exon and that these sequences were flanked by consensus splice donor sites (see sites 1 and 2, respectively, in Fig. 2). These results indicate that the gata-5 gene has two alternative (presumably first) exons, which we will refer to henceforth as exons 1a and 1b.
The RNase protection assay shown in Fig. 4 further
revealed that exon 1b is 270-285 bp in length. We infer that this is a first exon for two reasons. First, the predominant 5-ends map to
sequences that are typical of polymerase II transcriptional start sites
(25, 26), namely, purines embedded within pyrimidine-rich tracts (Fig.
5). Second, consensus splice acceptor sites do not map
in the vicinity of these 5
-ends (Fig. 5). Although primer extension
assays are often used to confirm the assignment of transcriptional start sites, we have been unable to synthesize cDNA copies of this
sequence even when we use in vitro transcribed templates. We
presume that this technical limitation is attributable to the fact that
this exon is extremely GC-rich.
Since RNase protection and primer extension assays designed to detect exon 1a-containing transcripts in embryonic heart yielded negative results (data not shown), we infer that these transcripts are relatively rare in this tissue. Consistent with this inference, we note that only 5 of the 44 RACE cDNA clones that were obtained using an antisense primer from the second exon (described above) contained exon 1a sequences; all of the others contained exon 1b sequences (data not shown). However, as discussed below, we have been able to deduce that exon 1a is at least 256 bp in length.
Differential Promoter Usage and Alternative SplicingThe fact
that exon 1a RACE cDNA clones were obtained from embryonic heart
(but not from adult heart or gut) suggested that this first exon might
be expressed in a development-specific or tissue-specific manner. We
explored this possibility by carrying out RT-PCR assays with sense
oligonucleotides specific for either exon 1a or 1b in combination with
a common antisense oliogonucleotide specific for the 3-untranslated
region of GATA-5 mRNA (Fig. 6). The cDNA
templates that were used for this analysis were derived from embryonic
(day 10) heart, adult heart, and adult skeletal muscle. As
predicted, exon 1b-containing RT-PCR products of the expected size and
sequence (1516 bp; data not shown) were obtained using heart (both
embryonic and adult), but not skeletal muscle, cDNA templates. In
contrast, whereas exon 1a-containing transcripts were detected in
embryonic (but not adult) heart, the predominant RT-PCR product
(indicated by the lower arrow in Fig. 6) was smaller than
expected (943 bp instead of 1540 bp). An analysis of this 943-bp RT-PCR
product revealed that exon 1a was precisely juxtaposed to exon 3 rather
than to exon 2 (Fig. 7); all of the other exons were
spliced normally (data not shown). We also sequenced the minor
(1540-bp) exon 1a-specific RT-PCR product and verified that it included
the exon 2 sequence as expected (data not shown). By carrying out
similar RT-PCR assays with primers that map upstream of the exon 1a
primer used for the analysis presented in Fig. 6, we determined that
exon 1a is at least 256 bp and contains termination codons in all three
reading frames (Fig. 7).
Characterization of a Single-zinc Finger Isoform of GATA-5
The predominant splicing pathway for the exon 1a-containing transcripts yields mRNAs that lack the previously reported translational initiation site for GATA-5. Based on Kozak rules (27), translation initiation is predicted to occur at the first methionine codon that is embedded in a favorable sequence context, which, in the case of this novel GATA-5 mRNA, lies within the exon 3 sequence (Fig. 7). Indeed, this ATG codon functions as an efficient translational initiation site in vitro as predicted (data not shown). Since this methionine residue is located within the coding region for the first zinc finger, the resultant GATA-5 isoform contains only one (i.e. the second) zinc finger. This raised three obvious questions. First, can the predicted single-zinc finger isoform of GATA-5 localize to the nuclei of transfected cells? Second, can this truncated isoform bind specifically to a consensus GATA site? And third, can this novel isoform transactivate a simple GATA-dependent target gene?
To address these questions and to compare the properties of the full-length and truncated GATA-5 isoforms, we cloned RT-PCR products that span the respective open reading frames (Fig. 6) into a eukaryotic expression plasmid and transfected these plasmids into COS-7 cells, which do not express endogenous GATA factors. Nuclear extracts from these transfected cells were used to program the gel shift assay shown in Fig. 8. This analysis revealed that both isoforms can be stably expressed in vivo and that both isoforms can bind a consensus GATA site in vitro (lanes 1 and 6, respectively). These protein-DNA interactions are sequence-specific since an excess of the unlabeled consensus site competed for binding (lanes 3, 4, and 8), whereas a similar excess of a mutated site did not (lanes 2 and 7). The distinct mobilities of these complexes are consistent with the fact that the full-length isoform is 391 amino acids long, whereas the truncated isoform is only 190 amino acids long.
We next addressed whether this truncated GATA-5 isoform can function as a transcriptional activator. Expression vectors for the two isoforms of GATA-5 (see above) were cotransfected into COS-7 cells along with a reporter plasmid that has a consensus GATA site in the promoter region (12). The results of this analysis are presented in Fig. 9. As expected, the full-length isoform of GATA-5 was able to transactivate this GATA-dependent reporter plasmid. Note that the -fold induction decreased when an excess of this full-length isoform was expressed, presumably due to squelching. The truncated isoform was also able to transactivate this reporter construct, albeit much less efficiently than the full-length isoform.
The six GATA factors that have been identified from vertebrate
species can be grouped into two distinct subfamilies (i.e. GATA-1/2/3 and GATA-4/5/6) on the basis of cDNA sequence
comparisons as well as expression profile comparisons. Thus, it is
perhaps not surprising to find that a member of the GATA-4/5/6
subfamily has a gene structure that is distinct from the conserved
gata-1/2/3 gene structure. On the other hand, the extent to
which these gene structures differ is rather remarkable. Indeed, only
two features are conserved across these two subfamilies,
i.e. noncoding first exons and comparable second zinc finger
exons (Fig. 10). Assuming that the two GATA subfamilies
were founded by the duplication of an ancestral gene, the fact that the
gata-4/5/6 gene structures are similar to each
other2 but distinct from the
gata-1/2/3 gene structures implies that a total of three
introns must have been lost or gained from the gata-1/2/3
ancestral gene and/or from the gata-4/5/6 ancestral gene
prior to the expansions of the respective subfamilies. Thus, the
ancestral gata-1/2/3 and/or gata-4/5/6 genes
presumably existed for a long period of evolution before each spawned
multiple progeny.
Two of the introns that are unique to the gata-4/5/6 gene subfamily (introns 5 and 6; see Figs. 2 and 10) appear to coincide with the boundaries of functional domains. For example, based on structural studies carried out with GATA-1 (28), we infer that intron 5 maps precisely to the carboxyl-terminal end of the minimal DNA-binding domain of the second zinc finger of GATA-5. It is also interesting that introns 5 and 6 delimit a domain that is rich in PEST residues (66% of the residues in exon 6 are Pro, Glu, Ser, Thr, or Asp), which suggests that this domain may be a determinant of GATA-5 instability (29). In support of this conjecture, we note that the PEST-rich amino acid composition (but not primary sequence) for this exon is conserved within the GATA-4/5/6 subfamily.
As noted above, noncoding first exons are a common feature of vertebrate gata genes. Moreover, alternative first exons have been identified for both the mouse gata-1 gene (23) and the chicken gata-5 gene (this report). The gata-1 gene first exons are differentially transcribed in erythroid cells and in the testis. Since these alternative noncoding exons are each spliced to a common second exon, the same GATA-1 protein is encoded in both cell types. In the case of the gata-5 gene, however, transcripts that include the distal first exon are preferentially spliced to the third exon, which results in an mRNA that encodes a novel single-zinc finger isoform of GATA-5. So far as we are aware, this is the first evidence of a single-zinc finger GATA factor being encoded in a vertebrate species. On the other hand, a novel (albeit two-zinc finger) GATA-1 isoform has been reported to result from the use of an internal translational initiation site (30).
Since mutational studies have revealed that the second zinc fingers of other vertebrate GATA factors are necessary and sufficient for binding to consensus GATA sites (12, 31), it is not surprising that the truncated isoform of GATA-5 can also bind to these sites. However, since the DNA binding specificities of normal and mutant (single-zinc finger) GATA factors are not identical, we presume that the two naturally occurring GATA-5 isoforms will similarly be found to have somewhat distinct binding specificities. We are in the process of using a site selection protocol to test this prediction.
Based on the results of cotransfection assays (Fig. 9) and an in situ assay of epitope-tagged GATA-5 isoforms (data not shown), we infer that the truncated isoform has a nuclear localization signal. Whereas GATA-1 and GATA-3 appear to have multiple nuclear localization signals (12, 31, 32), the short stretches of basic amino acids that resemble consensus nuclear localization signals are not conserved between these GATA factors and GATA-4/5/6. For example, the RPKKR and KGKKK motifs that flank the second zinc finger of GATA-3 are replaced by KPQKR and KGKTS, respectively, in GATA-5. Conversely, a presumptive nuclear localization signal for GATA-5 (RKRKPK; located in the carboxyl-terminal region of the second zinc finger) is not conserved for GATA-1/2/3. Furthermore, based on structural studies (28), we infer that this RKRKPK motif probably also functions as an essential determinant for binding to consensus WGATAR sites.
We have shown that the single-zinc finger isoform of GATA-5 is compromised with respect to its ability to transactivate a simple reporter construct. Whereas single-zinc finger mutants of GATA-1 are also compromised with respect to their ability to transactivate simple target genes in cotransfection assays, these mutant factors can still cause early myeloid cells to differentiate into megakaryocytes in cell culture (32). Similarly, a single-zinc finger GATA factor from Aspergillus nidulans can rescue erythroid differentiation in GATA-1-deficient embryonic stem cells (33). Thus, we presume that the single-zinc finger GATA-5 isoform can regulate critical subsets of GATA target genes in the tissues in which it is expressed.
Finally, it may be noteworthy that the mRNAs for both isoforms of
GATA-5 contain short open reading frames in their 5-untranslated regions (Figs. 5 and 7). Based on the ribosome scanning model (27),
these short open reading frames would be expected to impair the
efficiency of translation initiation at the downstream (GATA-5) open
reading frames. This may allow yet another level of regulation for
differentially expressing these GATA-5 isoforms (34, 35). On the other
hand, since these upstream open reading frames were included in the
respective GATA-5 expression vectors (Figs. 8 and 9), it is clear that
these open reading frames do not preclude expression of these
isoforms.
We thank Randy Strich, Leonard Cohen, and members of our laboratory for critically reading the manuscript; Jon Chernoff and Maryann Sells for advice on nuclear localization studies; Robert Muhlhauser (Oligonucleotide Synthesis Facility, Fox Chase Cancer Center) for oligonucleotides; and Vicki Sayer (Secretarial Services, Fox Chase Cancer Center) for help in preparing the manuscript.