(Received for publication, January 17, 1995; and in revised form, June 29, 1995)
From the
GATA transcription factors are DNA-binding proteins that recognize the core consensus sequence, WGATAR. Previous studies indicated that GATA factors play an important role in the development of tissue-specific functions in vertebrates. Here we report the identification of a new Drosophila melanogaster GATA factor, dGATAc, which displays a distinct expression pattern in embryos. The local concentration of dGATAc transcripts varies at different stages, being most prominent in the procephalic region at stages 6-10 and in the posterior spiracles, the gut, and the central nervous system at stages 11-13. On the basis of its predicted sequence, DNA-binding assays were performed to confirm that the dGATAc gene encodes a zinc finger protein that can bind the GATA consensus motif with predicted specificity. Two independent mutants carrying a P-element insertion at the dGATAc gene promoter region were identified that are homozygous lethal at the embryonic stage. Using a genetic scheme, it was demonstrated that the lack of dGATAc function can block normal embryonic development. Our results suggest that the dGATAc protein is a tissue-specific transcription factor that is vital to the development of multiple organ systems in D. melanogaster.
The GATA-1 transcription factor was originally identified as a DNA-binding protein that recognizes regulatory elements in the promoter and enhancer regions of several erythroid-expressed genes(1, 2, 3, 4, 5) . Using cloned mouse GATA-1 (4) as a probe, Yamamoto et al.(6) isolated from a chicken cDNA library three different GATA factor genes that encode amino acid sequences homologous to the mouse GATA-1 DNA binding domain. Although these genes were shown to express specifically among different tissues, their protein products apparently share similar DNA binding specificities(6) . Subsequently, complementary DNA clones corresponding to these three major GATA factors were identified for humans(7, 8, 9, 10, 11, 12, 13, 14) , mice(12) , and Xenopus laevis(15) . At the 1990 Globin Switching Conference (Airlie House, Virginia), different members of the GATA gene family were named GATA-1, GATA-2, and GATA-3, and a prefix was given to denote their specific origin(16) . Recently, a new member, GATA-4, was added to the vertebrate GATA gene family(17, 18) .
Except for the mouse GATA-1 (mGATA-1) and GATA-2 (mGATA-2), shown clearly to play critical roles in controlling erythroid differentiation in the hematopoietic system (for review, see (19) ), the exact function of other mammalian GATA factors remains largely unknown. From the fact that only certain tissues express mRNA for the individual GATA factor genes, and because tissue-specific functions are regulated by selective GATA factors, it is generally accepted that GATA proteins, as transcriptional factors, are crucial for both the initial decision and subsequent development of lineage-specific functions. As shown by gene-targeting experiments with the mGATA-1 and mGATA-2 genes(20, 21) , disruption of their normal function leads to the failure of blood cell development.
The GATA finger domain is not limited to vertebrates; transcription
factors bearing similar amino acid sequences were reported for Aspergillus nidulans(22) , Neurospora crassa(23) , Saccharomyces cerevisiae(24) , and Caenorhabditis elegans(25) . In these species,
proteins were identified that are highly conserved in the so-called
``finger domain,'' which contains a characteristic amino acid
sequence:
Cys-X-Asn-Cys-X-Cys-Asn-Ala-Cys.
Recently, three groups (26, 27, 28) have
reported the isolation of genomic and cDNA clones encoding the GATA
factors in Drosophila melanogaster.
To extend the search for evolutionarily related GATA factors to invertebrates, and to study the function of these factors in a model organism, we set out to isolate homologous sequences encoding Drosophila GATA factors. Initially, we took advantage of the conserved nature of the GATA zinc finger domain and used the GATA finger sequence as a probe to isolate two different Drosophila GATA genes (originally named dGATA-I and dGATA-II). Subsequently, using cloned dGATA cDNA fragment as a probe, we analyzed the RNA distribution in the developing embryos. One of the isolated genes, dGATA-I, is identical to the previously reported dGATAa/pannier(27, 28) . On the other hand, dGATA-II is unique and was found to be expressed specifically in the head region, the gut, the posterior spiracles, and the central nervous system. Following the nomenclature used by previous works (26, 27) for the Drosophila GATA factors, hereafter, our dGATA-II should be named GATAc.
Based on sequence comparison of the Drosophila GATA factors, the analysis of the dGATAc gene expression, and the genetic studies of the dGATAc mutants, we conclude that 1) GATA factor genes are also present in multiple forms in invertebrate animals; 2) the expression of the dGATAc gene is limited to defined tissues, although it can participate in the development of multiple organ systems; and 3) the function of dGATAc protein is essential to the development of Drosophila embryos.
A 102-bp ()genomic
sequence from the positive clone was labeled by radioactive PCR to
screen cDNA libraries made from either Drosophila embryos or I
and II stage instar larvae(30) . In summary, 2
10
clones were screened at a density of 40,000 plaques/150-mm Petri
dish. Hybridization was carried out with 5
SSPE, 5
Denhardt's solution, 0.1% SDS, 100 µg/ml heat-denatured
salmon sperm DNA, and 1
10
cpm/ml probes at 65
°C for 16 h. Stringent washing was done with 0.1
SSC and
0.1% SDS at 65 °C for 2 h. Autoradiography was carried out using
Kodak XAR film with one intensifying screen for 16 h.
Screening for Drosophila GATA genes was carried out with cosmid libraries using a reference system developed by Hoheisel et al.(31) . A Drosophila genomic library constructed in the bacteriophage lambda Charon 4A vector (32) was also used. The same conditions for hybridization, washing, and autoradiography were carried out as described above for screening cDNA libraries.
Two approaches were used to determine the flanking sequence
of P-insertional mutants. To rescue the 5`-junctional sequence of l(3)5930 integration site, genomic DNA isolated from 12 adult
flies was digested with XbaI, heat inactivated (65 °C for
20 min), and then self ligated. After transformation into DH5,
kanamycin-resistant colonies were isolated for DNA sequencing. PCR
cloning was carried out to isolate the 3` end junctional fragment of
P812 integration site. A white gene-specific primer
(5`-GCATATATACCCTTCTGAATGC-3`) and a dGATAc gene specific primer
(5`-CCGGGAATTCCCATGGCGGTCTAGAGCACACTGTTTCAATCACTC-3`) were used in a
100-µl PCR reaction containing 1
PCR buffer with 1.5 mM MgCl
, 50 µM dNTP, 0.5 µg of each
primer, and 2.5 units of Taq polymerase. 35 cycles of reaction
were carried out in a M-J research PTC-100 machine. Annealing was at 55
°C for 1 min, and extension was at 72 °C for 2 min. The PCR
product was gel-purified and cloned into the pGEM-T vector (Promega,
Wisconsin) for sequence determination.
To isolate l(3)5930/l(3)5930 homozygous embryos, the original l(3)5930 line (l(3)5930/TM6B, Tb) was crossed with the CS wild-type strain (+/+). The heterozygous progeny with l(3)5930/+ genotype were collected and intercrossed. Embryos were collected for 48 h, and those that failed to develop were isolated for DNA extraction. Genotype determination was performed by Southern analysis as described above using BamHI restriction and a 3.0-kb SacI/EcoRI genomic fragment as a probe.
We then screened Drosophila embryonic cDNA libraries (30) corresponding to 0-3
h and 3-12 h embryonic, and I and II instar larval stages of
development. Altogether, eight positive clones were isolated from the 2
10
phage plaques screened. Of them, three belong to
the dGATAa and were not studied further. Two clones (4 and 11) together
represent 3.2 kb of dGATAc cDNA sequence. Another clone, 13.1, is
related to dGATAc cDNA, but its 5` 519-bp sequence is not identical to
the other dGATAc clones (Fig. 1A). Further analysis of this
sequence revealed that the sequence deviation is due to alternative
splicing of the dGATAc transcripts. (
)
Figure 1: Restriction analysis of dGATAc cDNA clones and organization of the dGATAc gene. A, four dGATAc clones were mapped with BamHI. Another clone (13.1) is related to dGATAc but differs in its 5`-half of cDNA sequence (indicated by a stippledbox). B, BamHI. B, the position of dGATAc exons is shown relative to EcoRI restriction map of the gene. The position of two cosmid clones (c78A9 and c41A3) and probes used for Southern analysis are indicated. The P1 clone (DS01580) mentioned in this paper spans the entire dGATAc gene, but its two ends were not mapped. A detailed description on the isolation of genomic clones, the determination of the cap site and the exon/intron boundaries will be published elsewhere.
Through genomic cloning, restriction mapping, and DNA sequencing, the organization of the dGATAc gene is determined (Fig. 1B). The entire transcription unit covers approximately 36 kb of DNA sequence.
Figure 2: DNA sequence of dGATAc and its predicted amino acid sequence. The ATG at nucleotide position 224-227 is assigned as the translation initiation site. An in-frame termination codon upstream of the initiator ATG is underlined. The cysteine residues of the zinc fingers are in boldface.
As the ATG at nucleotide positions 224, 230, and
476 all have an immediate 5`-flanking sequence that is favorable for
translation initiation in eukaryotic genes(50) , we performed in vitro transcription and translation to analyze the
translation potential of these open reading frames within the context
of the entire cDNA. After subcloning into plasmid pBluescript SK
vector, the DNA template was linearized by SpeI at nucleotide
2500, downstream from the stop codon at nucleotide 1682. In vitro transcription was carried out with T7 polymerase, and the RNA
template then used for in vitro translation with
[S] methionine. As shown in Fig. 3A, SDS-polyacrylamide gel electrophoresis of the
synthesized proteins gave a prominent band with an apparent M
of 52
10
, a size consistent
with that predicted by the open reading frame starting at nucleotide
position 224 or 230. Since the gel system we used is unlikely to
resolve the small size difference between proteins translated from
these two start sites, we provisionally assign the translation start
site to the first AUG at nucleotide position 224.
Figure 3:
Analysis of dGATAc protein. A, in vitro transcribed RNA was used to direct the synthesis of
dGATAc protein using a rabbit reticulocyte lysate system. The protein
product was labeled by [S] methionine and
detected by autoradiography. The size of molecular weight maker
proteins is indicated to the left. B, specific DNA binding of
dGATAc protein to the target site. The entire dGATAc protein was
expressed using the pGEX expression system. E. coli lysates
were prepared from uninduced (lane2) and
isopropyl-1-thio-
-D-galactopyranoside-induced (lane3) cultures. Increasing concentration of the
GATA-specific competitor (lanes4-6: 5, 25, and
125 ng) and the control CACCC competitor (lanes7-9: 5, 25, and 125 ng) were added to the DNA
binding reaction containing approximately 10 ng of end-labeled probe. Lane1 contains no protein. C, binding of
dGATAc finger domain to the target site. Finger domain of dGATAc was
expressed using the pGEX vector. Increasing concentration of the
GATA-specific competitor (lanes1-6: 6.3, 12.5,
25, 50, 100, and 200 ng) or the CACCC competitor (lanes8-13: 6.3, 12.5, 25, 50, 100, and 200 ng) was added
to the DNA binding reaction containing approximately 10 ng of
end-labeled probe. Lane7 contains no
competitor.
The expression of GATAc transcripts during early Drosophila embryos is not detectable until the cellular blastoderm stage. Initially, the RNA transcripts are evenly distributed and concentrated at the basal end of the cells (Fig. 4A). Within a short period of time, the transcripts become localized to three regions along the dorsal portion of the embryo (Fig. 4, B and C). In the procephalic region, the dGATAc gene is abundantly expressed and the transcripts are widely distributed, properly reflecting its later role in the development of the head region. The expressed transcripts are also detectable in the posterior third (15-25% egg length) and middle third (40-60% egg length) of the dorsal embryo. These regions give rise to the precursors of the posterior spiracles and the dorsal epidermis, respectively. In addition, a very faint signal can be seen in a small region of the ventral embryo (Fig. 4B, between twoarrows).
Figure 4: Expression of dGATAc transcript during early Drosophila development. Embryos collected from early cellular blastoderm (A) and late cellular blastoderm (B) are shown on the lateral view. Left is anterior and top is dorsal. An embryo of early gastrulation stage (C) is shown on the dorsolateral view to reveal the distribution of signals in the dorsal portion of the embryo.
As embryonic development reaches stage 11 and beyond, three organ systems clearly stain positive with the dGATAc probe. The developing posterior spiracles are most prominent, and our probe could serve as a useful marker to trace the development of this structure (Fig. 5, A, C, and E). It is noticeable that as germ band shortening occurs, the posterior spiracles moved backward and outward toward their final position. Similarly, the strong but relatively diffuse signals of the anterior and posterior midgut primordia become discrete and approach the middle portion of the embryos (Fig. 5B). The expression of dGATAc gene in the developing central nervous system is also seen after stage 11. Distinct signals corresponding to each segment of the embryo become evident (Fig. 5, B, D, and G) at stages 12-13. From the ventral view (Fig. 5D), the probe-positive cells for each segment are distributed along both sides of the midline. In the head region, the brain and the developing optic lobes (detail not shown), as well as the anterior tip of the clypeolabrum, are also clearly stained.
Figure 5: Expression of dGATAc transcripts in the developing midgut, posterior spiracles, and the central nervous system. Embryos of stage 12 (A) and stage 13 (B) are shown on the lateral view. The same sample shown in B is focused on the dorsal region (C) to reveal staining in the head region, gut, and posterior spiracles, and on the ventral region (D) the developing central nervous system. In E, an embryo was focused on one of the posterior spiracles. Additional embryos were dissected to reveal staining of the anterior midgut (F) and the central nervous system (G).
Figure 6: Mapping of dGATAc gene and analysis of P1 clones. A, nonradioactive detection of dGATAc chromosomal gene. A prominent band can be seen on the lowerrightcorner. B, a 420-bp probe from the BamHI/EcoRI fragment of dGATA cDNA clone 13.3 detected two EcoRI bands of 2.2 kb (exon V) and 1.7 kb (exon VII) for the P1. Another 6.4 kb (exon VI) band is seen with longer exposure. C, a 5.3-kb genomic probe (3` probe, see Fig. 1B) derived from the dGATAc cosmid clone detected a 5.3-kb EcoRI band for P1. D, a 3.0-kb genomic probe (5` probe, see Fig. 1B) detected a 14-kb EcoRI band for cosmid DNA (lane1) and an 8.5-kb EcoRI band for P1 clone (lane2). E, the same 5` probe detected 3.0 kb band for SacI/EcoRI digested cosmid (lane1) and P1 clone (lane2), and 5.0 kb band for SacI-digested cosmid (lane3) and P1 clone (lane4).
Figure 7:
Analysis of dGATAc gene mutants. A, identification of P-element insertion mutants of the dGATAc
gene. Genomic DNA isolated from eight different Drosophila lines were digested with BamHI and probed with a 3.0-kb SacI/EcoRI dGATAc genomic fragment. Lanes1-8 represent CS wild-type, l(3)5930,
P971, P227, P609, P614, P812, and 2352, respectively. B,
correlation of dGATAc mutant genotype with embryonic lethal phenotype.
A l(3)5930/+ l(3)5930/+ cross was
set up to obtain dGATAc homozygous mutants. Southern analysis was
performed as in A for the original l(3)5930 stock (lanes1 and 2), embryos that failed to
hatch (lane3), the P812 mutant (lane4), and the CS wild-type strain (lane5).
Both P812 and l(3)5930 were originally reported to be homozygous lethal. We collected embryos for 48 h after fertilization to examine their phenotypes. Nearly half of the embryos failed to hatch, and, among these, two groups of dead embryos with distinct morphologies were identified. We presumed that one group was lethal due to the balancer chromosome and the other to the effects of dGATAc gene insertion. To isolate dGATAc homozygous mutant embryos and to confirm that a developmental block can be caused by the dGATAc gene mutation, we generated a l(3)5930/+ line by crossing the original stock with the CS wild-type strain. This new line was then self-crossed to obtain a homogeneous pool of mutant embryos carrying two l(3)5930 alleles. We presumed, by eliminating the balancer chromosome that also gives a homozygous lethal phenotype, that we could reliably identify the embryonic lethal phenotype due to the l(3)5930 mutant chromosome. Indeed, all of the embryos that failed to develop (stopped at stage 17) are homozygous for the mutant allele. As shown in Fig. 7B, the genotype of the lethal embryos is distinctly different from that of the original stock and the CS wild-type strain. The 6-kb BamHI band that is diagnostic of the wild-type dGATAc allele is absent in the homozygous mutants (lane3). We conclude by this study that the lethal phenotype of these mutant embryos could be attributed to P-element insertion at the dGATAc gene.
The results of Southern analysis indicated that the P-element insertions in P812 and l(3)5930 occurred at the dGATAc promoter region. To precisely map the integration site, we used PCR and plasmid rescue to clone and determine their flanking sequences. As shown in Fig. 8A, the 5` junctional fragment of l(3)5930 insertion was obtained through a plasmid rescue experiment. In addition, PCR cloning and sequencing using primers designed for the dGATAc promoter region and the P-element terminal repeat also confirmed the integration site from the 3` direction. For P812, PCR was carried out using the promoter sequence plus the white gene sequence included in the P-element construct (48) to amplify the 3` junctional sequence. The results of these studies are summarized in Fig. 8B. Interestingly, the two independent P-element insertions are very close to each other, and they are only 26 and 34 bp upstream of the cap site of the dGATAc gene for l(3)5930 and P812, respectively.
Figure 8: Determination of the P-element integration site. A, scheme for cloning the junctional fragments of l(3)5930 and P812 P-element insertion strains. B, the integration sites for l(3)5930 and P812 are marked relative to the dGATAc transcription initiation site. A detailed description of the dGATAc gene structure and the analysis of its promoter will be reported elsewhere.
To validate the assertion that the embryonic lethal phenotype is indeed linked to the lack of dGATAc protein function, we crossed P812 with l(3)5930 to determine whether the two mutations are allelic. We expected that the two mutant alleles would not complement each other and that the two insertions, if present in trans, would be lethal to the embryos. The different balancers for P812 and l(3)5930 strains provide a convenient way to identify the genotypes of the surviving offspring. All of them, in fact, carry the balancer derived from either one parental strain or both, thus indicating that none of the double mutants were viable (data not shown). We conclude, therefore, that the recessive embryonic lethal mutations on each of the mutant chromosomes belong to the same complementation group. Furthermore, the embryonic lethal phenotype seen in these strains is consistent with the finding that dGATAc transcripts are distributed in multiple yet specific organ systems in the developing embryos (Fig. 5).
Biochemical studies (52, 53, 56) and NMR structural analysis (51) defined the basic unit of the GATA zinc finger domain that is required for binding to its DNA target sequence. The recent isolation of a single-finger GATA finger protein from D. melanogaster(26) strengthened previous findings from areA(22) , nit-2(23) , and gln3(24) that DNA binding by the single-finger protein requires this 60-amino-acid conserved domain. Comparison of all the available Drosophila GATA finger sequences (Fig. 9) reveals that 43 of the 48 (90%) amino acids are identical for the N-terminal fingers of the dGATAa/pannier and dGATAc proteins. In contrast, a slightly longer sequence is conserved for the C-terminal dGATAa and C-terminal dGATAc fingers or the single finger of dGATAb. Of the 54 aligned amino acid residues of this region, 38 (70%) are identical, and of the remainder show only conservative change. Interestingly, the four point mutations identified as pannier null mutants are restricted to the conserved residues of the N-terminal finger(28) .
Figure 9: Comparison of the amino acid sequences of dGATA zinc fingers. The predicted zinc finger sequences of dGATAc are aligned with those of dGATAa Winick et al.(27) (identical to pannier of Ramain et al.(28) ) and dGATAb cDNA clones (Abel et al.(26) . The consensus sequence for each finger is shown in the bottomrow. Identical amino acid residues are indicated by the single-letter abbreviation, and the dash represents a conservative change.
Using the expression screening approach, Abel et al.(26) isolated the single-finger dGATAb gene from a 9-12-h embryonic cDNA library. Expression of dGATAb is found in the mesoderm-derived fat body, and the encoded protein, ABF, appears to act as an activator of the larval promoter of the alcohol dehydrogenase (Adh) gene.
Finally, members of the dGATA factor gene family are expressed in unique but slightly overlapping patterns during embryonic development. Together they could play important roles in tissue-specific gene regulation. However, this begs the question: would there be any functional redundancy or cross-interaction among different dGATA factors? It is noticeable that dGATAc is expressed in several tissues that were also found to be positive in previous whole mount in situ hybridization studies using dGATAa (27) and dGATAb (26) probes. In the late cellular blastoderm and early gastrulation stages, the weak dGATAc signal in the dorsal embryo (40-60% of the egg length) is also distributed in a pattern of stripes (Fig. 4, B and C), similar to those reported for dGATAa and dGATAb. Interestingly, the dorsal epidermis arising from this region of the embryo is clearly positive with the dGATAc probe at stage 13 of development (data not shown). The signal is particularly strong in segments T1-T3. In addition, dGATAa and dGATAc transcripts are colocalized to the posterior spiracles ( (27) and this work), and the dGATAb gene is expressed transiently in the anterior and posterior midgut primordia(26) . We excluded, with two experiments, the possibility that the apparent distribution of dGATAc transcripts in these tissues is an artifact due to cross-hybridization with other homologous dGATA sequences.
Using a shorter probe that lacks the
conserved finger sequence, we observed the same pattern of tissue
distribution (data not shown). Moreover, we analyzed the enhancer trap
line l(3)5930 and confirmed by antibody staining that the
expression of the lacZ reporter gene is confined to the same multiple
organ systems that express dGATAc transcripts. ()From a
biochemical point of view, different dGATA factors share similar DNA
binding structures and, through the function of zinc finger domain
alone, there would be limited discriminating power to determine target
site specificity. Other domains of the dGATA proteins must contribute
to the subtle binding site preferences and differential activity
necessary for their specific regulatory functions. This could be
achieved by intrinsic affinity to different target sequences or by
interacting with other regulatory proteins. It remains to be seen
whether different dGATA factors can regulate a common gene with
different effects. Because Drosophila is convenient for
genetic studies, the functional role of each dGATA factor can be
definitively identified by analyzing the phenotypes of the mutants. The
cloning of multiple dGATA genes, the precise mapping of their
chromosomal locations, and the identification and characterization of
the dGATA gene mutants should facilitate our future studies in this
direction.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) D50542[GenBank].