From the
We have previously shown that a tandem pair of (A/T)GATA(A/G)
sequences in the promoter region of the Caenorhabditis elegans gut esterase gene (ges-1) controls the tissue specificity
of ges-1 expression in vivo. The ges-1 GATA
region was used as a probe to screen a C. elegans cDNA
expression library, and a gene for a new C. elegans GATA-factor (named elt-2) was isolated. The longest open
reading frame in the elt-2 cDNA codes for a protein of M
The DNA sequence motif (A/T)GATA(A/G), hereafter referred to
only as GATA, is known to be involved in lineage-specific gene
expression in other organisms, most prominently in the control of
globin gene transcription in vertebrates (reviewed in Refs. 1-3).
``GATA-binding proteins'' or ``GATA factors'' are a
class of transcriptional activators that bind to GATA sequences in DNA.
Six classes of GATA factors (termed GATA-1 through GATA-6) have now
been defined in vertebrates(4) , and all have a pair of
distinctive zinc finger domains. Several one-finger GATA factors have
been identified in fungi (5-7), and two GATA factors have been
identified in Drosophila, one with a single zinc finger (8) and one with a pair of zinc fingers(9, 10) .
One GATA factor (named elt-1, standing for erythroid-like transcription factor 1) has
been isolated from the nematode Caenorhabditis elegans on the
basis of sequence homology with chicken GATA-1(11) .
We have
been studying the C. elegans ges-1 (gut esterase 1) gene as an example of a terminally differentiated
gene whose expression is restricted to the intestine. The ges-1 gene is expressed via transcription from the zygotic genome,
beginning when the developing gut has only four to eight
cells(12) . At this stage, the entire embryo has only
150-200 cells and ges-1 is one of the earliest markers
of tissue-specific differentiation to appear in the C. elegans embryo. Our previous deletion-transformation analysis identified a
region within the ges-1 promoter that appears to function both
as a gut activator and as a pharynx/tail repressor(13) . A more
detailed analysis of this region has revealed that the 30-base pair
region controlling the spatial expression of ges-1 contains a
tandem pair of (A/T)GATA(A/G) sequences(14) . Furthermore,
nuclear extracts prepared from C. elegans embryos contain a
factor that binds to these motifs(15) .
Judging from the
sequence of the ges-1 control region (14-16), expression
of the ges-1 gene is likely to involve a GATA factor. The
obvious candidate for such a factor is the product of the elt-1 gene(11) . However, homozygous deficiency embryos lacking
the elt-1 gene still express ges-1(15) ,
suggesting that elt-1 is not required zygotically for ges-1 expression. With the aim of identifying a factor that
does control ges-1, we have probed a C. elegans cDNA
expression library with multimerized ges-1 GATA sequences. The
present report describes the result of this screen, the identification
of a second C. elegans GATA factor elt-2.
Genomic clones were isolated from a C. elegans
The longest elt-2 cDNA clone is
1560 bp (excluding the poly(A) tail) and is essentially full-length.
Reverse transcriptase-PCR (23) showed that elt-2 mRNA is trans-spliced to the SL1 leader sequence(24) , and the
last three nucleotides of the SL1 leader sequence can indeed be
identified at the 5` end of the longest cDNA clone. Reverse
transcriptase-PCR using the SL2 trans-splice primer (25) instead of SL1 did not produce a product (data not shown).
As will be shown below, the length of the longest cDNA agrees well with
the estimated size of the elt-2 message detected by Northern
blotting.
Twelve examples of the sequence
(A/T)GATA(A/G) can be identified in the elt-2 5`-flanking
region (not shown), approximately twice as many as would be expected by
chance. During erythropoiesis in vertebrates, expression of GATA-1 is
autoregulated through flanking GATA motifs(28, 29) , and elt-2 may be similarly autoregulated
In
summary, the upstream putative finger does not appear to bind GATA
sequences by itself nor does it appear necessary for GATA-sequence
binding by the downstream true finger.
Sequence analysis and DNA binding assays have shown that
ELT-2 clearly belongs to the GATA factor family of DNA-binding
proteins. ELT-2 appears slightly more closely related to GATA-5 than to
any of the other GATA factors (84% identity, 92% similarity in the DNA
binding domain). Considering that elt-2 was isolated by
binding to an endoderm specific gene in nematodes (i.e.
ges-1), it is interesting that GATA-5 also appears to be expressed
in the endoderm of vertebrates(4) .
The majority of GATA
factors contain two highly conserved zinc finger domains, but ELT-2
contains only one such domain. Although the diverged sequence
C-X
Our central concern is
whether elt-2 does indeed control expression of the ges-1 gene in vivo. Although both in vitro expressed
ELT-2 and a factor present in C. elegans embryonic extracts (15) bind GATA sequences in a similar manner, this is only
indirect evidence that two proteins could be identical. Thus a suitably
cautious statement is that elt-2 has not yet been eliminated
as a candidate for ges-1 control. In any case, the necessary
proof must come from a genetic experiment that assays ges-1 expression in an elt-2 mutant. We have identified a Tc1
transposon insertion into the fifth intron of the elt-2 gene,
using the system of Zwaal et al.(35) . It should now be
possible to produce an elt-2 null mutation by imprecise
transposon excision events. Until such a mutant has been identified, we
leave open the question how (or even if) elt-2 is involved in ges-1 control.
elt-2 is the second GATA factor
that has been cloned from C. elegans. Our low stringency
screens of genomic DNA, using elt-2 cDNA as a probe, have not
yet revealed obviously cross- hybridizing species (data not shown), nor
did previous attempts using the elt-1 gene as a
probe(11) . However, both elt-1 and elt-2 are
expressed predominantly in embryos, and we would be surprised if C.
elegans did not have additional GATA factors involved in gene
expression during other stages of the life cycle.
The nucleotide sequence(s) reported in this paper has been
submitted to the GenBank
We thank Drs. Peter Okkema and Robert Barstead for
providing cDNA libraries, Dr. Chris Link for providing the C.
elegans genomic library, Barbara Goszczynski for providing total
RNA from staged populations of worms, and Fran Allen for her help in
sequencing the elt-2 genomic clone.
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES
47,000 with a single zinc finger domain, similar
(approximately 75% amino acid identity) to the C-terminal fingers of
all other two-fingered GATA factors isolated to date. A similar degree
of relatedness is found with the single-finger DNA binding domains of
GATA factors identified in invertebrates. An upstream region in the
ELT-2 protein with the sequence
C-X
-C-X
-C-X
-C
has some of the characteristics of a zinc finger domain but is highly
diverged from the zinc finger domains of other GATA factors. The elt-2 gene is expressed as an SL1 trans-spliced
message, which can be detected at all stages of development except
oocytes; however, elt-2 message levels are 5-10-fold
higher in embryos than in other stages. The genomic clone for elt-2 has been characterized and mapped near the center of the C.
elegans X chromosome. ELT-2 protein, produced by in vitro transcription-translation, binds to ges-1 GATA-containing
oligonucleotides similar to a factor previously identified in C.
elegans embryo extracts, both as assayed by electrophoretic
migration and by competition with wild type and mutant
oligonucleotides. However, there is as yet no direct evidence that elt-2 does or does not control ges-1.
General Materials and Methods
Unless otherwise
stated, all DNA manipulations were carried out as described in Ausubel et al.(17) and Sambrook et al.(18) .
Sequencing reactions were performed using Taq DyeDeoxy
Terminator Cycle sequencing kits (Applied Biosystems) in a GeneAmp 9600
PCR(
)system (Perkin Elmer) and analyzed by
automated sequencer (Applied Biosystems model 373A). Routine handling
of C. elegans was performed as described by
Brenner(19) .
Isolation of cDNA and Genomic Clones
A mixed stage C. elegans gt11 cDNA library (obtained from Dr. P.
Okkema, Carnegie Institute of Washington, Baltimore, MD) was screened
with a
P-labeled multimerized GATA-containing
double-stranded oligonucleotide using the method of Vinson et
al.(20) ; the exact oligonucleotide sequences are given
under ``Results.'' A single positive clone was isolated and
subsequently used as a probe to isolate further clones from an
independent C. elegans mixed stage cDNA library prepared in
ZAP(21) . The probe was radioactively labeled with
[
-
P]dCTP using a Prime-it II kit
(Stratagene), and hybridizations were performed at 65 °C for 18 h
(6
SSC, 1% SDS, 5
Denhardt's solution, 100
µg/ml sonicated salmon sperm DNA). The filters were washed at high
stringency (0.1
SSC, 1% SDS, 65 °C) and exposed to x-ray
film overnight (X-Omat; Eastman Kodak Co.). The longest such cDNA clone
was sequenced on both strands using a combination of unidirectional
nested deletions (22) and custom synthesized primers (Regional
Oligonucleotide Synthesis Unit, University of Calgary). The 5` and 3`
ends of all the other purified
ZAP clones were also sequenced.
EMBL-4
library (obtained from Dr. C. Link, University of Colorado, Boulder,
CO). The library was screened with a full-length elt-2 cDNA
fragment, radioactively labeled using random oligonucleotide primers;
hybridization and washing conditions were as described above. Once the
gene sequence had been determined (see below), Southern blotting
confirmed that restriction fragments of the lengths predicted from the
sequence could indeed be identified in the appropriate digests of C. elegans genomic DNA.
Amplification of 5` End of mRNA Transcript
Poly(A)
mRNA was prepared from a mixed population of N2 worms using a
Micro-Fastrack mRNA isolation kit (Invitrogen), and cDNA was
synthesized as described by Frohman et al.(23) . PCR
amplifications were performed using the C. elegans
trans-spliced primers SL1 (24) or SL2 (25) and several elt-2-specific antisense internal primers. PCR products
generated were cloned and sequenced.
In Vitro Transcription-Translation of cDNA Clones and Gel
Mobility Shifts
ELT-2 protein was produced from elt-2 cDNA contained in the vector pBluescript SK (Stratagene) using the TNT-coupled transcription-translation
system (Promega). Deletion constructs of ELT-2, in which either amino
acids 1-174 or amino acids 243-433 had been removed, were
also produced by in vitro transcription and translation. DNA
binding activity was analyzed by gel mobility shift assays, essentially
as described by Stroeher et al.(15) . Each reaction
typically contained 2 ng of double-stranded oligonucleotide probe, 100
ng of poly[d(I-C)] competitor DNA (unless specified
otherwise), and either 1 µl of in vitro transcription/translation reaction (full-length ELT-2 or deletion
construct as appropriate) or 10 µg of nuclear extract, prepared
from fluorodeoxyuridine-blocked embryos as described by Stroeher et
al.(15) .
Northern Blotting
RNA was isolated from staged
worms using a total RNA purification kit (Clontech) and analyzed for
levels of elt-2 message by Northern hybridization using a
Hybond N+ nylon membrane (Amersham International plc.) according
to the manufacturer's instructions. The full-length elt-2 cDNA, random oligo-labeled as described above, was used as a
probe. After hybridization and washing (using the same conditions
described above), filters were exposed to X-Omat x-ray film (Kodak)
with intensifying screens at -70 °C for 1-30 days.
Isolation of a cDNA Clone by Probing C. elegans
Expression Libraries with GATA Sequences from the Gut Activator Region
of the ges-1 Gene
We have identified a tandem pair of GATA
sequences required for the gut-specific expression of the ges-1 gene in the C. elegans embryo(14) . Both single
GATA motifs and the double GATA motif (see Fig. 1) appear to bind
the same protein present in embryo extracts(15) . The downstream
GATA sequence (with terminal extensions added to ensure directional
cloning; see Fig. 1) was ligated into 11 tandem copies,
kinase-labeled with [-
P]ATP, and used as
probe to screen an expression library prepared in
gt11 from mixed
stage C. elegans RNA (26). One positive clone was isolated
from approximately 300,000 plaques screened; partial sequence analysis
revealed that the clone did indeed code for a ``GATA factor''
protein, as will be discussed in detail below. The gene has been named elt-2 in accordance with C. elegans nomenclature
conventions(27) .
Figure 1:
DNA
sequence of the C. elegans ges-1 gut activator region.
Nucleotides 2170-2208 from GenBank entry no. M96145 (16)
are shown, along with the oligonucleotides used for probing cDNA
libraries and gel mobility shift assays. Terminal extensions (atgc) were added to ensure directional cloning (see
``Results''). Complementary oligonucleotides were synthesized
as appropriate, and double-stranded oligonucleotides were used in all
experiments.
The insert from this original elt-2 cDNA clone was amplified by PCR and used as probe to screen a
mixed-stage cDNA library prepared in ZAP(21) . Eleven
positive clones were isolated from approximately 350,000 plaques
screened under high stringency conditions; partial sequence analysis
and restriction mapping indicated that all of these clones were derived
from the same transcript.
Position of the elt-2 Gene on the Physical Map of the C.
elegans Genome
The isolated insert from the longest cDNA clone
was used as a probe to screen a C. elegans genomic library
under high stringency conditions. Six positive phage were isolated and
restriction mapping showed them all to be identical or overlapping. A
total of 5433 bp of elt-2 genomic sequence, including the
entire gene, 3.5 kilobase pairs of upstream flanking region and 0.3
kilobase pair of downstream flanking region, has been obtained and
deposited in GenBank (accession no. U25175). The genomic clones
were used to locate the elt-2 gene on the physical map of the C. elegans genome (Dr. A. Coulson, MRC, Cambridge, UK). elt-2 lies just right of center on the X chromosome (cosmid
C12H7 on the C. elegans physical map), close to the gene mab-18. No obvious genetic candidate for elt-2 has
been identified in this vicinity.
Alignment of elt-2 Genomic, cDNA, and Protein
Sequences
Comparison of the genomic and cDNA sequences allows
unambiguous assignment of the intron-exon structure of the elt-2 gene, as shown schematically in Fig. 2A. The elt-2 cDNA and protein sequences are aligned on Fig. 2B. The longest open reading frame (433 amino acid
residues, producing a predicted protein of M 47,000) begins with a putative initiator methionine residue 1
base pair downstream of the SL1 trans-splice acceptor site;
this reading frame includes the distinctive GATA-factor DNA binding
domain, as will be discussed in detail in the next section. A potential
poly(A) addition site (AATAAA) is present 237 nucleotides downstream of
the proposed translation termination codon and 11 nucleotides upstream
of the poly(A) sequence found in the cDNA clones.
Figure 2:
Alignment of the elt-2 genomic,
cDNA, and protein sequences. A, schematic alignment of the elt-2 genomic and cDNA structures. The elt-2 gene
contains nine exons; the length of each exon in base pairs is noted
above the genomic clone. The lengths of the eight introns (in base
pairs) are listed in italics below the 1.6-kilobase pair
schematic diagram of the elt-2 cDNA. Intron 6 interrupts the
conserved zinc finger domain (shaded). B, nucleotide
sequence of the elt-2 cDNA and deduced amino acid sequence. Numbers on the right refer to the amino acid and nucleotide
sequences, respectively. The longest open reading frame is 433 amino
acids, beginning at the first ATG codon after the SL1 trans-splice leader sequence (underlined). A putative
polyadenylation site (aataaa) is double-underlined and is 237 nucleotides downstream of the proposed translation
termination codon. The conserved zinc finger DNA binding domain (amino
acids 237-261) is shaded. Intron positions are indicated
by the arrowheads.
The elt-2 Gene Encodes a Distinctive Zinc Finger Domain
of the Type Found in GATA Factors
The ELT-2 protein contains a
region whose sequence is clearly related to the zinc finger DNA binding
domains of all GATA factors yet reported. The majority of GATA factors
have two such domains, but ELT-2 appears to only have one (as do
several factors isolated from fungi and from Drosophila, as
noted earlier). Fig. 3A aligns the sequence of the
single zinc finger domain of ELT-2 (amino acids 237-261) with the
sequence of the C-terminal finger of the previously isolated GATA
factors elt-1 from C. elegans(11) and pannier from Drosophila(9, 10) and
with the sequence of the C-terminal fingers found in each of the six
classes of chicken GATA factors(4, 30, 31) ;
four examples of single zinc finger factors from invertebrates (5, 6, 7, 8) are also included in the
alignment. Levels of sequence matches are in the range of 72-84%
amino acid identity (76-92% similarity) when compared to
vertebrate GATA factors and in the range of 56-72% identity
(68-85% similarity) when compared to invertebrate factors. A
similar alignment of the zinc finger domain of ELT-2 with the
N-terminal zinc fingers of two-finger GATA factors reveals 48-56%
identity (68-72% similarity). Unlike other known GATA factors,
the zinc finger domain of ELT-2 is contained on two separate introns.
As shown in Fig. 3B, all of the proteins listed above,
including ELT-2, share a highly conserved region (36-75%
identity, 54-88% similarity) extending 25 amino acids to the
C-terminal side of the zinc finger; this region has been shown to be
necessary for DNA binding(32) .
Figure 3:
The elt-2 gene encodes a
distinctive zinc finger domain of the type found in GATA factors. A, amino acid sequence alignment of the zinc finger domain of
ELT-2 (residues 237-261) aligned with the C-terminal zinc finger
domains from a number of two-finger GATA factors (elt-1, C. elegans, residues 272-296) (11); pannier, Drosophila, residues 226-250 (9, 10); GATA-1 to GATA-6
from chicken, residues 164-188 (cGATA-1) (30),
335-359 (cGATA-2) (31), 317-342 (cGATA-3)
(31), 211-235 (cGATA-4) (4), 239-263 (cGATA-5) (4), 235-259 (cGATA-6) (4), and zinc
finger domains from several single-finger GATA factors (ABF, Drosophila, residues 318-342 (8); nit-2, Neurospora crassa, residues 743-767 (6); areA, Aspergillus nidulans, residues 516-540 (5); GLN3, Saccharomyces cerevisiae, residues
306-330 (7)). Residues identical to aligned residues in ELT-2 are shaded, and the percentage amino acid identity to ELT-2 is
listed to the right of each alignment. B, amino acid
sequence alignment of the 25 amino acids immediately C-terminal to the
zinc finger domain of ELT-2 (residues 262-286) and equivalent
residues from other GATA-factors; references to individual sequences
are listed in A. Residues identical to aligned residues in
ELT-2 are shaded, and the percentage of amino acid identity to
ELT-2 is listed to the right of each
alignment.
ELT-2 Protein Produced in Vitro Binds to the GATA
Sequences of the C. elegans ges-1 Gene
ELT-2 protein was
produced in a coupled transcription-translation reaction and used in
electrophoretic mobility shift experiments with double-stranded
oligonucleotide probes. As shown in Fig. 4A, ELT-2
protein binds to the downstream GATA sequence of the ges-1 gene (the sequence used as probe for the library screening), and
this binding is resistant to a large excess of the nonspecific
competitor poly[d(I-C)]. Formation of the complex is competed
effectively with the unlabeled oligonucleotide, but is competed
ineffectively by an oligonucleotide in which the TGATAA sequence has
been mutated to GTCGCC (Fig. 4B). This specific binding
behavior mimics the binding properties of the GATA-binding factor
identified in nuclear extracts prepared from C. elegans embryos (see Fig. 5B of Ref. 15). Fig. 4C shows that elt-2 also binds to the upstream GATA motif
from ges-1. Fig. 4D shows that an elt-2-GATA oligonucleotide complex has an electrophoretic
mobility very close to that of the complex containing the binding
protein present in embryonic nuclear extracts(15) . Numerous
repetitions of this experiment have never reliably distinguished
between the two complexes. Any migration differences seen in individual
experiments are slight, and we judge them to lie within the uncertainty
associated with comparing crude extracts and in vitro translation products.
Figure 4:
Gel mobility shift analysis of in
vitro expressed ELT-2 protein binding to the putative gut
activator region of the C. elegans ges-1 gene. A,
double-stranded downstream GATA probe (see Fig. 1 for oligonucleotide
sequences) incubated with in vitro expressed full-length ELT-2
protein in reactions containing X-fold excess (by weight) of
nonspecific competitor DNA. B, double-stranded downstream GATA
probe incubated with in vitro expressed full-length ELT-2
protein in reactions containing X-fold molar excess of
unlabeled double stranded wild-type (downstream GATA oligo) or mutant
competitor oligonucleotide. C, in vitro expressed
full-length ELT-2 protein, incubated in reactions with either
double-stranded upstream or downstream GATA oligonucleotides. D, double-stranded downstream GATA oligonucleotide incubated
in reactions with either nuclear extracts from
fluorodeoxyuridine-blocked embryos (15) or in vitro expressed
full-length ELT-2 protein.
Figure 5:
A possible second zinc finger domain in
the elt-2 gene. A, amino acid sequence of the
putative upstream zinc finger domain of ELT-2 (residues 153-176)
aligned with the N-terminal zinc finger domains from several two-finger
GATA factors; for references to individual sequences, see legend to
Fig. 3A. Residues that are identical to the aligned residues
in ELT-2 are shaded; overall percentage identitity is listed
on the right of each example. B, gel mobility shift
assay of reactions containing double-stranded gut activator
oligonucleotides binding to in vitro expressed full-length
ELT-2 protein, putative zinc finger construct (amino acids
243-433 deleted) and true zinc finger construct (amino acids
1-174 deleted). Left-hand three lanes use the upstream
GATA oligonucleotide as the probe; the right-hand three lanes use the tandem GATA oligonucleotide as the probe (see Fig. 1 for
sequences).
A Possible Second Zinc Finger Domain in the ELT-2
Protein
Standard alignment programs identify only a single zinc
finger domain in the ELT-2 protein. However, visual inspection reveals
the sequence
C-X-C-X
C-X
-C
(amino acids 153-176), which looks sufficiently like a second
zinc finger domain to warrant further investigation. Fig. 5A shows the best alignments that could be obtained between this
region of ELT-2 and the N-terminal fingers of other GATA factors
(25-30% identity; 42-48% similarity). This upstream region
lacks the distinctive residues usually found between the two cysteine
pairs in GATA factors, but it could still be involved in DNA-binding
nonetheless. To test this possibility, ELT-2 mutant proteins were
produced by in vitro transcription and translation and then
assayed for DNA binding ability by gel mobility shift. Three constructs
were assayed: (i) full-length ELT-2 protein; (ii) a truncated version
containing the upstream putative zinc finger domain, but lacking the
downstream ``true'' zinc finger; and (iii) a truncated
version containing the downstream true zinc finger but lacking the
upstream putative finger. The results of this experiment are shown on Fig. 5B. The protein containing only the upstream
putative finger does not bind to the GATA-containing oligonucleotides;
in contrast, the protein containing only the downstream
``true'' finger shows easily detectable binding.
Expression of elt-2 during C. elegans
Development
Total RNA was isolated from the different stages of
the C. elegans life cycle and analyzed by Northern blotting,
using the full-length cDNA as a probe. As shown on Fig. 6, elt-2 mRNA sequences can be detected at all stages of
development, except oocytes (even after 1-month exposure of the
autoradiograph). However, the highest level of elt-2 mRNA is
present in embryos, 5-10-fold higher than in other stages.
Approximately equal amounts of RNA were loaded for each developmental
stage, at least as judged by the hybridization intensity produced by
probing with the rp21 ribosomal protein gene(11) .
Finally, the size of the elt-2 message, as shown in Fig. 6, is within experimental error of the size predicted from
the full-length elt-2 cDNA.
Figure 6:
Northern blot analysis of elt-2 expression during C. elegans development. Five µg of
total RNA isolated from staged populations of worms were
electrophoresed on each gel lane, blotted, and probed with the
full-length elt-2 cDNA clone as described under
``Experimental Procedures'' (7-day exposure). The blot was
also probed with a ribosomal protein gene (rp21, obtained from
Dr. J. Spieth, Indiana University, Bloomington, IN) to confirm RNA
loading quantities (overnight exposure). The size of the elt-2 mRNA was estimated from RNA standards run on the same gel (not
shown). Stages are represented as follows: Oo =
oocytes, prepared as in Stroeher et al. (15); Em = embryos, isolated by alkaline hypochlorite digestion of
gravid adults; L1, L2, L3, and L4 = larval stages; Gr = gravid adults; and Mx = mixed stage population.
-C-X
-C-X
-C
was detected in ELT-2 just upstream of the true finger, there was no
evidence that this region bound GATA-containing oligonucleotides. On
the other hand, the upstream fingers of two-finger GATA factors also do
not appear to bind DNA in the absence of the downstream
finger(33, 34) . Thus it remains possible that this
upstream domain in ELT-2 somehow contributes to binding strength or
binding specificity. A different view is that this motif is an
evolutionary relic of a former finger.
/EMBL Data Bank with accession
number(s) U25175.
©1995 by The American Society for Biochemistry and Molecular Biology, Inc.