(Received for publication, August 9, 1995)
From the
The second intron of the human globin gene (
IVS2)
has been previously identified as a region required for proper
expression of
globin. To further characterize this region, we
have footprinted the entire
IVS2 and have analyzed regions of
interest by electrophoretic mobility shift assay. Through these studies
we have identified four utilized binding sites for the erythroid
regulatory factor GATA-1, two sites bound by general transcription
factor Oct-1, two sites bound by the nuclear matrix attachment DNA
binding protein special A-T-rich binding protein 1, and a site bound by
a potential homeobox protein. Additionally, we have found several
factors displaying temporal or tissue specificity by electrophoretic
mobility shift assay, which may be potentially involved in the
regulation of
globin expression. These proteins are not
supershifted by antibodies to factors important in erythroid regulation
such as GATA-1, NFE-2, or YY1, or by antibodies against more general
transcription factors.
The 70-kb ()human
globin gene complex has been
extensively studied as a model of gene regulation. The region consists
of 20 kb known as the locus control region, or LCR, which can confer
erythroid-specific expression and position independence on any gene of
interest (1, 2, 3) followed by a series of
family genes, five of which are temporally expressed (embryonic
, fetal
,
, adult
and
)(4, 5) . A number of nuclear proteins have been
identified that play a role in transcription of one or more
family genes, through binding to the LCR, promoter, and/or enhancer
sequences of these
genes(6, 7, 8, 9, 10, 11, 12) .
Some of these factors may have a role in the normal temporal switching
of globin, as switching has been shown to be regulated at the level of
transcription (4, 5) . Despite these many findings,
the details of the molecular mechanisms regulating
family gene
expression and switching are still unclear.
The adult human
globin gene has been shown to include enhancers 3` to the structural
gene and within the gene itself, specifically, in the region spanning
the 3`-end of IVS2 and the beginning of exon 3 (13, 14, 15) . Previous work has indicated
that
IVS2 is required for
globin expression(16) .
Two DNase I hypersensitive sites have been identified within the
globin structural gene, a stronger site in exon 3, and a weaker site in
the center of IVS2(17) . The
IVS2 intronic enhancer
region has a utilized binding site for erythroid transcription factor
GATA-1. (
)We have previously shown in murine erythroleukemia
(MEL) cells that the IVS2 sequences from the
globin gene will not
substitute for
IVS2 in human
globin expression. The
replacement of
IVS2 with
IVS2 leads to a substantial
decline in the expression of
globin and renders the gene
uninducible(18) .
IVS2 has also been shown to be a
nuclear matrix attachment region (MAR), one of 10 found in the
globin complex(19, 20) . These MARs may be involved in
transcription, splicing, and replication of DNA (21, 22, 23) .
In order to identify DNA
binding proteins that may play a role in the expression of human
globin, we have characterized the entire human
IVS2 region by
DNase I footprint analysis. Based on the footprint pattern and
transcription factor binding analysis utilizing a computer-generated
map, sites were chosen to be analyzed by EMSA. We report here evidence
that three of nine potential GATA-1 binding sites are utilized, as well
as a fourth degenerate GATA-1 site. Additional proteins seen to bind
IVS2 are Oct-1, a ubiquitous factor binding to a homeobox consensus
site, several stage or tissue-specific factors, and the nuclear matrix
binding protein SATB1. The binding of this latter protein to
IVS2
suggests that
IVS2 and nuclear matrix attachment may be involved
in the regulation of
globin transcription.
Figure 1:
IVS2 subclones
for footprint analysis. Restriction map of the three subclones (I, II, and III) used in footprinting
IVS2, as described under ``Materials and Methods.''
IVS2 has been numbered 1-850, with 1 corresponding to
the start of IVS2 (base number 62684 in the GenBank human globin gene
complex sequence, accession number J00179).
All three subclones were entirely footprinted at least
twice each in both orientations, using nuclear extracts from
CEM(24) , HEL-92(25) , HeLa(26) ,
K562(27) , and MEL (28) cells. CEM is a human
T-lymphocyte cell line; HEL-92 and K562 are human fetal-embryonic
hematopoietic lines; and MEL is a murine adult hematopoietic line. HeLa
cells are human cervical carcinoma cells and are not hematopoietic.
Linearized subclones were dephosphorylated and end-labeled with
[-
P]ATP using T4 polynucleotide kinase.
Following a second restriction digest, probes were polyacrylamide gel
electrophoresis-purified on a 5% nondenaturing polyacrylamide gel (29) . Some of each probe was subjected to Maxam-Gilbert
sequencing (29) to provide a sequence ladder for footprint
gels. Labeled probe was footprinted by DNase I digestion as described
previously (30) using 50 µg of nuclear extract and 160 ng
of DNase I/reaction, except where indicated.
Figure 2:
Characterization of nuclear factor
binding sites in IVS2. All experiments were performed using CEM,
HEL-92, HeLa, K562, and MEL nuclear extracts.
IVS2 has been
numbered 1-850, with 1 corresponding to the start of
IVS2 (base number 62684 in GenBank
globin complex sequence).
Included as reference points are the BamHI site in exon 2 and
the EcoRI site in exon 3. A, sites of footprints
observed by DNase I digestion of end-labeled probe. Thicker bands are footprints observed on both DNA strands; thinner bands are footprints observed on one strand. Sites have been labeled as
to which extracts generated the footprint and are numbered 1-14 for easy reference. s, sense; a,
antisense; C, CEM; E, erythroid (HEL-92, K562, and
MEL); A, all extracts; EF, embryonic-fetal (HEL-92
and K562); hs, hypersensitive site. B, consensus
sites for binding of GATA-1(33, 34) . C,
sites found experimentally to bind GATA-1.
Figure 3:
EMSA of site 2 with the 42-bp site 2
oligonucleotide. Gels are 5% polyacrylamide. A, EMSA with
nuclear extracts as marked. Lane 1, control with no nuclear
extract. Bands 1a and b are seen only in erythroid
cell lines HEL-92, K562, and MEL. B, competition EMSA of site
2; competitor oligonucleotides as marked. GATAm is the mutant GATA
oligonucleotide. Lanes 2, 5, and 7 include
10 molar excess of unlabeled competitor probe; lanes 3, 6, and 8 include 100
molar excess of unlabeled
competitor probe.
Figure 8:
EMSA of site 7 with the 40-bp site 7
oligonucleotide. Reactions include 10 µg of the indicated nuclear
extract and 5 µg of poly(dIdC) as nonspecific competitor.
Gels are 5% polyacrylamide. A, lane 1, control with
no nuclear extract; lane 8, MEL nuclear extract with
anti-GATA-1 antibody (Ab). Band 1, SATB1; band
2, GATA-1; band 3, ubiquitous band. B,
competition EMSA with CEM (lanes 1-8) and K562 (lanes 9-16) nuclear extracts, and competitor
oligonucleotide as marked. Lanes 3 and 11 include
100
molar excess of unlabeled competitor oligonucleotide; all
other competition lanes include 1000
molar excess of unlabeled
competitor oligonucleotide. Band 4 is a faint band that is
seen in CEM and K562 nuclear extracts when probe has been labeled to a
high specific activity. C, supershift EMSA with CEM (lanes
1-4) and K562 (lanes 5-8) nuclear extracts,
and antibodies (Ab) as marked.
Figure 4:
DNase I footprints of sites 5 and 6 of
IVS2. Lane 1, T+C lane of
Maxam-Gilbert sequencing reaction; lane 2, C lane of
Maxam-Gilbert sequencing reaction, Lanes 3 and 9,
control with no nuclear extract and 20 ng of DNase I. Reactions include
50 µg of the nuclear extract indicated and 160 ng of DNase
I.
Figure 5:
EMSA of
site 5 with the 60 bp site 5 oligonucleotide. Gels are 4%
polyacrylamide. A, EMSA with nuclear extracts as marked. Lane 1, control with no nuclear extract. Reactions include 10
µg of nuclear extract and 5 µg of poly(dIdC) as a
nonspecific competitor. Band 1, ubiquitous Oct-1 band; band 2, ubiquitous band; bands 3a and b,
bands specific to HEL-92 and K562 nuclear extracts; band 4,
band specific to MEL; band 5, band specific to CEM and K562. B, competition and supershift EMSA of site 5, with competitor
oligonucleotide or antibody (Ab) as marked. Lanes
1-6, K562 nuclear extract; lanes 7-11, MEL
nuclear extract. Lanes 8 and 10 include 100
molar excess of unlabeled competitor oligonucleotide; all other
oligonucleotide competition lanes include 1000
molar excess of
unlabeled competitor oligonucleotide.
Figure 6:
EMSA of site 6 with the 41-bp site 6
oligonucleotide. Gels are 3.5% polyacrylamide, with 5 or 10 µg of
each nuclear extract as marked and 5 µg of poly(dIdC) as
nonspecific competitor/reaction. A, lane 1, control
with no nuclear extract. Band 1, ubiquitous band; band
2a, HEL-92-, K562-, and MEL- (?) specific band; band 2b,
HEL-92- and K562-specific band; band 3, SATB1; and band
4, ubiquitous ``homeobox'' band. Lower bands on the gel
are likely proteolytic degradation products and are not reproducible. B, competition and supershift EMSA of site 6 with CEM (lanes 1-4) and K562 (lanes 5-8) nuclear
extracts, with competitor oligonucleotide or antibody (Ab) as
marked. Lanes 2 and 6 include 100
molar excess
of unlabeled competitor oligonucleotide; all other competition lanes
include 1000
molar excess of unlabeled competitor
oligonucleotide. C, competition EMSA of site 6 with CEM (lanes 1-7) and K562 (lanes 8-13) nuclear
extracts and competitor oligonucleotides as marked. GATAm and oct-1m are the GATA and Oct-1 mutant oligonucleotides,
respectively. Lane 3 includes 100
molar excess of
unlabeled competitor oligonucleotide; all other competition lanes
include 1000
molar excess of unlabeled competitor
oligonucleotide. D, competition EMSA of site 6 with HEL-92
nuclear extract. ANT is the Antennapedia consensus oligonucleotide. Lanes 2 and 5 include 10
molar excess of
unlabeled competitor oligonucleotide; lanes 3 and 6,
100
molar excess of unlabeled competitor oligonucleotide; lanes 4 and 7, 1000
molar excess of unlabeled
competitor oligonucleotide.
Interestingly, the nuclear matrix attachment DNA binding protein
SATB1 was found to bind to the site 6 probe in CEM nuclear extract (Fig. 6A, band 3). IVS2 has been
described as one of nine sites in 90 kb of globin gene sequence studied
to contain a MAR(19) . Additionally, it has been noted that
MARs from the human
globin gene can bind SATB1(21) . A
SATB1 consensus oligonucleotide inhibits band 3 formation with the site
6 probe (Fig. 6C, lane 3), and the SATB1 band
is supershifted by an anti-SATB1 antibody (Fig. 6B, lane 4). SATB1 is a 103-kDa protein(39) , and the
SATB1 runs very slowly on EMSA. It is also possible that the SATB1 band
seen in Fig. 6B might be a complex of SATB1 and some
other protein. SATB1 complexes have been suggested in a recent paper
describing the binding of SATB1 to an
globin
regulatory region (20) . As was done for footprinted site 5, a
supershift assay was performed using the same panel of antibodies
against general factors, erythroid factors, and ets proteins. No
additional supershifts were seen by EMSA.
Figure 7:
DNase I footprint of IVS2 site 7. Lane 1, C lane of Maxam-Gilbert sequencing reaction; lane 2, control with 10 ng of DNase I; lane 3,
control with 20 ng of DNase I. Reactions with nuclear extract include
50 µg of nuclear extract except as marked, and were treated with
160 ng of DNase I.
A 28-bp oligonucleotide was synthesized to characterize a footprint (site 12, Fig. 9) seen with nuclear extracts CEM, HEL-92, and K562. This oligonucleotide generated a complex gel shift pattern. One band was seen only with CEM, HEL-92, K562 and NIH 3T3 embryonic fibroblast nuclear extracts (41) (data not shown). Further characterization of binding to this site 12 probe showed that the bands are not competed by a general factor binding oligonucleotide (AP-1), and no supershifts were observed with the anti-SATB1 antibody (data not shown).
Figure 9:
DNase I footprint of site 12 of
IVS2. Lane 1, C lane of Maxam-Gilbert sequencing
reaction; lane 2, control with no nuclear extract and 10 ng
DNase I; lanes 3 and 9, control with no nuclear
extract and 20 ng DNase I. Reactions include 50 µg of the nuclear
extract indicated and 160 ng DNase I.
Human globin IVS2 has been entirely footprinted and
further characterized by EMSA. Previous data have indicated that this
region has several interesting structural and functional features; it
contains a 3`-enhancer region(13, 14, 15) ,
two DNase I hypersensitive sites(17) , and is required for
proper expression of the
globin gene(16) . We have
previously analyzed the expression in MEL cells of
constructs in
which
IVS2 has been replaced by
or
globin IVS2, and
have found that these globin IVSs are not interchangeable. When
IVS2 is replaced with
IVS2, the base-line expression of
is
greatly decreased, and the cells are not inducible with
Me
SO(18) . In addition, constructs in which
IVS2 has been replaced with
IVS2 produce
transcripts that
are improperly initiated in K562 cells(42) . Comparison of
,
,
, and
IVS2 using restriction maps and maps
generated by the tfsites data base reveals no significant sequence
conservation on the nucleotide level, and few conserved potential
transcription factor binding sites, except for two GATA-1 binding sites
that are conserved in position. The first is the second intronic GATA-1
site (Fig. 2), which is conserved in position between
and
globin. The second is the seventh GATA-1 site, which is conserved
in position between
and
globin. Neither of
these sites were found to bind GATA-1 in our experiments and are not
apparently functionally important, at least in the expression of
globin.
The footprint pattern of IVS2 and those areas studied
by EMSA have revealed a very dense and complex pattern of protein
binding (Table 1). Previous studies in which only the DNase I
hypersensitive site of murine
IVS2 was characterized also
revealed a complex gel shift pattern(43) . These gel shift
analyses covered about one-third of the total sequence of murine
IVS2. Two proteins were identified as binding to murine
IVS2,
GATA-1 and Spi-1/Pu.1, an Ets family protein(44) . These murine
IVS2 binding sites are not conserved in human
IVS2. Human
IVS2 does contain four potential Ets binding sites, but only one
of these is footprinted in human
IVS2 (site 14), and this one
site has been shown to bind GATA-1 only.
The complexity of
the binding pattern seen in IVS2 suggests that it is an area of complex
regulatory function, and might be involved in the regulation of
expression in the adult, and perhaps in earlier stages of
erythropoiesis. Certainly a regulatory function is supported by the
extensive binding of erythroid transcription factor GATA-1 to this
region. The redundancy of the GATA-1 consensus sequences alone (nine
sites), unique among the globin genes, indicates that some function is
likely. By comparison, the human
globin gene has only three
GATA-1 consensus sequences, and the human
gene only two GATA-1
consensus sequences. Three of the nine consensus sites in
IVS2
are bound by GATA-1 as is a fourth related sequence. It is interesting
that although binding sites for GATA-1 have been extensively
characterized(34, 35, 36) , one still cannot
predict with certainty which sites are utilized in vivo or in vitro.
Several stage- or tissue-specific bands were seen
on EMSA of IVS2. The gel shift analyses on site 5 in particular
revealed several bands of interest (Fig. 5, A and B), none of which could be supershifted by antibodies to
general transcription factors or known factors important in the
regulation of globin expression such as NFE-2, YY1, or GATA-1. Of
particular interest is a band seen only with MEL (adult erythroid)
cells (Fig. 5A, band 4), as this could
represent a potential factor for positive expression of
globin.
The two bands seen with HEL-92 and K562 (bands 3, a and b) and
faintly in CEM and the band seen only with CEM and K562 nuclear
extracts (band 5) are also intriguing. Perhaps these proteins
are not seen in murine MEL nuclear extract due to the species
difference, or possibly they play a role in embryonic-fetal
erythropoiesis, or in lymphoid cells. The HEL-92-, K562-, and possibly
MEL-specific band bound to the site 6 oligonucleotide (Fig. 6A, band 2a) can be approximately sized
due to the presence of SATB1 (band 3) binding to this
oligonucleotide in CEM nuclear extract. The HEL-92 and K562 specific
band runs more slowly than SATB1 which has a molecular mass of 103 kDa.
This size would be larger than known erythroid regulatory proteins,
with the exception of
PE.
PE is a 108-kDa protein that binds
to sites near human
globin(45). Its broad pattern of tissue
distribution argues against it being any of the uncharacterized
proteins we have found. The particular pattern of expression of the
HEL-92- and K562-specific band, i.e. only in embryonic-fetal
erythroid cells, could be relevant to down-regulation of
globin
expression early in development. This could be of particular importance
if GATA-1 binding to
IVS2 is indeed important in positive
regulation of
globin expression, as GATA-1 is certainly present
in embryonic-fetal erythroid cells. Another differentially expressed
band at site 12 (Fig. 9), seen with CEM, HEL-92, K562, and NIH
3T3 nuclear extracts, but not with the adult HeLa or MEL cells, is of
unknown significance.
We have found one potential homeobox protein
binding site in IVS2 (site 6, Fig. 6A, band
4). Previous data have shown that homeobox proteins may be
important in erythroid differentiation(46) . Eight of nine
genes in the HOX 2 cluster are expressed in erythroid cells, but rarely
in B or T cells(46, 47, 48, 49) .
There is also indirect evidence that HOX 3C may be necessary for adult
hematopoiesis(50) . A band found in all extracts at site 6 was
competed by an oligonucleotide with the Oct-1 consensus sequence
(ATTTGCAT) (Fig. 6C, band 4, lanes 6 and 12) and by an oligonucleotide with the Antennapedia
consensus sequence (CAATTAAA) (Fig. 6D, band
4, lane 7). The Antennapedia sequence is within the site
6 footprint and site 6 probe and is listed in the tfsites data base as
the engrailed consensus. This is the core consensus for many HOX
proteins including HOX B6, which may have a role in erythroid
differentiation(51) . However, we do not see any erythroid cell
specificity of the particular protein binding at this site. Although
the site 7 footprinted region and oligonucleotide contain the consensus
sequence for the homeobox protein bicoid, all bands seen with K562
nuclear extracts could be competed by GATA-1, SATB1, or Oct-1 sequence
oligonucleotides, and so all bands in erythroid cell extracts are
accounted for. There seems to be no erythroid-related homeobox binding
at this site in
IVS2.
We have found that the nuclear
matrix-associated DNA binding protein SATB1 binds to site 6 of
IVS2 with CEM nuclear extract and more intensely to site 7 of
IVS2 with CEM and K562 nuclear extract (Fig. 6A, band 3; Fig. 8A, band 1). MARs are
postulated to play an important role in the functional organization of
chromatin loop domains. There is evidence that replication and
transcription occur at the interface of DNA and the nuclear matrix and
that the nuclear matrix is involved in RNA
splicing(21, 22, 23) . Recent reports have
indicated that DNA binding of some transcription factors is associated
with the nuclear matrix (52, 53, 54, 55) . MARs have a
strong potential for extensive unpairing or unwinding. Although MARs
often contain or reside close to enhancer sequences(21) , their
role is not clear as yet.
SATB1 is one of the characterized MAR
binding proteins. It is a 103-kDa protein that binds as a monomer and
is expressed primarily in thymus (21, 39) . It binds
selectively to MARs with well mixed ATC sequences (21, 39) . IVS2 has been previously
characterized to be one of nine sites in the 90 kb of the human
globin gene locus to function as an MAR(19) . MARs are regions
of DNA at least 200 bp in length and are generally 70%
AT-rich(21, 22) . The areas binding SATB1 in
IVS2 are about 73% AT rich and do consist of a well mixed ATC sequence.
Two distinct sites, in footprints 6 and 7, bind SATB1; two sites seem
to be required for a strong SATB1 interaction to
occur(21, 39) . The site 6 and 7 oligonucleotides are,
respectively, 83 and 75% AT-rich. The bands run very slowly on EMSA and
may consist of a complex of SATB1, and some other protein as has been
suggested(20) .
Preliminary data show that SATB1 is a
suppressor of transcription based on transient cotransfection assays
with a reporter gene(21) . One regulatory region to which SATB1
binds is the Igµ heavy chain intronic enhancer, which is flanked by
MARs. In this context SATB1 may help to repress expression in non-B
cells(56) . SATB1 has also been observed to bind the
globin gene(21) . Also, SATB1 has been recently reported to
bind to the human
3`-regulatory region at sites I and
IV(20) . These sites had been previously characterized as
binding HOX protein 2.8 (2H)(57, 58) . Besides being
highly expressed in CEM cells, SATB1 was also found in heart, skeletal
muscle, fetal liver, K562 cells, and B and T cells (20) . It
was proposed that the
regulatory region might
influence gene expression through interaction with the nuclear matrix.
The regulatory region was also found to be an MAR, and this group
speculated that promoter/enhancer interaction is mediated by SATB1
binding of MARs. However, they found no MAR near the
promoter(20) .
MARs and SATB1 binding in IVS2 could
have any of several functions. Previous data seem to indicate a
correlation between MARs and enhancer
regions(21, 22) . Also, each
family gene (except
) harbors an MAR, while by comparison, no such sites
exist in the large
globin gene complex(19) . MARs might
mediate an attachment between individual
globin family genes and
the
globin LCR (which also contains MARs)(19) , possibly
having some role in globin switching. The
IVS2 MAR might increase
expression mediated by the
IVS2 enhancer, as the
3`-MAR(19) , situated 500 bp downstream of the
3`-enhancer, might facilitate expression mediated by this enhancer. Or,
the
IVS2 MAR in combination with GATA-1 binding or binding of
other factors might function as an independent enhancer in
IVS2.
Possibly, MARs could mediate interaction between IVS2 and the
promoter or 3`-enhancer.
From the complexity of DNase I footprint
and EMSA results we have obtained, it is clear that there are many
interactions between human IVS2 sequences and nuclear factors,
both known factors and those yet to be characterized. The biological
significance of the presence of protein-DNA interactions in
IVS2,
and any interactions between
IVS2 and other regulatory sequences
5` or 3` to the human
gene, other
family genes or the LCR
remain to be determined. The details of the relationships between DNA
binding factors and globin gene switching also remain to be elucidated.
Deletion analysis and site-directed mutagenesis of human
IVS2
transacting factor binding sites may provide new insights into the
relationship between protein binding to this region and human
globin gene function.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) J00179[GenBank].