From the
A total of seven DNA bend sites were mapped in the 4.4-kilobase
human
The human
Other than the coding regions, there are
conserved sequences in the noncoding regions. Some of them are TATA,
CAAT, and CAC boxes and the binding motifs for transcription factors
such as GATA-1
(7) . Gumucio et al.(8) reported
that 12 sequence motifs were conserved in the upstream
The sequence that confers DNA bending was first reported for
kinetoplast DNA, and, since then, bent DNA has been identified in the
genes of a wide variety of species
(9) . The biological effects
of the DNA bending were reported in conjunction with recombination,
transcription, and
replication
(10, 11, 12, 13, 14, 15, 16, 17, 18, 19) .
The molecular structure that causes DNA bending is not yet fully
understood, but the unusual conformation adopted by repeated short
poly(dA) tracts, especially when they are distributed in roughly
10-base intervals, shows the extensive bending
characteristics
(20, 21) .
We reported that DNA bend
sites are located in a very organized manner, at an interval of about
700 bp,
The positions of the bend sites are
summarized in . Also shown is a comparison of the bend
sites in the human
The sequences in the region of
the bend sites are shown in Fig. 2. There were stretches of short
poly(dA) tracts (shadowed in the figure) in all of them, which
are typical of DNA bending. Other than this, there were no apparent
sequence features in common among the sites.
We previously reported the periodic appearance of DNA bend
sites in the noncoding region of the human
The human
We previously proposed
that the bend sites that appear at a 700-bp interval are used as a
signal for nucleosome phasing and can facilitate efficient and accurate
folding of the chromatin structure during chromosome
condensation
(22) . Although it is still a matter of discussion
whether DNA bending actually occurs in vivo, the A+T-rich
sequences tend to be excluded from the nucleosomes and, therefore,
could phase the nucleosomes and facilitate the chromatin folding.
Our present findings suggest that the periodicity of the bend sites
is mainly directed by the regions that include the sequences containing
a consensus A/A/A and that the periodicity is universal among
eukaryotic genomes. Periodicity, or long range correlation of the
nucleotide bases among eukaryotic genomes have been reported by several
groups
(28, 29, 30, 31, 32, 33) .
Peng et al.(32) suggested long range correlation of
the nucleotide sequences in genes with introns, and the correlation was
absent in the genes without introns or mRNAs
(32) . The
periodicity of nucleotide bases has also been shown by digesting
genomic DNA with less base-specific nucleases such as DNase
I
(33) . From the results of in vitro reconstitution of
nucleosomes, van Holde and co-workers
(34, 35) postulated that nucleosome positioning is an inherent
property of nucleotide sequences. It also has been shown that
nucleosomes could be phased by DNA sequences containing the
dinucleotide A
-globin gene region by the circular permutation assay. The
periodicity of these sites (except one) was about every 700 (average
685.5 ± 267.7) base pairs. All of the sites contained the
sequence feature of short poly(dA) tracts, which are typical of DNA
bending. The relative positions of the sites to the cap site were
identical to those in the
-globin gene region, suggesting that the
bend sites were conserved during molecular evolution of the two globin
genes. To explain this periodicity and conservation of the sites within
the evolutionary unstable noncoding regions, we focused upon the
appearance of a potential bend core sequence,
A
N
A
N
A
(A/A/A), and its complement,
T
N
T
N
T
(T/T/T). These sequences appeared in or very close to most of the
bend sites of the globin gene regions, whereas other A+T-rich
sequences or candidates for DNA bending did not. The distances between
any two of the core sequences in the entire
-globin locus showed a
strong bias to a length of about 700 base pairs and its multiples,
suggesting that the periodicity exists throughout the locus. The data
presented here strengthen the idea of sequence-directed nucleosome
phasing.
-globin locus consists of five active genes in
the order of
-
-
-
-
on
chromosome 11 (1, 2). These active genes and several pseudogenes, as
well as the genes in the
-globin locus, were considered to be
derived from a single ancestral gene. The divergence of the ancestral
globin gene first occurred about 500 million years
ago
(3, 4) . Since then, the number of the globin genes
increased by a series of duplication and subsequent diversification,
which enabled it to achieve the coordinated function of the globin gene
family typically shown by switching of globin genes during embryonic
development to accommodate oxygen transport under different
environmental conditions. In the process of globin gene evolution,
duplication created two identical genes, including the coding regions.
Although the noncoding regions experienced extensive mutation and
insertion of transposable elements, which resulted in the
diversification of their nucleotide sequences, the coding regions
retained their homology by the mechanism of gene
conversion
(5, 6) . The conservation of the coding
regions can be explained by the mechanism that sequences that bear
important functions are difficult to mutate and, therefore, are
conserved during evolution.
-globin
gene region in several species as protein binding sites
(8) .
(
)
in the noncoding region around the
human
-globin gene over 7 kb in length (22). The regularity of
these bend sites in an evolutionary unstable noncoding region suggests
that they are significant, possibly in the organization of chromatin
structure. We, therefore, proposed that the sites act as a signal for
nucleosome phasing
(22) . We present here evidence that DNA bend
sites have been conserved during molecular evolution of the genomic
sequence, which strengthens their significance in the genomic
organization as well as function.
Materials
Restriction enzymes were purchased
from Takara (Kyoto) or New England Biolabs.
Plasmid Construct
Plasmids were constructed by
subcloning a fragment from the plasmid containing the region between
the positions -2282 (PstI) and 2161 (PstI) in
pUC8 vector. The fragment in each plasmid was as follows: pBA41,
-2256 (BamHI) to -1461 (BamHI); p
DU,
-1844 (DraI) to -1029 (DraI);
p
MHA30, -1444 (HincII) to -817
(HincII); p
MHB12, -817 (HincII) to
-79 (MscI); p
BBA24, -406 (BstYI) to
321 (BstYI); p
R65, -129 (RsaI) to 580
(RsaI); p
H22, 403 (HaeIII) to 1300
(HaeIII); p
HHc113, 403 (HaeIII) to 996
(DraI); p
HE41, 918 (MslI) to 1300
(HaeIII); p
DL7, 996 (DraI) to 1616
(DraI); p
E81, 1394 (EcoRI) to 2161
(PstI); and p
DES14, 1394 (EcoRI) to 1616
(DraI). Tandem duplicates of each fragment were inserted into
the multiple cloning site of pBluescript SK(-).
Assay for DNA Bend Sites
The circular permutation
assay for DNA bend sites originally described by Wu and Crothers
(23) proceeded as described previously
(22) . Briefly,
about 1 µg of plasmid DNAs that contained duplicates of the regions
of interest were linearized with the restriction enzymes shown (see
Fig. 1
). After mixing with an internal calibration marker
(PvuII or AvaI digests of M13mp18) the DNAs were
electrophoresed at 4 °C on 8% polyacrylamide gels (mono/bis
= 19:1) in 50 mM Tris borate, 5 mM EDTA (TBE)
buffer under the following conditions: 1 V/cm for 60-70 h for
fragments of over 600 bp or 1.5 V/cm for 30-40 h for those
smaller than 600 bp. Electrophoresis at 55 °C (3 V/cm for 7 h) was
performed with at least one clone for each bend site to confirm that
the bending was abolished at a high temperature.
Figure 1:
Mapping of DNA bend
sites in the human -globin gene region. a, the circular
permutation assay performed at 4 or at 55 °C with the plasmid
p
DU, which contains the region between -1844 and
-1029. U, unit length fragments; M, 422-bp
AvaI fragments from M13mp18 DNA. b, summary
of mapping. The plasmid DNA containing tandem duplicates of the region
shown as a thickhorizontalline in each
panel was digested with the enzymes shown below. The relative
migration distance to the fastest migrating band was adjusted using
marker DNA fragments (422-bp AvaI fragments or 322-bp
PvuII fragments from M13mp18 DNA, whichever was closer to the
unit fragments was used), which are shown in the panel. The
verticalbar in the panel indicates the average
thickness of the bands. The bend sites, tentatively and operatively
defined as the regions between the second and third nearest restriction
sites (see Table I), are shown as a shadowedbox. An
additional bend site (
B-2`) between
B-2 and
B-1
is also indicated in the figure. The restriction enzymes were:
1, AccI; 2, AciI; 3,
AlwNI; 4, ApaI; 5, BamHI;
6, BfaI; 7, BglII; 8,
BsaHI; 9, BsmI; 10, BsmAI;
11, Bsp1286I; 12BspHI;
13, BsrGI; 14BstXI; 15,
BstYI; 16, DdeI; 17, DraI;
18, DraIII; 19, EarI; 20,
EcoO109I; 21, EcoRI; 22,
EcoT14I; 23, HincII; 24,
HindIII; 25, HphI; 26,
MaeIII; 27, MscI; 28,
MslI; 29, MunI; 30, MvaI;
31, NcoI; 32, NlaIII; 33,
NlaIV; 34, NspI; 35,
Sau96I; 36, SnaBI; 37,
SspI. The restriction sites derived from the cloning vector
are in parentheses.
Computer Analysis
DNA sequences were analyzed by
the program supplied by Software Development Co. using the sequence of
the human -globin gene locus (73,326 bp, entry name HSHBB) or
other sequences from the GenBank data bases.
Mapping DNA Bend Sites
Fig. 1
shows the
results of mapping of the DNA bend sites in the human -globin gene
region by the circular permutation
assay
(23, 24, 25) . First, clones containing
tandem duplicates of a DNA fragment of about 500 bp to 1 kb were
constructed. Then, the plasmid DNA was digested with a restriction
enzyme that cuts the unit sequence once and thus produces a fragment
with an identical length but in the permuted order of the unit
sequence. The positions of the bend sites were detected by
polyacrylamide gel electrophoresis at 4 °C as the fastest migrating
band among the fragments created by various restriction enzymes, and
the bend centers were likely to be located between the second and third
nearest sites and close to the (first) nearest site. We defined the DNA
bend sites as the region between the second and the third nearest sites
(22). Fig. 1a shows the results of mapping of the bend
site with p
DU. In this assay, the bend site is likely to be
located between EcoT14I and HincII and close to
BglII. The anomaly of migration of restricted fragments
detected at 4 °C was abolished at 55 °C, confirming that the
anomaly was due to DNA bending
(22) . A total of seven sites
(
B-3 to
B+3 and
B-2`) was mapped in the 5` and 3`
regions as well as in the second intron (Fig. 1b). The
positions of the bend sites relative to the canonical cap site were
-1675 to -1461 for
B-3, -1054 to -817 for
B-2, -664 to -406 for
B-2`, -373 to
-210 for
B-1, 714 to 918 for
B+1, 996 to 1192 for
B+2, and 1687 to 2032 for
B+3, and the average
interval between the sites (except
B-2`) was 685.5 ± 267.7
bp. As shown in the figure, the presence of the bend sites in the
middle of the fragments generally resulted in 7-20% retardation
upon electrophoresis. When a bend site is absent in the fragment,
however, all fragments of permuted sequences had identical mobility
(see p
R65). A clear bend site,
B-2`, between -664
(Bsp1286I) and -406 (BstYI), mapped between the
sites
B-2 and
B-1, was likely to be additional, deduced by
the comparative alignment of the sites between the two globin gene
regions (described below).
-globin gene region (
B-3 to
B+3)
(22) . The relative positions of the bend sites in
the
- and
-globin gene regions were located close together,
suggesting that these sites were conserved during the molecular
evolution of the globin gene locus.
Figure 2:
Nucleotide sequences of the bend sites.
Nucleotide sequences of the bend sites B-3 to
B+3 were
aligned in 10-nucleotide intervals to reveal the periodicity of short
(at least three consecutive) poly(dA) tracts. The regions of potential
bend centers were bracketed on the
left.
Sequence Features of DNA Bend Sites
Since most of
the bend sites in the - as well as
-globin gene regions have
three to more than 10 repeats of short (dA)
(n
2) tracts with intervals of roughly 10, or
multiples of 10 nucleotides, we focused upon the appearance of the
short poly(dA) tracts with the consensus
A
N
A
N
A
(A/A/A, and its complementary sequence
T
N
T
N
T
,
T/T/T). Based on the observation that the DNA fragments with
(dA)n (n
4) tracts at the interval of
10 nucleotides show a substantial bending feature
(26) , these
sequences are potentially bend core sequences. We chose these sequences
because they include a wide variety of potentially bend core sequences.
A more stringent screening, with
A
N
A
N
A
for
example, would exclude many bending sequences, such as
A
N
A
N
A
, and
would not provide a reliable result. As shown in
Fig. 3a, the sequence A/A/A appeared roughly once every
few hundred bases and fit well with the bend sites in the human
-globin gene region. Especially, A/A/A sequences were found in the
regions
B-2,
B-1,
B+1,
B+2 (see
Fig. 3a), and
B-2`. Other sequences known for
bending, A
N
T
N
or
G
N
C
N
(27) , did
not show such periodicity (data not shown). Similarly, other
combinations of the A
or T
dinucleotides in the
10-nucleotide interval, A/T/T (and A/A/T), T/T/A (and T/A/A), and A/T/A
(and T/A/T), could not explain the bend sites (data not shown).
Periodicity of the A/A/A sequence was also observed in the human
-globin gene region (Fig. 3b).
Figure 3:
The appearance of a potential bend core
sequence A/A/A (and its complementary T/T/T) for DNA bending in the
human - (a) and
- (b) globin gene. A total
of six bend sites both in the
-globin gene (
B-3 to
B+3) and in the
-globin gene (
B-3 to
B+3)
are shown by shadowedboxes. The nearest A/A/A
position for each bend site is shown by a filledcircle, and the A/A/A sequence, which is a part of
poly(dA) sequences, is shown by an opencircle.
Periodicity of A/A/A Sequences in Human
To examine the periodicity of the A/A/A sequence in the
entire human -Globin
Locus
-globin locus, we searched for the sequence in more
than 70 kb of the locus and scored the distance between any two of the
sequences. Fig. 4a shows the distribution of the
distances in the range of 1-2000 bp. As shown in the figure,
there appeared a biased distribution: two major peaks centered at
701-800 and 1301-1400 bp. Since the consensus A/A/A has
variations that actually cannot confer bending capability because it is
too A+T-rich, such as
A
(A
)A
(A
)A
(all Ns are As), we subtracted the
A
N
A
N
A
sequences where the total number of A and T in the total of 16 Ns
exceeded nine (referred to as A+T
9/16,
Fig. 4b). This subtraction was also justified by the
fact that most of the A+T-rich sequences appeared as the consensus
A/A/A were a mix of adenines and thymines or a poly(dA) tail of
retroposons or pseudogenes, instead of having a characteristic of long
consecutive adenines or thymines such as
(A
N)
or
(A
N
)
, which also have a
bending characteristic (data not shown). More strict conditions,
A+T
8/16 for example, showed similar results, although the
total numbers of the sites were not enough for statistical analysis. As
shown in the figure, the two peaks that appeared in Fig. 4a remained predominant. This tendency was indicated as a deviation
from a random distribution by a simple calculation; the total frequency
of 701-800 bp and 1301-1400 bp divided by the total
incidents, was then normalized by multiplying by 10. If the
distribution is random, the value should be 1.00. Actually, the real
value was less than 1.00 because of the tendency of A+T-rich
sequences to cluster. The value for the
-globin gene locus was
1.36, and that for the A/A/A where A+T
9/16 was 1.68. This
indicated that the appearance of the A/A/A sequences, especially less
A+T-rich ones, was biased toward a periodicity of about 700 bp and
its multiples, which explains the periodicity of the bend sites in both
globin gene regions. Furthermore, as shown in Fig. 4c,
the base distribution in the variations of the A/A/A (A+T
9/16) sequences exhibited the following features that enhance bending
efficiency
(21) : the preference for a longer A stretch (N7 and
N8 for A, resulting in 3-4 consecutive As), interruption of the A
stretches by G (N1, N1`, and N8` for G) and the preference of a T base
in the middle (N3, N5, N3`, and N5`). We also noted that there were
peaks at 1001-1100 and 1701-1800 bp, located in the middle
of the peaks of multiples of about 700 bp. A similar survey of human
and other eukaryotic genomes, eukaryotic mRNAs, and Escherichia
coli and viral genomes revealed that the bias toward multiples of
about 700 bp is universal among eukaryotic genomes (data not shown). On
the other hand, the eukaryotic mRNAs or E. coli and viral
genes had no such bias.
Figure 4:
Distribution of the distances between any
two of the A/A/A sequences in the range of 1-2000 bp in the human
-globin locus. a, a survey of all A/A/A sequences.
b, a survey of the A/A/A sequences where A+T
9/16.
The positions of 701-800 bp and 1301-1400 bp are indicated
by filledarrows, while those of 1001-1100 bp
and 1701-1800 bp are indicated by openarrows.
Two or more A/A/A sequences appearing within 30 bp are represented by
the one in the middle or the mean. c, the relative appearance
(in %) of each base in the A/A/A sequences (A+T
9/16). The
predominant (at least 5% higher than others) bases are
shadowed.
Sequence Features for DNA Bending at
To examine whether the A/A/A sequences are responsible
for the observed DNA bending, we focused upon these sequences appeared
at B-2 and
B-2
B-2 and
B-2. These sites were selected because both sites
are well conserved between
- and
-globin genes and were
accompanied by the A/A/A sequences within the sites. We constructed a
total of four serial deletions encompassing the B-2 regions for each
globin gene, and the relative migration was compared between the
BamHI site, which was introduced at both ends (the common
upstream positions and the downstream positions N, S, M, or L) and used
as a control site, and the ApaI (for
-globin) or
HphI (for
-globin) sites, which should show a bending
characteristic if a bend site is included (Fig. 5a). We
took this approach because in the circular permutation assay, the
locations of other regions that potentially affect DNA bending are also
permuted, and mapping is totally dependent upon the availability of
appropriate restriction sites, both of which complicate the mapping
process. As shown in Fig. 5, b and c, when a
part of the B-2 sites (S, M, and L sites for
-globin, and M and L
sites for
-globin) was present, DNA bending was significantly
recovered. In both cases, although relatively large regions were
required for a full-scale bending characteristic, the regions
containing the A/A/A sequences (underlined in the lowerpart of Fig. 5a) seemed to be
partly responsible for the DNA bending. This was confirmed by the
bending assay with concatenated oligonucleotides with the A/A/A
sequences from these regions (Fig. 5, d and e),
which indicates that the fragments with these A/A/A sequences actually
bend.
Figure 5:
Fine mapping of DNA bend sites at B-2
sites of - and
-globin genes. a, maps of the clones
containing the B-2 regions. The downstream positions (N, S, M, or L) of
the deletion constructs are indicated. Deletion constructs (clones
p
N1-9, -1384 to -1097, p
S2-4,
-1384 to -1047, p
M3-36, -1384 to
-997, and p
La-16, -1384 to -947, for
-globin; p
N4-7, -1367 to -985,
p
S5-60, -1367 to -935, p
M6-33,
-1367 to -885, and p
Lb-13, -1367 to -835,
for
-globin) containing tandem duplicates of the indicated region
(unit length fragment) were created by polymerase chain reaction using
28-nucleotide primers containing the BamHI site
(GGATCCGC) at the 3` end and cloned into the BamHI
site of pBluescript SK(-). Therefore, each deletion construct has
the BamHI site at the ends and produces the unit length
fragments by this enzyme. The A/A/A sequences (locations -1056 to
-1015 for
-globin and -944 to -923 for
-globin) are indicated by shadedboxes (upper) and underlined (lower). The
regions used for the bending assay with concatenated oligonucleotides
are bracketed at the bottom. b, the bending
assay with the clones containing deletions. Approximately 0.5 µg of
plasmid DNA was digested with BamHI for control fragments
(C), or ApaI (for
-globin) or HphI (for
-globin) for bend fragments (B) and electrophoresed on
8% polyacrylamide gels at 4 °C for 48 h. BamHI was
replaced by BfaI for p
MHA30 (see Fig. 1b) and
ClaI for p200LE2III (22). The unit length fragments are
indicated by dots (the sizes are shown in a). c, the relative migration of the bend fragments to the
control ones was calculated and plotted for each deletion construct (N,
S, M, and L) and the original clones (All). The assay was repeated
twice, and the standard deviations are indicated. d,
the bending assay with concatenated oligonucleotides. The 20-base-long
oligonucleotides containing the A/A/A sequences (-1056 to
-1037,
A+
T and, -944 to -925,
A+
T) and the control oligonucleotides,
20-nucleotide-long poly(dA) and poly(dT)
(A
+T
), which show a normal migration
(see Ref. 26), were annealed and ligated as described previously (22)
and then electrophoresed on an 8% polyacrylamide gel at 4 °C for 23
h. e, the R
values (ratio of apparent
length to real length, see Ref. 26) for the A/A/A sequences
(
A+
T and
A+
T) obtained from d. The control A
+T
was
used as a size standard.
-globin gene. We
identified a total of 10 bend sites in the 7-kb region (from
-4615 to +2382 relative to the cap site of the
-globin
gene), which appeared every 680 bp on average
(22) . Similar
findings in the 5` flanks of other eukaryotic genes prompted us to
investigate the presence and the positions of the sites in the human
-globin gene region. As summarized in Fig. 1, a total of
seven bend sites were mapped in the region. The relative positions of
the six sites,
B-3 to
B+3, to the cap site were
strikingly similar to those in the
-globin gene region, and the
average distance between the sites was also about 700 bp in length.
This raised the question as to how those sites were conserved during
evolution and why.
- and
-globin genes were
separated about 200 million years ago, and, since then, although the
coding regions (exons) retained sequence homology, random mutations in
the noncoding regions (introns and 5`- and 3`-flanks) resulted in
sequences with almost no homology
(3) . This was typically shown
by Harr-plot analysis, where no stretches of conserved sequences, as
judged by the presence of perfect matches eight or nine nucleotides
long, were identified in the noncoding regions around both globin genes
(data not shown). Furthermore, analysis under less strict conditions,
eight matches out of 10 nucleotides for example, also failed to reveal
sequence conservation between the regions (data not shown). There is an
Alu family sequence between the sites
B-3 and
B-2 in
the
-globin gene region (see Fig. 3). Although the sequence
is absent in the
-globin gene region (between
B-3 and
B-2), the distance between the sites was conserved, suggesting
that the structure of DNA bending rather than the nucleotide sequence
of the bend sites was conserved during evolution. It is, therefore,
natural to assume that conservation of the bend sites should be caused
by the biological significance of the sites.
/T
or trinucleotide
A
/T
(36, 37) . Therefore,
together with the fact that the bend sites in the intergenic regions
have been maintained throughout genome evolution, the periodicity of
the bend sites in eukaryotic genomes would not be a result of the
nucleosome phasing but instead could be actively involved in forming
and arranging the nucleosomes.
Table:
Comparison of the bend sites between the human
- and
-globin gene regions
-globin Domain, in Tissue-specific Gene Expression. (Renkawitz, R., ed) VCH Publishers, Weinheim, Germany
©1995 by The American Society for Biochemistry and Molecular Biology, Inc.