(Received for publication, September 11, 1995; and in revised form, December 20, 1995)
From the
We have previously shown that the G-rich sequence
GCG(GGT)
GG in the promoter region of the
chicken
-globin gene poses a formidable barrier to DNA synthesis in vitro (Woodford et al., 1994, J. Biol. Chem. 269, 27029-27035). The K
requirement,
template-strand specificity, template concentration independence, and
involvement of Hoogsteen bonding suggested that the underlying basis of
this new type of DNA synthesis arrest site might be an intrastrand
tetrahelical structure. However, the arrest site lacks the four G-rich
repeats that are a hallmark of previously described intramolecular
tetraplexes and contains a number of noncanonical bases that would be
expected to greatly destabilize such a structure. Here we report
evidence for an unusual K
-dependent intrastrand
``cinched'' tetraplex. This structure has several unique
features including the incorporation of bases other than guanine into
the stem of the tetraplex, interaction between loop bases and bases in
the flanking region, and base pairing between bases 3` and 5` of the
tetrahelix-forming region to form a molecular ``cinch.'' This
finding extends the range of sequences capable of tetraplex formation
as well as our appreciation of the conformational complexity of the
chicken
-globin promoter.
Sequences that cause arrest of DNA synthesis have been identified in plasmids, viruses, and chromosomes. Some of these arrest sites signal the point of replication termination in plasmids and chromosomes (1) . Others are associated with phenomena such as the amplification of genomic sequences(2) , strand switching during replication(3, 4, 5, 6) , or mutational hotspots(7) . Some of these sequences act by binding specific proteins that then block progression of the polymerase (8, 9, 10) , while others block DNA synthesis by forming DNA structures that are sufficient in and of themselves to impede the progress of the polymerase(2, 6, 11, 12, 13, 14, 15, 16) .
Previously identified structures implicated in DNA synthesis arrest
include hairpins(14) , and triplexes(2, 17) .
We have recently described the presence of a strong
K-dependent DNA synthesis arrest site in a G-rich
region of the chicken
-globin gene promoter(18) . The
sequence of the arrest site is shown in boldface type in Fig. 1A. The arrest site is composed of three
independent blocks to DNA synthesis (K1, K2, and K3), suggesting three
different structural blocks to DNA synthesis. The first block is the
strongest, and under some conditions no chain extension is seen beyond
this site (Fig. 1B). The characteristics of this region
are not consistent with any previously defined category of DNA
synthesis arrest site.
Figure 1:
A, sequence of portion of chicken
-globin promoter (GenBank locus: CHKHBBA) (32) indicating
the sequence of the previously identified DNA arrest sites and some of
the bases flanking the arrest site(18) . The arrest site
sequence is shown in boldface type with numbers in boldface type above the sequence indicating the
numbering scheme for bases in the arrest site that are used in this
report. The arrest site sequence is thus labeled 1-26, with
G
being the 5`-most base in the arrest site and G
being the base at the 3` end of the arrest site. G
corresponds to the residue 195 bases upstream of the start of
transcription. The positions of the previously described DNA synthesis
arrest sites are also marked, K1-K3, respectively, and the relative
strength of each arrest site is indicated by the number of filled
arrowheads at that position. B, arrest of DNA synthesis
by the templates G
CG(GGT)
GG (chicken
-globin promoter) and G
in the presence (+) and
absence(-) of 40 mM K
. DNA synthesis
arrest assays were performed as described previously(18) . C, diagram of generic intrastrand
tetraplex.
We have previously shown that the underlying physical basis of this block to DNA synthesis is the formation of a series of intrastrand DNA structures that involve Hoogsteen base interactions between guanines (18) . It is known that some G-rich sequences associate into higher order structures via guanine tetrad formation. Four DNA strands containing sequences with a single G-rich motif can associate to form an intermolecular tetraplex referred to as G4 DNA(19) . Sequences containing two G-rich repeats can form G-G hairpins that can then dimerize to form tetraplexes made up of two DNA strands(20) , and sequences with four G-rich repeats or long G runs (21) can fold back to form an intrastrand tetraplex. An example of a generic intrastrand tetraplex is shown in Fig. 1C.
The properties of the chicken -globin
DNA arrest site are consistent with the formation of an intrastrand
tetraplex, in that they are template concentration-independent, are
specific to the G-rich strand, are stable at elevated temperatures,
require K
, and involve non-Watson-Crick base
interactions between guanines. The K
specificity is
particularly compelling since the binding constants of alkali metal
ions to the phosphate groups in DNA are known to decrease slightly with
increasing metal ion radius, i.e. Li
Na
K
Rb
Cs
, and it is therefore difficult to
rationalize the K
specificity in terms of a hairpin or
other similar structure. It has been suggested that the K
specificity for tetraplexes results from some sort of size
constraint for which K
ions are particularly well
suited(20) . Hydrogen bonding between four DNA strands in an
intramolecular tetraplex creates an internal cavity that would exclude
large ions such as Cs
. Small ions such as
Li
can fit inside the cavity but are too small to form
stable complexes with multiple ligand binding sites within the cavity.
It has been claimed that the K
ion is both small
enough to fit inside the cavity and large enough to be able to bridge
multiple binding sites within the cavity, thus forming octahedral
coordination complexes with O-6 atoms in adjacent tetrads, thereby
stabilizing the tetraplex(20) . However, the chicken
-globin promoter arrest site sequence
G
CG(GGT)
GG lacks the repeated motif normally
associated with tetraplexes and contains a number of non-guanine bases
that might be expected to reduce the stability of the tetraplex.
Data presented here indicate that the chicken -globin promoter
DNA synthesis arrest site does indeed form an intrastrand tetrahelical
structure in the presence of K
. However, this
structure differs from conventional tetraplexes in a number of
important respects. In addition to the incorporation of a number of
non-guanine bases into the stem of the tetraplex, the structure is
stabilized by interactions between a loop guanine and a guanine in the
flanking region and hydrogen bonding between bases in the 5`- and
3`-flanking regions to form a ``cinch'' that holds one end of
the tetraplex together. We suggest that the stabilizing effect is due
to duplex formation by the G-rich flanking sequence that effectively
closes off the ``open'' end of the tetraplex. We also
demonstrate that in the absence of K
, part of this
region is able to form a hairpin containing a mixture of G-G and G-C
base pairs. Our findings, together with those that describe the
triplex-forming ability of this same sequence, demonstrate the
structural complexity of the chicken
-globin promoter. This
conformational complexity may have implications for the transcriptional
regulation of this gene. Our data also indicate that since the absence
of four perfect G-motifs does not preclude tetraplex formation, the
number of potential tetrahelix-forming sequences is much broader than
previously thought. Our observations demonstrate a clear link between
K
-dependent DNA synthesis arrest sites and tetrahelix
formation, suggesting that the K
-dependent blocks to
DNA synthesis might be a general feature and useful diagnostic property
of this class of structures.
Modification of labeled
oligonucleotides with dimethyl sulfate (DMS) ()was performed
using reagents from a Maxam-Gilbert sequencing kit (Sigma) using a
modified version of the manufacturer's procedure. Briefly, the
oligonucleotide suspensions were diluted with 180 µl of DMS
reaction buffer. One microliter and 0.5 µl of DMS was added to
tubes with KCl and without KCl, respectively, and incubated at 18
°C for 1 min.
Bromoacetaldehyde (BAA) modification was carried
out as follows. BAA was prepared as described previously(22) .
Labeled oligonucleotides were brought to a volume of 49 µl with
distilled HO and mixed with 1 µl of BAA and then
incubated for 10 min at 37 °C. The volume was brought to 100 µl
with distilled H
O, extracted with 100 µl of
phenol:chloroform:isoamyl alcohol (25:24:1) and then with 100 µl of
ether.
Formic acid modification was carried out using a final
concentration of 36% formic acid for 40 s at 18 °C. Osmium
tetroxide (OsO; Sigma) modification was carried out as
described by Palecek(23) .
All reactions were stopped by
precipitating the oligonucleotides with 1 ml of butanol. Pellets were
washed with 70% ethanol and dried under vacuum. Pellets were
resuspended in 100 µl of 1 M piperidine, and the cleavage
reaction was carried out for 30 min at 90 °C. Reactions were
removed from heat and butanol-precipitated. Samples were resuspended in
6-10 µl of distilled HO to which a volume of
Sequenase Stop buffer (U.S. Biochemical Corp., Amersham Corp.) had been
added. A portion of the reaction was run on a 20% polyacrylamide gel
containing 7 M urea.
The image from the gel autoradiograph of the BAA chemical modification assays was captured using a CCD camera, and the relative density of each band on the autoradiograph was determined using NIH Image (24) .
We have previously shown that the chicken -globin gene
promoter contains a G-rich sequence,
G
CG(GGT)
GG, that forms a strong DNA synthesis
arrest site in the presence of K
(18). The location of
this arrest site in the promoter is shown in Fig. 1A. The
individual bases in the arrest site are labeled 1-26 with base 1
being the 5`-most guanine in the arrest site (G
). In fact,
this arrest site consists of a series of three successive blocks to DNA
synthesis, since under some conditions three stops are seen opposite
successive T residues in the template (see arrows in Fig. 1A, labeled K1, K2, and K3). Polymerase arrest is significantly more efficient at K1
than at K2 and K3, and under some conditions almost no read-through is
seen past K1 (Fig. 1B). The amount of DNA synthesis
arrest by the chicken
-globin promoter is similar to that seen for
a run of uninterrupted guanines of the same length (Fig. 1B). These blocks to DNA synthesis are eliminated
if some of the guanines in this sequence are blocked at the N-7
position, suggesting that formation of a series of structures involving
non-Watson-Crick base interactions is responsible for DNA synthesis
arrest. Arrest of DNA synthesis is independent of the anion present and
is not seen in the presence of other cations such as
Li
, NH
,
Rb
, or Cs
(18) . This
K
-specific effect is thus not simply a general
ion-screening effect. We have also previously shown that a
hairpin-forming sequence, G
C
, of the same
length as the
-globin arrest site does not form a
K
-specific block to DNA synthesis, (
)suggesting that the K
-specific effect
seen in the chicken
-globin promoter is not due to hairpin
formation. Neither the pattern of DNA synthesis arrest nor the ion
specificity are consistent with the formation of triplexes(2) .
Previous data showed that the arrest site is found only when the G-rich strand served as template and was independent of template concentration, with the arrest of DNA synthesis being observed even when only femtomoles of template were present(18) . These findings suggested that unusual intrastrand tetraplex-like structures might form the basis of the blocks to DNA synthesis.
The intrastrand nature of these structures was confirmed by the observation that even at very low oligonucleotide concentrations, corresponding to template concentrations at which the blocks to DNA synthesis are still clearly visible, no intermolecular associations of an oligonucleotide containing the arrest site were observed by nondenaturing polyacrylamide gel electrophoresis (data not shown). However, in gels containing KCl, this oligonucleotide migrates slightly faster than an oligonucleotide containing the complement of the arrest site, suggesting that it can form a more compact intrastrand structure. While the difference in mobility is small, it is reproducible and is consistent with mobility differences that we have found for known tetraplex-forming sequences(26) .
Figure 2:
Densitometric analysis of the
bromoacetaldehyde modification of the arrest site oligonucleotide. The
oligonucleotide was reacted with bromoacetaldehyde in the absence (A) and presence of 40 mM K (B) and treated with hot piperidine as described under
``Materials and Methods.'' The products were resolved on a
20% sequencing gel. The gel was autoradiographed, and the
autoradiographic image was captured using a CCD camera and imported to
NIH Image for analysis. The C residue in the arrest site is marked with
an arrow.
Figure 3:
Chemical modification of the
GCG(GGT)
GG sequence with DMS and OsO
in the absence and presence of K
. The arrest
site oligonucleotide was reacted with either DMS or OsO
in
the absence (0) and presence (K
) of
40 mM K
as described under ``Materials
and Methods.'' The lanes labeled C represent
control reactions in which no DMS or OsO
was
used.
Chemical modification reactions carried out under the same conditions as the assay of DNA synthesis arrest produce a result that reflects the sum total of the chemical modification of all the structures in the mixture. From a comparison of the amount of prematurely terminated polynucleotide chains relative to full-length products it is clear that the major molecular species present in the reaction represent those that cause DNA synthesis arrest at K1, with minor contributions from the structures that cause arrest at K2 and K3 (see Fig. 1), and that for all intents and purposes the chemical modification data will reflect the K1 structure.
To examine if the cytosine residue at
position 17 was base-paired we first reacted the oligonucleotide with
BAA and then treated it with formic acid, followed by piperidine. BAA
reacts with the N-3 and the N-4 position of unpaired C residues.
Treatment of a BAA-modified cytosine with formic acid enhances
-elimination by piperidine. Fig. 2shows the results
obtained for the BAA/formic acid modification of the arrest site
oligomer. In the presence of 40 mM KCl, a strong band
corresponding to the C
residue is seen on a sequencing
gel. This strong band translates into a tall peak for this residue on
densitometric analysis. However, in the absence of KCl the density of
this band is much reduced. These data suggest that C
is
modified by BAA in the presence of KCl, i.e. it is unpaired,
while in the absence of KCl, it is resistant to BAA modification and is
thus involved in a hydrogen-bonding interaction. As expected, the
C
residue was not reactive in the presence or absence of
K
when treated only with formic acid (data not shown).
To examine the thymidine residues in the arrest site, the arrest
site oligonucleotide was modified with OsO in the presence
of pyridine. Under these conditions, addition to the C-5 and C-6 double
bonds of thymidine residues promotes formation of osmium esters that
are susceptible to cleavage with hot piperidine. OsO
is
significantly more reactive with unpaired residues and has been used
successfully as a probe for DNA conformation junctions and for
identifying loop regions in cruciforms ( (27) and references
therein). Our results are shown in Fig. 3. All thymidine
residues in the sequence were reactive in the presence and absence of
K
, but the intensity of modification of the T residues
in the G tract was markedly increased in the presence of 40 mM K
, with T
being particularly
susceptible to OsO
modification in comparison with T
and T
.
DMS treatment of DNA results in the
methylation of G residues at the N-7 position. This modification makes
the residue susceptible to cleavage by piperidine. In the absence of
KCl, cleavage with piperidine is significantly above background at all
positions. G is most reactive, followed by
G
-G
, G
-G
,
and the guanines outside of the arrest site (Fig. 3). Under
these conditions, BAA modification indicated some protection of the
C
site, suggesting that it is base-paired. The pattern of
slight protection from DMS by bases in the middle of the arrest site,
combined with the hyperreactivity of G
, is consistent with
the formation of a stem-loop structure with the G
being in
the loop. In such a hairpin, the N-7 of each G in a G-G base pair would
be available for DMS modification about 50% of the time. The base that
constitutes the hairpin loop, G
, would be the only base
that was consistently available for DMS modification and would
therefore appear hyperreactive. Since no arrest of DNA synthesis is
seen under these conditions, it seems that this hairpin structure does
not block DNA synthesis. This is consistent with our observation that a
G-C hairpin of the same length also does not block DNA synthesis under
these conditions.
In contrast, almost complete protection from DMS
modification of some residues was seen in the presence of 40 mM KCl (Fig. 3). In the presence of K, DMS
modification at G
was similar to guanines outside the
arrest site, and intermediate reactivity was observed at G
,
G
, G
, and G
. DMS hyperreactivity
was observed at position G
. The reactivity of the remaining
G residues was reduced to close to background levels. Protection of the
N-7 position of guanine residues is diagnostic of structures containing
G-G Hoogsteen base interactions. The apparent complete protection of
some of the guanine residues from DMS modification indicates that they
are involved in hydrogen bonding interactions in which they act as N-7
donors almost all of the time. The DMS reactivity pattern observed in
40 mM NaCl was identical to that observed without potassium,
illustrating that the DMS protection observed in the presence of
K
is not simply a general cation effect (data not
shown).
Figure 4:
A
sequence with an interrupted guanine motif can arrest DNA synthesis in vitro. The ability of two sequences,
(TG
)
and Cstem
((T
G
)T
G
CG
(T
G
)
)
to arrest DNA synthesis was tested. Sequencing reactions were conducted
on plasmids bearing these sequences in the absence (0) and
presence (K
) of 40 mM K
. The location where DNA synthesis arrest
occurred in both sequences is shown by the arrow.
However, the arrest site in the chicken -globin locus seems to
contain at least three non-G interruptions. The fact that this region
still forms such a strong block to DNA synthesis is indicative of the
fact that some additional stabilizing factors must be present that
compensate in some way for these interruptions.
Figure 5:
DNA synthesis arrest patterns of mutants
with altered sequence in the region proposed to form a molecular cinch.
Construction of mutants of the chicken -globin sequence that forms
a block to DNA synthesis is described in the text. Positions 1-26
correspond to -195 through -169 of the chicken
-globin
sequence (GenBank locus: CHKHBBA). The bases that vary from that of the
wild type arrest site are shown in outline. The position and
relative strength of each arrest site seen in the presence of 40 mM KCl is denoted by the position and number of triangles.
We have previously shown that the chicken -globin
promoter contains a strong composite arrest site for DNA synthesis in vitro(18) . That DNA synthesis arrest is template
concentration-independent and is seen only on one strand suggested that
the underlying physical basis was the formation of a series of
intrastrand structures. The G-richness of the arrest site (the sequence
5`-G
CG(GGT)
GG-3` is necessary and sufficient
to cause synthesis arrest) suggested that the arrest site might involve
G-G base interactions. The K
specificity suggested
that despite its relatively short length, its lack of four clearly
identifiable G-repeats, and the presence of a number of non-canonical
bases, arrest was due to a series of intrastrand tetrahelical
structures of some kind.
These conclusions are supported by
experiments shown here. In gel electrophoresis of oligonucleotides
containing the arrest site in the presence of K a high
mobility species was observed consistent with intrastrand folding. The
fact that a hairpin-forming sequence (G
C
) of
the same length as the arrest site produces no
K
-dependent block to DNA synthesis suggested that
arrest of DNA synthesis by the chicken promoter is not due to hairpin
formation.
Our chemical modification data are consistent with the
major DNA synthesis arrest site being due to the formation of a novel
intramolecular tetrahelical structure in the presence of
K. The complete protection of bases
G
-G
from DMS modification indicates
that guanine tetrads are involved. The hyperreactivity of G
suggests that it might be located at the junction between the
tetraplex and bases 5` of the tetraplex, and the OsO
hyperreactivity of the T just 3` of the arrest site defines the
3` limit of bases involved in the structure. A tetraplex of the length
defined by the distance between these two bases i.e. 23 would
have three loops spaced approximately an equal number of bases apart at
around G
-G
,
G
-G
, and
G
-T
. The DMS reactivity seen for bases
G
-G
is confined to bases G
,
G
, and G
. It is hard to fit all of these
reactive bases into the loops of the tetraplex, and it seems likely
that the loop bases are not in fact DMS-reactive and that reactivity at
G
, G
, and G
is the result of some
other structural feature. The lack of reactivity of loop bases may be
due to stacking interactions, transient base pairing in or between
loops, or binding to K
. The reactivity of
G
, G
, and G
can perhaps be
accounted for by placing them adjacent to some of the non-G bases. The
reactivity of C
with BAA and the OsO
hyperreactivity of T
and T
suggest that
these bases are all unpaired. The DMS protection of G residues that
would be in the same plane as G
, G
, and
G
are consistent with G-G-G base triplets, with the
DMS-reactive base acting as an N-7 acceptor but not an N-7 donor. The
non-G base adjacent to the reactive G presumably fills the space that
would normally be occupied by the fourth G in the tetrad but does not
participate in hydrogen bonding. One possible structure that accounts
for this pattern of reactivity is shown in Fig. 6B. In this
structure G
is shown as being in a loop on the same side
of the structure as the base G
and G
.
Interaction among G
, G
, and G
in
a G-G-G triplet in which G
and G
act as N-7
donors would explain the DMS protection of G
and
G
.
Figure 6:
Structures formed by the chicken
-globin sequence in the absence and presence of
K
. Structural models were generated on the basis of
chemical modification data as described in the text. In the absence of
K
the sequence CG
CG(GGT)
GG
forms a hairpin structure (A) that does not present a block to
DNA synthesis in vitro. In the presence of K
the sequence G
CG(GGT)
GG forms a cinched
tetrahelix (B). Bases adjacent to the four-stranded tetraplex
structure interact to stabilize the structure. This structure presents
a formidable block to DNA synthesis in vitro. The G residues
shown in outline are those modified by
DMS.
The pattern of DNA synthesis arrest by sequence
variants confirms various details of the structure shown in Fig. 6B. Replacement of G-G
with the residues T-C eliminates the second arrest site (K2) and
reduces the extent of arrest at K1. Replacement of these residues
together with a substitution of A for G
eliminates the
first arrest site altogether. On the other hand, replacement of G
by a C reduces but does not eliminate this arrest site. This
might indicate that G
is involved in hydrogen bonding in a
context in which a C can substitute at least partially. We interpret
the hydrogen-bonding contribution made by G
in terms of a
molecular cinch that holds the end of the tetraplex closed, making it
more difficult for the polymerase to traverse this region. The fact
that a G-to-A substitution at G
eliminates the stop at K1
and that a C at that position partially restores the stop might be due
to the fact that the C permits interaction with the top portion of the
stem that, according to this model, becomes folded back, while an A at
that position would hydrogen bond to the T in the same end of the stem,
providing no stabilization of the fold-back structure.
The fact that
the ability of all of these variants to block DNA synthesis is
considerably less than that of the wild type suggests that bases
G-G
also make a contribution to the stability
of the structure, perhaps as a result of stacking interactions on the
single strand or from pairing with bases outside the tetraplex-forming
region. The DMS protection of G
is consistent with a G-G
interaction between G
and G
in which G
acts as the N-7 acceptor. Deletion of T
(Tout in Fig. 5) abolished the original pattern of DNA synthesis arrest
and resulted in the formation of two new arrest sites both located at
bases 3` of the original arrest site. This illustrates that T
plays an important role in the arrest site structure. The pattern
of arrest in this mutant is also consistent with the structure shown in Fig. 6B, if it is assumed that formation of a G-G base
pair with bases immediately flanking the tetrahelix is an important
stabilizing factor. In this case G
would move into the
tetrahelix, and G
would be available for hydrogen bonding
to G
and G
. However, in the absence of a
hydrogen bonding partner for G
3` of G
this
interaction might not be stable, resulting in the stop at
G
. In spite of the paucity of complete tetrads in the wild
type arrest site structure, K
may still be able to
bind to guanines in adjacent rungs of the structure since the internal
dimension of the channel might still resemble a more conventional
tetraplex.
Direct evidence for the ability of non-G bases to be
accommodated into the stems of tetraplexes was obtained by comparing a
known tetraplex-forming sequence
(TG
)
, and a sequence
T
G
T
G
CG
(T
G
)
that is identical except for a single C residue that disrupts one
of the four G-rich repeat motifs. Both the ability to block DNA
synthesis and the DMS protection of stem guanines were decreased
markedly in the template containing the interrupted motif, but clear
evidence for tetraplex formation was still visible. The extent of DNA
synthesis arrest by the full-length arrest site is comparable with a
pure G tract of the same length as the chicken arrest site (Fig. 1B) despite the presence of three non-G bases.
Given the effect of a single interruption in these experiments, the
relative efficiency of the chicken
-globin DNA arrest site is thus
all the more remarkable.
Tetraplex-forming sequences studied to date
have not shown evidence for incorporation of bases other than G into
the tetraplex stems. In telomere sequences with a
(TAG
)
repeated motif, the A bases
have been shown to reside in the loop of the intramolecular tetraplex,
and other variants of this sequence such as
(T
G
A)
,
(T
AGAG)
, and (T
GAGA)
were unable to form stable intramolecular
tetraplexes(28) . Our observation that a number of non-G bases
can be accommodated within the stem of a tetraplex, particularly if
additional stability is provided by hydrogen-bonding interactions of
flanking G-rich sequences to form a cinch, greatly extends the range of
sequences that could potentially form tetrahelical structures that
block DNA synthesis.
In theory the structure we have described could
form in vivo any time that the duplex region containing the
sequence becomes unpaired. Replication or transcription would provide
such an opportunity, as would local melting of the duplex or formation
of the triplex, found in this region(29) . In fact a large
region of the chicken -globin gene promoter is known to be
susceptible to chemical modification in
vivo(30, 31) . One possible role for the hairpin
or cinched tetraplex could be in modifying expression of the
-globin gene, with this region perhaps acting as a
K
-sensitive switch. The structure may act by binding
of conformation-specific factors that affect transcription or by the
occlusion of a binding site.