(Received for publication, May 18, 1995; and in revised form, August 1, 1995)
From the
The properties of an intramolecular triplex formed in vitro at the 5`-flanking region of the human -globin genes were
studied by chemical and physical probes. Chemical modifications
performed with osmium tetroxide, chloroacetaldehyde, and diethyl
pyrocarbonate revealed the presence of non-paired nucleotides on the
``coding strand'' at positions -209 through -217.
These reactivities were induced by negative supercoiling, low pH, and
magnesium ions. Downstream point mutations associated with hereditary
persistence of fetal hemoglobin (HPFH) altered the extent of the
modifications and some of the patterns. Specifically, C
G and C
T significantly
decreased the reactivities, whereas the patterns were increased and
altered in the T
C. C
T and C
G caused local
decreases in reactivity. Modifications at the upstream flanking duplex
were modulated by the composition of the vector sequence. In summary,
our data indicates the formation of an intramolecular triplex between
nucleotides -209 to -217 of the ``non-coding
strand'' and the downstream sequence containing the HPFH
mutations. All of the HPFH point mutations altered the structure. More
than one sequence alignment is possible for each of the triplexes. In
addition, a consequence of some of the point mutations may be to
facilitate slippage of the third strand relative to the Watson-Crick
duplex.
Human hemoglobin is synthesized from two sets of clustered genes
designated the -cluster
(
-
-
-
-
)
located on chromosome 16 and the
-cluster
(
-
-
-
-
-
) on
chromosome 11. Expression of the genes follows a developmentally, as
well as tissue-specific, regulated program that allows
and
to be transcribed during early embryonic life in placental yolk
sac-derived red cells. At subsequent stages of development, globin
expression shifts to the
and
genes in the red cells of
hepatic origin. At the time of birth,
-chains are predominantly
expressed, and erythropoiesis shifts to the bone
marrow(1, 2) . This pattern of expression can be
altered by mutations affecting any of the transcribed
genes(1, 2, 3) , but a condition that has
attracted particular attention is the hereditary persistence of fetal
hemoglobin (HPFH) (
)caused by point mutations at the
5`-flanking region of either one of the
-genes. These single
nucleotide changes cause the affected allele to permanently express
high levels of
-chains in adult red cells.
Although the
molecular mechanisms responsible for this protracted expression are not
known, alterations in the recognition of cis-acting elements
by regulatory factors and/or in the supramolecular assembly of
chromatin have been invoked (4-7 and references therein). Ulrich et al. suggested that selected mutations in the -200
region destabilize a non-B DNA structure formed during the course of
the -to-
switching(4) . This hypothesis was supported
by the finding that some of the mutations abolished an S1
nuclease-hypersensitive site (S1-HSS) located just upstream, which
suggested the formation of an intramolecular triplex (I.T.). This
occurrence is interesting for several reasons. First, it is becoming
increasingly evident that various types of sequence motifs, including
simple repeating defined-order-sequences, have the potential of
adopting non-B conformations and that these structural transitions
occur in biological
systems(8, 9, 10, 11) . Second, the
isolation of DNA-binding proteins specific for pyrimidine-, or
purine-rich, motifs are intriguing (12-16 and references cited
therein) since these sequences are known to undergo conformational
polymorphisms. Finally, diseases inherited by non-Mendelian genetic
mechanisms have been recently associated with the expansion of tandemly
repeated DNA motifs (17, 18, 19, 20, 21, 22, 23) .
The mechanism(s) through which such aberrant expansions are carried out
is unknown, but the propensity of defined-order-sequences to adopt
multiple conformations suggests that these properties may be directly
involved in the process(24, 25, 26) .
Intramolecular triplexes have been well characterized in recent years and are known to form at mirror repeat oligopurine-oligopyrimidine tracts under the influence of negative supercoiling and low pH. Sequences of this type, but usually with imperfect mirror repeat symmetries, are often found at regulatory regions in eukaryotic genomes and have been proposed to participate in the regulation of physiological processes such as transcription and recombination. Additionally, some of these sequences have been shown to adopt I.T. structures under appropriate conditions(27, 28, 29, 30, 31, 32, 33) .
Here we extend the previous studies on the human -globin
5`-flanking sequence, which identified the formation of an I.T. based
upon S1 nuclease and oligomer binding assays. By applying chemical
probe analyses and two-dimensional agarose gel electrophoresis, we now
identify the bases associated with the Hoogsteen-paired third strand
and describe the structural alterations introduced by the HPFH point
mutations.
Figure 1:
Sequence of the human
-globin 5`-flanking sequence and model for I.T. formation. A, the sequences of human
-globin 5`-flanking regions
from bp -228 to -189 were cloned in pUC9 as described;
single point mutations leading to HPFH are shown in boldface and indicate the bp change on the top (coding) strand as well as
the name of the plasmids carrying the respective mutations. The S1
nuclease hypersensitive site is indicated by S1-HSS. A stretch
of bp containing two adjacent purine-rich motifs centered on S1-HSS and
HPFH is underlined. B, schematic representation of
the I.T. structure proposed to be adopted by the -200 region. The
structure forms under conditions of negative supercoiling and low pH
and is stabilized by Hoogsteen-type hydrogen bonds between the two
adjacent purine-rich motifs, represented by a thicker line.
The structure leaves unpaired pyrimidine residues on the top strand
which become a substrate for S1 nuclease (S1-HSS). The model shows how
residues affected by the HPFH point mutations may destabilize the I.T.
structure. The 5` terminus on each of the DNA strands is indicated by a filled circle.
Fig. 1shows the sequences analyzed in this study, the
location of the S1-HSS, and a schematic model for the I.T. based on
previous data(4) . We have now extended these data using
chemical probe analyses (OsO, CAA, and DEPC) in order to
detect perturbations at the bp
level(27, 28, 29, 37, 38, 39) .
Figure 2:
Chemical modifications on the coding
strand of p-200. A, plasmid p
-200, containing the
wild type sequence, was reacted with topoisomerase I and ethidium
bromide to give topoisomers at mean superhelical density -
of 0 (R) and 0.137 (S). These were modified by
OsO
(lanes 1 and 2), CAA (lanes 3 and 4), and DEPC (lanes 5 and 6) and
processed under the conditions described under ``Experimental
Procedures.'' Cherenkov radiation was counted and 100,000
counts/min/lane was loaded. The sequence reported on the left represents the vector in lowercase and the insert in uppercase. Strong and weak signals are denoted by filled and open circles, respectively. B, gels
containing the modifications from OsO
, CAA, and DEPC were
scanned and quantitated (``Experimental Procedures'').
Signals corresponding to individual bands were expressed as percentage
of the total integrated areas. In the case of CAA and DEPC, the values
for the relaxed lane were subtracted from supercoiled. The graph shows
the mean (± S.D.) of two experiments for OsO
(filled bars) and CAA (stippled
bars).
CAA forms -etheno adducts with unpaired cytosines,
adenines, and, to a lesser extent, guanines(37) .
Supercoiling-induced cleavages were found at
C
, C
,
C
, and A
(Fig. 2A, lanes 3 and 4).
However, these bands accounted only for 0.5-1.5% of the total
radioactivity (Fig. 2B). Acid treatment of the samples,
before piperidine cleavage, did not improve the signal-to-noise ratio.
The sites of modification complemented those detected by OsO
and suggested that the 5` end of the single-stranded region was
3` of A
.
The character of the
flanking nt was determined by DEPC, a probe specific for unpaired
adenines and guanines(37) . As shown in lanes 5 and 6 of Fig. 2A, supercoiling-induced
reactivities extended from A to
A
, the strongest band corresponding to
A
(0.94 ± 0.05% of the total
radioactivity). Taken together, these data define two sets of
accessible nt: a major site that extends from T
to
T
that we interpret as single-stranded and a minor
one, from nt -218 to -228, that we consider being weakly
bonded (filled and open circles in Fig. 2A and Fig. 4C). These structural transitions are
influenced by supercoiling, protonation at specific residues, and
Mg
ions.
Figure 4:
Chemical modifications on the non-coding
strand. A, DNAs were prepared and reacted with CAA as
described (``Experimental Procedures'' and Fig. 2).
p-200S was used in this case instead of p
-200 in order to
align the wild type sequence with that of -198C, -196T, and
-195G. The nt changes carried by the mutant plasmids are shown on
the left. Open circles indicate reactivities common
to more than one DNA, whereas asterisks identify sites of
modification specific for -198C. B, DNAs were reacted to
OsO
as described. Signals above background levels were
quantitated and shown as the percentage of the total radioactivity. C, summary of the CAA (C, A, and G residues), OsO
(T residues), and DEPC (A, and G residues) modifications. Filled and open circles identify strong and weak
cleavage sites, respectively, on the wild type, -198C,
-196T, and -195C. Boxed open circles indicate the
reactivities common to the wild type, -196T, and -195G, but
not -198C. Cleavages specific for this mutant are shown by asterisks. A line exterior to the open circles denotes those residues in which the modifications were affected by
the flanking vector sequences.
Reactions with OsO, CAA, and
DEPC were conducted and visualized on the coding strand of plasmids
containing point mutations associated with HPFH (Fig. 1A). Relaxed and supercoiled DNA (-
= 0.137) were treated under the same conditions as p
-200.
The results demonstrated that the modifications occurred at the same
residues as the wild type (not shown) but that there were quantitative
differences, which are summarized in Fig. 3. Here the columns
represent the percentages of the signals from OsO
and CAA
normalized to the wild type sequence. The major changes were caused by
the G
and T
mutations, which
produced a general reduction in modification; the consequences of the
G
mutation were more severe than those of
. -198C exhibited a 2-fold increase in
modification at T
and T
,
whereas -196T and -195G displayed a modest reduction in
modification at the middle residues. The normalization at
A
with DEPC gave the following values: 0.00 for
-202G, 0.38 for -202T, 1.19 for -198C, 0.91 for
-196T, and 1.32 for -195G.
Figure 3:
Chemical modifications on the coding
strand of HPFH mutant plasmids. Plasmids bearing point mutations
leading to HPFH were reacted to OsO or CAA and processed as
described in the legend to Fig. 2. After quantitation, the data
were normalized by dividing the results obtained at each residue by
that of p
-200 at the corresponding position. The amount of
modification at Ts is derived from the data from OsO
reactions, whereas the values at Cs and A are derived from the
data from CAA reactions. The data represent the average from two or
more experiments with each probe. Coefficients of variation were
comparable to those of p
-200 shown in Fig. 2B.
Figure 7:
Sequence alignments for the I.Ts. Bases
complementary to the reactive residues -217 to -209 were
aligned with the downstream purine-rich sequence, which contains the
HPFH point mutations (-203 to -194), giving the two I.T.
models illustrated for p-200. The structures for all models on the
right half (B) of the figure have the third strand displaced
by one position as compared to the models on the left half (A). Closed circles indicate 5` ends. Dotted
lines designate hydrogen bonds. For the mutant plasmids, only the
composition of the triplex stem is shown, where the triplets affected
by the HPFH point mutations are boxed. + indicates
protonated residues.
The data with DEPC showed
the accessibility of A, G
,
and G
(not shown). OsO
modified the
thymines spanning positions -216 to -226 to a moderate
extent (Fig. 4B), confirming the chemical accessibility
of the I.T. flanking sequences as well as that of the triplex-duplex
junction (nt -217 and -218).
-202G and -202T
showed a marked reduction in reactivities (Fig. 4B)
relative to the wild type sequence, confirming their destabilizing
effect. -196T and -195G displayed patterns of modifications
with CAA and DEPC qualitatively identical to that of the wild type.
These mutants also showed a considerable reduction in reactivity at
T (Fig. 4, A and B).
These results, together with those on the coding strand (Fig. 3), suggest that these two mutations perturb the overall
geometry of the triplex structure, but do not change the sequence
alignment. In contrast, the CAA-induced cleavages at C
and C
were not observed in the -198C
mutant. Instead, signals were detected at C
,
G
, and G
(denoted by asterisks in Fig. 4, A and C),
indicating an alteration in the loop structure. This may be
accomplished by a slippage of the purine-rich third strand relative to
the Watson-Crick duplex or by multiple adjustments in the interactions
among nt that retain their sequence alignment. In either case, it is
clear that this mutation has profound consequences on the triplex
structure.
The increase in OsO modification at
T
and T
in mutants
-198C, -196T, and -195G relative to p
-200 (Fig. 4B) was quite significant (3-4-fold).
However, this behavior is not due to differences in the I.T.
structures. A detailed analysis of these reactivities will be given in
the last section under ``Results.''
Figure 5:
Supercoiling-dependent titration of
OsO modification. DNAs were reacted with topoisomerase I in
the presence of different amounts of ethidium bromide (from 0 to 4.0
µg/ml). The mean superhelical densities were derived as reported
under ``Experimental Procedures.'' Topoisomers were modified
by OsO
and processed as described for the coding strand.
For each lane, the percentage of modification corresponding to thymines
-209, -210, -212, -215, and -217 was
added and expressed as a single (y) value. Interpolation of
the experimental data was satisfied by a four-parameter logistic
function. Experiments were performed in duplicate. A,
p
-200 (
), -202G (
), -202T (
); B, -198C; C, -196T; D,
-195G. Dotted line represents p
-200; droplines indicate the inflection points (c), namely the -
value of at 50% OsO
modification. Values of asymptotical
maximum (a) and c were as follows: p
-200 (a) 52.7 ± 1.3, (c) 0.078 ± 0.001;
-202G (a) 6.4 ± 0.3, (c) 0.098 ±
0.002; -202T (a) 11.1 ± 1.0, (c) 0.093
± 0.005; -198C (a) 65.2 ± 1.2, (c) 0.068 ± 0.001; -196T (a) 51.7
± 1.3, (c) 0.081 ± 0.001; -195G (a) 57.2 ± 2.5, (c) 0.084 ±
0.002.
We
also conducted two-dimensional gel electrophoresis to assay for the
relaxation associated with the DNA structural transition. Topoisomers
of p-200 and all five mutant plasmids were separated on agarose
gels in the same buffer solution used for the chemical modification
experiments. Chloroquine concentrations of 4 and 30 µM were employed in the second dimension, which afforded the
resolution of topoisomers up to 23 negative superhelical turns. A
transition centered at 15 negative superhelical turns was observed;
however, this was attributed to the formation of a non-B DNA structure
in the vector. No transitions due to the duplex-triplex conversion were
observed. Two explanations are possible: first, the transition may be
too small to be detected in this range of topoisomers or, second, the
I.T structure may be too unstable to survive the electrophoretic
conditions.
Fig. S1represents a bp in conventional
duplex DNA. As bp opening depends on k and k
, the two constants regulate the extent of
modification at both residues. However, if Fig. S2occurs, in
which T may interact with a second partner (X) once in an open
conformation, the reactivities will depend on k
and k
, in addition to k
and k
. This is
expected to selectively increase modification at A, since this residue
remains in an open conformation for a longer time than its partner T.
If the percentages of cleavage are expressed as a ratio of T/A, values
for A
T pairs involved in a type 2 process will be smaller than
those progressing through Fig. S1.
Figure S1:
Figure S2:
Table 1shows the
results for AT bp -226, -225, -222, -219,
and -216 for p
-200 and mutant plasmids. The values varied
considerably, from 0.7 to 66, but, with the exception of -216,
they were relatively homogeneous at a given locus. The variations
observed may be interpreted in terms of modulation in the accessibility
to the reactants. In fact, not shown in the schemes are intermediate
states in which a given bp may adopt distorted conformations and/or
alterations in the stacking interactions with neighbor residues. These
changes, which are favored by high levels of supercoiling and flanking
non-B DNA structures such as the I.T., not only facilitate bp opening,
but also increase the rate of chemical attack on partially unpaired
conformations(42) .
Locus -216 appears to be a
different case. Here the low values, which spanned a greater range
(0.7-6.3), were determined by an increased modification at
A(open) (A) associated with normal or low
cleavages at T (Fig. 4B and 6A) and are
appropriately accounted for by a type 2 process. Thus, the ratio of T/A
modifications is a measure of the relative chemical accessibility and
hence the extent to which the A or the T residue has reassociated with
another partner.
Figure 6:
Effect of the cloning site on chemical
modification at the I.T. flanking sequence. A, plasmids
p-200 and p
-200S contain the wild type
-globin insert
cloned at the HincII or SmaI site of pUC9,
respectively. The DNAs were reacted with DEPC at -
of 0 (R) and 0.137 (S) and processed as described for the
coding strand. Quantitations were performed as follows. Pixel values
for each line graph were converted to percentage of the total signal
(sum of pixel values for each lane). Percentages of the R sample were then subtracted from those of S after R and S were aligned on their highest pixel value. The
range of the
axes was adjusted so as to align the two relevant
sets of peaks. Areas were calculated by cutting and weighing the peaks.
Values (mg
10) at selected locations are reported. Since the
amount of material in A
for both DNAs was
identical (±9%), this reinforces the quantitative methodology. B, supercoiling-dependent OsO
modification at
T
and T
. The percentage of
modification at T
and T
was
added and expressed as a single (y) value. Interpolation was
conducted as explained in the legend to Fig. 5. For clarity,
standard deviations were removed.
These chemical probe analyses on the 5`-flanking region of
the human -globin genes enable a molecular description of the
I.Ts. at a level of detail previously not possible. The duplex to
triplex transition characteristic of oligopurine-oligopyrimidine
sequences is accomplished by the purine residues simultaneously
engaging in Watson-Crick and Hoogsteen hydrogen
bonds(27, 28, 29, 30) . In general,
the bound third strand may occupy a parallel, or antiparallel,
orientation relative to the purine residues, depending on its sequence
composition. Accordingly, a purine-rich third strand will be
accommodated in the major groove in an antiparallel orientation,
whereas a pyrimidine-rich third strand will occupy the reverse
position. Therefore, stable hydrogen bonds may form between G:G, A:A,
and A:T in the former case, and G:C
and A:T in the
latter(30) . Low pH is required in order to stabilize a
pyrimidine-rich strand containing cytosine residues in this
arrangement. (
)
The structures formed at the 5`-flanking
region of the human -globin genes, both in the wild type as well
as the HPFH point mutations, deviate considerably from this general
scheme. In fact, these and previous data (4) indicate that the
third strand is purine-rich, yet low pH is required for stabilization.
Our results show that the third strand
AAGAGGATA
is hybridized in an
antiparallel orientation to the downstream
GGGGAAGGGG
containing the
sites of mutations. Since these two sequences are 9 and 10 bases long,
their interaction leads to two possible alignments (Fig. 7). In
no case can homogeneous G:G, A:A, or A:T Hoogsteen base pairing take
place. Rather, mismatches of the G:A, A:G, and G:T type must also be
considered. In both of the reported models, the most abundant triplet
is C
G:A
, a combination that has recently been
observed in other I.Ts.(43, 44) . The stabilization
induced by the protonated reversed-Hoogsteen-bound adenines agrees well
with our observations. The Hoogsteen G:T pair has been described by NMR
only in the parallel orientation(45) , where T(H3) shares one
hydrogen bond with G(N7). Since parallel and antiparallel thymine
displays a 2-fold symmetry about N3(30) , it is possible that
the antiparallel G:T pair maintains this type of hydrogen bond.
Antiparallel A:G has not been documented, however, close interactions
are possible.
Overall, the paucity of stable CG:G and
T
A:A triplets, together with the short length of the I.T. stem,
accounts for the observed requirement of high levels of supercoiling
and the inability to detect a duplex to triplex transition by
two-dimensional agarose gel electrophoresis. Finally, both models are
consistent with the data which locate C
and
C
in the loop.
All of the HPFH point mutations
alter the normal I.T. structure, some slightly (T and G
), others profoundly
(C
, G
, and
T
). The destabilizing effects of G
and T
, observed previously(4) , are
confirmed. Both of these mutations disrupt a GGGCCC motif, a sequence
that has been shown to acquire an induced bend upon complexation of
Mg
ions(46) . The effect of these nt changes
may be that of abolishing this induced bend, which may represent the
nucleation step for I.T. formation or that of disrupting critical
Hoogsteen hydrogen bonds. Our results favor the first interpretation.
The stabilization mediated by C was suspected
from an earlier work(47) . However, the previous studies (4, 47) did not anticipate that this mutation may
alter the sequence alignment of the triplex by inducing a slippage of
the third strand relative to the Watson-Crick duplex.
The results
for T and G
are unexpected.
Here, we find subtle changes in the overall I.T. structures, whereas
substantial destabilizations were predicted from former
assays(4) . It is likely that these discrepancies originate
from the experimental conditions used. In fact, we found no
modifications at pH 5.0, whereas S1 nuclease cleavages were previously
detected at this pH. Since I.T. formation occurs in the pH range of
4.5-5.0, slight variations are likely to affect the
stabilities greatly. Also, the results of the oligomer binding assays
may have been influenced by the distortion of the DNA flanking the
I.T., as well as by the difference in the cloning site between
p
-200 and mutant plasmids (Fig. 6). The two sets of schemes
in Fig. 7predict stable triplexes for C
,
T
, and
G
(48, 49) ; our data do not
permit a delineation between these alternatives.
From a
physiological standpoint, the combination of low pH and elevated
superhelical density required to induce these structures in vitro may raise concerns about their stability in a cellular
environment. However, base protonation may occur, and be maintained, in
polynucleotides at several pH units above the pK of the free
base(50, 51, 52, 53) . Also,
divalent metal ions,
polyamines(54, 55, 56, 57, 58) ,
and the aforementioned single-strand-specific binding proteins may
cooperate in lowering the activation energy needed for the I.T.
transition.
In vivo, the chromatin in the 5`-flanking
region of the human -globin genes has been shown to be
hypersensitive to DNase I digestion or restriction enzyme cleavage in
cells where the
-globin genes are actively
transcribed(59, 60) . This behavior, which is also
observed in other systems, is likely to be correlated with the loss of
positioned nucleosomes along the DNA, and the acquisition of new
interactions between cis regulatory sequences and cognate
transcription factors(61, 62) . Indeed,
-globin
regulatory elements such as the CACCC and CCAAT boxes appear to be
selectively occupied in K562 cells, which express these
genes(63, 64) .
In vivo, no protein
complex has been identified to date that interacts with the upstream
region that contains the HPFH point mutations. In addition, experiments
conducted in transgenic mice have demonstrated that, at least in the
case of G, a strong correlation exists between
this point mutation and the HPFH phenotype(7) . Therefore, a
macromolecular complex may assemble at -200; this complex could
be involved in
-globin gene silencing. A polypeptide that binds
and stabilizes the I.T. structure induced at this location in vitro has been found(14) . The interaction of this protein with
the I.T. might be altered by any of the HPFH point mutations. It
remains to be established whether such an interaction reflects the
formation of an I.T. complex that operates in vivo to
temporally regulate the expression of the human
-globin genes.