©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
An Intramolecular Triplex in the Human -Globin 5`-Flanking Region Is Altered by Point Mutations Associated with Hereditary Persistence of Fetal Hemoglobin (*)

(Received for publication, May 18, 1995; and in revised form, August 1, 1995)

Albino Bacolla Michael J. Ulrich (1)(§) Jacquelynn E. Larson Timothy J. Ley (1) Robert D. Wells (¶)

From the Institute of Biosciences and Technology, Texas A& University, Texas Medical Center, Houston, Texas 77030-3303 and the Division of Bone Marrow Transplantation and Stem Cell Biology, Departments of Medicine and Genetics, Washington University Medical Center, St. Louis, Missouri 63110

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

The properties of an intramolecular triplex formed in vitro at the 5`-flanking region of the human -globin genes were studied by chemical and physical probes. Chemical modifications performed with osmium tetroxide, chloroacetaldehyde, and diethyl pyrocarbonate revealed the presence of non-paired nucleotides on the ``coding strand'' at positions -209 through -217. These reactivities were induced by negative supercoiling, low pH, and magnesium ions. Downstream point mutations associated with hereditary persistence of fetal hemoglobin (HPFH) altered the extent of the modifications and some of the patterns. Specifically, C G and C T significantly decreased the reactivities, whereas the patterns were increased and altered in the T C. C T and C G caused local decreases in reactivity. Modifications at the upstream flanking duplex were modulated by the composition of the vector sequence. In summary, our data indicates the formation of an intramolecular triplex between nucleotides -209 to -217 of the ``non-coding strand'' and the downstream sequence containing the HPFH mutations. All of the HPFH point mutations altered the structure. More than one sequence alignment is possible for each of the triplexes. In addition, a consequence of some of the point mutations may be to facilitate slippage of the third strand relative to the Watson-Crick duplex.


INTRODUCTION

Human hemoglobin is synthesized from two sets of clustered genes designated the alpha-cluster (--alpha-alpha(2)-alpha(1)) located on chromosome 16 and the beta-cluster (-^G-^A-beta--beta) on chromosome 11. Expression of the genes follows a developmentally, as well as tissue-specific, regulated program that allows and to be transcribed during early embryonic life in placental yolk sac-derived red cells. At subsequent stages of development, globin expression shifts to the alpha and genes in the red cells of hepatic origin. At the time of birth, beta-chains are predominantly expressed, and erythropoiesis shifts to the bone marrow(1, 2) . This pattern of expression can be altered by mutations affecting any of the transcribed genes(1, 2, 3) , but a condition that has attracted particular attention is the hereditary persistence of fetal hemoglobin (HPFH) (^1)caused by point mutations at the 5`-flanking region of either one of the -genes. These single nucleotide changes cause the affected allele to permanently express high levels of -chains in adult red cells.

Although the molecular mechanisms responsible for this protracted expression are not known, alterations in the recognition of cis-acting elements by regulatory factors and/or in the supramolecular assembly of chromatin have been invoked (4-7 and references therein). Ulrich et al. suggested that selected mutations in the -200 region destabilize a non-B DNA structure formed during the course of the -to-beta switching(4) . This hypothesis was supported by the finding that some of the mutations abolished an S1 nuclease-hypersensitive site (S1-HSS) located just upstream, which suggested the formation of an intramolecular triplex (I.T.). This occurrence is interesting for several reasons. First, it is becoming increasingly evident that various types of sequence motifs, including simple repeating defined-order-sequences, have the potential of adopting non-B conformations and that these structural transitions occur in biological systems(8, 9, 10, 11) . Second, the isolation of DNA-binding proteins specific for pyrimidine-, or purine-rich, motifs are intriguing (12-16 and references cited therein) since these sequences are known to undergo conformational polymorphisms. Finally, diseases inherited by non-Mendelian genetic mechanisms have been recently associated with the expansion of tandemly repeated DNA motifs (17, 18, 19, 20, 21, 22, 23) . The mechanism(s) through which such aberrant expansions are carried out is unknown, but the propensity of defined-order-sequences to adopt multiple conformations suggests that these properties may be directly involved in the process(24, 25, 26) .

Intramolecular triplexes have been well characterized in recent years and are known to form at mirror repeat oligopurine-oligopyrimidine tracts under the influence of negative supercoiling and low pH. Sequences of this type, but usually with imperfect mirror repeat symmetries, are often found at regulatory regions in eukaryotic genomes and have been proposed to participate in the regulation of physiological processes such as transcription and recombination. Additionally, some of these sequences have been shown to adopt I.T. structures under appropriate conditions(27, 28, 29, 30, 31, 32, 33) .

Here we extend the previous studies on the human -globin 5`-flanking sequence, which identified the formation of an I.T. based upon S1 nuclease and oligomer binding assays. By applying chemical probe analyses and two-dimensional agarose gel electrophoresis, we now identify the bases associated with the Hoogsteen-paired third strand and describe the structural alterations introduced by the HPFH point mutations.


EXPERIMENTAL PROCEDURES

Plasmid DNA

Plasmid p-200 contains the sequence of the 5`-flanking region of human -globin genes from bp -228 to -189 inserted at the HincII site of pUC9. Plasmids -202G, -202T, -198C, -196T, and -195G harbor the same -globin fragment with the sequence diverging at the indicated position to reproduce the HPFH point mutations. The mutated inserts were cloned at the HincII site in -202G and -202T, and at SmaI in -198C, -196T, and -195G. The cloning was described previously(4) . Plasmid p-200S is as p-200, but the -globin insert was cloned at the SmaI site of pUC9. This was performed using synthetic oligonucleotides and standard procedures (34) . Plasmid DNA was amplified in the Escherichia coli strain DH5alpha and purified twice through CsCl banding. Na ions were exchanged for Cs by dialysis in 10 mM TrisbulletHCl, pH 8.0, 50 mM NaCl, 1 mM EDTA. DNA was stored in 10 mM TrisbulletHCl, pH 8.0, 10 mM NaCl, 1 mM EDTA.

Preparation of Topoisomers at Defined Superhelical Density

12 µg of plasmid DNA was incubated with various concentrations of ethidium bromide from 0.1 to 4.0 µg/ml for 90 min at 37 °C in 300 µl of J-1 buffer (10 mM TrisbulletHCl, pH 7.6, 50 mM KCl, 1 mM EDTA, 10 mM dithiothreitol) in the presence of 26 units of topoisomerase I prepared from chicken erythrocytes according to the procedure of Germond et al.(35) . Ethidium bromide was allowed to intercalate into the DNA for 30 min at room temperature before topoisomerase I was added. DNA isomers differing in their linking numbers were resolved on 1.2% agarose gels containing increasing concentrations of chloroquine from 0.1 to 400 µM(36) . The average number of superhelical turns (y) introduced at each concentration of ethidium bromide (x) was described by the function y = 0.05 + 10.25x - 0.32x^2 (r^2 = 0.9990). Mean superhelical densities - were derived from -= yh/d, where (h) was the DNA helical repeat (10.5 bp) and (d) the size of the plasmid (2,750 bp).

Chemicals

The concentration of ethidium bromide was determined by optical absorption using = 40,000 M cm. Chloroacetaldehyde (CAA) (Fluka) was prepared as described(37) . The concentration of OsO(4) was determined using = 1,738 M cm and = 3,116 M cm. OsO(4) was mixed with equimolar amounts of 2,2`-bipyridine (Sigma) just before use.

Chemical Modifications

2.4 µg of plasmid DNA, corresponding to 72 µM of phosphate-DNA, was reacted with chemical probes in 100 µl of 50 mM Tris acetate, pH 4.5, 0.1 mM EDTA, 2 mM MgCl(2), 150 mM NaCl. DNA was allowed to equilibrate at 22 °C for 30 min before the modifications were started. OsO(4) was used at 1.8 mM for 30 min at 22 °C, CAA at 4% (v/v) for 3 h at 25 °C, and diethyl pyrocarbonate (DEPC) at 1.5% (v/v) for 15 min at 22 °C. The modified DNA was separated from the unreacted chemical by filtration through spin columns of Sephadex G-50 (Pharmacia Biotech Inc.) equilibrated in 10 mM TrisbulletHCl, pH 8.0, 50 mM NaCl, 1 mM EDTA, 5.4 µg tRNA/100 µl. To visualize the modifications on the coding strand (top strand on Fig. 1A), DNA was cleaved with EcoRI, 3`-end labeled with the Klenow fragment of E. coli DNA polymerase I and [P]ddATP (3,000 Ci/mmol; 1 Ci = 37 GBq), and then cleaved with HindIII. For the modification on the non-coding strand, DNA was cleaved with BsrBI (located at about 40 bp 5` of HindIII site), 3`-end labeled with terminal deoxynucleotidyl transferase and [P]ddATP, and then cleaved with EcoRI. The restriction fragments containing the -globin insert were separated and purified through 10% polyacrylamide gels. Sites of modifications were cleaved by 1 M piperidine for 30 min at 90 °C and resolved on 8% sequencing gels. Gels were fixed, dried, exposed to Fuji RX films, and quantitated on a PhosphorImager (Molecular Dynamics, Sunnyvale, CA) using the ImageQuant software. Data were processed using the SigmaPlot program (Jandel Scientific, San Rafael, CA).


Figure 1: Sequence of the human -globin 5`-flanking sequence and model for I.T. formation. A, the sequences of human -globin 5`-flanking regions from bp -228 to -189 were cloned in pUC9 as described; single point mutations leading to HPFH are shown in boldface and indicate the bp change on the top (coding) strand as well as the name of the plasmids carrying the respective mutations. The S1 nuclease hypersensitive site is indicated by S1-HSS. A stretch of bp containing two adjacent purine-rich motifs centered on S1-HSS and HPFH is underlined. B, schematic representation of the I.T. structure proposed to be adopted by the -200 region. The structure forms under conditions of negative supercoiling and low pH and is stabilized by Hoogsteen-type hydrogen bonds between the two adjacent purine-rich motifs, represented by a thicker line. The structure leaves unpaired pyrimidine residues on the top strand which become a substrate for S1 nuclease (S1-HSS). The model shows how residues affected by the HPFH point mutations may destabilize the I.T. structure. The 5` terminus on each of the DNA strands is indicated by a filled circle.



Two-dimensional Agarose Gel Electrophoresis

Two-dimensional agarose gel electrophoresis was conducted as reported previously (38) in the buffer described for the chemical modifications. 4 or 30 µM of chloroquine were used in the second dimension.


RESULTS

Fig. 1shows the sequences analyzed in this study, the location of the S1-HSS, and a schematic model for the I.T. based on previous data(4) . We have now extended these data using chemical probe analyses (OsO(4), CAA, and DEPC) in order to detect perturbations at the bp level(27, 28, 29, 37, 38, 39) .

Chemical Modifications on the Coding Strand

We examined the reactivities on the coding strand of the wild type sequence (p-200) by using OsO(4), which reacts with the C5-C6 double bond of unpaired thymines(39) . Lanes 1 and 2 of Fig. 2A show a typical result. While relaxed (R) p-200, lane 1, did not show any strong reactivity, introduction of supercoiling (S, lane 2, - = 0.137) induced strong cleavages at thymines -209, -210, -212, -215, and -217. The percentage of modifications ranged from 7 to 14 (Fig. 2B). Weaker bands were observed corresponding to thymines -227 and -228 (less than 1%). Modifications were pH dependent, being observed only at pH 4.5. At pH 5.0 or higher (up to 7.5 was tested), no appreciable signals were detected. In addition, reactivities were increased by the addition of up to 10 mM Mg ions in a concentration-dependent manner (not shown).


Figure 2: Chemical modifications on the coding strand of p-200. A, plasmid p-200, containing the wild type sequence, was reacted with topoisomerase I and ethidium bromide to give topoisomers at mean superhelical density - of 0 (R) and 0.137 (S). These were modified by OsO(4) (lanes 1 and 2), CAA (lanes 3 and 4), and DEPC (lanes 5 and 6) and processed under the conditions described under ``Experimental Procedures.'' Cherenkov radiation was counted and 100,000 counts/min/lane was loaded. The sequence reported on the left represents the vector in lowercase and the insert in uppercase. Strong and weak signals are denoted by filled and open circles, respectively. B, gels containing the modifications from OsO(4), CAA, and DEPC were scanned and quantitated (``Experimental Procedures''). Signals corresponding to individual bands were expressed as percentage of the total integrated areas. In the case of CAA and DEPC, the values for the relaxed lane were subtracted from supercoiled. The graph shows the mean (± S.D.) of two experiments for OsO(4) (filled bars) and CAA (stippled bars).



CAA forms -etheno adducts with unpaired cytosines, adenines, and, to a lesser extent, guanines(37) . Supercoiling-induced cleavages were found at C, C, C, and A (Fig. 2A, lanes 3 and 4). However, these bands accounted only for 0.5-1.5% of the total radioactivity (Fig. 2B). Acid treatment of the samples, before piperidine cleavage, did not improve the signal-to-noise ratio. The sites of modification complemented those detected by OsO(4) and suggested that the 5` end of the single-stranded region was 3` of A.

The character of the flanking nt was determined by DEPC, a probe specific for unpaired adenines and guanines(37) . As shown in lanes 5 and 6 of Fig. 2A, supercoiling-induced reactivities extended from A to A, the strongest band corresponding to A (0.94 ± 0.05% of the total radioactivity). Taken together, these data define two sets of accessible nt: a major site that extends from T to T that we interpret as single-stranded and a minor one, from nt -218 to -228, that we consider being weakly bonded (filled and open circles in Fig. 2A and Fig. 4C). These structural transitions are influenced by supercoiling, protonation at specific residues, and Mg ions.


Figure 4: Chemical modifications on the non-coding strand. A, DNAs were prepared and reacted with CAA as described (``Experimental Procedures'' and Fig. 2). p-200S was used in this case instead of p-200 in order to align the wild type sequence with that of -198C, -196T, and -195G. The nt changes carried by the mutant plasmids are shown on the left. Open circles indicate reactivities common to more than one DNA, whereas asterisks identify sites of modification specific for -198C. B, DNAs were reacted to OsO(4) as described. Signals above background levels were quantitated and shown as the percentage of the total radioactivity. C, summary of the CAA (C, A, and G residues), OsO(4) (T residues), and DEPC (A, and G residues) modifications. Filled and open circles identify strong and weak cleavage sites, respectively, on the wild type, -198C, -196T, and -195C. Boxed open circles indicate the reactivities common to the wild type, -196T, and -195G, but not -198C. Cleavages specific for this mutant are shown by asterisks. A line exterior to the open circles denotes those residues in which the modifications were affected by the flanking vector sequences.



Reactions with OsO(4), CAA, and DEPC were conducted and visualized on the coding strand of plasmids containing point mutations associated with HPFH (Fig. 1A). Relaxed and supercoiled DNA (- = 0.137) were treated under the same conditions as p-200. The results demonstrated that the modifications occurred at the same residues as the wild type (not shown) but that there were quantitative differences, which are summarized in Fig. 3. Here the columns represent the percentages of the signals from OsO(4) and CAA normalized to the wild type sequence. The major changes were caused by the G and T mutations, which produced a general reduction in modification; the consequences of the G mutation were more severe than those of . -198C exhibited a 2-fold increase in modification at T and T, whereas -196T and -195G displayed a modest reduction in modification at the middle residues. The normalization at A with DEPC gave the following values: 0.00 for -202G, 0.38 for -202T, 1.19 for -198C, 0.91 for -196T, and 1.32 for -195G.


Figure 3: Chemical modifications on the coding strand of HPFH mutant plasmids. Plasmids bearing point mutations leading to HPFH were reacted to OsO(4) or CAA and processed as described in the legend to Fig. 2. After quantitation, the data were normalized by dividing the results obtained at each residue by that of p-200 at the corresponding position. The amount of modification at Ts is derived from the data from OsO(4) reactions, whereas the values at Cs and A are derived from the data from CAA reactions. The data represent the average from two or more experiments with each probe. Coefficients of variation were comparable to those of p-200 shown in Fig. 2B.



Chemical Modifications on the Non-coding Strand

Plasmids carrying the wild type sequence or the HPFH point mutations were reacted with OsO(4), CAA, or DEPC, and the sites of cleavage were monitored on the non-coding strand. In response to supercoiling (- = 0.137), CAA modified C and C in the wild type sequence (Fig. 4A, lanes 1 and 2). A second set of reactive nt was present from A to C. Since modifications at these latter positions were also detected on the coding strand (Fig. 4C), they support the interpretation of a locally distorted duplex DNA. However, in contrast to the coding strand, no cleavages were seen between purines -209 and -215. This strand-selective pattern of reactivity, together with the modifications at C and C, is indicative of an I.T. Accordingly, the protected purines constitute the third strand of the structure, whereas C and C would be located in the loop (Fig. 1B and Fig. 7)(40, 41) .


Figure 7: Sequence alignments for the I.Ts. Bases complementary to the reactive residues -217 to -209 were aligned with the downstream purine-rich sequence, which contains the HPFH point mutations (-203 to -194), giving the two I.T. models illustrated for p-200. The structures for all models on the right half (B) of the figure have the third strand displaced by one position as compared to the models on the left half (A). Closed circles indicate 5` ends. Dotted lines designate hydrogen bonds. For the mutant plasmids, only the composition of the triplex stem is shown, where the triplets affected by the HPFH point mutations are boxed. + indicates protonated residues.



The data with DEPC showed the accessibility of A, G, and G (not shown). OsO(4) modified the thymines spanning positions -216 to -226 to a moderate extent (Fig. 4B), confirming the chemical accessibility of the I.T. flanking sequences as well as that of the triplex-duplex junction (nt -217 and -218).

-202G and -202T showed a marked reduction in reactivities (Fig. 4B) relative to the wild type sequence, confirming their destabilizing effect. -196T and -195G displayed patterns of modifications with CAA and DEPC qualitatively identical to that of the wild type. These mutants also showed a considerable reduction in reactivity at T (Fig. 4, A and B). These results, together with those on the coding strand (Fig. 3), suggest that these two mutations perturb the overall geometry of the triplex structure, but do not change the sequence alignment. In contrast, the CAA-induced cleavages at C and C were not observed in the -198C mutant. Instead, signals were detected at C, G, and G (denoted by asterisks in Fig. 4, A and C), indicating an alteration in the loop structure. This may be accomplished by a slippage of the purine-rich third strand relative to the Watson-Crick duplex or by multiple adjustments in the interactions among nt that retain their sequence alignment. In either case, it is clear that this mutation has profound consequences on the triplex structure.

The increase in OsO(4) modification at T and T in mutants -198C, -196T, and -195G relative to p-200 (Fig. 4B) was quite significant (3-4-fold). However, this behavior is not due to differences in the I.T. structures. A detailed analysis of these reactivities will be given in the last section under ``Results.''

Supercoiling-dependent Titration of OsO(4) Modification

Since the previous experiments were conducted at very high superhelical densities, they left open the question on whether mutations C, T, and G had a significant influence on I.T. formation. To address this issue, OsO(4) was reacted with a set of topoisomers spanning superhelical densities from 0 to -0.137. At each supercoil density, the percentages of cleavage on the coding strand (from T to T) were totaled. Fig. 5A shows the results with p-200, -202G, and -202T; a substantial amount of free energy from supercoiling was required to form the I.T. even in the wild type sequence. In fact, at a superhelical density of -0.06, typical of plasmid DNA isolated from E. coli, only about 10% reactivity was detected. The results from -202G and -202T confirm their strong inhibitory effect. Panel B shows the results with -198C and p-200 (as a dotted line). Whereas the increase in modification for -198C may be attributed to the stronger reactivity at T and T (Fig. 3), the significant shift in - at 50% modification (0.068 ± 0.001 versus 0.078 ± 0.001) indicates that this mutation decreases the amount of free energy required for the duplex to triplex transition. On the other hand, the data for -196T and -195G, reported in panels C and D, respectively, do not show dramatic perturbations.


Figure 5: Supercoiling-dependent titration of OsO(4) modification. DNAs were reacted with topoisomerase I in the presence of different amounts of ethidium bromide (from 0 to 4.0 µg/ml). The mean superhelical densities were derived as reported under ``Experimental Procedures.'' Topoisomers were modified by OsO(4) and processed as described for the coding strand. For each lane, the percentage of modification corresponding to thymines -209, -210, -212, -215, and -217 was added and expressed as a single (y) value. Interpolation of the experimental data was satisfied by a four-parameter logistic function. Experiments were performed in duplicate. A, p-200 (), -202G (), -202T (bullet); B, -198C; C, -196T; D, -195G. Dotted line represents p-200; droplines indicate the inflection points (c), namely the - value of at 50% OsO(4) modification. Values of asymptotical maximum (a) and c were as follows: p-200 (a) 52.7 ± 1.3, (c) 0.078 ± 0.001; -202G (a) 6.4 ± 0.3, (c) 0.098 ± 0.002; -202T (a) 11.1 ± 1.0, (c) 0.093 ± 0.005; -198C (a) 65.2 ± 1.2, (c) 0.068 ± 0.001; -196T (a) 51.7 ± 1.3, (c) 0.081 ± 0.001; -195G (a) 57.2 ± 2.5, (c) 0.084 ± 0.002.



We also conducted two-dimensional gel electrophoresis to assay for the relaxation associated with the DNA structural transition. Topoisomers of p-200 and all five mutant plasmids were separated on agarose gels in the same buffer solution used for the chemical modification experiments. Chloroquine concentrations of 4 and 30 µM were employed in the second dimension, which afforded the resolution of topoisomers up to 23 negative superhelical turns. A transition centered at 15 negative superhelical turns was observed; however, this was attributed to the formation of a non-B DNA structure in the vector. No transitions due to the duplex-triplex conversion were observed. Two explanations are possible: first, the transition may be too small to be detected in this range of topoisomers or, second, the I.T structure may be too unstable to survive the electrophoretic conditions.

Kinetics of Modifications at AbulletT Base Pairs Support the I.T. Model

Reactivities to OsO(4) on the non-coding strand and DEPC at the corresponding adenines were interpreted from a dynamic standpoint.

Fig. S1represents a bp in conventional duplex DNA. As bp opening depends on k(1) and k, the two constants regulate the extent of modification at both residues. However, if Fig. S2occurs, in which T may interact with a second partner (X) once in an open conformation, the reactivities will depend on k(2) and k, in addition to k(1) and k. This is expected to selectively increase modification at A, since this residue remains in an open conformation for a longer time than its partner T. If the percentages of cleavage are expressed as a ratio of T/A, values for AbulletT pairs involved in a type 2 process will be smaller than those progressing through Fig. S1.


Figure S1:



Figure S2:


Table 1shows the results for AbulletT bp -226, -225, -222, -219, and -216 for p-200 and mutant plasmids. The values varied considerably, from 0.7 to 66, but, with the exception of -216, they were relatively homogeneous at a given locus. The variations observed may be interpreted in terms of modulation in the accessibility to the reactants. In fact, not shown in the schemes are intermediate states in which a given bp may adopt distorted conformations and/or alterations in the stacking interactions with neighbor residues. These changes, which are favored by high levels of supercoiling and flanking non-B DNA structures such as the I.T., not only facilitate bp opening, but also increase the rate of chemical attack on partially unpaired conformations(42) .



Locus -216 appears to be a different case. Here the low values, which spanned a greater range (0.7-6.3), were determined by an increased modification at A(open) (A) associated with normal or low cleavages at T (Fig. 4B and 6A) and are appropriately accounted for by a type 2 process. Thus, the ratio of T/A modifications is a measure of the relative chemical accessibility and hence the extent to which the A or the T residue has reassociated with another partner.

Reactivities at the I.T. Flanking Sequence Are Affected by the Adjacent Cloning Site

We observed variations in chemical modifications among the mutants that did not correlate with the stability of the respective triplex structures. For example, the OsO(4) modifications at T and T were 3-4-fold higher in -198C, -196T, and -195G relative to p-200 (Fig. 4B), whereas an increase in stability was apparent only for -198C (Fig. 5). Since this behavior reflected the differences in the cloning sites for mutants, a comparison of OsO(4), CAA, and DEPC reactivities between p-200 and p-200S was performed to resolve this issue. Fig. 6A shows the results from DEPC treatment of the coding strand. The tracings represent the signal from the supercoiled plasmids (- = 0.137) less the modification found for the relaxed DNA. Quantitation of the peak areas revealed that A and A acquired an 2-fold increase when subcloned at the SmaI site (p-200S), whereas smaller differences were associated with the other residues. Also, the supercoiling-dependent modifications with OsO(4) at T and T (Fig. 6B) revealed that greater signals were associated with the plasmids containing the -globin insert at the SmaI site. Hence, these data show that the flanking vector sequences are responsible for the reactivities at these positions, rather than the HPFH point mutations.


Figure 6: Effect of the cloning site on chemical modification at the I.T. flanking sequence. A, plasmids p-200 and p-200S contain the wild type -globin insert cloned at the HincII or SmaI site of pUC9, respectively. The DNAs were reacted with DEPC at - of 0 (R) and 0.137 (S) and processed as described for the coding strand. Quantitations were performed as follows. Pixel values for each line graph were converted to percentage of the total signal (sum of pixel values for each lane). Percentages of the R sample were then subtracted from those of S after R and S were aligned on their highest pixel value. The range of the times axes was adjusted so as to align the two relevant sets of peaks. Areas were calculated by cutting and weighing the peaks. Values (mg times 10) at selected locations are reported. Since the amount of material in A for both DNAs was identical (±9%), this reinforces the quantitative methodology. B, supercoiling-dependent OsO(4) modification at T and T. The percentage of modification at T and T was added and expressed as a single (y) value. Interpolation was conducted as explained in the legend to Fig. 5. For clarity, standard deviations were removed.




DISCUSSION

These chemical probe analyses on the 5`-flanking region of the human -globin genes enable a molecular description of the I.Ts. at a level of detail previously not possible. The duplex to triplex transition characteristic of oligopurine-oligopyrimidine sequences is accomplished by the purine residues simultaneously engaging in Watson-Crick and Hoogsteen hydrogen bonds(27, 28, 29, 30) . In general, the bound third strand may occupy a parallel, or antiparallel, orientation relative to the purine residues, depending on its sequence composition. Accordingly, a purine-rich third strand will be accommodated in the major groove in an antiparallel orientation, whereas a pyrimidine-rich third strand will occupy the reverse position. Therefore, stable hydrogen bonds may form between G:G, A:A, and A:T in the former case, and G:C and A:T in the latter(30) . Low pH is required in order to stabilize a pyrimidine-rich strand containing cytosine residues in this arrangement. (^2)

The structures formed at the 5`-flanking region of the human -globin genes, both in the wild type as well as the HPFH point mutations, deviate considerably from this general scheme. In fact, these and previous data (4) indicate that the third strand is purine-rich, yet low pH is required for stabilization. Our results show that the third strand AAGAGGATA is hybridized in an antiparallel orientation to the downstream GGGGAAGGGG containing the sites of mutations. Since these two sequences are 9 and 10 bases long, their interaction leads to two possible alignments (Fig. 7). In no case can homogeneous G:G, A:A, or A:T Hoogsteen base pairing take place. Rather, mismatches of the G:A, A:G, and G:T type must also be considered. In both of the reported models, the most abundant triplet is CbulletG:A, a combination that has recently been observed in other I.Ts.(43, 44) . The stabilization induced by the protonated reversed-Hoogsteen-bound adenines agrees well with our observations. The Hoogsteen G:T pair has been described by NMR only in the parallel orientation(45) , where T(H3) shares one hydrogen bond with G(N7). Since parallel and antiparallel thymine displays a 2-fold symmetry about N3(30) , it is possible that the antiparallel G:T pair maintains this type of hydrogen bond. Antiparallel A:G has not been documented, however, close interactions are possible.

Overall, the paucity of stable CbulletG:G and TbulletA:A triplets, together with the short length of the I.T. stem, accounts for the observed requirement of high levels of supercoiling and the inability to detect a duplex to triplex transition by two-dimensional agarose gel electrophoresis. Finally, both models are consistent with the data which locate C and C in the loop.

All of the HPFH point mutations alter the normal I.T. structure, some slightly (T and G), others profoundly (C, G, and T). The destabilizing effects of G and T, observed previously(4) , are confirmed. Both of these mutations disrupt a GGGCCC motif, a sequence that has been shown to acquire an induced bend upon complexation of Mg ions(46) . The effect of these nt changes may be that of abolishing this induced bend, which may represent the nucleation step for I.T. formation or that of disrupting critical Hoogsteen hydrogen bonds. Our results favor the first interpretation.

The stabilization mediated by C was suspected from an earlier work(47) . However, the previous studies (4, 47) did not anticipate that this mutation may alter the sequence alignment of the triplex by inducing a slippage of the third strand relative to the Watson-Crick duplex.

The results for T and G are unexpected. Here, we find subtle changes in the overall I.T. structures, whereas substantial destabilizations were predicted from former assays(4) . It is likely that these discrepancies originate from the experimental conditions used. In fact, we found no modifications at pH 5.0, whereas S1 nuclease cleavages were previously detected at this pH. Since I.T. formation occurs in the pH range of 4.5-5.0, slight variations are likely to affect the stabilities greatly. Also, the results of the oligomer binding assays may have been influenced by the distortion of the DNA flanking the I.T., as well as by the difference in the cloning site between p-200 and mutant plasmids (Fig. 6). The two sets of schemes in Fig. 7predict stable triplexes for C, T, and G(48, 49) ; our data do not permit a delineation between these alternatives.

From a physiological standpoint, the combination of low pH and elevated superhelical density required to induce these structures in vitro may raise concerns about their stability in a cellular environment. However, base protonation may occur, and be maintained, in polynucleotides at several pH units above the pK(a) of the free base(50, 51, 52, 53) . Also, divalent metal ions, polyamines(54, 55, 56, 57, 58) , and the aforementioned single-strand-specific binding proteins may cooperate in lowering the activation energy needed for the I.T. transition.

In vivo, the chromatin in the 5`-flanking region of the human -globin genes has been shown to be hypersensitive to DNase I digestion or restriction enzyme cleavage in cells where the -globin genes are actively transcribed(59, 60) . This behavior, which is also observed in other systems, is likely to be correlated with the loss of positioned nucleosomes along the DNA, and the acquisition of new interactions between cis regulatory sequences and cognate transcription factors(61, 62) . Indeed, -globin regulatory elements such as the CACCC and CCAAT boxes appear to be selectively occupied in K562 cells, which express these genes(63, 64) .

In vivo, no protein complex has been identified to date that interacts with the upstream region that contains the HPFH point mutations. In addition, experiments conducted in transgenic mice have demonstrated that, at least in the case of G, a strong correlation exists between this point mutation and the HPFH phenotype(7) . Therefore, a macromolecular complex may assemble at -200; this complex could be involved in -globin gene silencing. A polypeptide that binds and stabilizes the I.T. structure induced at this location in vitro has been found(14) . The interaction of this protein with the I.T. might be altered by any of the HPFH point mutations. It remains to be established whether such an interaction reflects the formation of an I.T. complex that operates in vivo to temporally regulate the expression of the human -globin genes.


FOOTNOTES

*
This work was supported by National Institutes of Health Grants GM52982 (to R. D. W.) and DK49786 (to T. J. L.), National Science Foundation Grant DMB-9103942, and the Robert A. Welch Foundation (to R. D. W.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§
Present address: Dept. of Biology, 2345 Campus Box, Elon College, Elon College, NC 27244.

To whom correspondence should be addressed. Tel.: 713-677-7651; Fax: 713-677-7689; rwells{at}ibt.tamu.edu.

(^1)
The abbreviations used are: HPFH, hereditary persistence of fetal hemoglobin; S1-HSS, S1 nuclease-hypersensitive site; I.T., intramolecular triplex; bp, base pair(s); CAA, chloroacetaldehyde; DEPC, diethyl pyrocarbonate; nt, nucleotide(s).

(^2)
A colon is used to designate the association between two bases by Hoogsteen or reversed Hoogsteen pairs whereas a center dot designates the interaction between bases in a Watson-Crick pairing motif.


ACKNOWLEDGEMENTS

We thank Dr. Richard R. Sinden and Erna Baum for topoisomerase I and Drs. Xiaolian Gao, Richard Bowater, and Vladimir N. Potaman for suggestions.


REFERENCES

  1. Karlsson, S., and Nienhuis, A. W. (1985) Annu. Rev. Biochem. 54,1071-1108 [CrossRef][Medline] [Order article via Infotrieve]
  2. Collins, F. S., and Weissman, S. M. (1984) Prog. Nucleic Acids Res. Mol. Biol. 31,315-462 [Medline] [Order article via Infotrieve]
  3. Huisman, T. H. J. (1992) Hemoglobin 16,237-258 [Medline] [Order article via Infotrieve]
  4. Ulrich, M. J., Gray, W. J., and Ley, T. J. (1992) J. Biol. Chem. 267,18649-18658 [Abstract/Free Full Text]
  5. Berry, M., Grosveld, F., and Dillon, N. (1992) Nature 358,499-502 [CrossRef][Medline] [Order article via Infotrieve]
  6. Jane, S. M., Gumucio, D. L., Ney, P. A., Cunningham, J. M., and Nienhuis, A. W. (1993) Mol. Cell. Biol. 13,3272-3281 [Abstract]
  7. Starck, J., Sarkar, R., Romana, M., Bhargava, A., Scarpa, A. L., Tanaka, M., Chamberlain, J. W., Weissman, S. M., and Forget, B. G. (1994) Blood 84,1656-1665 [Abstract/Free Full Text]
  8. Sinden, R. R. (1994) DNA Structure and Function pp. 134-284, Academic Press, Inc., San Diego, CA
  9. Lukomski, S., and Wells, R. D. (1994) Proc. Natl. Acad. Sci. U. S. A. 91,9980-9984 [Abstract/Free Full Text]
  10. Lee, L. J., Latimer, L. J., Haug, B. L., Pulleyblank, D. E., Skinner, D. M., and Burkholder, G. D. (1989) Gene (Amst.) 82,191-199 [Medline] [Order article via Infotrieve]
  11. Bowater, R. P., Chen, D., and Lilley, D. M. (1994) Biochemistry 33,9266-9275 [Medline] [Order article via Infotrieve]
  12. Yamazoe, M., Shirahige, K., Rashid, M. B., Kaneko, Y., Nakayama, T., Ogasawara, N., and Yoshikawa, H. (1994) J. Biol. Chem. 269, 15244-15252 [Abstract/Free Full Text]
  13. Giffin, W., Torrance, H., Saffran, H., MacLeod, H. L., and Haché, R. J. G. (1994) J. Biol. Chem. 269,1449-1459 [Abstract/Free Full Text]
  14. Horwitz, E. M., Maloney, K. A., and Ley, T. J. (1994) J. Biol. Chem. 269,14130-14139 [Abstract/Free Full Text]
  15. Ito, K., Sato, K., and Endo, H. (1994) Nucleic Acids Res. 22,53-58 [Abstract]
  16. Cockell, M., Frutiger, S., Hughes, G. J., and Gasser, S. M. (1994) Nucleic Acids Res. 22,32-40 [Abstract]
  17. Brook, J., McCurrach, M. E., Harley, H. G., Buckler, A. J., Church, D., Aburatani, H., Hunter, K., Stanton, V. P., Thirion, J.-P., Hudson, T., Sohn, R., Zemelman, B., Snell, R. G., Rundle, S. A., Crow, S., Davies, J., Shelbourne, P., Buxton, J., Jones, C., Juvonen, V., Johnson, K., Harper, P. S., Shaw, D. J., and Housman, D. E. (1992) Cell 68,799-808 [Medline] [Order article via Infotrieve]
  18. Mahadevan, M., Tsilfidis, C., Sabourin, L., Shutler, G., Amemiya, C., Jansen, G., Neville, C., Narang, M., Barceló, J., O'Hoy, K., Leblond, S., Earle-Macdonald, J., de Jong, P. J., Wieringa, B., and Korneluk, R. G. (1992) Science 255,1253-1255 [Medline] [Order article via Infotrieve]
  19. La Spada, A. R., Wilson, E. M., Lubahn, D. B., Harding, A. E., and Fischbeck, K. H. (1991) Nature 352,77-79 [CrossRef][Medline] [Order article via Infotrieve]
  20. Orr, H. T., Chung, M.-Y., Banfi, S., Kwiatkowski, T. J., Jr., Servadio, A., Beaudet, A. L., McCall, A. E., Duvick, L. A., Ranum, L. P. W., and Zoghbi, H. Y. (1993) Nature Genet. 4,221-226 [Medline] [Order article via Infotrieve]
  21. Huntington's Disease Collaborative Research Group (1993) Cell 72,971-983 [Medline] [Order article via Infotrieve]
  22. Koide, R., Ikeuchi, T., Onodera, O., Tanaka, H., Igarashi, S., Endo, K., Takahashi, H., Kondo, R., Ishikawa, A., Hayashi, T., Saito, M., Tomoda, A., Miike, T., Naito, H., Ikuta, F., and Tsuji, S. (1994) Nature Genet. 6,9-13 [Medline] [Order article via Infotrieve]
  23. Nagafuchi, S., Yanagisawa, H., Sato, K., Shirayama, T., Ohsaki, E., Bundo, M., Takeda, T., Tadokoro, K., Kondo, I., Murayama, N., Tanaka, Y., Kikushima, H., Umino, K., Kurosawa, H., Furukawa, T., Nihei, K., Inoue, T., Sano, A., Komure, O., Takahashi, M., Yoshizawa, T., Kanazawa, I., and Yamada, M. (1994) Nature Genet. 6,14-18 [Medline] [Order article via Infotrieve]
  24. Fry, M., and Loeb, L. A. (1994) Proc. Natl. Acad. Sci. U. S. A. 91,4950-4954 [Abstract]
  25. Kang, S., Jaworski, A., Ohshima, K., and Wells, R. D. (1995) Nature Genet. 10,213-218 [Medline] [Order article via Infotrieve]
  26. Kang, S., Ohshima, K., Shimizu, M., Amirhaeri, S., and Wells, R. D. (1995) J. Biol. Chem. , in press
  27. Wells, R. D., Collier, D. A., Hanvey, J. C., Shimizu, M., and Wohlrab, F. (1988) FASEB J. 2,2939-2949 [Abstract/Free Full Text]
  28. Lu, G., and Ferl, R. J. (1993) Int. J. Biochem. 25,1529-1537 [CrossRef][Medline] [Order article via Infotrieve]
  29. Mirkin, S. M., and Frank-Kamenetskii, M. D. (1994) Annu. Rev. Biophys. Biomol. Struct. 23,541-576 [CrossRef][Medline] [Order article via Infotrieve]
  30. Radhakrishnan, I., and Patel, D. J. (1994) Biochemistry 33,11405-11416 [Medline] [Order article via Infotrieve]
  31. Bacolla, A., and Wu, F. Y.-H. (1991) Nucleic Acids Res. 19,1639-1647 [Abstract]
  32. Rooney, S. M., and Moore, P. D. (1995) Proc. Natl. Acad. Sci. U. S. A. 92,2141-2144 [Abstract]
  33. Behe, M. J. (1995) Nucleic Acids Res. 23,689-695 [Abstract]
  34. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989 ) Molecular Cloning, a Laboratory Manual , 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
  35. Germond, J. E., Hirt, B., Oudet, P., Gross-Bellard, M., and Chambon, P. (1975) Proc. Natl. Acad. Sci. U. S. A. 72,1843-1847 [Abstract]
  36. Keller, W. (1975) Proc. Natl. Acad. Sci. U. S. A. 72,4876-4880 [Abstract]
  37. Kohwi-Shigematsu, T., and Kohwi, Y. (1992) Methods Enzymol. 212,155-180 [Medline] [Order article via Infotrieve]
  38. Collier, D. A., Griffin, J. A., and Wells, R. D. (1988) J. Biol. Chem. 263,7397-7405 [Abstract/Free Full Text]
  39. Palecek, E. (1992) Methods Enzymol. 212,139-155 [Medline] [Order article via Infotrieve]
  40. Shimizu, M., Kubo, K., Matsumoto, U., and Shindo, H. (1994) J. Mol. Biol. 235,185-197 [Medline] [Order article via Infotrieve]
  41. Hanvey, J. C., Shimizu, M., and Wells, R. D. (1989) J. Biol. Chem. 264,5950-5956 [Abstract/Free Full Text]
  42. Buckle, M., Fritsch, A., Roux, P., Geiselmann, J., and Buc, H. (1991) Methods Enzymol. 208,236-258 [Medline] [Order article via Infotrieve]
  43. Malkov, V.-A., Voloshin, O. N., Veselkov, A. G., Rostapshov, V. M., Jansen, I., Soyfer, V. N., and Frank-Kamenetskii, M. D. (1993) Nucleic Acids Res. 21,105-111 [Abstract]
  44. Klysik, J. (1995) J. Mol. Biol. 245,499-507 [CrossRef][Medline] [Order article via Infotrieve]
  45. Macaya, R. F., Gilbert, D. E., Malek, S., Sinsheimer, J. S., and Feigon, J. (1991) Science 254,270-274 [Medline] [Order article via Infotrieve]
  46. Brukner, I., Susic, S., Dlakic, M., Savic, A., and Pongor, S. (1994) J. Mol. Biol. 236,26-32 [CrossRef][Medline] [Order article via Infotrieve]
  47. Tate, V. E., Wood, W. G., and Weatherall, D. J. (1986) Blood 68,1389-1393 [Abstract]
  48. Durland, R. H., Rao, T. S., Revankar, G. R., Tinsley, J. H., Myrick, M. A., Seth, D. M., Rayford, J., Singh, P., and Jayaraman, K. (1994) Nucleic Acids Res. 22,3233-3240 [Abstract]
  49. Dittrich, K., Gu, J., Tinder, R., Hogan, M., and Gao, X. (1994) Biochemistry 33,4111-4120 [Medline] [Order article via Infotrieve]
  50. Inman, R. B. (1964) J. Mol. Biol. 9,624-637
  51. Wells, R. D., and Larson, J. E. (1972) J. Biol. Chem. 247,3405-3409 [Abstract/Free Full Text]
  52. Carbonnaux, C., van der Marel, G. A., van Boom, J. H., Guschlbauer, W., and Fazakerley, G. V. (1991) Biochemistry 30,5449-5458 [Medline] [Order article via Infotrieve]
  53. Leonard, G. A., Booth, E. D., and Brown, T. (1990) Nucleic Acids Res. 18,5617-5623 [Abstract]
  54. Kohwi, Y., and Kohwi-Shigematsu, T. (1988) Proc. Natl. Acad. Sci. U. S. A. 85,3781-3785 [Abstract]
  55. Tung, C.-H., Breslauer, K. J., and Stein, S. (1993) Nucleic Acids Res. 21,5489-5494 [Abstract]
  56. Malkov, V. A., Voloshin, O. N., Soyfer, V. N., and Frank-Kamenetskii, M. D. (1993) Nucleic Acids Res. 21,585-591 [Abstract]
  57. Washbrook, E., and Fox, K. R. (1994) Nucleic Acids Res. 22,3977-3982 [Abstract]
  58. Kang, S., Wohlrab, F., and Wells, R. D. (1992) J. Biol. Chem. 267,1259-1264 [Abstract/Free Full Text]
  59. Gimble, J. M., Max, E. E., and Ley, T. J. (1988) Blood 72,606-612 [Abstract]
  60. Bresnick, E. H., and Felsenfeld, G. (1994) Proc. Natl. Acad. Sci. U. S. A. 91,1314-1317 [Abstract]
  61. Lewin, B. (1994) Cell 79,397-406 [Medline] [Order article via Infotrieve]
  62. Benezra, R., Cantor, C. R., and Axel, R. (1986) Cell 44,697-704 [Medline] [Order article via Infotrieve]
  63. Ikuta, T., and Kan, Y. W. (1991) Proc. Natl. Acad. Sci. U. S. A. 88,10188-10192 [Abstract]
  64. Reddy, P. M. S., Stamatoyannopoulos, G., Papayannopoulou, T., and Shen, C.-K. J. (1994) J. Biol. Chem. 269,8287-8295 [Abstract/Free Full Text]

©1995 by The American Society for Biochemistry and Molecular Biology, Inc.