(Received for publication, March 20, 1995; and in revised form, August 9, 1995)
From the
Several human hereditary neuromuscular disease genes are
associated with the expansion of CTG or CGG triplet repeats. The DNA
syntheses of CTG triplets ranging from 17 to 180 and CGG repeats from 9
to 160 repeats in length were studied in vitro. Primer
extensions using the Klenow fragment of DNA polymerase I, the modified
T7 DNA polymerase (Sequenase), or the human DNA polymerase paused
strongly at specific loci in the CTG repeats. The pausings were
abolished by heating at 70 °C. As the length of the triplet repeats
in duplex DNA, but not in single-stranded DNA, was increased, the
magnitude of pausings increased. The location of the pause sites was
determined by the distance between the site of primer hybridization and
the beginning of the triplet repeats. CGG triplet repeats also showed
similar, but not identical, patterns of pausings. These results
indicate that appropriate lengths of the triplets adopt a non-B
conformation(s) that blocks DNA polymerase progression; the resultant
idling polymerase may catalyze slippages to give expanded sequences and
hence provide the molecular basis for this non-Mendelian genetic
process. These mechanisms, if present in human cells, may be related to
the etiology of certain neuromuscular diseases such as myotonic
dystrophy and Fragile X syndrome.
CTG and CGG trinucleotide repeat expansions are associated with a number of human hereditary genetic disease genes including human myotonic dystrophy(1, 2, 3) , Kennedy's disease(4) , spinocerebellar ataxia type I(5) , Huntington's disease(6) , dentatorubral-pallidoluysian atrophy(7, 8) , Haw River syndrome(9) , Machado-Joseph disease(10) , and Fragile X and XE syndromes(11, 12, 13) . The triplet repeat sequences occur at different locations relative to the coding regions. For example, autosomal dominant myotonic dystrophy, which is characterized mainly by myotonia and progressive muscle weakness, has CTG triplet repeats in the 3`-untranslated region of the myotonin kinase gene(3, 14) . Fragile X syndrome, the most frequent inherited mental retardation, contains multiple CGG repeats within the mRNA noncoding region of a Fragile X associated gene (FMR-1) while, for the other hereditary diseases, the CTG repeat is located within the genes and encodes a tract of glutamines (4, 5, 6, 7, 8, 9, 10, 11, 12, 13) . These highly polymorphic triplet repeats have been shown to range from 5 to 37 copies on normal chromosomes, whereas carriers and affected individuals have more than 39 copies(15) , and the largest expansion observed is 2000 or more copies of the repeats in some Fragile X syndrome and myotonic dystrophy patients(14) . These diseases have a common property known as anticipation in which the severity of the disease is increased and the age of onset is reduced with each successive generation. These behaviors correlate with the massive expansions of triplet repeats, a non-Mendelian genetic process.
DNA structural investigations have shown evidence that a number of
repeating DNA sequences including triplets such as GAA, GGA, and TTA
adopt non-B conformations under appropriate environmental conditions (15, 16, 17, 18) . ()Cruciform structures form within regions of inverted
repeat symmetry in negatively supercoiled DNA, and left-handed Z-DNA is
formed at regions of alternating purines and
pyrimidines(20, 21) . pur
pyr (
)tracts (i.e. (A-G) or (G runs)) can form
intramolecular triplexes (15, 22, 23, 24, 25) . In
addition, other alternative conformations including bent DNA, slipped
structures, and nodule DNA are known to exist in microsatellite-type
DNA sequences(15, 17, 26) .
Repeating
sequences can cause DNA synthesis aberrations by the cellular
replication machinery. Inverted repeats generate frequent deletions,
and simple repeats promote multiple slippages in templates, which give
rise to deletions and duplications as well as strand switching during
replication in cells(27, 28, 29) . Previous
studies indicated that pur pyr sequences may be pause (arrest)
sites for DNA replication and amplification, and replication pause
sites are potentially mutagenic(30, 31, 32) .
Recently, the replication of CGG triplet repeats of Fragile X syndrome
was found to be delayed, compared with the replication of alleles from
normal males(33) .
Herein, we show that CTG and CGG triplet repeat sequences, which originated from hereditary genetic disease patients, have unorthodox properties; DNA polymerases pause at specific locations in these sequences. The pausings are dependent on the length of the repeat tract and temperature. Our results suggest that non-B conformations of the triplet repeats may be responsible for the pausing and that these properties are related to the etiology of some hereditary genetic diseases.
pRW3306 containing 160 CGG triplet repeats was constructed
as follows. A DNA fragment, which contains (CGG), was
isolated from pTM10 (a gift of B. A. Oostra, Erasmus University, The
Netherlands) by digestion with BstUI and HaeIII. The
insert was ligated to generate multimers using T4 DNA ligase. A
head-to-tail dimer of (CGG)
was cloned into the HincII site of the polylinker of pUC19. The CGG repeat
sequences in pTM10, which were derived from the cDNA of Fragile X
patients(11) , contain mutations of the perfect repeat at the
12th repeat (AGG) and at the 73rd repeat (CAG). pRW3306 also contains a
non-CGG repeat sequence (CTGGG) at the junction of the two blocks of
(CGG)
. Plasmids were grown in SURE cells (Stratagene) and
were isolated by the methods described above for the CTG-containing
plasmids.
Thus, we
investigated CTG repeat sequences for their ability to be replicated in vitro using primer extension methods with the Klenow
fragment of DNA polymerase I and a modified T7 DNA polymerase
(Sequenase). Supercoiled pRW1981, which contains (CTG),
was denatured with alkali and then annealed with primers. After
renaturation, the primer extension mixture was preincubated for 10 min
at 37 or 50 °C, and then either the Klenow fragment or the
Sequenase was added and incubated for 10 min at 37 °C. Fig. 1A shows that, unexpectedly, the DNA polymerases
do not progress uniformly through the triplet repeat sequences, but
instead pause sites were found for both Sequenase (S) and the
DNA polymerase I (Klenow fragment) (K) in both strands of
(CTG)
. In the primer extension of the bottom strand with
preincubation at 37 °C, the Sequenase (lane 2) and the DNA
polymerase I (Klenow fragment) (lane 3) paused at the
beginning region of the multiple CTG insert (triplets numbered
118-126, Fig. 1B) and at 37 CTG triplets away
(triplets numbered 88-94, Fig. 1B); two brackets on the left side in Fig. 1A indicate
these regions. Sequenase paused much stronger in the beginning region
than in the distal region 37 CTG triplets away (very weak pausing), but
the DNA polymerase I (Klenow fragment) showed more pausing in the
distal region (37 CTG triplets away) than in the beginning region. At
50 °C (lane 1), the amount of pausing of Sequenase in the
bottom strand was greatly reduced in the beginning region compared with
the 37 °C study.
Figure 1:
DNA
polymerase pause sites in CTG repeat sequence in plasmids. A,
DNA sequencing gel analysis of pausing of DNA polymerase in CTG
repeats. S and K represent Sequenase and the Klenow
fragment of DNA polymerase I, respectively. The arrows represent the beginning site of the multiple CTG insert. The
principal pause sites are marked by the brackets. B,
summary of the pausing in 130-CTG repeat sequences. The length of the bars represents an approximate visual estimate of the amount
of pausing. The data at both 50 and 37 °C are included to show all
of the pause sites. C, pause sites for the human DNA
polymerase . DNA was preincubated at 37 °C for 10 min, and
reactions were performed at 37 °C for 20 min. Data for the bottom
strand is shown. The enzyme was isolated as described previously (19) (generous gift of Drs. R. K. Singhal and S. H. Wilson).
The arrow and bracket have the same designations as
in panel A. D, effect of temperature on the pausings.
Primer extensions using Sequenase were performed with various sizes of
CTG repeating sequences (130, 75, and 26 repeats) as described under
``Materials and Methods'' except using the DNAs and the
conditions designated at the top. The pausings in the
beginning of the bottom strand are shown. The arrows designate
the beginning of the triplet repeats.
Also, a second type of pausing, albeit weak, was observed. The DNA polymerases paused throughout the CTG repeat sequences with a base pair periodicity of 12 + 9 (i.e. an average of 10.5 bp) (lane 1). This phenomenon was more pronounced with preincubation at 50 than at 37 °C.
In the top strand, Sequenase (sequencing lanes G, A, T, and C) paused at the beginning region of the insert and at the 30th CTG triplet (indicated by the brackets on the right side of Fig. 1A). However, the pausing was abolished in sequencing experiments by preincubation of the DNA at 70 °C, indicating that the observed bands were pause sites of the DNA polymerases and not due to the sequencing reactions. The DNA polymerase I (Klenow Fragment) (lane 4) paused strongly at about the 30th CTG triplet on the top strand.
Fig. 1B summarizes the overall pausing sites; the 3`-half of both strands could not be investigated since they were too far from the primer binding sites to be studied by these methods.
That DNA polymerases are arrested at specific locations in the CTG repeat sequences suggests that DNA secondary structures exist at CTG repeat sequences that inhibit polymerase progression. Also, the properties of secondary structures in the proximal region and the distal region (the 37th triplet from the 5`-end of the bottom strand and the 30th triplet from the 5`-end of the top strand) seem to be different from each other, since the Klenow fragment and the Sequenase polymerases paused in different manners.
Studies were also performed with the human DNA polymerase
(collaboration with Drs. R. K. Singhal and S. H. Wilson, University of
Texas Medical Branch, Galveston, TX). As found for the two prokaryotic
polymerases, the human enzymes also paused (Fig. 1C) at
the vector-triplet repeat interface, but the principal pause began at
33 repeats away on the bottom strand. Hence the pausing behavior
was found for prokaryotic as well as eukaryotic DNA polymerases. The
majority of the studies described below were conducted with the model
prokaryotic enzymes due to convenience and to the reaction
similarities.
Figure 2:
Effect of CTG repeat length on the
pausings. Plasmids containing various CTG repeat sequences were
constructed as described under ``Materials and Methods.''
Primer extensions of the bottom strand of plasmids containing 50, 80,
100, 130, 140, and 180 repeats of CTG sequences, respectively, were
performed with the Klenow fragment and M13 reverse primer(-24).
The amounts of newly synthesized DNA at the major pausing site (triplet
repeats numbered 88-94) were measured. The amount of pausing is
expressed relative to the amount of (CTG) as 100.
Reproducibility is 9%. The rectangle represents pRW3234, which
contains two (CTG)
repeats flanking a 100-bp nontriplet
repeat interruption sequence.
Figure 3:
Pausings require double-stranded CTG
repeats but not supercoiling. A, primer extension experiments
were performed for single-stranded (S-S) and
double-stranded (D-S) pRW3111. T7 and SP6 promoter
primers were used. S, Sequenase; K, Klenow. The arrows designate the beginning of the triplet repeats. B, various topoisomers of pRW1981 were used for primer
extension. Lanes 1-4 have average supercoil density 0,
-0.02, -0.04, and -0.06, respectively. Lane 6 is linear DNA that was prepared by digestion with SacI,
and lane 5 is a mixture of linear DNA (1 µg) and
supercoiled DNA (average = -0.053) (1 µg). The arrow represents the beginning site of the multiple CTG
insert. C, pRW3219, which contains (CTG)
, was
denatured with 0.2 N NaOH in the presence of
P-labeled primer and renatured by adding 0.3 M sodium acetate. The DNAs were resuspended in a buffer containing
10 mM Tris
Cl (pH 7.9), 10 mM MgCl
,
1 mM dithiothreitol, and 50 or 100 mM NaCl and
incubated for 10 min at 37 °C. 10 units of EcoRI was added
to the samples as indicated at the top and then incubated further for
20 min. 5 or 0.5 units of Sequenase or DNA polymerase I (Klenow
fragment) were added and incubated for 10 min. After the reactions were
stopped, the DNAs were run on a 12% polyacrylamide
gel.
Usually, non B-DNA structures
are underwound relative to the normal
B-DNA(17, 20, 37) . Hence, negative
supercoiling stabilizes unusual DNAs such as cruciforms, Z-DNA, and
triplexes(15, 17, 20, 37) . We
examined the pausing of the Klenow fragment with various topoisomers of
pRW1981. Fig. 3B shows that the DNA polymerase paused
at all supercoil densities (average , 0 to -0.06) (lanes 1-4), indicating that negative supercoil density
does not have a major influence on the pausing. However, linear DNA
does not display the same pausing pattern as circular DNAs, whereas a
mixture of linear DNA and circular DNA shows half the amount of pausing
at the same loci as in the closed circular DNA. The reason why linear
DNA shows no pausing after treatment with heat and alkali is probably
due to strand dissociation, which disfavors renaturation in the
annealing step compared with the circular DNA.
The extension of primers by DNA polymerases with closed circular DNAs may generate waves of positive supercoils in front of the synthesizing polymerases, as shown in the translocation of RNA polymerase along duplex DNA (twin-supercoiled domain)(38, 39) . The accumulated torsional stress may cause the pausing of DNA polymerases. To test this, we performed a primer extension experiment for DNA that was linearized with EcoRI after the supercoiled pRW3219 and the primer were annealed and preincubated for 10 min at 37 °C. Fig. 3C shows that the DNA polymerases paused in DNAs linearized after the preincubation with the same pattern as the supercoiled DNA (no EcoRI treatment); the bands at the tops of the gels (top arrow) (lanes 1-4 and 7-10) indicate that a high percentage of the DNAs were linearized with EcoRI. Also, Fig. 3C shows that pausings in the distal region are increased, although not dramatically, in the presence of 100 mM NaCl compared with 50 mM NaCl. In addition, the DNA polymerases pause more at the beginning region (bottom arrow) of the CTG repeats with low concentration of the enzymes and more in the distal region (37th CTG) (bracket) with high concentration. Hence, these results indicate that the pausing was not induced by an accumulated torsional stress resulting from the progression of the DNA polymerases.
Figure 4:
The
pausing sites are determined by the location of the primer binding
site. Primer extension experiments were performed with various primers
whose 5`-ends have various distances from the first CTG repeats in
pRW1981 and pRW3262. pRW3262 was used in place of pRW1981; pRW3262 is
pRW1981 with a deletion 38 bp (the left side of the triplets) and has
(CTG) in the BamHI site instead of the HincII site. Thus, the primer(-20) was effectively moved
26 bp closer to the (CTG)
tract. The primers were
GTAAAACGACGGCCAGT(-20), AACAGCTATGACCATG(-21), and
CCTGGCCGAAAGAAAT (p19). Other experimental details were as described
under ``Materials and Methods.'' A, lanes 1 and 2, 5`-end of the primer is 70 bp distance from the
first CTG triplets; lanes 3 and 4, 63 bp; lanes 5 and 6, 89 bp. The arrows designate the beginning
of the triplet repeats. B, lanes 1 and 2, 89
bp distance; lanes 3 and 4, 36 bp. S,
Sequenase; K, DNA polymerase I (Klenow Fragment). The arrow represents the beginning site of the multiple CTG
insert. C, summary of relationships between the locations of
5`-ends of primers (open bars) and pause sites by the Klenow
fragment of DNA polymerase I (filled
bars).
A summary of these observations is shown in Fig. 4C. The extension of primers with a longer distance from the initiation site of the CTG repeat stopped at triplets located farther from the initiation site; the distance between the pausing site in the distal region and the first CTG is about 20 bp longer than the distance between the first CTG and the 5`-end of primers. This phenomenon occurs in both strands, although the extent of the pausings seemed to be different. The lengths of the filled bars in Fig. 4C represent an approximate visual estimate of the amounts of pausing of the DNA polymerase I (Klenow fragment) occurring in the distal regions; the closest primer (e) showed the least amount of pausing. These results imply that the length of DNA synthesized influences the conformation that causes the pausing and hence the location of the pause sites.
Figure 5: Capacity of dITP or 7-deaza-dGTP to replace dGTP. A, dGTP (G) as a substrate was replaced by dITP (I) or 7-deaza-dGTP (D) in primer extension experiments. The preincubation was done at 37 or 50 °C. The arrows represent the beginning site of the multiple CTG insert. B, primer extensions were performed with dNTPs containing either dGTP or dITP as substrates for plasmids containing 130, 26, or 17 CTG repeats. The preincubation was done at 37 °C. The arrow represents the beginning site of the multiple CTG insert.
The progression of the DNA polymerase I (Klenow
fragment) in the presence of dITP was arrested at the five dIs at the
beginning region of the CTG repeats. Although the pattern is quite
different from that of dGTP, the pausing is length-dependent as also
occurred in the primer extension with dGTP. The progression of the
Klenow fragment of DNA polymerase I with dITP (Fig. 5B)
was arrested in the beginning region of (CTG) (lane
4) but not with (CTG)
(lane 5) and
(CTG)
(lane 6).
Our results indicate that the substitution of 7-deaza-dGTP for dGTP does not change the pattern of pausing, whereas the substitution of dITP does, suggesting that the conformation(s) that causes the pausings is not a tetraplex or a triplex in which N-7 atom is involved. The details of the structure remain to be elucidated.
Figure 6: DNA polymerases pause in long CGG triplet repeat sequences. Primer extension of pRW3306 was performed using the M13 primer(-40). Lanes 1-4, dideoxy termination reactions using Sequenase, G, A, T, and C, respectively. Lane 5, primer extension with the DNA polymerase I (Klenow fragment). DNA was preincubated at 37 °C for 10 min before the extension reaction. The arrow on the left side indicates the CGG repeat start site. Polymerase pausing was observed at the 29-31st CGG repeats as indicated on the right side of lane 5.
Our in vitro experiments show aberrant DNA synthesis in massive CTG and CGG repeat sequences; pausings of DNA polymerases occur at specific loci. The distance between the primer binding site and the initiation of the triplet repeats determines the pause location. The pausing is dependent on the length of the contiguous CTG insert and is temperature-dependent. These data suggest that DNA secondary structures, probably stabilized by H-bonds, exist in long CTG and CGG repeat sequences that inhibit DNA polymerase movement.
Our
experiments indicate that CTG triplet repeats have unorthodox
properties. If these features are present in vivo in human
cells, they may play a role in the expansion and the subsequent
molecular pathology of myotonic dystrophy, Huntington's disease,
Kennedy's disease, spinocerebellar ataxia I, and
dentatorubral-pallidoluysian atrophy. Other studies ()also
show the non-B properties of longer CTG repeats that are different from
previously characterized unusual DNA conformations (15, 17, 20, 37) (left-handed Z-DNA,
cruciforms, triplexes, nodule DNA, etc.) and formation of tetraplexes
for CGG oligomers(40) .
DNA polymerases as well as chemical probes and physical analyses have been used to study unusual DNA structures(30, 31, 32) . Mirkin and co-workers (32) showed that DNA polymerases pause at intramolecular triplex forming sequences and that the pause sites are dependent on the type of triplex isomer(32) . In addition, polymerases may be able to recognize sites such as smoothly curved DNAs that chemical probes cannot detect since there are no perturbed (unpaired) bases. Thus, DNA polymerases may recognize non-B DNA structures that preexist or that are formed in the course of DNA polymerization. The observed pausing of DNA polymerases in the CTG repeats is apparently caused by a non-B DNA conformation. Since no pausing was found in single-stranded triplet repeats, double-stranded DNA must be a prerequisite for the structure. However, it is not clear whether the structure preexisted in the duplex DNA, resulted from a misalignment created after the denaturation-renaturation process, and/or was formed by the process of polymerase elongation.
Expansion
and contraction of CTG triplet repeats occur in E.
coli(35) . Expansion may be related to the pausing of DNA
polymerase by the following explanation. An insertion and a deletion of
a few bases by a slippage mechanism have been
reported(41, 42) . As proposed
previously(15, 43) , slippage might be promoted by an
``idling polymerase'' at a strong block such as a DNA
structure or bound proteins such as nucleosomes; the result could be
multiple slippages, which causes the expansion of larger sequences.
Other studies showed that nucleosomes are preferentially positioned at
CTG triplets and that this behavior is more pronounced for longer
repeats (34, 44) . We proposed that the non-B
conformation adopted by the CTG repeats is a toroid; this
structure would be expected to serve as a superior histone binding
site, as observed(34, 44) . This unorthodox DNA
structure may block polymerase movement and transiently cause the
dissociation of the template and the newly synthesized strand (Fig. 7). A primer reassociation in a misaligned configuration
may generate a hairpin structure in the newly synthesized strand and
hence elicit expansions.
Figure 7: A model for the relationship between the pausing of DNA polymerases and expansion of triplet repeat sequences as mediated by primer realignment. The third process (Expansion) is a multistep reaction. n is the number of triplet repeats and n+s represents the repeat numbers expanded by the slipped (s) increment.
The pausing phenomenon is general for both eukaryotic and prokaryotic polymerases. The replication of eukaryotic chromosomes requires the participation of a multi-component ensemble that includes DNA polymerases, helicases, ligases, and single-stranded binding proteins. These proteins may influence the pausing of polymerases. Whereas further studies are required to determine the relationship of these observations to the behavior in living human cells, our studies provide a basis for exploring the genetic and biochemical mechanisms of expansion and deletion in a well characterized simple organism as related to expansion in human neuromuscular genetic diseases.