(Received for publication, October 10, 1995; and in revised form, November 29, 1995)
From the
The relative ability of the 10 triplet repeat sequences to be expanded in Escherichia coli was determined. Surprisingly, CTG tracts are expanded at least 8 times more frequently than any of the other nine triplets. Low levels of expansion were found also for CGG, GTG, and GTC. Thus, the structure of the CTG repeats and/or their utilization by the DNA synthetic systems in vivo must be quite different from the other triplets. These data further validate this genetically defined system for elucidating molecular mechanisms of expansion and may explain why most triplet repeat hereditary neuromuscular and neurodegenerative disease genes contain CTG repeats.
The molecular basis of several human genetic disorders involving CTG triplet repeat expansions including myotonic dystrophy(1, 2) , Kennedy's disease(3) , spinocerebellar ataxia type I(4) , Huntington's disease(5) , dentatorubral-pallidoluysian atrophy/Haw River syndrome(6, 7, 8) , and Machado-Joseph disease (9) has been partially established. The CTG triplet repeats may be within the genes or in the 3` untranslated regions. The CTG expansion is associated with anticipation whereby the penetrance of the disease is increased in successive generations. Thus, this expansion process, possibly due to the slippage of complementary DNA strands, is a novel type of mutation and shows non-Mendelian genetic properties(10, 11) . Furthermore, the genetic instability of repetitive microsatellite sequences (12) has been implicated with certain cancers(13, 14, 15, 16) .
Whereas the lengths of CTG repeats have been correlated with several diseases, expansion of only one of the other triplet repeats (CGG) has been linked with other diseases, i.e. the fragile-X and fra-E mental retardation syndromes, which are inherited as X-linked dominant traits(17, 18) . None of the other eight possible triplet repeats has been associated with hereditary disease genes. Hence, CTG is the most frequently observed triplet repeat to date.
Since an understanding of the expansion of CTG repeats is critical for elucidating the etiology of these syndromes, we (19) have established an Escherichia coli system for studying the molecular mechanisms of this process. The frequency of expansions versus deletions is strongly influenced by the direction of replication in vivo across these sequences; a model was proposed based on strand slippage coupled with non-classical DNA structures for explaining these behaviors(19) .
This
laboratory has been evaluating the physical, biochemical, and genetic
properties of all 10 repeating triplet sequences. More than 175
plasmids containing the 10 triplet repeat sequences in lengths ranging
from 4 to 300 repeats have been prepared and sequenced. ()In
this study, we compare the relative capacity of the 10 repeating
triplet sequences to be expanded in vivo in our E. coli expansion system(19) . Herein, we demonstrate that the CTG
triplet repeat is the most dominant expanded product of all ten triplet
repeats. Moreover, direct comparative analyses between CTG and the
sequence isomer GTC repeats reveal that the CTG repeat is
preferentially expanded.
For this study, a mixture of plasmids containing all 10 triplet repeats (Table 1) were digested with SacI and HindIII and the products were analyzed on an agarose gel; all inserts were cloned into the polylinker of pUC19. In the case of the shorter triplet repeats (Table 1), only fragments containing 45-62 repeats were observed along with the vector. However, areas of the gel that were larger (by approximately 50 bp) than this fragment that did not contain visually detectable DNA were eluted and the putative fragments were recloned back into pUC19. The sizes of the ``recloned'' inserts were determined by dideoxy sequencing and by restriction enzyme analysis.
When all 10 of the plasmids containing the shorter triplet repeats (Table 1) were mixed in equimolar amounts and investigated by this method, we surprisingly found (Fig. 1A) that virtually all of the colonies contained CTG repeats (38 of the 43 colonies investigated, 88.4%). In addition, a smaller number of these colonies contained CGG (7%) and GTC (4.7%) repeats (Fig. 1A). Expansion of the other repeats was not observed. As a control, we eluted the DNA from the major observable band on the gel at the expected repeat size (containing all 10 triplet sequences) and determined the relative ability of each of the 10 triplet repeat inserts to be recloned; the inset in Fig. 1A shows that all triplet repeats were recloned in approximately equal proportion indicating that the frequencies of ligation and/or transformation were not influenced by the triplet repeat sequences.
Figure 1: Distribution of expansion products from recloning of the plasmid inserts in the 10 shorter triplet repeats (Table 1) (A) and the nine shorter triplet repeats without the CTG repeat (B). The recloning of the expansion regions of the gel and of the expected size of the inserts were described under ``Materials and Methods.'' An expansion product is defined as a clone that contains at least 48 bp more than the average length starting triplet repeat insert; the range of the expanded products was from 48 bp to 150 bp. The number of expanded colonies found were: CTG, 38; CGG, 3; and GTC, 2 (panel A); and CGG, 10; GTG, 9; GTC, 11; and AAG, 1 (panel B). The filled bars indicate the frequency of the expanded products, whereas the hatched bars (insets) indicate the frequency of the products with the expected lengths (controls).
Other studies were conducted in the absence of the CTG triplet repeat (Fig. 1B) to more rigorously analyze the extent of expansion of the other nine triplet repeat sequences. Fig. 1B shows that CGG (32.3%), GTG (29.0%), and GTC (35.5%) were also expanded, whereas the other repeats were expanded to the extent of 3% or less. The reason for observing the expansion of GTG in this study but not in the experiment shown in Fig. 1A is uncertain. Again, a control experiment of recloning the principal observable band at the expected length of inserts (Fig. 1B, inset) showed that all of the nine triplet repeat sequences could be recloned, eliminating the potential artifact of problems in ligation, transformation, or other steps during the process.
Additionally, studies were conducted with the triplet repeats containing 68-81 repeats (Table 2). Similar results were found as for the shorter triplet repeats (Fig. 1A). Although the expansion frequency is larger with increasing repeat numbers(19) , GAT, GTA, and TTA repeat expansions were still not observed.
To further evaluate the extraordinary finding that CTG is the dominant expansion product, we performed direct comparative analyses of this triplet repeat with its sequence isomer, GTC. Table 3shows that both the 57-mer and the 71-mer of CTG were expanded much more readily than similar lengths of GTC. Both triplet repeats have the same base composition but are sequence-isomeric. Hence, the type or stability of the structure of the CTG repeats or the manner in which this sequence is utilized by the DNA synthetic enzymes in vivo must be very different from the GTC repeat.
GTC repeats have different stacking free energies from CTG
repeats, which result in less stable single-strand hairpin
structures(22, 23) . We proposed that expansion occurs in vivo during DNA replication due to the formation of a
region of single-stranded CTG repeats(19) . This hypothesis was
based on the relative rate of expansion versus deletion as a
function of the direction of DNA replication (19) as well as on
oligonucleotide stability studies as determined by NMR
spectroscopy(24, 25) . CTG oligomers form antiparallel
helical duplexes with the formation of TT base pairs, whereas CAG
oligomers form only metastable duplexes(25) . Also, computer
stability analyses (22, 24) infer that single-stranded
CTG repeats as well as CGG repeats form antiparallel hairpin structures
that are more stable than any of the other repeats. Our model (19) requires that, during the lagging strand replication, a
stable hairpin may be formed in the newly synthesized DNA strand to
achieve expansion, whereas a stable hairpin structure in the lagging
strand template is likely to produce deletions.
A consideration of
the effect of interruptions (polymorphisms) in the repeats is important
since they have been statistically correlated with the stability of
triplets; the loss of interruptions increases the genetic
instability(26, 27) . These interruptions consisted of
AGG repeats within the CGG triplets for the FMR-1 gene and CAT
polymorphorisms in the CAG repeats for the SCA1 gene. The
majority of the triplet repeats studied herein (Table 1) did
contain interruptions. We have evaluated the effect of interruptions on
the frequency of expansion; ()comparison of an uninterrupted
(CTG)
with a sequence of identical length that contains
one CTA interruption (at the 28th repeat), derived from the myotonic
dystrophy gene, showed that the uninterrupted sequence was expanded 5
times more frequently than the interrupted sequence. Thus,
interruptions decrease the frequency of expansion in these
investigations. For this study, we used uninterrupted CTG repeats for
the experiments shown in Fig. 1A as well as Table 3, while the GTC repeats have one interruption in the
center of this tract (Fig. 1A); alternatively, the GTC
repeats used in Table 3contain no interruptions. Both sets of
data indicated no influence of these interruptions on the expansion
frequencies, indicating that the differences in the frequencies
observed between the expansion of the CTG and GTC repeats (Table 3) are caused by the sequences and/or structures rather
than the interruptions per se.
At present, the genetic
functions responsible for the expansions and deletions are uncertain;
the establishment of a genetic system ()will be important
for these studies.
To date, only two triplet repeat sequences, CTG (CAG) and CGG (CCG), have been associated with human genetic diseases due to their massive expansions(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 17, 18) . Herein, we have shown that GTG and GTC repeats also have the capacity to be expanded, at least in E. coli. Although other long triplet repeats have been found in the human genome (28, 29) in addition to CTG and CGG repeats, their expansion has not been correlated with human hereditary diseases. Since GTG repeats have similar base composition characteristics to CGG and CTG repeats(30) , our determinations make this a likely candidate for disease involvement. The close relationship between our studies in E. coli (expansion-deletion data(19) , DNA polymerase pausing(31) , and mismatch repair(32) ) and eucaryotic observations (see Refs. 19, 31, and 32) infers that the dominant expansion of CTG described herein may reveal significant molecular insights into human systems.
The surprising discovery that
CTG triplet repeats are the dominant expansion products in E.
coli, as
found(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 17, 18) in
clinical samples from human hereditary diseases, suggests the
importance of DNA structural properties. Other investigations have
revealed that duplex CTG and CGG repeats have unorthodox properties,
including nucleosome assembly(33) , their capacity to cause DNA
polymerases to pause within the repeat sequences(31) , as well
as conformational features as revealed by helical repeat and
polyacrylamide gel migrations(34) . ()The further
elucidation of the involvement of DNA conformational features in the
etiology of human hereditary neuromuscular and neurodegenerative
diseases will be intriguing.