©1996 by The American Society for Biochemistry and Molecular Biology, Inc.
CTG Triplet Repeats from Human Hereditary Diseases Are Dominant Genetic Expansion Products in Escherichia coli(*)

(Received for publication, October 10, 1995; and in revised form, November 29, 1995)

Keiichi Ohshima Seongman Kang (§) Robert D. Wells (¶)

From the Institute of Biosciences and Technology, Center for Genome Research, Texas A & M University, Texas Medical Center, Houston, Texas 77030-3303

ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

The relative ability of the 10 triplet repeat sequences to be expanded in Escherichia coli was determined. Surprisingly, CTG tracts are expanded at least 8 times more frequently than any of the other nine triplets. Low levels of expansion were found also for CGG, GTG, and GTC. Thus, the structure of the CTG repeats and/or their utilization by the DNA synthetic systems in vivo must be quite different from the other triplets. These data further validate this genetically defined system for elucidating molecular mechanisms of expansion and may explain why most triplet repeat hereditary neuromuscular and neurodegenerative disease genes contain CTG repeats.


INTRODUCTION

The molecular basis of several human genetic disorders involving CTG triplet repeat expansions including myotonic dystrophy(1, 2) , Kennedy's disease(3) , spinocerebellar ataxia type I(4) , Huntington's disease(5) , dentatorubral-pallidoluysian atrophy/Haw River syndrome(6, 7, 8) , and Machado-Joseph disease (9) has been partially established. The CTG triplet repeats may be within the genes or in the 3` untranslated regions. The CTG expansion is associated with anticipation whereby the penetrance of the disease is increased in successive generations. Thus, this expansion process, possibly due to the slippage of complementary DNA strands, is a novel type of mutation and shows non-Mendelian genetic properties(10, 11) . Furthermore, the genetic instability of repetitive microsatellite sequences (12) has been implicated with certain cancers(13, 14, 15, 16) .

Whereas the lengths of CTG repeats have been correlated with several diseases, expansion of only one of the other triplet repeats (CGG) has been linked with other diseases, i.e. the fragile-X and fra-E mental retardation syndromes, which are inherited as X-linked dominant traits(17, 18) . None of the other eight possible triplet repeats has been associated with hereditary disease genes. Hence, CTG is the most frequently observed triplet repeat to date.

Since an understanding of the expansion of CTG repeats is critical for elucidating the etiology of these syndromes, we (19) have established an Escherichia coli system for studying the molecular mechanisms of this process. The frequency of expansions versus deletions is strongly influenced by the direction of replication in vivo across these sequences; a model was proposed based on strand slippage coupled with non-classical DNA structures for explaining these behaviors(19) .

This laboratory has been evaluating the physical, biochemical, and genetic properties of all 10 repeating triplet sequences. More than 175 plasmids containing the 10 triplet repeat sequences in lengths ranging from 4 to 300 repeats have been prepared and sequenced. (^1)In this study, we compare the relative capacity of the 10 repeating triplet sequences to be expanded in vivo in our E. coli expansion system(19) . Herein, we demonstrate that the CTG triplet repeat is the most dominant expanded product of all ten triplet repeats. Moreover, direct comparative analyses between CTG and the sequence isomer GTC repeats reveal that the CTG repeat is preferentially expanded.


MATERIALS AND METHODS

Plasmids

The inserts with triplet repeats were cloned into the HincII site of pUC19 except for pRW3293 and pRW3292, which had their inserts in the BamHI site of pUC19. pRW3292 and pRW3293 were deletion products from pRW3262, which was constructed as follows; pRW1981 (19) was digested with Sau3AI, and the fragment containing (CTG) was recloned into BamHI site of pUC19(31) . pRW3106 was a deletion product from pRW3311, which contains 81 repeats of CGG from the FMR-1 gene. (^2)pRW3144 and pRW3148 were produced by deletion from pRW3143, which contains 90 repeats of TTA derived from Trypanosoma brucei. (^3)pRW3464 and pRW3465 were constructed by expansion of pRW3413, which contains (GTC). Other triplet repeats were prepared by cloning synthetic oligonucleotides.^3 All plasmid DNAs were isolated from E. coli HB101 by alkaline lysis(20) .

Competition Analysis for Expansion

Purified plasmids (0.5 µg each) were mixed and digested with SacI and HindIII. After separation of the vector from the band containing the mixture of 10 fragments on a 1.8% agarose gel, regions of the gel above the normal insert size (by 50 bp) (^4)up to about 2.5 times larger than the normal size were eluted as well as regions with the insert size using the phenol method and then ethanol-precipitated with 500 ng of an oligomer as a carrier. The eluted DNA fragments were ligated into SacI/HindIII-digested pUC19 and transformed into E. coli DH10B by electroporation (transformation efficiency: >1 times 10^9 transformants/µg)(21) . The transformants were grown in LB media containing ampicillin (150 µg/ml), and plasmids were isolated by the alkaline lysis method(20) . To amplify the expanded components, the above experiments were repeated. After transformation, the cells were spread on ampicillin plates. Colonies were picked from the plates and grown in LB media (containing 75 µg/ml ampicillin). The plasmids were isolated, and the population of triplet repeats was determined by restriction enzyme digestions and/or DNA sequencing.

E. coli Strains

HB101 strain is FmcrB mrr hsdS20(r(B), m(B)) recA13 supE44 galK2 lacY1 proA2 rpsL20(Sm^r) xyl5 leu mtl1. DH10B is FmcrA Delta(mrr-hsdRMS-mcrBC) 80dlacZDeltaM15 DeltalacX74 deoR recA1 araDelta139 Delta(ara, leu)7697 galU gallK rpsL endA1 nupG.


RESULTS AND DISCUSSION

For this study, a mixture of plasmids containing all 10 triplet repeats (Table 1) were digested with SacI and HindIII and the products were analyzed on an agarose gel; all inserts were cloned into the polylinker of pUC19. In the case of the shorter triplet repeats (Table 1), only fragments containing 45-62 repeats were observed along with the vector. However, areas of the gel that were larger (by approximately 50 bp) than this fragment that did not contain visually detectable DNA were eluted and the putative fragments were recloned back into pUC19. The sizes of the ``recloned'' inserts were determined by dideoxy sequencing and by restriction enzyme analysis.



When all 10 of the plasmids containing the shorter triplet repeats (Table 1) were mixed in equimolar amounts and investigated by this method, we surprisingly found (Fig. 1A) that virtually all of the colonies contained CTG repeats (38 of the 43 colonies investigated, 88.4%). In addition, a smaller number of these colonies contained CGG (7%) and GTC (4.7%) repeats (Fig. 1A). Expansion of the other repeats was not observed. As a control, we eluted the DNA from the major observable band on the gel at the expected repeat size (containing all 10 triplet sequences) and determined the relative ability of each of the 10 triplet repeat inserts to be recloned; the inset in Fig. 1A shows that all triplet repeats were recloned in approximately equal proportion indicating that the frequencies of ligation and/or transformation were not influenced by the triplet repeat sequences.


Figure 1: Distribution of expansion products from recloning of the plasmid inserts in the 10 shorter triplet repeats (Table 1) (A) and the nine shorter triplet repeats without the CTG repeat (B). The recloning of the expansion regions of the gel and of the expected size of the inserts were described under ``Materials and Methods.'' An expansion product is defined as a clone that contains at least 48 bp more than the average length starting triplet repeat insert; the range of the expanded products was from 48 bp to 150 bp. The number of expanded colonies found were: CTG, 38; CGG, 3; and GTC, 2 (panel A); and CGG, 10; GTG, 9; GTC, 11; and AAG, 1 (panel B). The filled bars indicate the frequency of the expanded products, whereas the hatched bars (insets) indicate the frequency of the products with the expected lengths (controls).



Other studies were conducted in the absence of the CTG triplet repeat (Fig. 1B) to more rigorously analyze the extent of expansion of the other nine triplet repeat sequences. Fig. 1B shows that CGG (32.3%), GTG (29.0%), and GTC (35.5%) were also expanded, whereas the other repeats were expanded to the extent of 3% or less. The reason for observing the expansion of GTG in this study but not in the experiment shown in Fig. 1A is uncertain. Again, a control experiment of recloning the principal observable band at the expected length of inserts (Fig. 1B, inset) showed that all of the nine triplet repeat sequences could be recloned, eliminating the potential artifact of problems in ligation, transformation, or other steps during the process.

Additionally, studies were conducted with the triplet repeats containing 68-81 repeats (Table 2). Similar results were found as for the shorter triplet repeats (Fig. 1A). Although the expansion frequency is larger with increasing repeat numbers(19) , GAT, GTA, and TTA repeat expansions were still not observed.



To further evaluate the extraordinary finding that CTG is the dominant expansion product, we performed direct comparative analyses of this triplet repeat with its sequence isomer, GTC. Table 3shows that both the 57-mer and the 71-mer of CTG were expanded much more readily than similar lengths of GTC. Both triplet repeats have the same base composition but are sequence-isomeric. Hence, the type or stability of the structure of the CTG repeats or the manner in which this sequence is utilized by the DNA synthetic enzymes in vivo must be very different from the GTC repeat.



GTC repeats have different stacking free energies from CTG repeats, which result in less stable single-strand hairpin structures(22, 23) . We proposed that expansion occurs in vivo during DNA replication due to the formation of a region of single-stranded CTG repeats(19) . This hypothesis was based on the relative rate of expansion versus deletion as a function of the direction of DNA replication (19) as well as on oligonucleotide stability studies as determined by NMR spectroscopy(24, 25) . CTG oligomers form antiparallel helical duplexes with the formation of TbulletT base pairs, whereas CAG oligomers form only metastable duplexes(25) . Also, computer stability analyses (22, 24) infer that single-stranded CTG repeats as well as CGG repeats form antiparallel hairpin structures that are more stable than any of the other repeats. Our model (19) requires that, during the lagging strand replication, a stable hairpin may be formed in the newly synthesized DNA strand to achieve expansion, whereas a stable hairpin structure in the lagging strand template is likely to produce deletions.

A consideration of the effect of interruptions (polymorphisms) in the repeats is important since they have been statistically correlated with the stability of triplets; the loss of interruptions increases the genetic instability(26, 27) . These interruptions consisted of AGG repeats within the CGG triplets for the FMR-1 gene and CAT polymorphorisms in the CAG repeats for the SCA1 gene. The majority of the triplet repeats studied herein (Table 1) did contain interruptions. We have evaluated the effect of interruptions on the frequency of expansion; (^5)comparison of an uninterrupted (CTG) with a sequence of identical length that contains one CTA interruption (at the 28th repeat), derived from the myotonic dystrophy gene, showed that the uninterrupted sequence was expanded 5 times more frequently than the interrupted sequence. Thus, interruptions decrease the frequency of expansion in these investigations. For this study, we used uninterrupted CTG repeats for the experiments shown in Fig. 1A as well as Table 3, while the GTC repeats have one interruption in the center of this tract (Fig. 1A); alternatively, the GTC repeats used in Table 3contain no interruptions. Both sets of data indicated no influence of these interruptions on the expansion frequencies, indicating that the differences in the frequencies observed between the expansion of the CTG and GTC repeats (Table 3) are caused by the sequences and/or structures rather than the interruptions per se.

At present, the genetic functions responsible for the expansions and deletions are uncertain; the establishment of a genetic system (^6)will be important for these studies.

To date, only two triplet repeat sequences, CTG (CAG) and CGG (CCG), have been associated with human genetic diseases due to their massive expansions(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 17, 18) . Herein, we have shown that GTG and GTC repeats also have the capacity to be expanded, at least in E. coli. Although other long triplet repeats have been found in the human genome (28, 29) in addition to CTG and CGG repeats, their expansion has not been correlated with human hereditary diseases. Since GTG repeats have similar base composition characteristics to CGG and CTG repeats(30) , our determinations make this a likely candidate for disease involvement. The close relationship between our studies in E. coli (expansion-deletion data(19) , DNA polymerase pausing(31) , and mismatch repair(32) ) and eucaryotic observations (see Refs. 19, 31, and 32) infers that the dominant expansion of CTG described herein may reveal significant molecular insights into human systems.

The surprising discovery that CTG triplet repeats are the dominant expansion products in E. coli, as found(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 17, 18) in clinical samples from human hereditary diseases, suggests the importance of DNA structural properties. Other investigations have revealed that duplex CTG and CGG repeats have unorthodox properties, including nucleosome assembly(33) , their capacity to cause DNA polymerases to pause within the repeat sequences(31) , as well as conformational features as revealed by helical repeat and polyacrylamide gel migrations(34) . (^7)The further elucidation of the involvement of DNA conformational features in the etiology of human hereditary neuromuscular and neurodegenerative diseases will be intriguing.


FOOTNOTES

*
This work was supported by National Institutes of Health Grants GM52982, National Science Foundation Grant DMB-9103942, and a grant from the Robert A. Welch Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§
Present address: Laboratory of Genetic Disease Research, National Center of Human Genome Research, National Institutes of Health, Bldg. 49, Bethesda, MD 20892.

To whom correspondence should be addressed: Institute of Biosciences and Technology, Center for Genome Research, Texas A & M University, Texas Medical Center, 2121 W. Holcombe Blvd., Houston, TX 77030-3303. Tel.: 713-677-7651; Fax: 713-677-7689; :rwells{at}ibt.tamu.edu.

(^1)
K. Ohshima, M. Shimizu, R. Gellibolian, S. Kang, and R. D. Wells, unpublished data.

(^2)
M. Shimizu, R. Gellibolian, and R. D. Wells, manuscript in preparation.

(^3)
K. Ohshima, S. Kang, J. E. Larson, and R. D. Wells, manuscript in preparation.

(^4)
The abbreviation used is: bp, base pair(s).

(^5)
S. Kang, K. Ohshima, A. Jaworski, and R. D. Wells, manuscript in preparation.

(^6)
P. Parniewski, R. P. Bowater, R. R. Sinden, and R. D. Wells, unpublished work.

(^7)
R. Gellibolian, M. Shimizu, S. Amirhaeri, S. Kang, K. Ohshima, J. E. Larson, Y.-H. Fu, C. T. Caskey, B. A. Oostra, and R. D. Wells, manuscript in preparation.


ACKNOWLEDGEMENTS

We thank Drs. Adam Jaworski and Richard P. Bowater for helpful discussions and Ela Klysik for technical assistance.


REFERENCES

  1. Mahadevan, M., Tsilfidis, C., Sabourin, L., Shutler, G., Amemiya, C., Jansen, G., Neville, C., Narang, M., Barceló, J., O'Hoy, K., Leblond, S., Earle-MacDonald, J., de Jong, P. J., Wieringa, B., and Korneluk, R. G. (1992) Science 255, 1253-1255 [Medline] [Order article via Infotrieve]
  2. Fu, Y.-H., Pizzuti, A., Fenwick, R. G., Jr., King, J., Rajnarayan, S., Dunne, P. W., Dubel, J., Nasser, G. A., Ashizawa, T., de Jong, P., Wieringa, B., Korneluk, R., Perryman, M. B., Epstein, H. F., and Caskey, C. T. (1992) Science 255, 1256-1258 [Medline] [Order article via Infotrieve]
  3. La Spada, A. R., Wilson, E. M., Lubahn, D. B., Harding, A. E., and Fischbeck, K. H. (1991) Nature 352, 77-79 [CrossRef][Medline] [Order article via Infotrieve]
  4. Orr, H. T., Chung, M.-Y., Banfi, S., Kwiatkowski, T. J., Jr., Servadio, A., Beaudet, A. L., McCall, A. E., Duvick, L. A., Ranum, L. P. W., and Zoghbi, H. Y. (1993) Nat. Genet. 4, 221-226 [Medline] [Order article via Infotrieve]
  5. The Huntington's Disease Collaborative Research Group (1993) Cell 72, 971-983 [Medline] [Order article via Infotrieve]
  6. Koide, R., Ikeuchi, T., Onodera, O., Tanaka, H., Igarashi, S., Endo, K., Takahashi, H., Kondo, R., Ishikawa, A., Hayashi, T., Saito, M., Tomoda, A., Miike, T., Naito, H., Ikuta, F., and Tsuji, S. (1994) Nat. Genet. 6, 9-13 [Medline] [Order article via Infotrieve]
  7. Nagafuchi, S., Yanagisawa, H., Sato, K., Shirayama, T., Ohsaki, E., Bundo, M., Takeda, T., Tadokoro, K., Kondo, I., Murayama, N., Tanaka, Y., Kikushima, H., Umino, K., Kurosawa, H., Furukawa, T., Nihei, K., Inoue, T., Sano, A., Komure, O., Takahashi, M., Yoshizawa, T., Kanazawa, I., and Yamada, M. (1994) Nat. Genet. 6, 14-18 [Medline] [Order article via Infotrieve]
  8. Burke, J. R., Wingfield, M. S., Lewis, K. E., Roses, A. D., Lee, J. E., Hulette, C., Pericak-Vance, M. A., and Vance, J. M. (1994) Nat. Genet. 7, 521-524 [Medline] [Order article via Infotrieve]
  9. Kawaguchi, Y., Okamoto, T., Taniwaki, M., Aizawa, M., Inoue, M., Katayama, S., Kawakami, H., Nakamura, S., Nishimura, M., Akiguchi, I., Kimura, J., Narumiya, S., and Kakizuka, A. (1994) Nat. Genet. 8, 221-228 [Medline] [Order article via Infotrieve]
  10. Nelson, D. L. (1993) Genome Analysis: Genome Rearrangement and Stability (Davies, K. E., and Warren, S. T., eds) Vol. 7, pp. 1-24, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
  11. Wieringa, B. (1994) Hum. Mol. Genet. 3, 1-7 [Medline] [Order article via Infotrieve]
  12. Weber, J. L. (1990) Genomics 7, 524-530 [Medline] [Order article via Infotrieve]
  13. Aaltonen, L. A., Peltomäki, P., Leach, F. S., Sistonen, P., Pylkkänen, L., Mecklin, J.-P., Järvinen, H., Powell, S. M., Jen, J., Hamilton, S. R., Petersen, G. M., Kinzler, K. W., Vogelstein, B., and de la Chapelle, A. (1993) Science 260, 812-816 [Medline] [Order article via Infotrieve]
  14. Thibodeau, S. N., Bren, G., and Schaid, D. (1993) Science 260, 816-819 [Medline] [Order article via Infotrieve]
  15. Ionov, Y., Peinado, M. A., Malkhosyan, S., Shibata, D., and Perucho, M. (1993) Nature 363, 558-561 [CrossRef][Medline] [Order article via Infotrieve]
  16. Wooster, R., Cleton-Jansen, A.-M., Collins, N., Mangion, J., Cornelis, R. S., Cooper, C. S., Gusterson, B. A., Ponder, B. A. J., von Deimling, A., Wiestler, O. D., Cornelisse, C. J., Devilee, P., and Stratton, M. R. (1994) Nature Genet. 6, 152-156 [Medline] [Order article via Infotrieve]
  17. Verkerk, A. J. M. H., Pieretti, M., Sutcliffe, J. S., Fu, Y.-H., Kuhl, D. P. A., Pizzuti, A., Reiner, O., Richards, S., Victoria, M., F., Zhang, F., Eussen, B. E., van Ommen, G.-J. B., Blonden, L. A. J., Riggins, G. J., Chastain, J. L., Kunst, C. B., Galjaard, H., Caskey, C. T., Nelson, D. L., Oostra, B. A., and Warren, S. T. (1991) Cell 65, 905-914 [Medline] [Order article via Infotrieve]
  18. Knight, S. J. L., Flannery, A. V., Hirst, M. C., Campbell, L., Christodoulou, Z., Phelps, S. R., Pointon, J., Middleton-Price, H. R., Barnicoat, A., Pembrey, M. E., Holland, J., Oostra, B. A., Bobrow, M., and Davies, K. E. (1993) Cell 74, 127-134 [Medline] [Order article via Infotrieve]
  19. Kang, S., Jaworski, A., Ohshima, K., and Wells, R. D. (1995) Nature Genet. 10, 213-218 [Medline] [Order article via Infotrieve]
  20. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual , 2nd Ed., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
  21. Dower, W. J., Miller, J. F., and Ragsdale, C. W. (1988) Nucleic Acids Res. 16, 6127-6145 [Abstract]
  22. Mitas, M., Yu, A., Dill, J., Kamp, T. J., Chambers, E. J., and Haworth, I. S. (1995) Nucleic Acids Res. 23, 1050-1059 [Abstract]
  23. Yu, A., Dill, J., Wirth, S. S., Huang, G., Lee, V. H., Haworth, I. S., and Mitas, M. (1995) Nucleic Acids Res. 23, 2706-2714 [Abstract]
  24. Gacy, A. M., Goellner, G., Juranic, N., Macura, S., and McMurray, C. T. (1995) Cell 81, 533-540 [Medline] [Order article via Infotrieve]
  25. Smith, G. K., Jie, J., Fox, G. E., and Gao, X. (1995) Nucleic Acids Res. 23, 4303-4311 [Abstract]
  26. Chung, M.-Y., Ranum, L. P. W., Duvick, L. A., Servadio, A., Zoghbi, H. Y., and Orr, H. T. (1993) Nat. Genet. 5, 254-258 [Medline] [Order article via Infotrieve]
  27. Eichler, E. E., Holden, J. J. A., Popovich, B. W., Reiss, A. L., Snow, K., Thibodeau, S. N., Richards, C. S., Ward, P. A., and Nelson, D. L. (1994) Nat. Genet. 8, 88-94 [Medline] [Order article via Infotrieve]
  28. Beckmann, J. S., and Weber, J. L. (1992) Genomics 12, 627-631 [Medline] [Order article via Infotrieve]
  29. Lindblad, K., Zander, C., Schalling, M., and Hudson, T. (1994) Nat. Genet. 7, 124 [Medline] [Order article via Infotrieve]
  30. Han, J., Hsu, C., Zhu, Z., Longshore, J. W., and Finley, W. H. (1994) Nucleic Acids Res. 22, 1735-1740 [Abstract]
  31. Kang, S., Ohshima, K., Shimizu, M., Amirhaeri, S., and Wells, R. D. (1995) J. Biol. Chem. 270, 27014-27021 [Abstract/Free Full Text]
  32. Jaworski, A., Roche, W. A., Gellibolian, R., Kang, S., Shimizu, M., Bowater, R. P., Sinden, R. R., and Wells, R. D. (1995) Proc. Natl. Acad. Sci. U. S. A. 92, 11019-11023 [Abstract]
  33. Wang, Y.-H., Amirhaeri, S., Kang, S., Wells, R. D., and Griffith, J. D. (1994) Science 265, 669-671 [Medline] [Order article via Infotrieve]
  34. Chastain, P. D., II, Eichler, E. E., Kang, S., Nelson, D. L., Levene, S. D., and Sinden, R. R. (1995) Biochemistry 34, 16125-16131 [Medline] [Order article via Infotrieve]

©1996 by The American Society for Biochemistry and Molecular Biology, Inc.