©1996 by The American Society for Biochemistry and Molecular Biology, Inc.
Purification of Nuclear Proteins from Human HeLa Cells That Bind Specifically to the Unstable Tandem Repeat (CGG) in the Human FMR1 Gene (*)

(Received for publication, September 28, 1995)

Heidrun Deissler Annett Behn-Krappa (§) Walter Doerfler (¶)

From the Institute of Genetics, University of Cologne, D-50931 Köln, Germany

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

Autonomous expansions of trinucleotide repeats with the general structure 5`-d(CNG)-3` are associated with several human genetic diseases. We have characterized nuclear proteins binding to the unstable 5`-d(CGG)-3` repeat. Its expansion in the human FMR1 gene leads to the fragile X syndrome, one of the most frequent causes of mental retardation in human males. Electrophoretic mobility shift assays using nuclear extracts from several human and other mammalian cell lines and from primary human cells demonstrated specific binding to double-stranded DNA fragments containing only a 5`-d(CGG)-3` repeat or the repeat and flanking genomic sequences of the human FMR1 gene. Protein binding was inhibited by complete methylation of the trinucleotide repeat. The complex formed with crude nuclear extract apparently did not contain the human transcription factor Sp1 that binds to a characteristic GC-rich sequence. A 20-kDa protein involved in specific binding to the double-stranded 5`-d(CGG)-3` repeat was purified from HeLa nuclear extracts by DNA affinity chromatography.


INTRODUCTION

The autonomous, mechanistically still unexplained expansion of naturally occurring trinucleotide tandem repeats in the human genome has been recognized to be related to a number of serious human diseases: the fragile X syndrome (FRAXA locus), myotonic dystrophy, spinal and bulbar muscular atrophy, Huntington disease, mental retardation associated with the fragile site FRAXE on the human X chromosome, spinocerebellar ataxia type I, and dentatorubral-pallidoluysian atrophy (for reviews, see (1, 2, 3, 4, 5, 6, 7) ). Fragile sites, also known as folate-sensitive sites, are chromosomal aberrations that condense poorly during metaphase and can break under specific experimental conditions(8) . Several such sites have been identified on the X chromosome (FRAXA, FRAXE, FRAXF; (9) ) and on the autosomes 11 (FRA11B; (10) ) and 16 (FRA16A; (11) and (12) ). All fragile sites identified so far have been found to be associated with amplifications of the simple unstable tandem repeat 5`-d(CGG)-3`.

In the fragile X syndrome, the expanded tandem repeat 5`-d(CGG)-3` is located in the 5`-untranslated region (UTR) (^1)of the FMR1 gene in the human chromosomal location Xq27.3(13) . The number of repeat units varies between 6 and 54 in normal individuals, whereas more than 200 to up to 2000 repeat units can be found in affected individuals. Expansion of the repeat is accompanied by extensive methylation of the 5`-dCG-3` dinucleotides in the repeat (14, 15, 16) and is associated with transcriptional silencing of the FMR1 gene(17, 18, 19) . The function of the FMR1 protein is not yet known. The de novo methylation of the expanded trinucleotide repeat can be interpreted as a cellular defense against the invasion of foreign DNA or against unusual DNA structures(20, 21) .

The cellular mechanism of triplet repeat amplification is not understood. Interestingly, procaryotic DNA polymerases are capable of expanding short synthetic oligodeoxyribonucleotides containing simple tandem repeat sequences to DNA stretches of several thousand nucleotides in lengths even in the absence of template DNA(21, 22) . This finding suggests a slippage mechanism (23, 24) for the expansion of trinucleotide repeats presumably involving specific DNA-binding proteins. In transgenic mice for instance, a 5`-d(CAG)-3` repeat in the androgene receptor gene is stable upon transmission in the mouse, whereas it is expanding upon transmission in humans(25) . The authors suggest the involvement of sequence-specific, probably species-specific, DNA-binding proteins in the amplification reaction. Experiments with crude nuclear extracts from human HeLa cells indeed have shown binding of proteins to tandem repeat sequences(26) . In addition, an amplified 5`-d(CTG)-3` repeat is a preferential target for nucleosome assembly(27, 28) .

We have initiated experiments to characterize and purify human nuclear proteins that bind specifically to the double-stranded 5`-d(CGG)-3` repeat. Such proteins are present in a variety of human and other mammalian cell lines, as well as in primary cells.


EXPERIMENTAL PROCEDURES

Cells and Cell Lines

Human HeLa cells were purchased from Gesellschaft für Biotechnologische Forschung, Braunschweig, Germany. Human KB and Jurkat cells, BHK21 hamster cells, and fat head minnow (FHM) fish cells (29) were propagated by standard methods. Primary human lymphocytes were prepared and grown as reported(30) . Hamster cell line T637, an adenovirus type 12 (Ad12)-transformed BHK21 cell line, and the revertants of cell line T637, TR3, and TR12, with no detectable and about one genome equivalent of integrated Ad12 DNA, respectively(31) , were all grown in Dulbecco's medium supplemented with 10% fetal calf serum. Cell lines 293 and HEK12, human embryonic kidney cells transformed with parts of adenovirus type 5 (Ad5) (32) and Ad12(33) , respectively, cell lines A549 (human lung cancer) and C4/I (human cervix carcinoma), and a permanent cell line isolated from a human amnion tumor were gifts of the Institute of Cell Biology or of Molecular Biology (University of Essen, Medical School, Essen, Germany) as well as monkey Vero cells and the Ad12-transformed rat embryo fibroblast line REF12.

Oligodeoxyribonucleotides and DNA Fragments

Oligodeoxyribonucleotides were synthesized in an Applied Biosystems 381A DNA synthesizer. Hybridization to form double-stranded oligodeoxyribonucleotides was carried out in a polymerase chain reaction thermal cycler (Perkin Elmer Cetus) under the following conditions: 10 min at 95 °C, cooling to 70 °C for 60 min, 60 min at 70 °C, cooling to 58 °C for 60 min, 60 min at 58 °C, cooling to 17 °C for 90 min, 60 min at 17 °C. Oligodeoxyribonucleotides were subsequently purified by electrophoresis on polyacrylamide gels according to standard procedures. The compositions of the synthetic oligodeoxyribonucleotides used in this study and the abbrevations to designate them were summarized in Table 1.



DNA fragments were isolated from the plasmid pE5.1, which was a gift from Stephen T. Warren, Emory University School of Medicine, Atlanta, GA. This plasmid contained a 5`-d(CGG)-3` repeat in exon 1 of the human FMR1 gene and flanking genomic DNA sequences(13) . The plasmid was cut with NarI, and the excised 441-bp fragment was isolated. This fragment was subsequently treated with RsaI or BfaI to yield a 198-bp (198ds) or a 126-bp (126ds) fragment, respectively. To obtain the 248-bp (248ds) fragment, the plasmid was first cleaved with RsaI, and the resulting fragment was isolated and cut with DdeI. A restriction map illustrating the derivation of these fragments was presented in Fig. 1.


Figure 1: Survey of DNA fragments used in EMSA. DNA fragments were isolated from exon I of the human FMR1 gene cloned in the plasmid pE5.1. These fragments contained the double-stranded trinucleotide repeat 5`-d(CGG)-3` (gray boxes) flanked by genomic sequences of the 5`-untranslated region. The nucleotide numbers corresponded to the sequence published in GenBank (accession number X61378). The start codon AUG is located approximately 70 bp downstream of the trinucleotide repeat.



Oligodeoxyribonucleotides were 5`-end labeled with T4-polynucleotide kinase (New England Biolabs, Beverly, MA) and [-P]ATP. DNA fragments were labeled at the 3`-end with the Klenow fragment of DNA polymerase I (Boehringer Mannheim) and alpha-[P]dATP or alpha-[P]dCTP according to standard procedures. The specific activity of the DNA probes was 10^7 cpm/pmol.

Preparation of Nuclear Extracts and Purification of Proteins Binding to the Double-stranded 5`-d(CGG)(n)-3` Repeat

All procedures were carried out at 4 °C, unless stated otherwise. Nuclei were isolated from cells according to Dignam et al.(34) and Barrett et al.(35) by lysing the cells either in hypotonic buffer A (20 mM HEPES, 10 mM NaCl, 1 mM MgCl(2), 0.15 mM spermine, 0.1 mM EDTA, 0.1 mM EGTA, 0.5 mM dithiothreitol, 0.5 M sucrose, and protease inhibitors, pH 7.9) in 0.25-0.5% Triton X-100 or by disintegrating the cells in a tight fitting glass Dounce homogenizer followed by centrifugation at 600 times g for 15 min. Nuclei were washed 3 times with Triton-free buffer B (same as buffer A, except 0.35 M sucrose) and extracted on ice for 30 min in buffer C (buffer A, without sucrose, containing 420 mM NaCl and 20% glycerol). The supernatant of the subsequent centrifugation at 100,000 times g for 60 min was dialyzed for 3 h against buffer W (buffer A without sucrose, containing 80 mM KCl and 20% glycerol). The dialysate was centrifuged at 100,000 times g for 10 min, frozen in liquid nitrogen, and stored at -80 °C. Under these conditions, DNA binding activity of proteins was stable. Protein concentrations were measured by standard procedures(36) . Nuclear extracts from KB and BHK21 cells infected with Ad12 were gifts from Sabine Huppertz, those from the insect cell line IPLBSF21 (SF21) were from Andreas Kremer, and those of FHM cells were from Mark Munnes, all at the Institute of Genetics in Cologne.

For the purification of HeLa cell proteins (designated CGGBP(s) = 5`-d(CGG)(n)-3`ds binding proteins) that bind to the double-stranded 5`-d(CGG)(n)-3` repeat, crude nuclear extracts isolated from 2 times 10^9 cells (20 mg of protein) were equilibrated in buffer QA (10 mM Tris-HCl, 100 mM KCl, 1 mM MgCl(2), 0.15 mM spermine, 0.1 mM EDTA, 0.5 mM dithiothreitol, 20% glycerol, 0.01% Tween-20, and protease inhibitors, pH 7.9) using NAP-10 columns (Pharmacia Biotech Inc.) or Econo DP10-columns (Bio-Rad) and subsequently loaded on a 1-ml Resource Q column (Pharmacia) equilibrated in buffer QA. Proteins binding to the oligodeoxyribonucleotide (CGG)ds (see Table 1) eluted in the flow-through (fraction I, see Fig. 4). DNA affinity Sepharose was prepared by coupling 400 µg of the 3`-amino modified oligodeoxyribonucleotides (CGG)ds, CGG8Ads, or (CAG)ds covalently to 1 ml of N-hydroxysuccinimide-activated Sepharose beads (HiTrap; Pharmacia) according to the manufacturer's protocol. The material was equilibrated in buffer QA immediately before use. Proteins were bound and eluted in a batch procedure, washing and elution were performed in spin columns (Biometra, Göttingen, Germany). Active fraction I was incubated with CGG8Ads-Sepharose (250 µl) in the presence of 200 µg of poly(dAbulletdT) for 1 h. Unbound proteins containing CGGBP(s) (fraction II) were then incubated with 100 µl of (CGG)ds-Sepharose either at 4 °C for 4 h or at room temperature for 1 h. The material was centrifuged at 600 times g for 10 min, washed twice with 1 ml of buffer W 100 (20 mM HEPES, 100 mM NaCl, 1 mM MgCl(2), 0.15 mM spermine, 0.1 mM EDTA, 0.5 mM dithiothreitol, 20% glycerol, 0.01% Tween-20, and protease inhibitors, pH 7.9), and subsequently washed twice with 1 ml of buffer W 150 (same as W 100 but with 150 mM NaCl and 100 pmol of an unrelated oligodeoxyribonucleotide). CCGBP(s) were eluted as fraction III from the resin in 100 µl of buffer E 750 and partly in 100 µl of buffer E 1000 (same as W 100 but with 750 mM and 1 M NaCl, respectively). After equilibration of fraction III in buffer W 100 supplemented with 0.4% Tween-20, proteins were again bound to 20 µl of (CGG)ds-Sepharose. Binding, washing and elution were carried out as described above, but smaller volumes of the buffers W 100 (1 ml) and W 150 (100 µl) were used. CGGBP(s) eluted in 20 µl of buffer E 750 to yield fraction IV. Only low activity remained after elution with buffer E 1000 (fraction IV). Active fractions I to IV were analyzed by SDS-polyacrylamide gel electrophoresis (37) followed by silver staining.


Figure 4: Purification scheme for the isolation of CGGBP(s) from HeLa nuclear extracts. Details of the purification procedure were described in the text and under ``Experimental Procedures.''



Electrophoretic Mobility Shift Assay, Sodium Deoxycholate Treatment of DNA Protein Complexes, and Antibody Displacement/Supershift Assay

End-labeled oligodeoxyribonucleotides or DNA fragments (30,000 cpm, equivalent to 2 fmol), unspecific DNA (poly(dAbulletdT) or poly(dIbulletdC), 1 µg) were incubated for 30 min at room temperature in 20 mM HEPES, 50-100 mM NaCl, 0.5 mM dithiothreitol, 10% glycerol, pH 7.9, with 0.5-2 µg of protein from crude nuclear extracts or 1 µl of fractions I to IV. DNA-protein complexes were separated by electrophoresis on polyacrylamide gels (T% = 5% for the separation of oligodeoxyribonucleotides, T% = 4% for DNA fragments, C% = 5%) in 1 times TEB (89 mM Tris, 89 mM H(3)BO(3), 2 mM EDTA, pH 8.4) without the addition of loading dye. T% was acrylamide + bis-acrylamide per volume; C% was bis-acrylamide per acrylamide + bis-acrylamide. Gels were dried and exposed for 2-48 h on Kodak XAR films.

Influence of sodium deoxycholate on complex formation was tested as described previously(38) . Crude nuclear extracts or fraction I were incubated with labeled DNA fragments as described above for 10 min. Different amounts of sodium deoxycholate were then added in the absence or presence of 0.6% Nonidet P-40, the mixture was incubated for another 30 min and subsequently analyzed by gel electrophoresis.

The monoclonal antibody against the human transcription factor Sp1 was purchased from Santa Cruz Biotechnology Inc. (Santa Cruz, CA). Crude nuclear extracts or fraction I or III were incubated with the specific DNA fragment as described above in the presence of the anti-Sp1 antibody (0.3-1 µg) for 60 min at room temperature. Complexes were separated by electrophoresis on polyacrylamide gels.


RESULTS

Nuclear Proteins from Several Mammalian Cells Bind Specifically to the Double-stranded 5`-d(CGG)-3` Repeat

We have examined nuclear extracts from various human and other mammalian cells by electrophoretic mobility shift assays (EMSA) for the presence of proteins that bind to the double-stranded 5`-d(CGG)(n)-3` repeat located in the exon 1 of the human FMR1 gene(13) . For this purpose, DNA fragments containing a 5`-d(CGG)-3` repeat and the flanking genomic sequences from the 5`-UTR of the human FMR1 gene (Fig. 1) or double-stranded, repetitive oligodeoxyribonucleotides (Table 1) were used. In order to ensure formation of the double strand, the repetitive single-stranded oligodeoxyribonucleotides were hybridized under controlled conditions at high annealing temperatures. Double-stranded oligodeoxyribonucleotides migrated according to their sizes in native polyacrylamide or NuSieve agarose gels, whereas the single-stranded 5`-d(CNG)(n)-3` oligodeoxyribonucleotides showed increased mobility (data not shown; (39) ).

Binding of nuclear proteins to the synthetic oligodeoxyribonucleotide (CGG)ds and the FMR1 promoter derived DNA fragment 198ds was demonstrated by EMSA ( Fig. 2and Fig. 3, a and c). Specificity of binding was ascertained by competition experiments using the unlabeled homologous oligodeoxyribonucleotide (CGG)ds and additional synthetic products containing different tandem repeat sequences as competitors (Fig. 3, a and c). Nuclear proteins isolated from the established human cell lines HeLa, C4/I, KB, Jurkat, A549, 293, HEK12, an amnion tumor-derived cell line, as well as from primary human lymphocytes gave rise to the specific DNA-protein complex I (Fig. 2a, cI) after incubation with the oligodeoxyribonucleotide (CGG)ds. Formation of complex I could be competed by the oligodeoxyribonucleotide (CGG)ds in at least 75-fold excess, but not by several oligodeoxyribonucleotides with different sequences (Fig. 3a). Additional DNA-protein complexes apparent in EMSAs shown in Fig. 2a were not specific as shown by competition experiments (Fig. 3a). Extracts from non-human cells like hamster BHK21 cells and rat embryo fibroblasts REF12 produced the same patterns as those from human cells (Fig. 2b). However, proteins from monkey Vero cells, from nonmammalian FHM fish cells, and from the insect cell line SF21 generated specific DNA-protein complexes (Fig. 2b), which were different from those with proteins from human cell lines.


Figure 2: Binding of nuclear proteins isolated from various cell lines to the double-stranded trinucleotide repeat 5`-d(CGG)-3`. Crude nuclear extract (0.5-2 µg) was incubated with the oligodeoxyribonucleotide (CGG)ds in the presence of unspecific DNA. a, nuclear proteins isolated from a variety of human cell lines and human primary lymphocytes gave rise to the formation of one major complex I (cI). The same complex was observed with proteins isolated from various mammalian cell lines (b). Different complexes were detected with extracts from the fish cell line FHM and the insect cell line SF21, whereas no complex was detected with extracts from BHK21 cells grown in suspension. Experimental details were outlined in the text under ``Experimental Procedures.'' cI indicates the position of the specific complex I.




Figure 3: Specific binding of nuclear proteins to the double-stranded trinucleotide repeat 5`-d(CGG)-3`. Binding of nuclear proteins from HeLa cells to the oligodeoxyribonucleotide (CGG)ds (a), to the fully methylated oligodeoxyribonucleotide (MGG)ds (b), or to the promoter-derived DNA fragment 198ds (c) led to the formation of several specific DNA-protein complexes. a, complex I (cI) with the oligodeoxyribonucleotide (CGG)ds could be competed only with oligodeoxyribonucleotides of the general structures (CGG)ds (n geq 12) and (CGGNGG)(8)CGGds (with n = T or 5-methyldeoxycytidine). Oligodeoxyribonucleotide (MGG)ds containing the fully methylated trinucleotide repeat 5`-d(^mCGG)-3` did not function as a competitor. b, CGGBP(s) did not bind to the fully methylated repeat 5`-d(^mCGG)-3` in the oligodeoxyribonucleotide (MGG)ds. Fraction I (see Fig. 4) was incubated with either (CGG)ds (lane 2) or (MGG)ds (lane 4); complex formation was only observed with (CGG)ds. Specific complexes MI and MIII were formed only with crude nuclear extract and (MGG)ds (lanes 5-8). c, DNA fragment 198ds contained the trinucleotide repeat 5`-d(CGG)-3` flanked by genomic sequences of the 5`-untranslated region from the human FMR1 gene. In binding experiments, it gave rise to the specific complexes 1, 3 and 4 (c1, c3, and c4). Their formation was competed only by oligodeoxyribonucleotides of the general structure (CGGNGG)(8)CGGds (with n = T or C). Complex 3 was not always detectable. Double-stranded competitor oligodeoxyribonucleotides were used at a 300 t-fold excess over the double-stranded binding fragment (2 fmol). Sequences of oligodeoxyribonucleotides and a summary of competition experiments were described in Table 1and Table 2, respectively.





Infection of the permissive human cell lines HeLa and KB with Ad12 did not abolish CGG-binding activity (Fig. 2a). However, the abortive infection of hamster BHK21 cells with Ad12 gave rise to two additional bands showing slightly higher mobility in EMSA (Fig. 2b). In contrast, extracts from the Ad12-transformed BHK21 cell line derivative T637 or from its revertants TR3 or TR12 showed the same patterns as proteins from extracts of the parental BHK21 cells. Interestingly, CGGBP(s) were not detectable in extracts isolated from BHK21 cells grown in suspension cultures (Fig. 2b).

The biological significance of these data had to be ascertained by repeating the binding experiments with authentic DNA fragments from the 5`-UTR of the FMR1 gene. Fragment 198ds gave rise to the DNA-protein complexes 1-4 (Fig. 3c, c1-c4) when nuclear extracts from human HeLa cells were used. Similar or identical patterns were found when extracts from other human or non-human cell lines were investigated. Complex 3 was not always detectable. Complex 1 appeared to be specific for CGG binding, as its formation could be blocked by competition with the oligodeoxyribonucleotide (CGG)ds, but not with other oligodeoxyribonucleotides. The strong complex 4 seemed also to be formed by CGGBP(s), because its formation was partly competed by the oligodeoxyribonucleotide (CGG)ds (Fig. 3c) and also by 198ds. During the purification of CGGBP(s), complex 4 was the only detectable complex involving the 198ds fragment. Its formation could then be specifically competed by the oligodeoxyribonucleotide (CGG)ds and FMR1 promoter fragments 126ds, 198ds, and 248ds, but not by other oligodeoxyribonucleotides. Thus, complex 1 might contain additional factors that were probably associated with factors binding to flanking 3`-sequences. These additional factors could have been lost during purification and were no longer present in the CGGBP(s) in complex 4. Interestingly, the binding of proteins from nuclear extracts to the 126ds fragment with the same 5`-sequence as 198ds but a shorter 3`-end (Fig. 1) gave rise to only one complex and a pattern similar to that formed with the oligodeoxyribonucleotide (CGG)ds (data not shown). In contrast, binding of nuclear proteins to the 248ds fragment, which had the same 3`-sequence as 198ds but a longer 5`-sequence, produced the same pattern as the 198ds fragment.

It is concluded that several human and other mammalian cells express a (CGG)ds binding activity that gives rise to the same, strong complex I with the oligodeoxyribonucleotide (CGG)ds and to at least one specific complex with the authentic DNA fragments 198ds, 126ds, and 248ds from the 5`-UTR of the human FMR1 gene.

Specificity of Complex Formation as Assessed by Competition Experiments

The results of a series of competition experiments, which were performed to assess the specificity of complex I formation, were summarized in Table 2. The formation of complex I was impaired only by competition with the double-stranded oligodeoxyribonucleotides (CGG)(n)ds (8 < n leq 17) and with the authentic DNA fragments 126ds, 198ds, and 248ds from the 5`-UTR of the FMR1 gene (Fig. 3a). Single-stranded oligodeoxyribonucleotides (CCG)ss or (CGG)ss did not compete for binding.

Moreover, complex I was observed only with oligodeoxyribonucleotides (CGG)ds and (CGG)ds as binding probes, whereas (CGG)(8)ds gave rise to a very faint complex (data not shown). The oligodeoxyribonucleotide FraxF isolated from the human FRAXF locus (9) did not serve as a specific binding probe for CGGBP(s) and did not compete for binding to (CGG)ds. The FraxF oligodeoxyribonucleotide contained eight 5`-d(CGG)-3` repeats and alternating 5`-d(CAGCGG)-3`ds repeats (Table 1). Hence, effective binding of CGGBP(s) to the recognition sequence required more than 8 repeat units.

Formation of complex I was only partly competed by the synthetic oligodeoxyribonucleotide CGG8Tds (Fig. 3a), whereas no competition was observed with the oligodeoxyribonucleotide (TGG)ds (nucleotide sequences, see Table 1). However, complex I formation was not competed by the addition of oligodeoxyribonucleotides with other triplet repeat sequences (Fig. 3a). Moreover, binding of nuclear proteins to the 5`-d(CAG)-3`ds repeat was unspecific (data not shown). When the authentic DNA fragments 198ds or 126ds were used as binding probes, the 5`-d(CGG)-3`-specific complexes 1, 3, and 4 were competed by the oligodeoxyribonucleotide CGG8Tds (Fig. 3c) but not with other oligodeoxyribonucleotides.

Complex I and complexes 1-4 were destroyed after the addition of the anionic detergent sodium deoxycholate (geq0.03%), whereas the nonionic detergents Triton X-100 or Tween 20 (leq2%) did not have any effects on complex formation (data not shown). Complex disruption by sodium deoxycholate was reversed in the presence of 0.6% Nonidet P-40. Although it cannot be ruled out that sodium deoxycholate as an anionic detergent affects protein-DNA interaction, the sodium deoxycholate sensitivity of the binding of CGGBP(s) to the 5`-d(CGG)-3` repeat and the reversal by Nonidet P-40 suggest the involvement of protein-protein interactions in complex formation(38) .

CGGBP(s) Do Not Bind to the Fully Methylated Trinucleotide Repeat

The results of experiments with crude nuclear extracts from HeLa cells suggested methylation sensitivity of proteins binding to the 5`-d(CGG)(n)-3` repeat(26) . In order to investigate this problem further, oligodeoxyribonucleotides, which contained partly or fully methylated trinucleotide repeats, were used as binding probes or in competition experiments (Fig. 3, a and b). The completely methylated oligodeoxyribonucleotide (MGG)ds and the partly methylated oligodeoxyribonucleotides 8MCGGds and 4MCGGds (nucleotide sequences see Table 1) were synthesized by incorporating 5-methyldeoxycytidine instead of C during chemical synthesis. Only weak competition for the formation of complex I (cI) was observed when the completely methylated double-stranded oligodeoxyribonucleotide (MGG)ds was added (Fig. 3a). Moreover, only proteins from crude nuclear extracts were capable of forming complexes with the methylated oligodeoxyribonucleotide (MGG)ds (Fig. 3b, lanes 5-8). These complexes MI to MIII were not formed with proteins from fractions enriched for CGGBP(s) (see below and Fig. 3b, lanes 3 and 4). The formation of complex MIII was weakly competed by the unmethylated oligodeoxyribonucleotide (CGG)ds, complexes MI and MIII were not formed in the presence of (MGG)ds as competitor (Fig. 3b). In contrast, partly methylated oligodeoxyribonucleotides 8MCGGds and 4MCGGds formed the same complex I with crude nuclear extracts and purified CGGBP(s) as found with the unmethylated counterpart (CGG)ds (data not shown). These findings indicated methylation sensitivity of CGGBP(s). The binding of nuclear proteins to the fully methylated oligodeoxyribonucleotide (MGG)ds might be due to proteins that interacted specifically with highly methylated DNA sequences(40, 41) .

It is concluded that proteins in nuclear extracts from primary human cells, from established human cell lines, and from several mammalian as well as from some nonmammalian cells form a specific complex with the synthetic double-stranded oligodeoxyribonucleotides (CGG)(n)ds, with 12 leq n leq 17. The oligodeoxyribonucleotide (CGG)(8)ds suffices for weak complex formation. The authentic DNA fragments 248ds, 198ds, or 126ds from the 5`-UTR of the human FMR1 gene can also form at least one 5`-d(CGG)-3`ds-specific complex and additional, probably less specific complexes. Some of the more complicated EMSA patterns (Fig. 3c) might be accounted for by additional complex formation with nucleotide sequences that flank the 5`-d(CGG)-3` repeat. Modifications of the specific 5`-d(CGG)-3`ds sequence can be tolerated for its efficiency in competition experiments when exchanges of the C are limited to 8 and to the pyrimidines T or 5-methyldeoxycytidine. CGGBP(s) do not bind to the fully methylated trinucleotide repeat sequence. The ubiquitous expression of CGGBP(s) points to an important function of these proteins. This binding activity seems to be highly conserved, since similar proteins have been found in extracts from nonmammalian fish or insect cells.

Binding of Nuclear Proteins from Human Cells to the Single-stranded Oligodeoxyribonucleotides (CGG)ss and (CCG)ss Sequences Is Unspecific

Several reports suggested that single-stranded oligodeoxyribonucleotides 5`-d(CGG)(n)-3` and 5`-d(CCG)(n)-3` (n geq 4) might adopt unusual structures in vitro(39, 42) . In fact, these oligodeoxyribonucleotides exhibited abnormally high electrophoretic mobility in polyacrylamide gels (data not shown). We therefore examined these oligodeoxyribonucleotides for their capacity to bind nuclear proteins from human cells. The oligodeoxyribonucleotide (CCG)ss led to the formation of several complexes that could, however, be prevented by competition with single-stranded oligodeoxyribonucleotides of the general sequence 5`-(CSGCSK)-3` (S could be G or C and K could be G or T), but not with double-stranded oligodeoxyribonucleotides. The oligodeoxyribonucleotide (CGG)ss did not give rise to any specific complex at all. It is therefore likely that the generation of complexes between nuclear proteins and the single-stranded repeat sequences is rather unspecific and probably due to a single-strand binding protein.

The Human GC Box Binding Transcription Factor Sp1 Is Not Part of the CGGBP(s)bullet(CGG)ds Complex

A possible candidate protein for complex formation with the double-stranded 5`-d(CGG)-3` repeat was the transcription factor Sp1, which recognized the consensus sequence 5`-dGGGCGG-3`(43) . Therefore, an oligodeoxyribonucleotide Sp1ds containing the Sp1 binding sequence (Table 1) was tested for its capacity to compete for protein binding to the 5`-d(CGG)-3` repeat. It failed to function as a specific competitor (Fig. 3a).

In addition, we tried to assess the participation of Sp1 in the formation of the CGGBP(s)-(CGG)ds complex by testing the effect of an anti-Sp1 monoclonal antibody on complex formation. This antibody did not affect complex formation (data not shown).

It is therefore concluded, that the transcription factor Sp1 is not part of the CGGBP(s)bullet(CGG)ds complex. In addition, putative Sp1 binding sites located in the 3`-flanking region of the genomic 5`-d(CGG)-3` repeat are not bound by Sp1, since the antibody against this factor did not affect the formation of any complex formed with the authentic 198ds fragment (data not shown).

Partial Purification of a Nuclear Protein (p20) Associated with the Binding to the Double-stranded 5`-d(CGG)-3` Repeat

CGGBP(s) participating in complex I formation were isolated from HeLa nuclear extracts by the purification scheme outlined in Fig. 4. Nuclear extracts were prepared from 2 times 10^9 HeLa cells, and the proteins were first fractionated by anion-exchange chromatography (Fig. 5a). Protein binding activity to the double-stranded oligodeoxyribonucleotide (CGG)ds was recovered in the flow-through designated as fraction I ( Fig. 4and Fig. 5a). About 60% of unrelated proteins and nucleic acids from the nuclear extracts were eliminated in this purification step. Fraction I was then incubated in a batch procedure with the double-stranded oligodeoxyribonucleotides CGG8Ads or (CAG)ds coupled to Sepharose beads to remove proteins that bound unspecifically to DNA of similar structure (Fig. 4). CGGBP(s) were recovered almost quantitatively in the supernatant. This material was designated as fraction II. Fraction II was subsequently adsorbed to a (CGG)ds-Sepharose matrix, and active fractions (fraction III) were eluted with >300 mM NaCl (Fig. 5b). After a second passage of fraction III over the (CGG)ds-matrix, a major band of 20 kDa was detected in the active fraction IV (Fig. 5, b and c) by SDS-polyacrylamide gel electrophoresis followed by silver staining. The 20 kDa band was accompanied by an additional faint band of 120 kDa. In order to determine which of the two bands was responsible for specific (CGG)ds binding, proteins of fraction I were bound to the mutated CGG8Ads-matrix (Fig. 4, dashed line). The material was washed and eluted as described above. Fraction III` eluting with buffer E750 did not show (CCG)ds-binding activity (Fig. 5b). Analyses by SDS-polyacrylamide gel electrophoresis followed by silver staining revealed that this fraction contained several bands at 120, 70, and 55 kDa. However, a band around 20 kDa was not detected (Fig. 5c).


Figure 5: Isolation of a nuclear protein (p20) from HeLa cells involved in binding to the double-stranded trinucleotide repeat 5`-d(CGG)-3`. a, nuclear proteins were separated by anion-exchange chromatography (Resource Q). CGGBP(s) were detected in the flow-through, whereas accompanying proteins and nucleic acids eluted at higher salt concentrations. The inserts showed the results of EMSA experiments with the individual fractions. Only complex I was shown. b, fraction I was separated by DNA affinity chromatography as outlined in Fig. 4. (CGG)ds-binding activity was detected in fractions III and IV, eluting from the specific DNA affinity matrix (CGG)ds-Sepharose at high salt concentration after a first and a second loading, respectively (left panels). Almost all (CGG)ds-binding activity was found in the flow-through (fraction II) when the unspecific DNA affinity matrix CGG8Ads-Sepharose was used (right panel). c, proteins in fractions I-IV were separated by SDS-polyacrylamide gel electrophoresis (left panel). After silver staining, fraction IV gave rise to one dominant band with an apparent molecular mass of 20 kDa (p20) and a band at 120 kDa (left panel, lane 6). The band at 20 kDa was not present in fractions III` eluted with high salt isolated from the unspecific DNA affinity matrix CGG8Ads-Sepharose (right panel, lane 6), whereas it was detectable in high salt eluates (fraction III) from the specific DNA affinity matrix (CGG)ds-Sepharose (right panel, lanes 3 and 4). M, molecular mass (kDa) markers.



It is concluded that the protein p20 is involved in the formation of complex I and also of complex 4 established with the repetitive oligodeoxyribonucleotide (CGG)ds and the authentic DNA fragment 198ds, respectively. However, participation of additional proteins in complex I and complex 4 cannot be ruled out, since their amounts might be below the detection limit of silver staining.


DISCUSSION

This research has been initiated on the premise that the size stability of trinucleotide repeats in the human genome and their controlled replication may be regulated by factors that are encoded at chromosomal sites far remote from the locus of the trinucleotide repeats, e.g. of the FRAXA location on Xq27.3 in the instance of the fragile X syndrome(13) . Alterations in such regulatory proteins might be implicated in eliciting the repeat expansions that are causally related to a number of serious genetic diseases in humans. In addition, it needs be investigated whether the trinucleotide repeat itself might influence the regulation of the expression of adjacent genes.

Whatever the ultimate mechanisms underlying these striking trinucleotide repeat amplifications or the function of the repeat itself may turn out to be, we have considered it interesting to study cellular proteins that can bind specifically to these sequences. The 5`-d(CGG)(n)-3` repeat in the 5`-untranslated region of the human FMR1 gene has been chosen as a system of considerable theoretical and medical importance.

We have partly purified a protein that is involved in specific binding to the double-stranded form of the synthetic 5`-d(CGG)-3` repeat and its naturally occurring counterpart in the 5`-regulatory region of the human FMR1 gene. Further experiments will be focused on the isolation of a cDNA encoding this protein and on elucidating its function. Whether additional proteins are involved in complex I formation has to be investigated. However, the GC box binding protein Sp1 (43) does not participate in CGGBP(s)-(CGG)ds complex formation. This specific complex is sensitive to sodium deoxycholate treatment, and this sensitivity can be abrogated by sufficient concentrations of the nonionic detergent Nonidet P-40. This finding is indicative of a complex in which more than one protein is involved and which might be based in part on protein-protein interactions.

The protein-DNA complex investigated responds to specific 5`-d(CG)-3` methylation in the repeat sequences. This observation lends further credence to the biological significance of this complex formation since it has been demonstrated that in patients with the fragile X syndrome, the repeat sequence is hypermethylated(14, 15, 16) . The biochemical functions of the protein(s) actually contained in the complex require further detailed analyses.


FOOTNOTES

*
This research was supported by the Deutsche Forschungsgemeinschaft through SFB274-A1 and by the Fritz-Thyssen-Stiftung, Köln. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§
Present address: Howard Hughes Medical Inst., CMM, University of California at San Diego, 950 Gilman Dr., La Jolla, CA 92093-0649.

To whom correspondence should be addressed: Institute of Genetics University of Cologne, Weyertal 121, D-50931 Köln, Germany. Tel.: 49-221-470-2386; Fax: 49-221-470-5163.

(^1)
The abbreviations used are: UTR, untranslated region; FHM, fat head minnow; Ad12, adenovirus type 12; bp, base pair(s); EMSA, electrophoretic mobility shift assays; CGGBP(s), CGG binding protein(s); ^mC, 5-methyldeoxycytidine.


ACKNOWLEDGEMENTS

We thank Stephen T. Warren, Emory University Medical School, Atlanta, GA for a gift of the pE5.1 plasmid. We also thank Helmut Deissler and Hans-Christoph Kirch, Institute of Cell Biology or of Molecular Biology, University of Essen, Medical School, Essen, Germany, respectively, for cell lines and valuable comments on the manuscript, and Irmgard Hölker for the synthesis of oligodeoxyribonucleotides.

Note Added in Proof-Southwestern blotting analyses with purified protein fractions III, IV, or III` revealed that the p20 protein detected in fractions III or IV bound directly to the oligodeoxyribonucleotide (CGG)ds but not to the control oligodeoxyribonucleotide (CAG)ds. Proteins in fraction III` exhibited only unspecific binding to several different oligodeoxyribonucleotides. These results confirmed the conclusions drawn in this report that p20 bound specifically to the trinucleotide repeat 5`-d(CGG)ds-3`.


REFERENCES

  1. Caskey, C. T., Pizzuti, A., Fu, Y.-H., Fenwick, R. G., Jr., and Nelson, D. L. (1992) Science 256, 784-789 [Medline] [Order article via Infotrieve]
  2. Richards, R. I., and Sutherland, G. R. (1992) Cell 70, 709-712 [Medline] [Order article via Infotrieve]
  3. Riggins, G. J., Lokey, L. K., Chastain, J. L., Leiner, H. A., Sherman, S. L., Wilkinson, K. D., and Warren, S. T. (1992) Nature Genetics 2, 186-191 [Medline] [Order article via Infotrieve]
  4. Knight, S. J. L., Flannery, A. V., Hirst, M. C., Campbell, L., Christodoulou, Z., Phelps, S. R., Pointon, J., Middleton-Price, H. R., Barnicoat, A., Pembrey, M. E., Holland, J., Oostra, B. A., Bobrow, M., and Davies, K. E. (1993) Cell 74, 127-134 [Medline] [Order article via Infotrieve]
  5. Orr, H. T., Chung, M.-Y., Banfi, S., Kwiatkowski, T. J., Jr., Servadio, A., Beaudet, A. L., McCall, A. E., Duvick, L. A., Ranum, L. P. W., and Zoghbi, H. Y. (1993) Nature Genet. 4, 221-226 [Medline] [Order article via Infotrieve]
  6. The Huntington's Disease Collaborative Research Group (1993) Cell 72, 971-983 [Medline] [Order article via Infotrieve]
  7. Koide, R., Ikeuchi, T., Onodera, O., Tanaka, H., Igarashi, S., Endo, K., Takahashi, H., Kondo, R., Ishikawa, A., Hayashi, T., Saito, M., Tomoda, A., Miike, T., Naito, H., Ikuta, F., and Tsuji, S. (1994) Nature Genet. 6, 9-13 [Medline] [Order article via Infotrieve]
  8. Sutherland, G. R. (1979) Am. J. Hum. Genet. 31, 125-135 [Medline] [Order article via Infotrieve]
  9. Parish, J. E., Oostra, B. A., Verkerk, A. J. M. H., Richards, C. S., Reynolds, J., Spikes, A. S., Shaffer, L. G., and Nelson, D. L. (1994) Nature Genet. 8, 229-235 [Medline] [Order article via Infotrieve]
  10. Jones, C., Penny, L., Mattina, T., Yu, S., Baker, E., Voullaire, L., Langdon, W. Y., Sutherland, G. R., Richards, R. I., and Tunnacliffe, A. (1995) Nature 376, 145-149 [CrossRef][Medline] [Order article via Infotrieve]
  11. Nancarrow, J. K., Kremer, E., Holman, K., Eyre, H., Doggett, N. A., Le Paslier, D., Callen, D. F., Sutherland, G. R., and Richards, R. I. (1994) Science 264, 1938-1941 [Medline] [Order article via Infotrieve]
  12. Nancarrow, J. K., Holman, K., Mangelsdorf, M., Hori, T., Denton, M., Sutherland, G. R., and Richards, R. I. (1995) Hum. Mol. Genet. 4, 367-372 [Abstract]
  13. Verkerk, A. J. M. H., Pieretti, M., Sutcliffe, J. S., Fu, Y.-H., Kuhl, D. P. A., Pizzuti, A., Reiner, O., Richards, S., Victoria, M. F., Zhang, F., Eussen, B. E., van Ommen, G.-J. B., Blonden, L. A. J., Riggins, G. J., Chastain, J. L., Kunst, C. B., Galjaard, H., Caskey, C. T., Nelson, D. L., Oostra, B. A., and Warren, S. T. (1991) Cell 65, 905-914 [Medline] [Order article via Infotrieve]
  14. Oberlé, I., Rousseau, F., Heitz, D., Kretz, C., Devys, D., Hanauer, A., Boué, J., Bertheas, M. F., and Mandel, J. L. (1991) Science 252, 1097-1102 [Medline] [Order article via Infotrieve]
  15. Hansen, R. S., Gartler, S. M., Scott, C. R., Chen, S.-H., and Laird, C. D. (1992) Human Mol. Genet. 1, 571-578 [Abstract]
  16. Hornstra, I. K., Nelson, D. L., Warren, S. T., and Yang, T. P. (1993) Human Mol. Genet. 2, 1659-1665 [Abstract]
  17. Heitz, D., Rousseau, F., Devys, D., Saccone, S., Abderrahim, H., Le Paslier, D., Cohen, D., Vincent, A., Toniolo, D., Della Valle, G., Johnson, S., Schlessinger, D., Oberlé, I., and Mandel, J.-L. (1991) Science 251, 1236-1239 [Medline] [Order article via Infotrieve]
  18. Pieretti, M., Zhang, F., Fu, Y.-H., Warren, S. T., Oostra, B. A., Caskey, C. T., and Nelson, D. L. (1991) Cell 66, 817-822 [Medline] [Order article via Infotrieve]
  19. Sutcliffe, J. S., Nelson, D. L., Zhang, F., Pieretti, M., Caskey, C. T., Saxe, D., and Warren, S. T. (1992) Human Mol. Genet. 1, 397-400 [Abstract]
  20. Doerfler, W. (1991) Biol. Chem. Hoppe-Seyler 372, 557-564 [Medline] [Order article via Infotrieve]
  21. Behn-Krappa, A., and Doerfler, W. (1994) Hum. Mutat. 3, 19-24 [Medline] [Order article via Infotrieve]
  22. Schlötterer, C., and Tautz, D. (1992) Nucleic Acids Res. 20, 211-215 [Abstract]
  23. Chamberlin, M., and Berg, P. (1962) Proc. Natl. Acad. Sci. U. S. A. 48, 81-88 [Medline] [Order article via Infotrieve]
  24. Kornberg, A., Bertsch, L.-R. L., Jackson, J. F., and Khorana, H. G. (1964) Proc. Natl. Acad. Sci. U. S. A. 51, 315-323 [Medline] [Order article via Infotrieve]
  25. Bingham, P. M., Scott, M. O., Wang, S., McPhaul, M. J., Wilson, E. M., Garbern, J. Y., Merry, D. E., and Fischbeck, K. H. (1995) Nature Genet. 9, 191-196 [Medline] [Order article via Infotrieve]
  26. Richards, R. I., Holman, K., Yu, S., and Sutherland, G. R. (1993) Hum. Mol. Genet. 2, 1429-1435 [Abstract]
  27. Wang, Y.-H., Amirhaeri, S., Kang, S., Wells, R. D., and Griffith, J. D. (1994) Science 265, 669-671 [Medline] [Order article via Infotrieve]
  28. Wang, Y.-H., and Griffith, J. (1995) Genomics 25, 570-573 [CrossRef][Medline] [Order article via Infotrieve]
  29. Schetter, C., Grünemann, B., Hölker, I., and Doerfler, W. (1993) J. Virol. 67, 6973-6978 [Abstract]
  30. Behn-Krappa, A., Hölker, I., Sandaradura de Silva, U., and Doerfler, W. (1991) Genomics 11, 1-7 [Medline] [Order article via Infotrieve]
  31. Eick, D., Stabel, S., and Doerfler, W. (1980) J. Virol. 36, 41-49 [Medline] [Order article via Infotrieve]
  32. Graham, F. L., Smiley, J., Russell, W. C., and Nairn, R. (1977) J. Gen. Virol. 36, 59-72 [Abstract]
  33. Whittaker, J. L., Byrd, P. J., Grand, R. J. A., and Gallimore, P. H. (1984) Mol. Cell. Biol. 4, 110-116 [Medline] [Order article via Infotrieve]
  34. Dignam, J. D., Lebovitz, R. M., and Roeder, R. G. (1983) Nucleic Acids Res. 11, 1475-1489 [Abstract]
  35. Barrett, P., Clark, L., and Hey, R. T. (1987) Nucleic Acids Res. 15, 2719-2735 [Abstract]
  36. Bradford, M. M. (1976) Anal. Biochem. 72, 248-254 [CrossRef][Medline] [Order article via Infotrieve]
  37. Laemmli, U. K. (1970) Nature 227, 680-685 [Medline] [Order article via Infotrieve]
  38. Baeuerle, P. A., and Baltimore, D. (1988) Cell 53, 211-217 [Medline] [Order article via Infotrieve]
  39. Mitchell, J. E., Newbury, S. F., and McClellan, J. A. (1995) Nucleic Acids Res. 23, 1876-1881 [Abstract]
  40. Meehan, R. R., Lewis, J. D., and Bird, A. P. (1992) Nucleic Acids Res. 20, 5085-5092 [Abstract]
  41. Lewis, J. D., Meehan, R. R., Henzel, W. J., Maurer-Fogy, I., Jeppesen, P., Klein, F., and Bird, A. (1992) Cell 69, 905-914 [Medline] [Order article via Infotrieve]
  42. Fry, M., and Loeb, L. A. (1994) Proc. Natl. Acad. Sci. U. S. A. 91, 4950-4954 [Abstract]
  43. Briggs, M. R., Kadonaga, J. T., Bell, S. P., and Tjian, R. (1986) Science 234, 47-52 [Medline] [Order article via Infotrieve]

©1996 by The American Society for Biochemistry and Molecular Biology, Inc.