(Received for publication, September 28, 1995)
From the
Autonomous expansions of trinucleotide repeats with the general
structure 5`-d(CNG)-3` are associated with several
human genetic diseases. We have characterized nuclear proteins binding
to the unstable 5`-d(CGG)
-3` repeat. Its expansion
in the human FMR1 gene leads to the fragile X syndrome, one of the most
frequent causes of mental retardation in human males. Electrophoretic
mobility shift assays using nuclear extracts from several human and
other mammalian cell lines and from primary human cells demonstrated
specific binding to double-stranded DNA fragments containing only a
5`-d(CGG)
-3` repeat or the repeat and flanking genomic
sequences of the human FMR1 gene. Protein binding was inhibited by
complete methylation of the trinucleotide repeat. The complex formed
with crude nuclear extract apparently did not contain the human
transcription factor Sp1 that binds to a characteristic GC-rich
sequence. A 20-kDa protein involved in specific binding to the
double-stranded 5`-d(CGG)
-3` repeat was purified from HeLa
nuclear extracts by DNA affinity chromatography.
The autonomous, mechanistically still unexplained expansion of
naturally occurring trinucleotide tandem repeats in the human genome
has been recognized to be related to a number of serious human
diseases: the fragile X syndrome (FRAXA locus), myotonic dystrophy,
spinal and bulbar muscular atrophy, Huntington disease, mental
retardation associated with the fragile site FRAXE on the human X
chromosome, spinocerebellar ataxia type I, and
dentatorubral-pallidoluysian atrophy (for reviews, see (1, 2, 3, 4, 5, 6, 7) ).
Fragile sites, also known as folate-sensitive sites, are chromosomal
aberrations that condense poorly during metaphase and can break under
specific experimental conditions(8) . Several such sites have
been identified on the X chromosome (FRAXA, FRAXE, FRAXF; (9) )
and on the autosomes 11 (FRA11B; (10) ) and 16 (FRA16A; (11) and (12) ). All fragile sites identified so far
have been found to be associated with amplifications of the simple
unstable tandem repeat 5`-d(CGG)-3`.
In the
fragile X syndrome, the expanded tandem repeat
5`-d(CGG)-3` is located in the 5`-untranslated
region (UTR) (
)of the FMR1 gene in the human chromosomal
location Xq27.3(13) . The number of repeat units varies between
6 and 54 in normal individuals, whereas more than 200 to up to 2000
repeat units can be found in affected individuals. Expansion of the
repeat is accompanied by extensive methylation of the 5`-dCG-3`
dinucleotides in the repeat (14, 15, 16) and
is associated with transcriptional silencing of the FMR1
gene(17, 18, 19) . The function of the FMR1
protein is not yet known. The de novo methylation of the
expanded trinucleotide repeat can be interpreted as a cellular defense
against the invasion of foreign DNA or against unusual DNA
structures(20, 21) .
The cellular mechanism of
triplet repeat amplification is not understood. Interestingly,
procaryotic DNA polymerases are capable of expanding short synthetic
oligodeoxyribonucleotides containing simple tandem repeat sequences to
DNA stretches of several thousand nucleotides in lengths even in the
absence of template DNA(21, 22) . This finding
suggests a slippage mechanism (23, 24) for the
expansion of trinucleotide repeats presumably involving specific
DNA-binding proteins. In transgenic mice for instance, a
5`-d(CAG)-3` repeat in the androgene receptor gene is
stable upon transmission in the mouse, whereas it is expanding upon
transmission in humans(25) . The authors suggest the
involvement of sequence-specific, probably species-specific,
DNA-binding proteins in the amplification reaction. Experiments with
crude nuclear extracts from human HeLa cells indeed have shown binding
of proteins to tandem repeat sequences(26) . In addition, an
amplified 5`-d(CTG)-3` repeat is a preferential target for nucleosome
assembly(27, 28) .
We have initiated experiments to
characterize and purify human nuclear proteins that bind specifically
to the double-stranded 5`-d(CGG)-3` repeat. Such
proteins are present in a variety of human and other mammalian cell
lines, as well as in primary cells.
DNA fragments were isolated from the plasmid pE5.1,
which was a gift from Stephen T. Warren, Emory University School of
Medicine, Atlanta, GA. This plasmid contained a
5`-d(CGG)-3` repeat in exon 1 of the human FMR1 gene and
flanking genomic DNA sequences(13) . The plasmid was cut with NarI, and the excised 441-bp fragment was isolated. This
fragment was subsequently treated with RsaI or BfaI
to yield a 198-bp (198ds) or a 126-bp (126ds) fragment, respectively.
To obtain the 248-bp (248ds) fragment, the plasmid was first cleaved
with RsaI, and the resulting fragment was isolated and cut
with DdeI. A restriction map illustrating the derivation of
these fragments was presented in Fig. 1.
Figure 1:
Survey of DNA fragments used in
EMSA. DNA fragments were isolated from exon I of the human FMR1 gene
cloned in the plasmid pE5.1. These fragments contained the
double-stranded trinucleotide repeat 5`-d(CGG)-3` (gray boxes) flanked by genomic sequences of the
5`-untranslated region. The nucleotide numbers corresponded to the
sequence published in GenBank (accession number X61378). The start
codon AUG is located approximately 70 bp downstream of the
trinucleotide repeat.
Oligodeoxyribonucleotides were 5`-end labeled with T4-polynucleotide
kinase (New England Biolabs, Beverly, MA) and
[-
P]ATP. DNA fragments were labeled at the
3`-end with the Klenow fragment of DNA polymerase I (Boehringer
Mannheim) and
-[
P]dATP or
-[
P]dCTP according to standard procedures.
The specific activity of the DNA probes was 10
cpm/pmol.
For the purification of HeLa cell proteins (designated
CGGBP(s) = 5`-d(CGG)-3`ds binding proteins) that
bind to the double-stranded 5`-d(CGG)
-3` repeat, crude
nuclear extracts isolated from 2
10
cells (20 mg of
protein) were equilibrated in buffer QA (10 mM Tris-HCl, 100
mM KCl, 1 mM MgCl
, 0.15 mM spermine, 0.1 mM EDTA, 0.5 mM dithiothreitol,
20% glycerol, 0.01% Tween-20, and protease inhibitors, pH 7.9) using
NAP-10 columns (Pharmacia Biotech Inc.) or Econo DP10-columns (Bio-Rad)
and subsequently loaded on a 1-ml Resource Q column (Pharmacia)
equilibrated in buffer QA. Proteins binding to the
oligodeoxyribonucleotide (CGG)
ds (see Table 1)
eluted in the flow-through (fraction I, see Fig. 4). DNA
affinity Sepharose was prepared by coupling 400 µg of the 3`-amino
modified oligodeoxyribonucleotides (CGG)
ds, CGG8Ads, or
(CAG)
ds covalently to 1 ml of N-hydroxysuccinimide-activated Sepharose beads (HiTrap;
Pharmacia) according to the manufacturer's protocol. The material
was equilibrated in buffer QA immediately before use. Proteins were
bound and eluted in a batch procedure, washing and elution were
performed in spin columns (Biometra, Göttingen,
Germany). Active fraction I was incubated with CGG8Ads-Sepharose (250
µl) in the presence of 200 µg of poly(dA
dT) for 1 h.
Unbound proteins containing CGGBP(s) (fraction II) were then incubated
with 100 µl of (CGG)
ds-Sepharose either at 4 °C
for 4 h or at room temperature for 1 h. The material was centrifuged at
600
g for 10 min, washed twice with 1 ml of buffer W
100 (20 mM HEPES, 100 mM NaCl, 1 mM MgCl
, 0.15 mM spermine, 0.1 mM EDTA,
0.5 mM dithiothreitol, 20% glycerol, 0.01% Tween-20, and
protease inhibitors, pH 7.9), and subsequently washed twice with 1 ml
of buffer W 150 (same as W 100 but with 150 mM NaCl and 100
pmol of an unrelated oligodeoxyribonucleotide). CCGBP(s) were eluted as
fraction III from the resin in 100 µl of buffer E 750 and partly in
100 µl of buffer E 1000 (same as W 100 but with 750 mM and
1 M NaCl, respectively). After equilibration of fraction III
in buffer W 100 supplemented with 0.4% Tween-20, proteins were again
bound to 20 µl of (CGG)
ds-Sepharose. Binding, washing
and elution were carried out as described above, but smaller volumes of
the buffers W 100 (1 ml) and W 150 (100 µl) were used. CGGBP(s)
eluted in 20 µl of buffer E 750 to yield fraction IV. Only low
activity remained after elution with buffer E 1000 (fraction IV).
Active fractions I to IV were analyzed by SDS-polyacrylamide gel
electrophoresis (37) followed by silver staining.
Figure 4: Purification scheme for the isolation of CGGBP(s) from HeLa nuclear extracts. Details of the purification procedure were described in the text and under ``Experimental Procedures.''
Influence of sodium deoxycholate on complex formation was tested as described previously(38) . Crude nuclear extracts or fraction I were incubated with labeled DNA fragments as described above for 10 min. Different amounts of sodium deoxycholate were then added in the absence or presence of 0.6% Nonidet P-40, the mixture was incubated for another 30 min and subsequently analyzed by gel electrophoresis.
The monoclonal antibody against the human transcription factor Sp1 was purchased from Santa Cruz Biotechnology Inc. (Santa Cruz, CA). Crude nuclear extracts or fraction I or III were incubated with the specific DNA fragment as described above in the presence of the anti-Sp1 antibody (0.3-1 µg) for 60 min at room temperature. Complexes were separated by electrophoresis on polyacrylamide gels.
Binding
of nuclear proteins to the synthetic oligodeoxyribonucleotide
(CGG)ds and the FMR1 promoter derived DNA fragment 198ds
was demonstrated by EMSA ( Fig. 2and Fig. 3, a and c). Specificity of binding was ascertained by
competition experiments using the unlabeled homologous
oligodeoxyribonucleotide (CGG)
ds and additional synthetic
products containing different tandem repeat sequences as competitors (Fig. 3, a and c). Nuclear proteins isolated
from the established human cell lines HeLa, C4/I, KB, Jurkat, A549,
293, HEK12, an amnion tumor-derived cell line, as well as from primary
human lymphocytes gave rise to the specific DNA-protein complex I (Fig. 2a, cI) after incubation with the
oligodeoxyribonucleotide (CGG)
ds. Formation of complex I
could be competed by the oligodeoxyribonucleotide (CGG)
ds
in at least 75-fold excess, but not by several
oligodeoxyribonucleotides with different sequences (Fig. 3a). Additional DNA-protein complexes apparent in
EMSAs shown in Fig. 2a were not specific as shown by
competition experiments (Fig. 3a). Extracts from
non-human cells like hamster BHK21 cells and rat embryo fibroblasts
REF12 produced the same patterns as those from human cells (Fig. 2b). However, proteins from monkey Vero cells,
from nonmammalian FHM fish cells, and from the insect cell line SF21
generated specific DNA-protein complexes (Fig. 2b),
which were different from those with proteins from human cell lines.
Figure 2:
Binding of nuclear proteins isolated from
various cell lines to the double-stranded trinucleotide repeat
5`-d(CGG)-3`. Crude nuclear extract (0.5-2 µg)
was incubated with the oligodeoxyribonucleotide (CGG)
ds in
the presence of unspecific DNA. a, nuclear proteins isolated
from a variety of human cell lines and human primary lymphocytes gave
rise to the formation of one major complex I (cI). The same
complex was observed with proteins isolated from various mammalian cell
lines (b). Different complexes were detected with extracts
from the fish cell line FHM and the insect cell line SF21, whereas no
complex was detected with extracts from BHK21 cells grown in
suspension. Experimental details were outlined in the text under
``Experimental Procedures.'' cI indicates the
position of the specific complex I.
Figure 3:
Specific binding of nuclear proteins to
the double-stranded trinucleotide repeat 5`-d(CGG)-3`.
Binding of nuclear proteins from HeLa cells to the
oligodeoxyribonucleotide (CGG)
ds (a), to the
fully methylated oligodeoxyribonucleotide (MGG)
ds (b), or to the promoter-derived DNA fragment 198ds (c) led to the formation of several specific DNA-protein
complexes. a, complex I (cI) with the
oligodeoxyribonucleotide (CGG)
ds could be competed only
with oligodeoxyribonucleotides of the general structures
(CGG)
ds (n
12) and
(CGGNGG)
CGGds (with n = T or
5-methyldeoxycytidine). Oligodeoxyribonucleotide (MGG)
ds
containing the fully methylated trinucleotide repeat
5`-d(
CGG)
-3` did not function as a competitor. b, CGGBP(s) did not bind to the fully methylated repeat
5`-d(
CGG)
-3` in the oligodeoxyribonucleotide
(MGG)
ds. Fraction I (see Fig. 4) was incubated with
either (CGG)
ds (lane 2) or (MGG)
ds (lane 4); complex formation was only observed with
(CGG)
ds. Specific complexes MI and MIII were formed only
with crude nuclear extract and (MGG)
ds (lanes
5-8). c, DNA fragment 198ds contained the trinucleotide
repeat 5`-d(CGG)
-3` flanked by genomic sequences of the
5`-untranslated region from the human FMR1 gene. In binding
experiments, it gave rise to the specific complexes 1, 3 and 4 (c1, c3, and c4). Their formation was
competed only by oligodeoxyribonucleotides of the general structure
(CGGNGG)
CGGds (with n = T or C). Complex 3
was not always detectable. Double-stranded competitor
oligodeoxyribonucleotides were used at a 300 t-fold excess
over the double-stranded binding fragment (2 fmol). Sequences of
oligodeoxyribonucleotides and a summary of competition experiments were
described in Table 1and Table 2,
respectively.
Infection of the permissive human cell lines HeLa and KB with Ad12 did not abolish CGG-binding activity (Fig. 2a). However, the abortive infection of hamster BHK21 cells with Ad12 gave rise to two additional bands showing slightly higher mobility in EMSA (Fig. 2b). In contrast, extracts from the Ad12-transformed BHK21 cell line derivative T637 or from its revertants TR3 or TR12 showed the same patterns as proteins from extracts of the parental BHK21 cells. Interestingly, CGGBP(s) were not detectable in extracts isolated from BHK21 cells grown in suspension cultures (Fig. 2b).
The biological significance of these data
had to be ascertained by repeating the binding experiments with
authentic DNA fragments from the 5`-UTR of the FMR1 gene. Fragment
198ds gave rise to the DNA-protein complexes 1-4 (Fig. 3c, c1-c4) when nuclear extracts
from human HeLa cells were used. Similar or identical patterns were
found when extracts from other human or non-human cell lines were
investigated. Complex 3 was not always detectable. Complex 1 appeared
to be specific for CGG binding, as its formation could be blocked by
competition with the oligodeoxyribonucleotide (CGG)ds, but
not with other oligodeoxyribonucleotides. The strong complex 4 seemed
also to be formed by CGGBP(s), because its formation was partly
competed by the oligodeoxyribonucleotide (CGG)
ds (Fig. 3c) and also by 198ds. During the purification of
CGGBP(s), complex 4 was the only detectable complex involving the 198ds
fragment. Its formation could then be specifically competed by the
oligodeoxyribonucleotide (CGG)
ds and FMR1 promoter
fragments 126ds, 198ds, and 248ds, but not by other
oligodeoxyribonucleotides. Thus, complex 1 might contain additional
factors that were probably associated with factors binding to flanking
3`-sequences. These additional factors could have been lost during
purification and were no longer present in the CGGBP(s) in complex 4.
Interestingly, the binding of proteins from nuclear extracts to the
126ds fragment with the same 5`-sequence as 198ds but a shorter 3`-end (Fig. 1) gave rise to only one complex and a pattern similar to
that formed with the oligodeoxyribonucleotide (CGG)
ds
(data not shown). In contrast, binding of nuclear proteins to the 248ds
fragment, which had the same 3`-sequence as 198ds but a longer
5`-sequence, produced the same pattern as the 198ds fragment.
It is
concluded that several human and other mammalian cells express a
(CGG)ds binding activity that gives rise to the same,
strong complex I with the oligodeoxyribonucleotide (CGG)
ds
and to at least one specific complex with the authentic DNA fragments
198ds, 126ds, and 248ds from the 5`-UTR of the human FMR1 gene.
Moreover, complex I was observed only
with oligodeoxyribonucleotides (CGG)ds and
(CGG)
ds as binding probes, whereas (CGG)
ds
gave rise to a very faint complex (data not shown). The
oligodeoxyribonucleotide FraxF isolated from the human FRAXF locus (9) did not serve as a specific binding probe for CGGBP(s) and
did not compete for binding to (CGG)
ds. The FraxF
oligodeoxyribonucleotide contained eight 5`-d(CGG)-3` repeats and
alternating 5`-d(CAGCGG)-3`ds repeats (Table 1). Hence, effective
binding of CGGBP(s) to the recognition sequence required more than 8
repeat units.
Formation of complex I was only partly competed by the
synthetic oligodeoxyribonucleotide CGG8Tds (Fig. 3a),
whereas no competition was observed with the oligodeoxyribonucleotide
(TGG)ds (nucleotide sequences, see Table 1).
However, complex I formation was not competed by the addition of
oligodeoxyribonucleotides with other triplet repeat sequences (Fig. 3a). Moreover, binding of nuclear proteins to the
5`-d(CAG)
-3`ds repeat was unspecific (data not shown).
When the authentic DNA fragments 198ds or 126ds were used as binding
probes, the 5`-d(CGG)
-3`-specific complexes 1, 3, and 4
were competed by the oligodeoxyribonucleotide CGG8Tds (Fig. 3c) but not with other oligodeoxyribonucleotides.
Complex I and complexes 1-4 were destroyed after the addition
of the anionic detergent sodium deoxycholate (0.03%), whereas the
nonionic detergents Triton X-100 or Tween 20 (
2%) did not have any
effects on complex formation (data not shown). Complex disruption by
sodium deoxycholate was reversed in the presence of 0.6% Nonidet P-40.
Although it cannot be ruled out that sodium deoxycholate as an anionic
detergent affects protein-DNA interaction, the sodium deoxycholate
sensitivity of the binding of CGGBP(s) to the 5`-d(CGG)
-3`
repeat and the reversal by Nonidet P-40 suggest the involvement of
protein-protein interactions in complex formation(38) .
It is concluded that proteins
in nuclear extracts from primary human cells, from established human
cell lines, and from several mammalian as well as from some
nonmammalian cells form a specific complex with the synthetic
double-stranded oligodeoxyribonucleotides (CGG)ds, with 12
n
17. The oligodeoxyribonucleotide
(CGG)
ds suffices for weak complex formation. The authentic
DNA fragments 248ds, 198ds, or 126ds from the 5`-UTR of the human FMR1
gene can also form at least one 5`-d(CGG)
-3`ds-specific
complex and additional, probably less specific complexes. Some of the
more complicated EMSA patterns (Fig. 3c) might be
accounted for by additional complex formation with nucleotide sequences
that flank the 5`-d(CGG)
-3` repeat. Modifications of the
specific 5`-d(CGG)
-3`ds sequence can be tolerated for its
efficiency in competition experiments when exchanges of the C are
limited to 8 and to the pyrimidines T or 5-methyldeoxycytidine.
CGGBP(s) do not bind to the fully methylated trinucleotide repeat
sequence. The ubiquitous expression of CGGBP(s) points to an important
function of these proteins. This binding activity seems to be highly
conserved, since similar proteins have been found in extracts from
nonmammalian fish or insect cells.
In addition, we tried to
assess the participation of Sp1 in the formation of the
CGGBP(s)-(CGG)ds complex by testing the effect of an
anti-Sp1 monoclonal antibody on complex formation. This antibody did
not affect complex formation (data not shown).
It is therefore
concluded, that the transcription factor Sp1 is not part of the
CGGBP(s)(CGG)
ds complex. In addition, putative Sp1
binding sites located in the 3`-flanking region of the genomic
5`-d(CGG)
-3` repeat are not bound by Sp1, since the
antibody against this factor did not affect the formation of any
complex formed with the authentic 198ds fragment (data not shown).
Figure 5:
Isolation of a nuclear protein (p20) from
HeLa cells involved in binding to the double-stranded trinucleotide
repeat 5`-d(CGG)-3`. a, nuclear proteins were
separated by anion-exchange chromatography (Resource Q). CGGBP(s) were
detected in the flow-through, whereas accompanying proteins and nucleic
acids eluted at higher salt concentrations. The inserts showed the
results of EMSA experiments with the individual fractions. Only complex
I was shown. b, fraction I was separated by DNA affinity
chromatography as outlined in Fig. 4.
(CGG)
ds-binding activity was detected in fractions III and
IV, eluting from the specific DNA affinity matrix
(CGG)
ds-Sepharose at high salt concentration after a first
and a second loading, respectively (left panels). Almost all
(CGG)
ds-binding activity was found in the flow-through
(fraction II) when the unspecific DNA affinity matrix CGG8Ads-Sepharose
was used (right panel). c, proteins in fractions
I-IV were separated by SDS-polyacrylamide gel electrophoresis (left panel). After silver staining, fraction IV gave rise to
one dominant band with an apparent molecular mass of 20 kDa (p20) and a
band at 120 kDa (left panel, lane 6). The band at 20
kDa was not present in fractions III` eluted with high salt isolated
from the unspecific DNA affinity matrix CGG8Ads-Sepharose (right
panel, lane 6), whereas it was detectable in high salt
eluates (fraction III) from the specific DNA affinity matrix
(CGG)
ds-Sepharose (right panel, lanes 3 and 4). M, molecular mass (kDa)
markers.
It is concluded
that the protein p20 is involved in the formation of complex I and also
of complex 4 established with the repetitive oligodeoxyribonucleotide
(CGG)ds and the authentic DNA fragment 198ds,
respectively. However, participation of additional proteins in complex
I and complex 4 cannot be ruled out, since their amounts might be below
the detection limit of silver staining.
This research has been initiated on the premise that the size stability of trinucleotide repeats in the human genome and their controlled replication may be regulated by factors that are encoded at chromosomal sites far remote from the locus of the trinucleotide repeats, e.g. of the FRAXA location on Xq27.3 in the instance of the fragile X syndrome(13) . Alterations in such regulatory proteins might be implicated in eliciting the repeat expansions that are causally related to a number of serious genetic diseases in humans. In addition, it needs be investigated whether the trinucleotide repeat itself might influence the regulation of the expression of adjacent genes.
Whatever the ultimate mechanisms underlying these striking
trinucleotide repeat amplifications or the function of the repeat
itself may turn out to be, we have considered it interesting to study
cellular proteins that can bind specifically to these sequences. The
5`-d(CGG)-3` repeat in the 5`-untranslated region of the
human FMR1 gene has been chosen as a system of considerable theoretical
and medical importance.
We have partly purified a protein that is
involved in specific binding to the double-stranded form of the
synthetic 5`-d(CGG)-3` repeat and its naturally occurring
counterpart in the 5`-regulatory region of the human FMR1 gene. Further
experiments will be focused on the isolation of a cDNA encoding this
protein and on elucidating its function. Whether additional proteins
are involved in complex I formation has to be investigated. However,
the GC box binding protein Sp1 (43) does not participate in
CGGBP(s)-(CGG)
ds complex formation. This specific complex
is sensitive to sodium deoxycholate treatment, and this sensitivity can
be abrogated by sufficient concentrations of the nonionic detergent
Nonidet P-40. This finding is indicative of a complex in which more
than one protein is involved and which might be based in part on
protein-protein interactions.
The protein-DNA complex investigated responds to specific 5`-d(CG)-3` methylation in the repeat sequences. This observation lends further credence to the biological significance of this complex formation since it has been demonstrated that in patients with the fragile X syndrome, the repeat sequence is hypermethylated(14, 15, 16) . The biochemical functions of the protein(s) actually contained in the complex require further detailed analyses.
Note Added in
Proof-Southwestern blotting analyses with purified protein
fractions III, IV, or III` revealed that the p20 protein detected in
fractions III or IV bound directly to the oligodeoxyribonucleotide
(CGG)ds but not to the control oligodeoxyribonucleotide
(CAG)
ds. Proteins in fraction III` exhibited only
unspecific binding to several different oligodeoxyribonucleotides.
These results confirmed the conclusions drawn in this report that p20
bound specifically to the trinucleotide repeat
5`-d(CGG)
ds-3`.