(Received for publication, September 22, 1994; and in revised form, December 18, 1994)
From the
The restriction endonuclease EcoRI binds and cleaves
DNA containing GAATTC sequences with high specificity. According to the
crystal structure, most of the specific contacts of the enzyme to the
DNA are formed by the extended chain region and the first turn of
-helix
4 (amino acids 137-145). Here, we demonstrate
that a dodecapeptide (WDGMAAGNAIER), which is identical in the
underlined parts of its sequence to EcoRI amino acids
137-145, specifically binds to GAATTC sequences. The peptide
inhibits DNA cleavage by EcoRI but not by BamHI, BclI, EcoRV, HindIII, PacI, and XbaI. DNA cleavage by XbaI is slowed down at sites
that partially overlap with EcoRI sites. The peptide inhibits
cleavage of GAATTC sites by ApoI, which recognizes the
sequence RAATTY. It interferes with DNA methylation by the EcoRI methyltransferase but not by the BamHI
methyltransferase. It competes with EcoRI for DNA binding.
Based on these results, the DNA binding constant of the peptide to
GAATTC sequences was calculated to be 3
10
M
. DNA binding is not
temperature-dependent, suggesting that binding of the peptide is
entropy-driven. As the peptide does not show any nonspecific binding to
DNA, its DNA binding specificity is similar to that of EcoRI,
in spite of the fact that the affinity is much smaller. These results
suggest that contacts to the phosphate groups in EcoRI mainly
provide binding affinity, whereas the specificity of EcoRI is
based to a large extent on sequence-specific base contacts.
One century ago the concept of complementarity between an enzyme
and its substrate was introduced by Emil Fischer (1894) ()to
explain the specificity of enzymes. This concept has proven to be one
of the most successful concepts in enzymology and has been demonstrated
to be applicable in numerous cases including very specific enzymes, for
example type II restriction endonucleases. These enzymes (for reviews
see Heitman (1993) and Roberts and Halford(1993)) recognize palindromic
sequences 4-8 bp (
)in length and cleave the DNA within
these sequences. As has been shown, for example, for EcoRI
(recognition site, GAATTC), sequences differing in only 1 base pair
from the canonical sequence (``star'' sites) are cleaved at
least 3 orders of magnitude more slowly (Lesser et al., 1990;
Thielking et al., 1990), and sites differing in more than 1
base pair are not cleaved at all (Gardner et al., 1982;
Rosenberg and Greene, 1982). Similarly, binding of star sites is
impaired by at least 2 orders of magnitude when compared with binding
of GAATTC sequences; other sites are bound at least 4 orders of
magnitude more weakly than GAATTC (Lesser et al., 1990;
Thielking et al., 1990). The structural basis of this high
specificity is explained by the x-ray structure analysis of a specific EcoRI-DNA co-crystal (see Fig. 1) (McClarin et
al., 1986; Kim et al., 1990). It demonstrated that EcoRI binds as a symmetrical dimer to the palindromic
recognition site and identified in the protein-DNA interface many
specific contacts between the protein and the DNA (Rosenberg, 1991; Kim et al., 1993). They are formed between the bases of the GAATTC
sequence and EcoRI (direct readout) as well as between the
phosphate groups of the DNA backbone and the protein (indirect
readout). All specific contacts to the bases of the recognition
sequence are compiled in Fig. 1B, namely nine hydrogen
bonds to the bases of the DNA (two of which are mediated by a water
molecule) and five hydrophobic contacts. Additionally, at least nine
phosphate contacts are observed. Interestingly, direct readout is
almost exclusively due to a short region of EcoRI, the
extended chain motif (Met
-Ala
), which is
deeply buried in the major groove of the DNA, and the amino-terminal
part of
-helix
4 (Ile
-Arg
). This
extended chain-
4 region (Met
-Arg
)
forms all direct (i.e. not water-mediated) hydrogen bonds and
three of five hydrophobic contacts. Taken together, on the basis of the
x-ray structure analysis and many biochemical studies (Brennan et
al., 1986; Fliess et al., 1986; McLaughlin et
al., 1987; Needels et al., 1989; King et al.,
1989; Alves et al., 1989a; Heitman and Model, 1990a, 1990b;
Osuna et al., 1990; Oelgeschläger et
al., 1990; Jeltsch et al. 1993a), it appears as if the
specificity of EcoRI is based on an extensive complementarity
of the extended chain-
4 region and the major groove of the GAATTC
sequence. Here we have directly tested this assumption by investigating
the DNA binding properties of a short peptide with an amino acid
sequence identical to the extended chain-
4 region.
Figure 1:
A, EcoRI-DNA co-crystal
structure (Kim et al., 1990; Brookhaven Data Bank entry 1R1E).
The region of the extended chain-4 peptide is highlighted (thick line). B, schematic drawing of the specific
base contacts observed in the EcoRI-DNA co-crystal structure
(Rosenberg, 1991; Kim et al., 1993). Contacts of one subunit
to the GAATTC sequence are indicated, most of which are formed by the
extended chain-
4 region (Met
-Arg
).
The majority of the specific contacts between EcoRI
and DNA are formed by amino acids within a short region of the protein,
namely the extended chain-4 region
(Met
-Arg
) (Fig. 1). Here we have
studied the DNA binding activity of a short dodecameric peptide
(H
N-WDGMAAGNAIER-COOH), which is identical in sequence to
the amino acids Asp
-Arg
in EcoRI
except for the amino-terminal Trp that was added to allow for
spectroscopic determination of the concentration of the peptide and for
a Leu
Gly replacement, which was made in order to
increase the solubility of the peptide.
Figure 2:
Cleavage of
oligoRI by EcoRI in the presence of 100 µM peptide (, right ordinate) as well as in
the absence of peptide (-, left ordinate) in EcoRI cleavage buffer at 25 °C. In each case 0.5
µM oligoRI was cleaved with 14.8 nMEcoRI. The initial rates of cleavage (straight
line) differ by a factor of 1.4.
Figure 3:
A, cleavage of -DNA in the absence (- peptide) of peptide as well as in the presence of 500
µM peptide (+ peptide) in EcoRI
cleavage buffer at 37 °C. 2 µg of
-DNA were incubated with
0.5 nMEcoRI in 70 µl of reaction volume (c
sites = 5.4
nM). After the times given (`, min),
aliquots were withdrawn and analyzed electrophoretically.
-DNA
(48,502 bp) is cleaved by EcoRI into six fragments (21,225,
7,421, 5,804, 5,643, 4,878, and 3,531 bp). B, cleavage of
pUC8-DNA in the absence (- peptide) of peptide as well
as in the presence of 500 µM peptide (+ peptide) in EcoRI cleavage buffer at 37 °C. 1 µg
of pUC8-DNA was incubated with 0.1 nMEcoRI in 100
µl of reaction volume (c
sites = 5.7 nM). After the times given (`, min), aliquots were withdrawn and analyzed
electrophoretically. The supercoiled pUC8-DNA (sc) is cleaved
via an open circle intermediate (oc) to give the linear form (lin).
Figure 4:
Rates of -DNA and pUC8-DNA cleavage
by EcoRI at various concentrations of peptide. All rates are
given relative to the rates of DNA cleavage of identical samples
measured in the absence of peptide. Rates were determined in EcoRI buffer at 21 °C. Values given are accurate within
±20%. The line is a theoretical curve corresponding to
the best fit of the data (K
(K
) = 3
10
M
). The dashed lines are simulated curves for K
=
1.5
10
M
(upper) and K
= 6
10
M
(lower),
respectively, and are included to demonstrate that 2-fold variations of
the K
result in a significantly worse
fit.
Figure 5: A, cleavage of pRVIS1 plasmid with XbaI in the absence of peptide (- peptide) as well as in the presence of 500 µM peptide (+ peptide). The supercoiled plasmid (sc) is cleaved via an open circle intermediate (oc) to give the linear DNA (lin). 5 µg of plasmid were incubated with 5 units of XbaI in EcoRI cleavage buffer at 37 °C. After the times given (`, min), aliquots were withdrawn and analyzed. B, cleavage of pRIF309+ plasmid with XbaI in the absence of peptide (- peptide) as well as in the presence of 500 µM peptide (+ peptide). The supercoiled plasmid is cleaved to produce two fragments 4153 and 915 bp in length. 3 µg of plasmid were incubated with 5 units of XbaI in EcoRI cleavage buffer at 37 °C. After the times given (`, min), aliquots were withdrawn and analyzed.
Figure 6: Gel electrophoretic mobility-shift assay of EcoRI in the absence of peptide (- peptide) as well as in the presence of 250 and 500 µM peptide. DNA was incubated with 0, 15, 25, 50, 75, or 150 nMEcoRI (left to right) at 21 °C in binding buffer.
Figure 7:
Methylation of -DNA by the EcoRI methyltransferase. 2 µg of
-DNA were incubated
with 5 units of methylase in reaction buffer at 37 °C in the
absence (- peptide) as well as in the presence (+ peptide) of 500 µM peptide. After the
times indicated (`, min), aliquots were withdrawn and analyzed
by digestion with EcoRI.
The concentration of both EcoRI and the
macromolecular DNA substrates used in the cleavage experiments
described above was low. Moreover, a high excess of nonspecific
competitor sites is present in the reaction mixture when macromolecular
DNAs are employed as substrates (Langowski et al., 1980).
Under these conditions substrate binding is the rate-limiting step of
the reaction (Langowski et al., 1981). Recently, we have shown
that EcoRI, when diffusing along the DNA, does not miss a
recognition site under conditions similar to those employed here
(Jeltsch et al., 1994). This implies that the rate of
formation of the enzyme-EcoRI site complex governs the
observed reaction rate. Therefore, the overall rate can be described by
a second order rate equation, v =
kcc
, where c
is the concentration of DNA and c
is the
concentration of EcoRI. Assuming that EcoRI cannot
cleave the DNA in the DNA-peptide complex, under identical conditions
the observed reaction rate is proportional to the fraction of the DNA
not complexed with peptide,
where v is the initial
cleavage velocity in the presence of peptide, v
is
the initial cleavage velocity in the absence of peptide,
c
is the concentration of
free DNA in the presence of peptide, and
c
is the total
concentration of DNA in the presence of peptide.
The relative rates
determined at various peptide concentrations, therefore, can be used
directly to calculate the DNA binding constant of the peptide. This
analysis was carried out with the computer program TITRAT, which
calculates an association constant with a least square fit method using
a multistep predictor/corrector module (VA05A) (Powell, 1965). The
binding constant turned out to be 3 10
M
(Fig. 4). With the same
analysis, the equilibrium binding constant for the binding of the
peptide to the sequence CAATTG, which was derived from inhibition of MunI-catalyzed DNA cleavage, was estimated to be 1
10
M
.
The cleavage of
oligoRI was carried out at relatively high substrate concentrations
(0.5 µM). Fluorescence-stopped flow studies with oligoRI
have shown that substrate binding of EcoRI occurs in a
pre-equilibrium kinetically separable from cleavage under these
conditions (Alves et al., 1989b). Therefore, the
Michaelis-Menten model is applicable to analyzing the kinetics of
oligoRI cleavage, v = ck
c
/K
+ c
, where k
is the turnover number.
The effect of the peptide on the rate
of DNA cleavage by EcoRI under these conditions is to reduce
the concentration of free oligoRI leading to a decrease in rate. The
ratio of rates measured in the absence and in the presence of 100
µM peptide (v =
v
/v
= 1/1.4) (Fig. 2) together with K
= 80 nM and k
= 23 s
(Jeltsch et al., 1993b) can be used to deter-mine the concentration of
free oligoRI in the presence of peptide
(c
),
where c is the concentration of
oligoRI in the absence of peptide, which is equal to the total
concentration of oligoRI. With
c
, the concentration of
oligoRI-peptide complexes and finally K
for the
peptide-DNA complex can be calculated. With this procedure a K
of 3
10
M
was obtained in agreement with the
result of the analysis of the inhibition of
-DNA and pUC8-DNA
cleavage.
The DNA binding reaction of EcoRI in the presence
of peptide is governed by two coupled equilibria: EcoRI
binding to the DNA and peptide binding to the DNA. For a quantitative
analysis of the gel electrophoretic mobility-shift assays (Fig. 6), the lanes with the two highest EcoRI
concentrations were analyzed to estimate the DNA binding affinity of EcoRI and of the peptide. The fractions of DNA bound by EcoRI in the lanes without peptide yield a binding constant of EcoRI to the specific site of 1.5 10
M
under the conditions of the
experiments. This value, in combination with the concentration of the EcoRI-DNA complexes in the presence of peptide, can be used to
calculate the concentration of the free DNA in these mixtures. Then the
concentration of the peptide-DNA complexes and the equilibrium constant
for the binding of the peptide to the DNA can be estimated to be
2-4
10
M
, which
is similar to the values derived in the other analyses.
Specific interactions of proteins with DNA have been
investigated in great detail in several cases. Often specific contacts
between the proteins and the DNA are formed by characteristic
structural elements (for reviews see Steitz(1990) and Harrison(1991)).
Frequently -helices are positioned in the major groove of the DNA, e.g. by helix-turn-helix, basic region helix-loop-helix, basic
region leucine zipper, and zinc finger proteins, to provide a
structural framework for the recognition interactions (for a recent
review see Wolberger(1993)) but
-sheets, as in the MetJ- and
Arc-repressors, are also employed (for a recent review see Rauman et al.(1994)). In the EcoRI-DNA complex, a short
segment of the protein comprising amino acids
Met
-Arg
forms nearly all specific contacts
to the bases of the recognition sequence. This segment has an extended
conformation and is deeply buried in the major groove of the DNA.
Often the DNA-binding regions are stable subdomains that
specifically interact with the DNA as demonstrated for basic region
leucine zipper, helix-turn-helix, basic region helix-loop-helix, and
zinc finger proteins. Indeed many of the available structures of
DNA-binding proteins were determined only with the DNA-binding domain
or subdomain of the protein. An independent folding of the DNA binding
module, however, could not be expected for EcoRI, because the
extended chain-4 motif is held in place by several interactions to
other parts of the protein. Although the extended chain-
4 region
of EcoRI, therefore, cannot be considered to be a subdomain,
here we demonstrate by several lines of evidence that a dodecameric
oligopeptide that contains this sequence specifically binds to GAATTC
sequences. These lines of evidence are as follows. (i) The peptide
inhibits the EcoRI-catalyzed DNA cleavage of several different
substrates (13-mer oligodeoxynucleotide, pUC8-DNA, pRIF309+
plasmid DNA, and
-DNA) in a concentration-dependent manner. (ii)
The relative cleavage rates of GAATTC sites by EcoRI, which
recognizes GAATTC sequences and ApoI (recognition sequence,
RAATTY), are equally reduced by the peptide. (iii) DNA cleavage of BamHI (GGATCC), BclI (TGATCA), EcoRV
(GATATC), HindIII (AAGCTT), PacI (TTAATTAA), and XbaI (TCTAGA) is not affected by the peptide. (iv) XbaI cleavage at sites that partially overlap with EcoRI sites (TCTAGAATTC) is inhibited by the
peptide. (v) DNA methylation by the EcoRI methyltransferase
but not by the BamHI methyltransferase is slowed down by the
peptide. (vi) The peptide competes with specific DNA binding by EcoRI.
Interestingly, the discrimination of the peptide
between GAATTC and CAATTG is less stringent than the discrimination of EcoRI between these sequences. This observation is in
accordance with the recognition scheme, which is based on the structure
of the specific EcoRI-DNA co-crystal (Rosenberg, 1991),
because, in contrast to the AT base pairs, the two symmetry-related GC
base pairs are contacted mainly by amino acids outside of the extended
chain-4 region rather than by amino acids within this region (Fig. 1). The close contact of Met
and Ala
to the GC base pair, on the other hand, appears sufficient to
discriminate a GC base pair from a TA base pair that contains a methyl
group in the major groove of the DNA at this position, because TTAATTAA
cleavage by PacI was not inhibited by the peptide.
Specific
binding of short peptides to nucleic acids is not a novel phenomenon.
For the basic region leucine zipper protein GCN4, it was shown that a
small peptide comprising 20 residues specifically interacts with DNA
(Talanian et al., 1992). Specific DNA interaction of minor
groove-binding peptides is observed with short peptides containing RGR
repeats, which resemble minor groove-binding drugs like netropsin or
distamycin (Geierstanger et al., 1994). Moreover, a peptide 17
amino acids in length containing the Arg-rich region of the HIV Rev
protein is able to bind to the Rev response element in RNA (Tan et
al., 1993). Other examples of RNA-binding protein motifs were
reviewed recently (Mattaj, 1993). The binding of the extended
chain-4 peptide to DNA demonstrated in this work differs from all
of these examples, because this peptide does not contain a net positive
charge or positive charge clusters, which could support nonspecific
binding to DNA via electrostatic contacts to the phosphate backbone.
Consequently, in the EcoRI-DNA co-crystal structure, the
extended chain-
4 region is not involved in phosphate contacts to
the DNA.
The binding affinity of the peptide to the GAATTC sequence
was determined to be 3 10
M
, which corresponds to an
interaction energy of,
G
=
-RT ln K
= -25.3 kJ
mol
(-6.05 kcal mol
). The
binding constant of the peptide to GAATTC turned out to be
temperature-independent between 4 and 37 °C, within the limits of
error of our experiments. This result shows that
H
of the peptide-DNA association is small and, hence, that the
reaction is mainly entropy-driven. This is surprising at first, because
the peptide is most likely disordered in solution but presumably well
ordered in the complex with the DNA. One has to expect, therefore, the
existence of the unfavorable entropy term
S
upon complex formation. One
might speculate that this term is overcompensated by the favorable term
S
that arises, because upon complex
formation 1450 Å
of solvent-accessible surface are
buried. Because 1 Å
contributes roughly -100 J
mol
(Chothia, 1974) to the interaction energy,
G
can be estimated to be around
-145 kJ mol
(-34.7 kcal
mol
). The peptide contains 43 rotatable bonds that
generate rotational isomers (24 freely rotatable backbone bonds and 19
aliphatic C-C, C-N, or C-S bonds of the side chains).
Assuming that in solution only three rotational states are populated (i.e.
= 60, 180, and 300°) and that rotation
is completely frozen in the complex, the contribution of the reduced
conformational flexibility to the entropy change of complex formation
can be estimated using S = k ln W,
where W denotes the number of possible conformations,
S
= R ln W
/W
= R ln 1/(3
) = -390 J
K
. Then,
G
can
be estimated to be 116 kJ mol
(27.8 kcal
mol
). Although
G
and
G
are only crude
estimates (in
S
, for example, an
altered flexibility of the DNA is not taken into account), the sum of
both terms is close to the
G
observed. This
estimation shows that the release of ordered water molecules could be
the thermodynamic driving force of the peptide-DNA association (Ha et al., 1989). What then is the function of the specific
hydrogen bonds? Before complex formation all hydrogen bond donors and
acceptors of the peptide and the DNA interact with water molecules. If
both surfaces match perfectly, all hydrogen bond donors and acceptors
are saturated after complex formation, too, resulting in a very small
net enthalpy change. If, however, the surfaces are not chemically
complementary to each other, some hydrogen bond donors or acceptors
would remain free but without access to water in the DNA-protein
interface yielding a large and positive
H. This would
prevent association to such (i.e. nonspecific) sites, because
the overall
G of the complex formation would become
positive.
The binding affinity of the peptide to CAATTG sequences
has been estimated to be 10M
.
A nonspecific binding affinity to other sequences was not detectable.
Given the sensitivity of the experiments, the binding affinity to
nonspecific sites can be estimated to be below 10
M
. Because this peptide contains
neither a positive net charge nor positive charge clusters, there is no
structural basis for a nonspecific binding of the peptide to DNA. The
specificity of the peptide in the discrimination of GAATTC and
nonspecific sequences, hence, is in the order of
10
-10
. This value is similar to the
discrimination factor of EcoRI in binding GAATTC and
nonspecific sequences (e.g.K
(GAATTC)/K
(CTTAAG)
= 7
10
) (Lesser et al., 1990)
measured, however, in the absence of Mg
. This
comparison demonstrates that the extended chain-
4 region in EcoRI provides the major contribution to the binding
specificity of the enzyme. This conclusion is further supported by the
finding that cleavage at EcoRI star sites is similarly
inhibited by the peptide as cleavage at canonical sites.
Most of the specific contacts of EcoRI to the bases
of its recognition sequence (GAATTC) are formed by a short continuous
peptide sequence, the extended chain-4 motif. As demonstrated by
the crystal structure (Kim et al., 1990), this amino acid
motif is largely complementary to the major groove of the GAATTC
sequence, which enables it to form a variety of specific contacts to
the DNA; 8 amino acid residues (Met
-Arg
)
are involved in 10 specific interactions with the bases of the
recognition sequence. Here, we have demonstrated that a dodecapeptide
containing the sequence of the extended chain-
4 motif specifically
binds to GAATTC sequences. Therefore, binding specificity of EcoRI is based mainly on the specific contacts between a small
amino acid sequence motif and the bases of the DNA, whereas binding affinity is provided by contacts between amino acid residues
dispersed over the entire DNA binding site of the protein and the
phosphate groups of the DNA. Our data suggest that, at least for the EcoRI restriction endonuclease, direct readout is more
important to ensure binding specificity than indirect readout. It must
be kept in mind, however, that cleavage specificity of restriction
enzymes is only in part due to binding specificity. It might well be
that in the transition state contacts to the phosphate groups play an
important role and determine whether a sequence is cleaved or not
(Koziolkiewicz and Stec, 1992; Jeltsch et al., 1993c).