(Received for publication, January 31, 1997, and in revised form, April 17, 1997)
From the The principal determinant of the pyrimidine
preference by the c-Myb DNA-binding domain at the initial base of the
consensus sequence was investigated by mutation of both the protein and the DNA base pairs, with analysis by a filter binding assay. Amino acid
residue 187 was revealed to interact with the pyrimidine base position,
as estimated from our previous complex structure. Unexpectedly, since
the pyrimidine preference is retained even in the
Gly187 mutant, the principal origin of the base
specificity should not occur via the direct-readout mechanism, but by
an indirect-readout mechanism, namely in the intrinsic
"bendability" of the pyrimidine-purine step of the DNA duplex.
A significant but rather small positive base pair roll is detectable in
the conformation of DNA in complex with the c-Myb DNA-binding domain.
Following the conventional chemical rules of the direct-readout
mechanism, amino acid mutagenesis at position 187 yielded several new
base preferences for the protein.
Specific interactions between proteins and DNA are critical to
gene expression and regulation, so a general readout mechanism of the
information encoded in DNA has been sought (1-3). However, a number of
complex structures of DNA duplexes and proteins determined at atomic
resolution have revealed that nature uses a great variety of readout
mechanisms (4-11).
In most complex structures, the direct-readout mechanism is mediated by
intermolecular hydrogen bond networks and hydrophobic interactions
between DNA duplexes and proteins. The interaction modes have been
classified into: (i) the intrinsic chemical features of bases and amino
acids (1, 3, 12), and (ii) the stereochemical relations between the
amino acids and the bases inside the DNA major grooves (12, 13). In
contrast, the indirect-readout mechanism works in several systems, such
as the trp repressor/operator (4, 5), where the DNA bases
are specifically recognized by proteins without the use of particular
hydrogen bonds or non-polar contacts. Instead, each
sequence-dependent deformation of the DNA conformation
stabilizes the characteristic geometry of the phosphate backbone, which
directly interacts with the protein through polar contacts (6). Water
molecules are often observed to mediate the specific interaction
through additional hydrogen bonds (4, 5). One common type of DNA
deformation is a steep kink of the duplex (7-9), which substantially
contributes to readout of the minor groove (8, 9).
In general, a combination of the direct- and indirect-readout
mechanisms results in specific base pair recognition. In other words,
both the specific binding affinity and the DNA bending contribute to
the free energy of complex formation (6, 14). This situation has made
it difficult to determine how each consensus base sequence is
recognized by the corresponding protein, even when the precise complex
structure is known.
The c-myb gene product (c-Myb) is a transcriptional
activator that specifically binds to DNA fragments containing the
consensus sequence PyAAC(G/T)G, where Py indicates a pyrimidine
(15-17). The DNA-binding domain (DBD)1 of
c-Myb consists of three imperfect 51- or 52-residue repeats (designated
R1, R2, and R3 from the N terminus) (18-20). The last two repeats, R2
and R3, are sufficient for the recognition of the specific DNA
sequences (20, 21). NMR analysis revealed that both R2 and R3 contain
three helices, and the third helix in each is a recognition helix
(22-24). R2 and R3 are closely packed in the major groove, so that the
two recognition helices directly contact each other to bind
cooperatively to the specific base sequence.
In the complex of c-Myb R2R3 with the Myb-binding DNA sequence (MBS-I),
the consensus A4, the counterpart guanine of C6, and the last G8
directly interact with Asn183 in R3, Lys182 in
R3, and Lys128 in R2, respectively (Fig. 1) (23). The
strong cooperativity between R2 and R3 originates from the putative
polar interactions between the side chains of Glu132 and
Asn179, and between those of Arg131 and
Asp178. However, it is not clear why the initial Py
corresponding to the third base position in the MBS-I fragment is
preferred by c-Myb R2R3, although this Py3 is less specific than the
other A4, A5, C6, and G8 sites in the consensus DNA sequence (17). In
our NMR structure shown in Fig. 1, Ser187 is
the only candidate that interacts with the T3 base, and this ability
was suggested in our previous paper (23). The hydroxyl group in the Ser
side chain could form a hydrogen bond with the O4 oxygen of
the T3 base, either directly or through water molecules.
Thus far, the Myb-homologous DBD has been found in over 30 proteins
from many species. An alignment of the DBDs shows that the Ser at
position 187 is highly conserved in the animal sequences, whereas it is
variable in the plant sequences (23, 25).
Here, to investigate the role of Ser187 and the origin of
this pyrimidine preference at the third base position, both
Ser187 in the c-Myb R2R3 and the third T-A base pair in the
22-mer MBS-I fragment containing the Myb-binding site were substituted
by other amino acids and other base pairs, respectively. The
interactions between them were examined using a filter binding assay,
whose efficiency has already been shown (17, 26, 27). The recognition mechanism will be discussed.
A DNA fragment
encompassing R2R3 (Leu90-Val193) in the
DNA-binding domain of c-Myb was amplified by polymerase chain reaction, using pact-c-myb (28) as the template and two synthetic
primers, to generate an NcoI site and a BamHI
site at the 5 Escherichia
coli BL21(DE3) was transformed with the wild type and mutant
plasmids (30). Freshly precultivated cells were inoculated into growth
medium containing 100 µg/ml ampicillin and were grown at 37 °C.
When the culture reached an A600 of about 0.4, isopropyl-1-thio- Circular dichroism (CD) spectra were
measured at 20 °C on a Jasco J-600 spectropolarimeter equipped with
a water-circulating cell holder. The spectra were obtained in 100 mM potassium phosphate buffer (pH 7.5) containing 20 mM KCl, using a 0.2-cm optical path length cell. The
protein concentration was 0.1 mg/ml. CD spectra between 200 and 250 nm
were obtained using a scanning speed of 20 nm/min, a time response of
1 s, a bandwidth of 1 nm, and an average over 8 scans.
The 22-mer oligonucleotide
CACCCTAACTGACACACATTCT, containing the Myb-binding site in the simian
virus 40 enhancer sequence (MBS-I) (31), and the third base substituted
variants were synthesized and purified by high performance liquid
chromatography with a C18 reverse-phase column (Fig.
2). The purified DNA was suspended in STE (10 mM Tris-HCl (pH 8.0), 100 mM NaCl, 1 mM EDTA), and complementary strands were annealed and
end-labeled with [
All filter binding assays for the
protein-DNA binding were carried out essentially as described (32-34).
[32P]DNA and various amounts of the c-Myb R2R3 mutant
proteins were incubated in 100 µl of binding buffer (100 mM potassium phosphate buffer (pH 7.5), 20 mM
KCl, 0.1 mM EDTA, 500 µg/ml bovine serum albumin, and 5%
(v/v) glycerol) on ice for 30 min. The final concentration of the
[32P]DNA in binding buffer was 0.4 nM, which
was always a lower concentration than the Kd value.
The incubated samples were filtered through a nitrocellulose membrane
(Schleicher & Schuell, BA-85, 0.45 µm) in approximately 10 s
with suction. The filters were dried and counted by a liquid
scintillation counter. The equilibrium dissociation constants
Kd were obtained from the binding titration curve,
based on the least square fitting to the normalized bound DNA
(y) with the protein concentration (x) using the
formula, y = x/(x + Kd).
Prior to the mutational analyses of Ser187, the
Cys130 in R2, which is the only cysteine residue in the
c-Myb R2R3 and is located at a position equivalent to an isoleucine in
R3, was replaced with Ile, to facilitate the protein purification and
the DNA-binding assay (35). It was reported that this mutation has
little effect on DNA binding (36). The affinity of the C130I mutant was
also measured in our own assay system, and it was shown to be almost equal to that of the wild type, and to maintain the pyrimidine preference at the third base position (Table I).
Table I.
Dissociation constants for the cognate 22-mer MBS-I fragments and the
third base-pair substituted variants with the
Ser187-substituted mutants
Biomolecular Engineering Research Institute,
Tsukuba Life Science Center,
Fig. 1.
Specific interaction between the c-Myb R2R3
and the MBS-I fragment (23). The side chains of
Lys128, Lys182, Asn183, and
Ser187 are indicated by the thick black lines,
and the consensus bases Py3(T3), A4, A5, C6, and G8 are indicated by
the thick white lines. The black and white
thin lines are the DNA double strands. The backbone of the protein
is shown by a pipe model. The figure was drawn with the program
InsightII (Molecular Simulations Inc., San Diego).
[View Larger Version of this Image (65K GIF file)]
Plasmids and Site-directed Mutagenesis
- and the 3
-end of the amplified fragment, respectively.
After digestion with NcoI and BamHI, the DNA
fragment was cloned into pAR2156NcoI (17) to yield the expression
plasmid, pRP23. An additional Met-Glu- sequence was introduced at the N
terminus of R2R3. Site-directed mutagenesis was performed by two-step
polymerase chain reaction, as described by Higuchi (29). Here the name
of each mutant protein is indicated as, for example, C130I/S187G for
the simultaneous mutations that replace Cys130 with Ile and
Ser187 with Gly.
-D-galactopyranoside was added to a
final concentration of 0.5 mM. The cells were cultured at
22 °C for another 12 h. The harvested cells were suspended in
50 mM Tris-HCl buffer (pH 7.8) containing 5 mM
MgCl2, and were lysed by sonication at 4 °C. After the
cell debris was removed by centrifugation, ammonium sulfate was added
to the supernatant to 50% saturation. After an incubation at 4 °C
for 1 h, the supernatant was dialyzed against 50 mM
potassium phosphate buffer (pH 7.5) containing 200 mM NaCl,
and was then applied to a phosphocellulose column (Whatman, P11). The
purified fractions were pooled, and the buffer was exchanged to 100 mM potassium phosphate buffer (pH 7.5) containing 20 mM KCl. The protein concentrations were determined from UV
absorption at 280 nm and were calculated by using the molar absorption
coefficient of 3.7 × 104 M
1
cm
1 (17).
-32P]ATP (Amersham) using T4
polynucleotide kinase (Toyobo, Osaka, Japan). The labeled DNAs were
purified by passage through spin columns (Pharmacia Biotech Inc.,
HR-300). Here the name of each variant DNA is indicated as, for
example, [C3]MBS-I, for the substitution of the T-A base pair at the
third position by a C-G base pair.
Fig. 2.
Sequences of the cognate 22-mer MBS-I and the
third base pair substituted MBS-I fragments. The base numbering
follows Ref. 23. The third base pair is denoted in bold
letters, and the consensus base sequence is surrounded by a
box.
[View Larger Version of this Image (16K GIF file)]
Protein
Kd
T3
C3
A3
G3
nM
Wild-typea
3.2
3.7
8.7
25
C130I
5.5
5.7
12
27
C130I/S187G
18
17
33
37
C130I/S187A
9.3
12
22
24
C130I/S187T
15
11
21
36
C130I/S187N
26
37
15
53
C130I/S187Q
36
39
54
39
C130I/S187V
13
8.9
37
34
C130I/S187L
130
73
103
72
C130I/S187K
61
22
103
36
C130I/S187R
43
22
103
21
C130I/S187D
103
103
103
103
a
An additional Met-Ala-sequence was introduced at the N
terminus of R2R3, which was used in the NMR experiment (23).
A series of 10 amino acids, Gly, Ala, Thr, Asn, Gln, Val, Leu, Lys Arg,
and Asp, were introduced into position 187 of the c-Myb R2R3, which is
a Ser residue in the wild type. The purity of each mutant protein was
about 95%, as monitored by SDS-polyacrylamide gel electrophoresis. All
of the mutant proteins have secondary structure contents similar to the
wild type, as confirmed by the CD spectra at the far UV region (Fig.
3). The perfect coincidence of all the spectra suggests
that the global tertiary structures of the mutant proteins were not
deformed.
The binding affinities of the mutants to the cognate 22-mer MBS-I fragments and the third base pair substituted variants were analyzed using the filter binding assay, and the results are summarized in Table I. All measurements were repeated at least twice, and typical experimental errors for the Kd value were less than 10%, although the retention efficiency was 20 ± 10% depending on the experimental conditions. From the methylation interference experiments (17) and the NMR analyses (23), the number of bound DNA duplexes per the c-Myb is considered to be one within the concentration used in this assay. As already indicated in the previous experiments (17, 26, 27), the filter binding assay was validated for this investigation.
The C130I/S187G mutant protein binds about one-third less strongly to the cognate MBS-I than the standard C130I mutant. The relative binding free energy change for the replacement of Ser with Gly, calculated from the Kd values, is 0.65 kcal/mol. It should correspond to the free energy derived from the interaction between the Ser side chain and the T3 base. This Gly mutant preferentially binds to both the cognate [T3]MBS-I and the substituted [C3]MBS-I. That is, even when residue 187 has no side chain, the mutant protein prefers the third pyrimidine as well as the wild type and the C130I mutant proteins.
The substitutions of Ser187 by Ala (C130I/S187A), Thr (C130I/S187T), or Val (C130I/S187V) reveal slightly reduced binding affinities, although the sequence specificities are retained like the standard C130I. In contrast, the C130I/S187N mutant preferentially binds to the [A3]MBS-I. The affinity for the A3 base is similar to that of the wild type, although those for the other three bases (T, C, and G) are greatly reduced, by approximately one-half to one-sixth. The specific interaction between the Asn residue and the A3 base closely follows the intrinsic chemical features. Interestingly, for the substitution by Gln, which is one methylene group longer than Asn, the C130I/S187Q mutant loses the preference for the A3 base. Also, in the case of the C130I/S187L mutant, in which Leu is one methylene group longer than Val, the binding affinity is greatly reduced.
The mutant proteins C130I/S187K and C130I/S187R, which introduced basic amino acids into position 187, specifically prefer to bind to the [G3]MBS-I and [C3]MBS-I variants. In contrast, for the substitution of Ser187 by acidic Asp (C130I/S187D), the binding affinity is completely reduced and is no longer sequence-specific.
Thus far, many amino acid replacements in the c-Myb R2R3 have been created and assayed by specific DNA binding (37-39), and almost all of their effects have been explained by the specific polar contacts between the R2R3 and the DNA in the three-dimensional structure of the R2R3-DNA complex (23). The current mutational study clearly indicates that residue 187 in R2R3 is also able to interact with the T3 base, as estimated from the geometry of Ser187 in the NMR complex structure (23). This specific DNA-binding mode is very different from the telomeric DNA recognition by the yeast RAP1-DBD (40), whose amino acid sequence is weakly homologous to that of the c-Myb R2R3.
However, the substitution of Ser187 with Gly, Ala, or Val unexpectedly resulted in only about a 3-fold decrease in the binding affinity toward any base, which would be a consequence of a direct-readout mechanism, while the pyrimidine base preference at the third position in the MBS-I fragment was retained. Ser is thought to have weak specificity, because its side chain can act as either a hydrogen bond donor or an acceptor, and thus can bind to any base. Nevertheless, Ser187 of the R2R3 preferentially binds to the pyrimidine bases. If this interaction were attributable only to the direct-readout mechanism, then the substitution of Ser187 should have resulted in an over 100-fold reduction of the binding affinity and a loss of the sequence specificity, like the substitution of Lys128 by Ala (23), and those of Asn136 and Asn186 by Ala (38). These results suggest that the preference of the pyrimidine bases at the third position of MBS-I should occur primarily by an indirect-readout mechanism.
In our previous structural study, no distinct deformation of the global
DNA conformation was observed (23). However, when the local bending of
the DNA duplex was carefully analyzed in 25 NMR complex structures and
the refined average structure (Protein Data Bank codes 1MSF and 1MSE,
respectively), significantly positive roll angles were always observed
between the third pyrimidine and the fourth purine, as indicated by an
arrow in Fig. 4. Characteristic negative
slides (1.1 ± 0.3 Å) were also observed at the same pyrimidine-purine step, corresponding to positive rolling, while the
twist angles at this step were 34.1 ± 2.3°, nearly equal to the
twist angle in standard B-form DNA.
Similar significant, positive rolls at pyrimidine-purine steps are general phenomena (42), observed in many complex crystal structures of repressors and homeodomains with the helix-turn-helix motif, as summarized in Table II. In every case, as a part of the consensus base sequence, the base pair roll bends the DNA so that the recognition helix is wrapped by the DNA duplex in the major groove (3, 42). Consequently, a large contact area is created between the recognition helix and the DNA major groove, facilitating the preferable polar contacts between the protein side chains and the DNA phosphate backbone. The local roll in the MBS-I fragment may be associated with the small magnitude of observed bending in long DNA duplexes bound with the c-Myb R2R3 (52). This bending may be enhanced by other regions in the protein, like the transactivation domain.
|
Due to the intrinsic propeller-twist of the DNA base pairs, the pyrimidine-purine step has two stable conformations, with rolling of 0° and around 10° (53, 54), from the physical requirements of the base stacking (55). There is negligible additional free energy cost required for the 10° rolling at the pyrimidine-purine step, even for a free DNA duplex without a protein. This is the physical origin of the so-called "bendability" of kinked DNA duplexes, commonly observed in the minor groove readout mechanism (8, 9). At the other pyrimidine-pyrimidine, purine-purine, and purine-pyrimidine steps, no such tendency toward a strongly bistable step is observed (53). In fact, the binding free energy differences between the pyrimidine bases and the purine bases at the third base position for the current Gly187, Ala187, and Val187 mutants are 0.4 ± 0.1 kcal/mol, as calculated from the dissociation constants in Table I.
Fig. 5 shows the results of the relative binding free
energy changes G toward the C130I/S187G mutant:
G =
Gbind (mutant against
the third N base)
Gbind (C130I/S187G
against the same third N base), where
Gbind = RT ln Kd. Here, the difference was calculated while
keeping the same third position base pair. We can now separate the
bendability effect from the total binding free energies between the
c-Myb R2R3 mutants and the variety of DNA sequences, unless the binding
modes vary from the wild type. Each positive and negative free energy
corresponds to a decrease and an increase of the binding affinity,
depending upon the intrinsic chemical features of the amino acids and
the bases, and subtracting the DNA bending effect.
For the Ala substitution, the binding affinity is increased as compared with Gly187, independent of the bases at the third position, probably due to the hydrophobic contacts. When the side chain volume is larger in the Val substitution, a similar binding affinity to the pyrimidines remains, but the affinity becomes neutral to the purines. Therefore, the volume of space created between residue 187 and the third base may allow at most the Val-pyrimidine pair, but the Val-purine pair would be slightly too large for the space. In fact, other amino acids, such as Leu and Gln, with larger side chain volumes than Val, significantly lack binding affinity, as indicated in Fig. 5. Moreover, the Val-, Leu-, and Gln-substituted mutants always have lower affinities for adenine than for guanine. This is also supported by the fact that the amino N6 of adenine occupies a larger volume than the oxygen O6 of guanine, which should be located at the position nearest to the side chain of residue 187.
From this consideration of the space volume around residue 187 and the third base, the native and the optimum interaction between Ser187 and T3 should be mediated by water molecules, as long as the binding mode is assumed to be the same in all of the mutant proteins and DNAs. In the Thr mutant, the disposition of the water molecules could be different from that in the wild type, thus yielding a slight decrease in the binding affinity. Since there is no possible conformation on the helix in which the methyl group of the Thr side chain would be able to access the methyl group in T3, as shown in a modeling study, a specific non-polar contact between the Thr mutant and T3 is not expected.
Following the conventional chemical rules for specific binding between amino acids and bases (1, 3, 12), the current Asn mutant specifically binds to the A3 base relative to the other bases, as indicated in Fig. 5. The Asn side chain size is less than that of Val, and there should be enough space for the Asn-adenine pair, resulting in the formation of direct hydrogen bonds with a free energy gain of about 0.5 kcal/mol. In addition, the Lys and Arg mutant proteins prefer to bind to the G3 base. From their intrinsic chemical nature, both basic amino acids can bind to the guanine base almost exclusively by electrostatic interaction. In contrast, these mutant proteins bind to the [A3]MBS-I and [T3]MBS-I bases with only weak affinity, probably because of the bulky side chains of the amino acids, like the Leu mutant. It is interesting that their long side chains seem to interact with the guanine base on the opposite side of C3. The acidic Asp substitution results in a severe reduction of its DNA binding, which is much lower than the Gly substitution, suggesting that the Asp side chain cannot interact with any base, including cytosine, in this geometry. Rather, the negative ionic charge may disturb other specific hydrogen bonds between the protein and the DNA.
The wild type protein and the C130I mutant with Ser187 bind to the cognate DNA most tightly among the mutant proteins, and their Kd values are in the nanomolar order. Generally, transcriptional regulator proteins bind to their target genes with greater affinity (57). These results are consistent with the conservation of Ser in position 187 of c-Myb among animal species (23). In contrast, among plant species, the amino acid in this position varies (25). This suggests that the recognition mode in the plant Myb homologues may be different from that of the c-Myb DBD from animal species. In fact, in the case of the yeast RAP1 domain 1, the corresponding Val409 residue does not interact with the DNA in the complex structure (40), although the free domain structure is similar to that of the c-Myb R3.
In conclusion, the current mutational analysis revealed that the pyrimidine preference of the native c-Myb DBD for the initial base of the consensus sequence originates principally in the intrinsic positive roll at the pyrimidine-purine step of the DNA duplex. For the purine-purine step, as much as 0.4 kcal/mol of additional free energy would be necessary, corresponding to the bendability. When these bending energies are separated, the conventional chemical rules between the amino acids and the bases are distinctively observed in the c-Myb R2R3 mutants.
It is still difficult to extract a definite "recognition code" from the variety of DNA information readout mechanisms. The situation becomes much more complicated when the DNA flexibility is considered. Only a screening technology, such as a phage display library (58-60), would be expected to reveal a novel, specific form of DNA recognition, instead of an artificial molecular design. However, based upon the complex structure and the mutational analysis, one may be able to dissect the sequence specific affinity into the DNA bendability and the specific interaction between the amino acids and the bases. Without this kind of precise analysis, we may never reach a complete understanding of the readout mechanism, nor produce any novel devices for molecular readout.