(Received for publication, December 4, 1996, and in revised form, January 28, 1997)
From the Waksman Institute and the Department of
Molecular Biology and Biochemistry, Rutgers University,
Piscataway, New Jersey 08855-0759
The MAT2 (
2) repressor interacts with the
Mcm1 protein to turn off a-cell type-specific genes in the yeast
Saccharomyces cerevisiae. We compared five natural
2-Mcm1 sites with an
2-Mcm1 symmetric consensus site (AMSC) for
their relative strength of repression and found that the AMSC functions
slightly better than any of the natural sites. To further investigate
the DNA binding specificity of
2 in complex with Mcm1, symmetric
substitutions at each position in the
2 half-sites of AMSC were
constructed and assayed for their effect on repression in
vivo and DNA binding affinity in vitro. As expected,
substitutions at positions in which there are base-specific contacts
decrease the level of repression. Interestingly, substitutions at other
positions, in which there are no apparent base-specific contacts made
by the protein in the
2-DNA co-crystal structure, also significantly
decrease repression. As an alternative method to examining the DNA
binding specificity of
2, we performed in vitro
2
binding site selection experiments in the presence and absence of Mcm1.
In the presence of Mcm1, the consensus sequences obtained were extended
and more closely related to the natural
2 sites than the consensus
sequence obtained in the absence of Mcm1. These results demonstrate
that in the presence of Mcm1 the sequence specificity of
2 is
extended to these positions.
Homeodomain proteins are a family of transcription factors
involved in many developmental and cellular processes and have been
found in almost every eukaryotic organism (1-4). The natural target
sites for many homeodomain proteins are unknown; therefore, their
DNA-binding sites have been defined through site selection experiments
in vitro (5-9). Although these studies provide important information on homeodomain-DNA recognition in vitro, in some
cases it appears that the homeodomain proteins do not function well at
these sites in vivo (10-14). One possible explanation for
this discrepancy is that in vivo the DNA binding specificity
and affinity of some homeodomain proteins may be influenced by
interactions with cofactors (13, 15-19). Since many of the studies
which examine homeodomain binding sites have been done in the absence
of cofactors, this may explain why in some cases sites selected
in vitro may not be functional sites for the homeodomain
protein complexes in vivo. To address this issue and to
understand how homeodomain proteins recognize specific sites, we have
investigated the DNA binding specificity of 2, a yeast homeodomain
protein, in which the natural target sites and cofactors are well
known.
The 2 protein is involved in the regulatory system that specifies
cell mating type in Saccharomyces cerevisiae (20-24). In diploid cells, the
2 and a1 proteins form a heterodimer to repress expression of haploid-specific genes (25, 26). In haploid
cells and
diploid cells,
2 interacts with a general transcription regulatory
factor, Mcm1, to repress expression of a-specific genes (asg)1 (27-29). DNase I
protection and deletion experiments of the promoter region of the
STE6 gene revealed an operator site required for
2-mediated repression (23, 30). Sequences similar to this site were
also found in the promoters of four other asg,
BAR1, STE2, MFA1, and MFA2
(20, 31-33). On its own,
2 binds in vitro to the
STE6 operator site with moderate affinity (34). In the presence of Mcm1, the apparent
2 DNA binding affinity increases at
least 100-fold, indicating that there are strong cooperative interactions between
2 and Mcm1 (28, 29).
2 binds DNA as a dimer
with each monomer flanking an Mcm1 dimer, which binds to the center of
the
2-Mcm1 site. Mutations in either the
2 or Mcm1 binding site
of the STE6 operator dramatically reduce the level of
repression in vivo, demonstrating that binding by both
proteins is required for repression (25, 35, 36).
The 2 protein (25 kDa, 210 amino acids) contains two structural
domains (34). The N-terminal domain (residues 1-102) is required for
dimerization and repression (37-39). The C-terminal domain (residues
132-188) contains a homeodomain, which binds DNA on its own in
vitro but is not sufficient for repression (34, 37). Flexible
regions adjacent to the N terminus and C terminus of the
2
homeodomain are required for interaction with the Mcm1 and a1
cofactors, respectively (26, 40-44). The three-dimensional structure
of the
2 homeodomain has been determined by NMR and x-ray
crystallography studies (42, 45, 46). Although there is only 27%
sequence identity between the
2 and Drosophila engrailed homeodomains, their overall structures are very similar (46, 47).
Moreover, these proteins bind DNA in a similar manner, and most of the
conserved residues in the third helices of the homeodomains make
identical contacts with DNA (46, 47). The
2 protein therefore
provides a good model for studies of homeodomain protein-DNA
interactions.
In this paper, we have examined in detail the sequence requirements for
the 2 homeodomain protein in repression of asg in vivo
and in vitro. Our results indicate that, in complex with Mcm1, the sequence specificity of DNA binding by
2 is apparently more extended than on its own. These results suggest one explanation for DNA recognition sites determined in vitro by proteins in
the absence of their cofactors may not function as optimal sites
in vivo.
Plasmids, which contain 2-Mcm1
binding sites (operators) found in the promoter regions of the
asg (STE6, STE2, BAR1,
MFA1, and MFA2 (21-24)), were constructed by
inserting double-stranded oligonucleotides containing the operators
with TCGA overhangs into the XhoI site between the TATA and
UAS elements in the CYC1-lacZ promoter of pTBA23 (44). The
2-Mcm1 consensus symmetric site (AMSC) and mutant derivatives are
symmetric and therefore can self-anneal, leaving TCGA overhangs for
cloning into the reporter vector. The mutant sites are named by
describing the original nucleotide, position, and substituted
nucleotide in the top strand. For instance, T3A/A28T is a symmetric
mutant in which the T at position 3 is mutated to A and the A at
position 28 is mutated to T.
Plasmids containing different
asg operators or AMSC sites were transformed into the yeast
strain 246.1.1 (MAT trp1 leu2 ura3 his4) for
-galactosidase assays (48). The assays were performed as described
(27). The level of repression was determined by comparing
lacZ expression from a plasmid containing an
2-Mcm1 site
with a plasmid lacking a site, pTBA23. Assays were performed with three
independent transformants of each construct, and the
-galactosidase
units were averaged. Standard deviations were within 20%.
The relative
2-Mcm1 DNA binding affinities for the AMSC and mutant operators were
determined by EMSA as described previously (43). Labeled DNA fragments
containing
2-Mcm1 sites were incubated with purified
2-(92-210)
and Mcm1-(1-96) proteins at the concentrations given in the figure
legends at room temperature for 3 h. The
2 and Mcm1 proteins
used in these experiments were purified as described previously and
were greater than 90% pure as judged on Coomassie-stained SDS-polyacrylamide gels (40). All EMSAs were electrophoresed through
6% 0.5 × TBE, native polyacrylamide gel at 200 V for 2 h.
Gels were then dried and exposed to phosphor screens, and images were
scanned on a Molecular Dynamics PhosphorImager model 425.
Labeled DNA fragments used in EMSAs were synthesized by PCR
amplification. Oligonucleotides W340 (5-CACGCCTGGCGGATCTGC) and W341
(5
-GCCCACGCGTAGGCAATC), which anneal to the sequences on either side
of the
2-Mcm1 site in the CYC1-lacZ promoter, were used
as primers in PCR amplifications. W340 was kinased with
[
-32P]ATP and purified with a QIAquick spin column
(Qiagen). PCR amplification was performed as follows: 94 °C for 5 min; 94 °C for 30 s, 48 °C for 1 min, 72 °C for 1 min for
35 cycles; and 72 °C for 10 min. PCR products were purified on a
10% native polyacrylamide gel. The relative
2 DNA binding affinity
for each site was calculated by comparing the percentage of fragment
bound at different
2 protein concentrations. The cooperativity
between
2 and Mcm1 was determined by comparing the
2 DNA binding
affinity in the presence of Mcm1 with that in the absence of Mcm1.
The binding site selection
assays were performed with oligonucleotide W896
(5-gcccacgcgtaggcaatcgaattcN8taaagtcgacgcagatccgccaggcgtg), where N signifies a random base and the underlined nucleotides represent the core Mcm1 binding sequence. Nucleotides flanking the Mcm1
core binding site were designed to be imperfectly symmetric to ensure
that the sites selected in the assay were not from contamination of a
wild type sequence ( ...
CATGTAATtata ... ) that was used for
comparison of the DNA binding affinity. W896 was made double-stranded
by filling in with Sequenase using the end-labeled W340 as a primer
that anneals to the 3
part of W896. EMSA of the randomized
double-stranded oligonucleotides were performed with
2-(92-210) as
described above. DNA in the shifted band that is detectable at the
lowest protein concentration was extracted from the dried gel slice
(49). The isolated DNA were amplified by PCR using primer W341 and
end-labeled primer W340 and then purified for the next round of
selection. After six rounds, the purified DNA fragments were cloned
into a T-overhang vector (50). The inserted selected sites were
sequenced, and their DNA binding affinity was determined by EMSA and
quantitated on a PhosphorImager. The
2 binding site selection in the
presence of Mcm1 was performed by utilizing the same initial randomized
oligonucleotide pool. In each round of selection, a titration of
2
initiated at 1 µM and 0.5 µM of Mcm1 were
present in the EMSA reactions. After six rounds of selection, the
selected sites were cloned and sequenced, and DNA binding affinity was
measured as described above.
The 2-Mcm1 binding sites
have been identified in the promoter regions of five asg
(STE6, BAR1, STE2, MFA1,
and MFA2) (21-24). Although the natural sites are highly
conserved, there are variations at some positions that may result in
different levels of
2-Mcm1-mediated repression. To verify that these
2-Mcm1 sites are all functional repressor sites and to measure their
relative strength for repression in the same promoter context,
oligonucleotides containing the sites were inserted into the promoter
region of a CYC1-lacZ reporter plasmid, and the level of
expression from the promoter was assayed by measuring
-galactosidase
activity (Fig. 1). The results indicate that although
there are small variations in the level of repression between the
different sites, all of the sites confer greater than 30-fold
repression of lacZ expression.
The STE6 operator was shown to function as a repressor site
in either orientation. There was, however, a small difference in the
level of repression between the two directions (23). Since all five
natural 2-Mcm1 sites are only partially symmetric dyads, we were
interested in whether the orientation of the sites with respect to the
start site of transcription affects the level of repression. To address
this question, we compared the level of repression conferred by these
sites in both orientations (Fig. 1). Although all five sites function
in either orientation, there were slight differences in the levels of
repression. These small differences may be due to the asymmetry of the
sites or to the flanking sequences.
Although there is some
asymmetry in the natural 2-Mcm1 sites, a consensus sequence of these
sites is highly symmetric. To examine whether a symmetric sequence may
function as a better repressor site, an
2-Mcm1 symmetric consensus
site, which we call AMSC, was assayed for the repression activity it
conferred in the context of the CYC1-lacZ reporter
construct. We found that AMSC is a slightly better repressor site
(1.3-4-fold) than any of the natural
2-Mcm1 sites (Fig. 1).
To determine whether the increase in repression conferred by the AMSC
site is due to stronger DNA binding affinity of the 2-Mcm1 complex
to the site, we compared the DNA binding affinity of
2-Mcm1 for AMSC
and STE6 sites by EMSA. Our results indicated that
2-Mcm1
binds to the AMSC site slightly better (1.3-fold) than to the
STE6 operator (Fig. 3). This result is consistent with our
in vivo repression data and shows that the AMSC site functions as a better site for
2-Mcm1 binding and repression than
the known natural asg operators.
Saturation Mutagenesis of the
The co-crystal structure of the 2 homeodomain bound to
DNA shows that the protein makes base-specific contacts with positions T3, G4 and T5 in the major groove
and with positions T8 (or A8) and
T9 in the minor groove (Fig. 2A)
(46). These positions are conserved in the
2 binding sites in each
of the known natural asg. However, a comparison of the
natural
2-Mcm1 sites also indicates that other positions such as
positions 1, 2, 6, and 7, in which there are no apparent base-specific
contacts in the co-crystal structure, are also highly conserved. This
strong conservation among the natural sites suggests that there is a
sequence specificity at these positions. To investigate if there are
sequence requirements at these positions for
2 DNA binding and
repression, a series of AMSC operators with symmetric base pair
substitutions in both
2 half-sites were cloned into the
CYC1-lacZ reporter promoter, and their effects on repression
in vivo were measured using
-galactosidase assays (Fig.
2B). As expected, substitutions at positions T3,
G4 and T5 result in a large decrease in
repression of lacZ expression (approximately 100-fold).
Substitutions at positions in which there are base-specific contacts in
the minor groove, T8 and T9, also significantly
reduce the level of repression. Surprisingly, substitutions at
positions in which there are sugar-phosphate backbone contacts, but no
base-specific contacts in the co-crystal structure, also dramatically
reduce repression (Fig. 2B). For example, some substitutions
at positions C1, A2, and A7, and
most notably A6, reduce the level of repression
10-50-fold. These results show that there is additional sequence
specificity at these positions for
2-Mcm1-mediated repression
in vivo.
To correlate the repression data with DNA binding activity, we assayed
2-Mcm1 DNA binding affinity for the mutant operator sites by EMSA.
Substitutions at positions in which there are base-specific contacts,
such as T3A/A28T, G4T/C27A and T5G/A26C cause a large reduction in
2-Mcm1 DNA binding affinity (10-, 26-, and 40-fold, respectively)
(Fig. 3A). Substitutions at positions in
which there are no base-specific contacts in the co-crystal structure,
such as C1G/G30C, A2C/T29G, A6C/T25G and A7C/T24G, also affect the
2-Mcm1 DNA binding affinity but to a smaller degree, 2-, 3-, 8-, and
3-fold, respectively, compared with substitutions at positions T3, G4, and T5 (Fig.
3B). Although there are some differences between the
absolute fold decreases of the in vivo and in
vitro assays, all of the substitutions have similar effects. For
example, those substitutions with large decreases in repression
in vivo have large decreases in DNA binding affinity
in vitro, and substitutions with small effects on the level
of repression in vivo have small decreases in DNA binding
affinity. The in vitro results therefore support the
in vivo observations and show that many of these
substitutions, even at those positions in which there are no apparent
contacts with bases, affect
2 DNA binding affinity and
repression.
We have found that substitutions at many positions in
the AMSC site affect repression in vivo. These results are
different from those of a previous study, which showed that changes at
positions 1, 2, 3, 6, 7, and 8 in the 2 homeodomain recognition
sequence of the natural STE6 operator have only small
effects on repression (36). One explanation for this discrepancy is
that we have constructed operators with symmetric substitutions in both
2 half-sites of the AMSC instead of a single point mutation in one
2 half-site of the natural STE6 operator. To investigate
this difference, we compared the effect on repression of single point
substitutions in one
2 half-site in AMSC with symmetric mutations in
both
2 half-sites (Fig. 4). An asymmetric
substitution at a position in which there is a base-specific contact,
such as G4A, leads to slightly higher repression than the symmetric
substitution G4A/C27T. Asymmetric substitutions at other positions,
such as A7, T8, and T9, have less
effect on repression than symmetric substitutions in both half-sites.
These results agree with the previous study and show that although the
single mutations only have a small effect on repression, there is a
sequence preference at these positions in the context of symmetric
substitutions.
The
The
natural DNA-binding sites of many homeodomain proteins are unknown. One
commonly used method to determine their target sites is through
in vitro DNA-binding site selection experiments utilizing
randomized oligonucleotides. One important question is how well sites
selected through in vitro selection experiments correlate
with the natural in vivo target sites. The mutagenesis experiments described above have precisely defined the sequence requirements for 2 binding. We therefore decided to use the site selection technique to determine whether it would identify a similar site. An oligonucleotide pool that contains an Mcm1 binding site adjacent to a randomized region was used in the site selection assay.
After six rounds of selection, the sites were cloned and sequenced
(Table I). An alignment of the sequences obtained in the
selection arrived at a consensus site of TGT, which corresponds perfectly with the positions contacted by
2 in the major groove in
the co-crystal structure (46). We have assayed the
2 DNA binding
affinity to each site and have found that sites with high DNA binding
affinity are closely related to the natural
2 recognition sequences
(Fig. 1). If only those sites with moderate affinity (++) or better
were considered, we obtained a consensus sequence of TGTAA, which
closely matches the natural
2 sites. These results indicate that
this in vitro technique can identify sites that correspond
well with the natural ones.
|
Although 2 binds DNA on its own in vitro, it must
interact with Mcm1 to repress asg in vivo. Previous studies
have shown that the cooperative DNA binding by
2 and Mcm1 requires a
specific spacing and orientation between their respective DNA-binding
sites (34, 35, 44). We therefore analyzed whether the sites that were
obtained from the in vitro selection experiments were able to be bound cooperatively by
2 and Mcm1 (Table I). Our results show
that only sites that have the proper spacing, orientation, and sequence
between the
2 and Mcm1 recognition sites, such as sequences 1, 2, and 9, are bound cooperatively by
2 and Mcm1. On the other hand,
those sites that do not have these sequence requirements are not bound
cooperatively by
2 and Mcm1, although on their own both proteins
bind to these sites with relatively high affinity. For example,
sequences 3, 5, and 7 are not bound cooperatively by
2 and Mcm1
because they do not have the same orientation between the
2 and Mcm1
binding sites. Furthermore, sequences 6 and 8, which have the proper
orientation and spacing between the
2 and Mcm1 binding sites, are
not bound cooperatively by
2 and Mcm1 because a G or C is present at
positions that are important for the
2-Mcm1 complex binding in
vivo (Fig. 2B). These results indicate that the site
selection assay is also able to screen for additional sequence
requirements such as spacing and orientation between sites when a
protein binds DNA in a complex with other cofactors.
The in vitro selection experiment, using 2 alone, defined
a consensus sequence TGT that corresponds well to the
2 recognition core sequence in natural sites. This consensus site, however, does not
extend to some positions that are conserved in the natural
2-Mcm1
sites and that we have shown are important for repression in
vivo. In the presence of Mcm1, it appears that there are
base-specific preferences at these positions. We therefore performed
the
2 site selection experiment a second time with the same pool of random oligonucleotides in the presence of Mcm1. After six rounds of
selection, we obtained 50 sequences, which we aligned in two groups
according to the different spacing (5 or 6 bp) between the
2
recognition core site, TGT, and the core Mcm1 binding site, CCTAATTAGG
(Table II). In each group, the sequences are listed based on the observed binding affinity of the
2-Mcm1 complex. Most
of the selected
2 sites have the appropriate orientation and spacing
between the
2 and Mcm1 sites. In this selection, one-third of the
sites selected were sequence 1, which is not only the highest affinity
site selected from the pool, but also exactly matches the
2
half-site that was used in the AMSC site (Figs. 1 and 2). Sequences
that were selected from the pool (sequences 1, 4, 17, and 19) are
identical to some of the natural
2 half-sites shown in Fig. 1. We
obtained a consensus sequence of ATGTAAT for sites with 5-bp spacing
between the
2 TGT core site and the Mcm1 site. This sequence
perfectly matches the
2 half-site sequence in the AMSC site that we
derived from an alignment of the natural asg operators. A
slightly different consensus site, GTGTAADT (D represents A, G, or T)
was obtained from selected sites with 6-bp spacing between the
2 TGT
core site and the Mcm1 site. We have assayed one derivative of this
consensus (CGTGTAAAT) for its ability to repress transcription of the
CYC1-lacZ promoter in vivo and have found that
the operator containing this sequence in each
2 half-site strongly
represses the lacZ expression (45-fold repression).
|
One notable difference between the selected sites with 5-bp spacing or
6-bp spacing between the 2 TGT sequence and the Mcm1 site is the
base preferences at position 2. Selected sites with 5-bp spacing
predominantly have an A at this position (23 of 34), while only 6 of 34 have a G at this position. On the other hand, selected sites with 6-bp
spacing predominantly contain a G (13 of 16) at this position. These
results suggest that there may be a difference in the base pair
specificity at this position that depends on the spacing between the
2 and Mcm1 sites. The data in Fig. 2 show that in operators with
5-bp spacing an A at position 2 represses lacZ expression
2-fold better than a site with a G at this position. However, in sites
with 6-bp spacing, we find that the site with a G at position 2 represses lacZ expression about the same (45-fold) as the
site with an A at this position (51-fold). These results indicate that
the sequence-specific requirements at position 2 are less stringent for
sites with 6-bp spacing between the
2 and Mcm1 recognition sites
than sites with 5-bp spacing.
The five natural 2-Mcm1 binding sites that have been identified
in the promoter regions of asg are highly conserved. We have examined the relative strength of these sites by comparing the level of
repression mediated by these sites in the same promoter context. All of
these sites confer strong repression of lacZ expression from
a heterologous CYC1-lacZ promoter, although there are some differences in the relative strength of repression, with
MFA1 > BAR1 > STE6 > MFA2 > STE2 (Fig. 1). The strength of
repression mediated by these sites correlates with the degree of
similarity to a consensus
2-Mcm1 binding site; i.e. the
higher the sequence similarity to the consensus site, the stronger the
repression. To further test this correlation, a symmetric consensus
site (AMSC) was assayed in the same context and was found to confer
better repression than any of the natural sites. The AMSC site was also bound cooperatively by the
2 and Mcm1 proteins with slightly higher
affinity than the STE6 operator. The level of repression is
therefore, at least in part, a function of the strength of
2-Mcm1
binding, and the higher the binding affinity, the greater the
repression. Although the natural
2-Mcm1 operators are not optimal
binding and repressor sites, it may not be biologically necessary for
these sites to function as well as the AMSC site. For example, the
transcriptional activator elements in the asg promoters may
be significantly weaker than the CYC1 UAS elements of the
reporter promoter used in this study. These weaker promoters would not
require a repressor site as strong as the AMSC site to completely turn
off expression of the genes. Alternatively, the weaker natural
repressor sites may enable the cells to respond faster to switches in
mating type and hence the cells would quickly derepress asg
and be able to mate with MAT
cells.
The 2 half-sites in AMSC are identical to one of the
2 half-sites
used in determining the co-crystal structure (46). In the co-crystal
complex, residues Ser-50, Asn-51, and Arg-54 in the
2 homeodomain
make base-specific contacts in the major groove with T3,
G4, and T5, and Arg-7 in the N-terminal arm of
the homeodomain makes base-specific contacts in the minor groove with
T8 and T9. As expected, our mutagenesis results
show that mutations in T3, G4, T5,
T8 and T9 dramatically reduce the level of
2-Mcm1-mediated repression in vivo. However, we also
observed that substitutions at other positions, such as C1,
A2, A6 and A7, in which there are
no base-specific contacts in the
2 co-crystal structure, also
significantly affect repression (Fig. 2B). These results
suggest that specific base pairs are also required at these
positions.
Recently, a ternary crystal structure of the a1 and 2 proteins bound
to DNA has been solved (42). This structure was determined at a higher
resolution than the previous
2 co-crystal structure, and portions of
the
2 protein, most notably the N-terminal arm and the C-terminal
tail extending from the homeodomain, are more ordered in the ternary
complex. The
2 half-site in the ternary complex is identical to the
2 half-sites in the AMSC consensus sequence. In the ternary
structure, besides base-specific contacts at positions 3, 4, 5, 8, and
9 that are present in the co-crystal structure, there are also
additional base-specific contacts at positions 2, 4, 5, and 6. It is
possible that in complex with Mcm1,
2 may make similar contacts to
these positions, which would explain why substitutions of these bases
pairs have an effect on
2-Mcm1-mediated repression. For example,
although there is no apparent base-specific contact to position 2 in
the co-crystal structure, it has been shown in the structure of the
a1-
2-DNA ternary complex that N-7 of A2 is contacted via
a water-mediated hydrogen bond by Ser-50 of the
2 homeodomain (42).
This position is strongly conserved among the
2-Mcm1 binding sites
found upstream of asg, and of the 10 natural
2
half-sites, 8 contain an A and 2 contain a G at this position (Fig. 1).
The observation that G, unlike C and T, functions almost as well as an
A at this position is consistent with a model that in complex with
Mcm1,
2 makes a similar base-specific contact to the N-7 group as is
observed in the a1-
2-DNA ternary complex.
We have found that substitutions to T or A at positions A7,
T8, and T9 have less effect on repression than
substitutions to G or C (Figs. 2B and 4). It has been
observed that A:T and T:A base pairs have a similar distribution of
hydrogen bond donors and acceptors in the minor groove (51). Since in
both crystal structures positions 8 and 9 are contacted in the minor
groove by Arg-7, it is possible that this extended side chain is able
to adjust to accommodate the slight alteration of the positions of the
hydrogen bond acceptors when an A:T base pair is substituted for T:A at
these positions. This model is supported by the observation that 2
binds on its own with almost equal affinity to sites with either T:A or
A:T at these positions. However, substitutions from T to A at position 8 or 9 cause more than a 5-fold reduction in the level of
2-Mcm1-mediated repression (Fig. 2B). A portion of the
effects of these substitutions may be due to the slight decrease in
2 DNA binding affinity. However, substitutions at these positions
also affect Mcm1 binding to the site (52), and this decrease in
affinity may account for most of the decrease that we observed in
2-Mcm1-mediated repression. Although no contacts were observed at
position 7 in either structure, there is also an A or T preference at
this base pair. It is possible that there may be base-specific contacts at this position in the
2-Mcm1-DNA complex. Alternatively, G or C
substitutions at this position may interfere with the minor groove
contacts at adjacent positions.
In the a1-2-DNA ternary complex, there are only base-specific
contacts in the minor groove at position 6; therefore, we might expect
the A to T substitution at this position would not greatly affect
repression and DNA binding affinity. However, we observed that the T
substitution at this position reduces the level of repression over
30-fold. If the Arg-4 side chain makes similar contacts in the
2-Mcm1-DNA complex as observed in the a1-
2-DNA ternary complex,
the position of the side chain may be fixed by its contacts with base
pairs 4 and 5 (42). Therefore, unlike Arg-7, the Arg-4 side chain may
not be able to alter its position to accommodate the small changes for
making a hydrogen bond with the T substitution at position 6. In
addition, the Gly-5 peptide backbone amide makes a hydrogen bond
contact to the O-2 of thymine on the bottom strand at position 6. To
maintain the hydrogen bond, the position of the peptide backbone would
have to be slightly altered in the A6 to T substitution.
The repositioning of the backbone may in turn weaken or destroy
multiple base-specific or sugar-phosphate backbone contacts that are
made by other side chains in the N-terminal arm and therefore
significantly reduce the level of repression. Alternatively, the
substitution may sterically interfere with the precise position of the
arm for making contacts with DNA. It has been shown that a small
hydrophobic region proceeding the N-terminal arm of the
2
homeodomain is required for cooperative DNA binding and protein-protein
interactions with Mcm1 (44). It is possible that the interactions
between the proteins fix the position of residues in the N-terminal arm
so that additional contacts could be made in the minor groove that are
not observed in either crystal structure. If these additional contacts
are made, then that may partially contribute to the increase in
2 DNA binding specificity that is observed in the presence of Mcm1. In
summary, the high degree of sequence conservation at positions 1, 2, and 6 among the natural sites along with our mutational analysis at
these positions shows that they play an important role in
2 DNA
recognition. Our results are consistent with a model that, in
combination with Mcm1,
2 is making contacts with the DNA that are
similar to contacts observed in the a1-
2-DNA ternary
complex.
We have analyzed the DNA binding specificity of 2 in complex with
Mcm1 by determining the effects of mutations within the AMSC site on
repression. As an alternative approach to investigate the
2 DNA
binding specificity, we have performed in vitro site selection experiments for the
2 homeodomain in the presence and absence of Mcm1. In the absence of Mcm1, we obtained an
2 binding consensus site of TGT, corresponding to positions in which there are
base-specific contacts by the
2 homeodomain in the major groove
(46). In the presence of Mcm1, we obtained two consensus sequences,
ATGTAAT and GTGTAADT (D represents A, G, or T) according to the spacing
between the
2 and Mcm1 sites. These consensus sequences show
extended sequence specificity compared with the consensus sequence
obtained from the site selection of
2 on its own. Furthermore, most
sequences obtained from the second selection have the same orientation
and spacing for the
2 and Mcm1 binding sites as is found in the
natural
2-Mcm1 operators (Fig. 1, Table II). Among these selected
sequences, four different sequences are identical to the natural
2
half-sites shown in Fig. 1. Our results demonstrate that in
vitro DNA site selection technique can be utilized not only to
identify binding sites of individual proteins but also to further
screen for optimal binding sites for a protein complex.
Previous studies have shown that the relative positions between the
2 and Mcm1 binding sites is somewhat flexible, and while large
changes in spacing are not functional, operators with 5 or 6 base pairs
between the sites are bound cooperatively by the proteins and function
as repressor elements in vivo (35, 44). The flexibility of
the spacing between the
2 and Mcm1 sites is evident among the
natural operators, since the STE2, STE6,
MFA1, and MFA2 sequences have a 5-bp space
between the
2 and Mcm1 sites in one half-site and 6-bp spacing in
the other half-site (Fig. 1). The fact that in the presence of Mcm1
sites were selected from the random pool which have both 5- and 6-bp
spacing further shows that binding by the
2-Mcm1 complex can
accommodate either spacing. In contrast, the spacing requirements
between the
2 and a1 binding sites of haploid-specific operators, as
well as the positions of the binding sites in other homeodomain
complexes such as the Drosophila Paired homodimer and the
Hox-Pbx heterodimer are rigidly fixed (43, 53, 54). These results
suggest that either the protein-protein or protein-DNA interactions in
the
2-Mcm1-DNA complex can adjust, to some extent, to accommodate the alterations in spacing between the binding sites.
Interestingly, in comparing the consensus sequences with 5- or 6-bp
spacing between the 2 and Mcm1 sites, we noticed that there is a
different preference for the base pair corresponding to position 2 in
the AMSC site. In sites with 5-bp spacing an A is preferred (23 of 34),
while sites with 6-bp spacing predominantly have a G at this position
(13 of 16). We have determined that in sites with 5-bp spacing an A at
position 2 results in 2-fold higher repression than a G, while in sites
with 6-bp spacing, G functions as well as A. These results suggest that
sites with 6-bp spacing have relaxed sequence specificity at this
position in comparison with sites with 5-bp spacing. It is possible
that to make the proper contacts with Mcm1 on operators with 6-bp
spacing,
2 may have to alter the contacts with position 2 of the
operator. In the a1-
2-DNA ternary complex structure, this base pair
is contacted by Ser-50 of the
2 homeodomain via a water-mediated hydrogen bond to N-7 (42). The preference for purines at this position
in operators with either 5- or 6-base pair spacing suggests that the
contact to N-7 is made in both sets of operators. However, the fact
that A is preferred to G in selected sites with 5-bp spacing indicates
that there may be another base-specific contact to the A:T base pair at
this position. In contrast, in operators with 6-bp spacing G functions
as well as A, which suggests that this contact is not made in this set
of operators. In other homeodomains, residue 50 makes either a direct
or water-mediated hydrogen bond with the base pair corresponding to
position 2, and this residue has been shown to have an important role
in determining homeodomain DNA binding specificity (9, 47, 53, 55,
56).
In summary, the in vitro site selection results support the
conclusions drawn from our mutagenesis data that, in complex with Mcm1,
the DNA binding specificity of the 2 protein extends to positions in
which there are no apparent base-specific contacts in the co-crystal
structure. Similar changes in the binding specificity of homeodomain
proteins in the presence of their cofactors have also been observed in
other homeodomain proteins. For example, the optimal binding site for a
Hox protein in complex with Pbx1 appears to be slightly different from
the site selected for Hox binding on its own (54). Likewise, the DNA
binding specificity of Oct-1 appears to change upon interaction with
Bob1 (13). The fact that the
2 binding sites selected in the
presence of Mcm1 are extended and better defined than in the absence of
Mcm1 could explain why consensus sites for some DNA-binding proteins identified in the absence of their cofactors may not function well
in vivo. Site selection in the absence of the cofactor would therefore not be able to define the sequence requirements for binding
by a protein complex, such as the orientation and spacing between the
binding sites of each protein, as well as the sequence specificity from
additional contacts made by the proteins.
We thank T. Li and C. Wolberger for providing
the coordinates of the a1-2-DNA ternary structure. We are grateful
to S. Parent, C. Abate-Shen, S. Zhang, H. Tang, and members of our lab
for helpful discussion and comments on the manuscript. We thank J. Mead
for pJM120, which contains the AMSC site. We also thank T. B. Acton for
pTBA23 and the purified Mcm1-(1-96) fragment.