Modular construction of extended DNA recognition surfaces: mutant DNA-binding domains of the 434 repressor as building blocks

Tiebing Liang1, Jinqiu Chen2, Marie-Louise Tjörnhammar, Sándor Pongor and András Simoncsits,3

1 Present address: Institute of Botany, Chinese Academy of Sciences, Xiang Shan, Hai Dian Qu, Bejing 100093, China 2 Present address: Department of Pathology and Laboratory Medicine, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA International Centre for Genetic Engineering and Biotechnology (ICGEB), Area Science Park, Padriciano 99, I-34012 Trieste, Italy


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Single-chain derivatives of the 434 repressor containing one wild-type and one mutant DNA-binding domain recognize the general operator ACAA–6 base pairs–NNNN, where the ACAA operator subsite is contacted by the wild-type and the NNNN tetramer by the mutant domain. The DNA-binding specificities of several single-chain mutants were studied in detail and the optimal subsites of the mutant domains were determined. The characterized mutant domains were used as building units to obtain homo- and heterodimeric single-chain derivatives. The DNA-binding properties of these domain-shuffled derivatives were tested with a series of designed operators of NNNN–6 base pairs–NNNN type. It was found that the binding specificities of the mutant domains were generally maintained in the new environments and the binding affinities for the optimal DNA ligands were high (with Kd values in the range of 10–11–10–10 M). Considering that only certain sequence motifs in place of the six base pair spacer can support optimal contacts between the mutant domains and their subsites, the single-chain 434 repressor mutants are highly specific for a limited subset of 14 base pair long DNA targets.

Keywords: HTH motif/protein/DNA interactions/protein engineering/434 repressor/single-chain proteins


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Single-chain (sc) proteins that bind double-stranded DNA have recently been constructed from several transcription factors by covalent dimerization of DNA-binding domains (DBDs) or of the whole DNA-binding protein. Examples of covalent dimerization by recombinant peptide linkers include the 434 repressor (Percipalle et al., 1995Go; Simoncsits et al., 1997Go), the lac repressor headpiece (Gates et al., 1996Go), the Arc repressor (Robinson and Sauer, 1996aGo), the estrogen receptor (Kuntz and Shapiro, 1997Go), the bHLH domain of MASH-1 (Sieber and Alleman, 1998Go) and the lambda Cro repressor (Jana et al., 1998Go). The corresponding parent proteins function naturally as non-covalent dimers or higher oligomers of identical subunits and, in most cases, the covalent dimerization has been shown to cause a general increase in DNA-binding affinity. The sc molecules have been used to study various aspects of DNA recognition, such as the effect of covalent linkage on subunit association, DNA binding, protein folding and stability (Liang et al., 1993Go; Robinson and Sauer, 1996bGoRobinson and Sauer, 1998Go; Jana et al., 1998Go; Ruiz-Sanz et al., 1999Go).

The sc arrangement can also provide new ways to engineer proteins of novel DNA-binding specificities. Contrary to the non-covalently associated natural dimers, the sc dimers can easily accommodate two different DBDs of either identical or different DNA-binding specificities. While the non-covalent dimers generally recognize DNA sequences containing palindromic half-sites, the engineered sc derivatives can recognize either palindromic or non-palindromic sequences. In the case of the 434 repressor, it was shown that covalent dimerization of DBDs did not change the wild-type DNA-binding specificity (Chen et al., 1997Go; Simoncsits et al., 1997Go). When rational changes (Wharton and Ptashne, 1985Go) were introduced into one of the DBDs, the heterodimeric sc repressor mutant recognized non-palindromic sequences (Chen et al., 1997Go). Combinatorial mutant libraries of the sc 434 repressor containing one wild-type DBD and one partially randomized DBD were also constructed and used in a genetic selection to isolate mutant DBDs that bind to predetermined target sites (Simoncsits et al., 1999Go). These studies showed that the heterodimeric sc 434 repressors recognize a general, 14 base pair (bp) DNA operator sequence of ACAA–6 bp–NNNN type and that strongly binding mutants can be isolated for defined NNNN targets. The ‘non-contacted’ 6 bp spacer region between the 4 bp contacted operator boxes was also shown to influence the binding affinity strongly and similar, consensus spacer sequences were found to support high affinity binding by the natural, the sc and mutant sc 434 repressors (Chen et al., 1997Go).

In this study, we show that these findings can be utilized to construct long DNA recognition surfaces of novel specificities by combining previously isolated and characterized mutant DBDs in the sc arrangement. The building blocks used were a designed and previously characterized domain (Chen et al., 1997Go) as well as three mutant DBDs obtained in a protein selection experiment (Simoncsits et al., 1999Go). First, the DNA-binding properties of these mutant DBDs were characterized in detail by using binding site selection from randomized DNA pools and by binding affinity studies. In these specificity studies, the mutant DBDs were linked to the wild-type DBD. Several homo- and heterodimeric sc proteins were then constructed from these mutant domains and their DNA-binding properties were tested by using artificial operators. These operators were designed by considering the subsite recognition properties of the constituent mutant DBDs. It is shown that the binding specificities of the DBDs are generally maintained in the engineered, double-mutant sc dimers and in several cases specific, high-affinity interactions could be observed between the newly identified protein–DNA cognate pairs. Thus, the sc framework of the 434 repressor can accommodate selected and characterized, mutant DBDs to engineer novel reagents with defined DNA-binding specificities.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
General techniques

Protein expression and HPLC purifications were performed as described (Simoncsits et al., 1999Go) by using a Resource S column (Pharmacia Biotech). Protein concentrations were determined spectrophotometrically using the extinction coefficient 12 660 M–1 cm–1 at 280 nm as described (Gill and von Hippel, 1989Go). The 32P-labeled DNA probes were obtained by PCR amplification of the operator regions of the corresponding pRIZ' or pCP8 plasmids as 95–125 bp fragments (Simoncsits et al., 1999Go). Electrophoretic mobility shift assay (EMSA), data collection and quantitative evaluations were performed as described (Simoncsits et al., 1999Go). Briefly, binding reactions were performed by using 2-fold serial protein dilutions and 32P-labeled DNA probe present in a concentration which is significantly lower than the protein concentration in the whole titration range. The binding buffer contained a large excess of nonspecific DNA over the probe DNA. Generally, eight protein concentrations were used and the binding reaction mixtures were analysed by EMSA as described (Simoncsits et al., 1997Go). Binding affinities (Kd) were calculated by plotting the fraction of bound DNA ({ominus}) as a function of the total protein concentration (Pt) and the binding isotherm {ominus} = 1/(1 + Kd/Pt) was evaluated using Kaleidagraph software as described (Robinson and Sauer, 1996aGo; Simoncsits et al., 1999Go). Bound DNA was derived from the shifted bands corresponding to 1:1 stoichiometry binding. Bands corresponding to higher stoichiometries can generally be observed with the sc proteins at significantly higher concentrations (starting between 10 and 40 nM) than the titration range and the Kd of the specific interactions of this study. Methylation protection by dimethyl sulfate (DMS) was performed by following a general protocol (Rhodes and Fairall, 1997Go).

Selection of binding sites for RRTATG and RRTRPS

Selections with the two mutants were performed parallel by using a nitrocellulose filtration technique (Chen et al., 1997Go). The oligonucleotide TCCGGCTCGTATGTTGCATACAATAAAAN9ATGAGGAAACAGCTATGACCTCC (AT500) contained nine randomized residues (N9) and the sequences corresponding to PCR primer sites (AT421 upstream and AT422 downstream) are underlined. Eight selection cycles were performed as described (Chen et al., 1997Go) with slight modifications and simplifications as follows. The binding reactions were performed in 200 µl of binding buffer (50 mM KCl, 2.5 mM MgCl2, 1 mM CaCl2, 0.1 mM EDTA, 25 mM Tris–HCl, pH 7.2) containing 10 µg/ml poly(dI–dC), mutant sc protein (25 nM in the first cycle, 10 nM in cycles 2 and 3, then 5 nM in cycles 4–8) and 1 pmol ds DNA for 1 h at room temperature. After filtration and washing with water (200 µl), the bound DNA was recovered by soaking the nitrocellulose filter in 200 µl of PCR buffer (10 mM Tris–HCl, 1.5 mM MgCl2, 50 mM KCl, 0.1% Triton X-100, pH 8.3) for 5 min at room temperature. The eluted DNA (10 µl) was used directly in the PCR mixture (100 µl) containing the buffer shown above supplemented with 2.5 µM primers, 0.2 mM dNTP and 3 units of Taq polymerase (Roche Molecular Biochemicals). Twelve amplification cycles (94, 58 and 72°C, 1 min each) followed by a final 10 min incubation at 72°C were performed. The amplified DNA was precipitated with ethanol and ~0.5–1 pmol was used without further purification in the subsequent binding step. After eight selection cycles, an additional enrichment was performed by using the selected populations as 32P-labeled probes and EMSA. The shifted bands obtained with 0.5 and 2 nM proteins were used in subsequent analyses.

Selection of binding sites for RRTRES

The starting DNA pool (AT 586, N6 pool) containing six randomized bases had the central sequence –CATACAAGAAAGNNNNNNTTTATG– and the flanking PCR primer regions were identical with those shown above for AT500 (underlined). Four selection cycles based on EMSA were performed by using 2-fold serial protein dilutions in protein titrations. Shifted bands were isolated at protein concentrations when ~5–10% of the 32P-labeled DNA was shifted. These concentrations were gradually lower as the selection progressed: 1 nM in cycle 1, 0.4 nM in cycle 2, 0.1 nM in cycle 3, 12.5 and 25 pM in cycle 4.

Cloning of the selected sequences and designed operators

The operator regions of the selected sequences were cloned into the pRIZ'O(–) vector by loop insertion mutagenesis as described previously (Chen et al., 1997Go). The designed operators were obtained by annealing synthetic oligonucleotide pairs to form duplexes with 5'-TA overhangs which were cloned into the NdeI site of either pRIZ'O(–) (Simoncsits et al., 1997Go) or pCP8 (Simoncsits et al., 1999Go).

Construction of single-chain repressors containing one or two mutant DBDs

The genes coding for the RRTATG, RRTRPS and RRTRES mutants were cloned into pSET expression vector (Simoncsits et al., 1997Go) as described (Simoncsits et al., 1999Go), resulting in pSETRRTATG, pSETRRTRPS and pSETRRTRES. These vectors were used after XbaI–BamHI cleavage to replace the coding region of the R wild-type domain with that of the R* domain of pSETR*R*69 (Simoncsits et al., 1997Go) to obtain pSETR*RTATG, pSETR*RTRPS and pSETR*RTRES. The genes containing two selected mutant domains were also obtained in the pSET vector in two cloning steps. First, the pSETRRTATG, pSETRRTRPS and pSETRRTRES vectors were converted into pSETRTATG, pSETRTRPS and pSETRTRES, respectively, by complete EcoRI cleavage followed by vector re-ligation. These vectors were then cleaved with XhoI and HindIII and were ligated with the XhoI (partial)–HindIII fragments isolated from pSETR*RTATG, pSETR*RTRPS or pSETR*RTRES to obtain the pSETRTATGRTATG, pSETRTATGRTRPS, pSETRTRPSRTRPS and pSETRTRESRTRES clones.

The genes coding for substitution mutants of RRTRPS and RRTATG were constructed by replacing the {alpha}3 helix coding region of the R* domain in the pSETRR*69 with synthetic KpnI–XhoI linkers as described for the corresponding pRIZ' vectors (Simoncsits et al., 1997GoSimoncsits et al., 1999Go).


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
General experimental design and origin of the mutant DBDs

The sc derivatives of the 434 repressor contain tandem repeats of two DBDs of the natural repressor to form a single polypeptide chain of 158 amino acids (Percipalle et al., 1995Go; Simoncsits et al., 1997Go). The first 89 amino acid residues of the full 434 repressor are fused to a second copy of the first 69 amino acids to form the translational fusion (1–89)–(1–69). In this artificial protein, the two DBDs (residues 1–69) are joined by a peptide linker corresponding to the sequence of 70–89 residues of the full repressor. For simplicity, the prototype of this protein containing two wild-type DBDs in the functional (1–69)–(70–89)–(1–69) arrangement was abbreviated as RR69, where R stands for the DBD and the suffix 69 indicated the length of the second repeat (Simoncsits et al., 1997Go). Except for the first, designed mutant derivative RR*69 (Chen et al., 1997Go; Simoncsits et al., 1997Go), this suffix is not used in the abbreviations of the selected (Simoncsits et al., 1999Go) and constructed mutant hetero- and homodimeric sc molecules (this work).

The general scheme of constructing double-mutant sc variants of the 434 repressor with new DNA-binding specificities is shown in Figure 1Go. In the first step, sc repressor libraries containing one wild-type DBD (empty oval) and one mutant DBD (grey shaded oval) with randomized amino acids at certain, DNA-contacting positions are constructed. The libraries are then selected for interaction with a DNA operator composed of a subsite for the wild-type domain (empty rectangle) and an arbitrarily chosen target subsite (grey shaded rectangle) for the mutant domain. The protein selection experiments provide directly or after further specificity studies a set of mutant DBDs with characterized DNA-binding specificities, i.e. a set of cognate protein–DNA pairs is identified (see Figure 1BGo, where the components of the cognate pairs are identically striped). These mutant DBDs are finally combined to obtain novel sc molecules which are expected to recognize DNA operators composed of the cognate subsites of the corresponding DBDs (Figure 1CGo).



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 1. General strategy for building sc proteins that recognize novel DNA target sites. (A) A library of sc repressors containing a mutant DBD (shaded oval) with randomized residues is selected for a target DNA subsite (shaded rectangle). (B) Cognate DBD–subsite pairs are isolated. (C) The mutant DBDs are combined to form novel sc homo- or heterodimers.

 
The mutant DBDs used in this work were obtained previously by rational design or by selection. The designed DBD (R*) was obtained (Simoncsits et al., 1997Go) from the wild-type domain (R) by substituting amino acid residues 27, 28, 29 and 32 as shown in Figure 2BGo. When this domain was joined to the wild-type domain, the heterodimeric sc repressor RR*69 recognized the OR*1 type operator sequences and the optimal DNA target of the R* domain contained the TTAA sequence (Chen et al., 1997Go) between the 4' and 1' operator positions as shown in Figure 2AGo. The selected DBDs were obtained from heterodimeric sc repressor libraries containing random mutations in one of the DBDs at the above-indicated positions after a genetic selection for the OR*1 operator or for its 4'-TTAA-1' subsite (Simoncsits et al., 1999Go). The heterodimeric, selected sc proteins were abbreviated as RRXXXX, where R is the wild-type domain and RXXXX is the mutant domain containing amino acid substitutions X at the randomized positions. While the in vivo selected DBDs were shown to bind the selection target in vitro, some of them could also bind to other sequences with even higher affinities. For example, the RTATG domain showed preference for the OR1 subsite TTGT, the RTRPS bound the TTAA target and its close homolog TTAC, while the RTRES bound the TTAC with high affinity and specificity (Simoncsits et al., 1999Go). We chose these DBDs as building blocks to construct double-mutant sc proteins, and therefore their DNA-binding specificities were further studied by binding site selection from random ligand pools and by affinity studies.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 2. General numbering schemes of the operator bases (A) and of the amino acid residues (B) of the mutant sc 434 repressors. (A) OR1 is the natural OR1 of the 434 bacteriophage, OR*1 is a designed hybrid operator (Hollis et al., 1988Go;Simoncsits et al., 1997Go) and examples for their derivatives are also given. For clarity, only one strand of the double-stranded operators is shown. The regions of OR1 (bases 1–4 and the corresponding palindromic 4'–1') which were shown to participate in direct, specific amino acid side chain–base pair contacts in the complex of OR1 and the N-terminal DBD (Aggarwal et al., 1988Go) are underlined. The equivalent, contacted or putatively contacted regions (1–4 and 4'–1') of the listed operator derivatives are also underlined. (B) The amino acid sequences of the wild-type (R) and mutant DBDs used in this work are listed and the residues are numbered according to their positions in the {alpha}3 helix (above the residues) and in the full 434 repressor (under the residues). R* is a designed DBD (Wharton and Ptashne, 1985Go;Simoncsits et al., 1997Go); the other mutant DBDs were isolated previously (Simoncsits et al., 1999Go). Residues in the mutated positions are shown in bold.

 
Selection of binding sites for the RTRPS, RTATG and RTRES domains

Binding site selections for RTRPS an RTATG were performed by using the sc proteins RRTRPS and RRTATG, respectively and a random DNA pool containing the ACAATAAAANNNNNNNNN sequence with nine randomized residues (N9 pool). The binding site of the R domain ACAA is underlined and the binding site of the mutant domain is expected to be 6bp away from it, within the N9 region. Since this 6bp spacer or non-contacted region strongly influences the operator binding affinities of the wild-type and mutant sc 434 repressors (Chen et al., 1997Go; Simoncsits et al., 1999Go), a major part (5 of 6 bp) of it was kept constant as an ‘OR*1-like’ spacer (see Figure 2AGo) in order to make easier comparison between the sequences selected in the putative contacted regions. The selection conditions were not very stringent and this allowed for the isolation of both high and lower affinity binding sites. The average affinity of the random ligands was estimated to be around 100 and 200 nM (Figure 3A and BGo), while those of the selected ligand populations were at least 100-fold higher for the respective protein with relatively low cross-binding affinity (shown only for RRTRPS in Figure 3C and DGo).



View larger version (87K):
[in this window]
[in a new window]
 
Fig. 3. Selection of binding sites. EMSA performed with 2-fold serial protein dilutions (from right to left) shows binding site enrichment. (A) Unselected N9 library–RRTRPS interaction; (B) unselected N9 library–RRTATG interaction; (C) RRTRPS selected population–RRTRPS interaction; (D) RRTATG selected population–RRTRPS interaction.

 
In the ligand selection for RRTRES, a random DNA pool containing the ACAAGAAAGNNNNNNTTT sequence (N6 pool) was used. Here, higher affinity ligands can generally be expected owing to context effects. First, the OR1-like spacer (GAAAGN) confers high binding affinity for both wild-type and mutant sc 434 repressors, partly owing to the first G residue (Simoncsits et al., 1999Go). Second, operators with NTTT flanking sequence were often isolated in previous selections (Chen et al., 1997Go) and a 3-fold affinity increase compared with OR1 was observed with such an operator when tested with RR69 (Kd {approx} 5 pM, not shown). Correspondingly, the average affinity of the random N6 pool for RRTRES was ~4–8 nM (not shown). Four selection cycles were performed, but selected pools of earlier stages were also used to obtain lower affinity ligands.

DNA-binding specificities of the RTRPS, RTATG and RTRES domains: binding site selection reveals consensus operator regions

The selected DNA pools were cloned by loop insertion mutagenesis (Chen et al., 1997Go). A number of clones were sequenced and their binding affinities for the corresponding protein were determined by EMSA. The results of the selection and affinity studies are summarized in Tables IGo (RTRPS), IIGo (RTATG) and IIIGo (RTRES). When a certain sequence was obtained more then once, the numbers of occurrences are indicated (x). At the bottom part of each table, data obtained with several reference or designed operators are also included.


View this table:
[in this window]
[in a new window]
 
Table I. Sequences selected for RRTRPS and their binding affinities (nM)
 

View this table:
[in this window]
[in a new window]
 
Table II. Sequences selected for RRTATG and their binding affinities (nM)
 

View this table:
[in this window]
[in a new window]
 
Table III. Sequences selected for RRTRES and their binding affinities (pM)
 
Generally, consensus sequences can be found between positions 4' and 1', which region is separated by a 6 bp spacer sequence from the ACAA binding site of the wild-type DBD. This region (underlined in the tables) is likely to be in direct contact with the mutant DBDs. At the right side of this region, A + T-rich sequences were selected (for RTRPS and RTATG), which is in accord with previous selections for the R* domain binding sites (Chen et al., 1997Go). The putative contacted regions generally contain the TT sequence in the 4'3' positions. For the RTRPS domain, TTAC and TTAA sequences (between the 4' and 1' operator positions) were selected most frequently, but several other sequences with high binding affinities were also obtained. Three main groups with TTTA, TTGA and TTGT sequences were obtained for RTATG, while the RTRES selections resulted in two high-affinity groups with TTAC and TTCC sequences. For a better understanding of the selection results, they were complemented by using the following techniques: (i) binding with designed operator sequences to clarify certain context effects, mainly the effect of the base at the operator 5' position on the binding affinity, (ii) by studying the effect of amino acid substitutions of putatively contacting residues on the binding affinities, (iii) methylation protection of DNA to identify amino acid–bp contacts.

Identification of the putatively contacted operator subsites at the 4' to 1' positions and effect of the 5' base on the binding affinity

The major recognition sites of the RTRPS domain are TTAA and TTAC (Table IGo). This domain was originally selected in vivo for the TTAA target of the OR*1 operator (Simoncsits et al., 1999Go). The TTAA containing sequences obtained in this study are higher affinity binding ligands than OR*1 (a12–a18, Table IGo). The other major group (TTAC sequences) also contained high or even higher affinity ligands (a1–a11) and it was also noted that the rarely selected TATC and TAAC containing ligands (a23–a25) were also high binders. While the highest binders in this small group (and in the TTAA group) contained C at the 5' operator position, this residue was not found in the TTAC group. To test the role of this residue, a set of operators containing all four possible bases in the 5' position was compiled for the TTAC, TAAC and TATC sequences by complementing the selected sequences with synthetic ones. In these collections, the other flanking base (downstream of the shown tetramer sequences) or a short flanking region was generally constant. Comparison of the binding affinities showed a general, strong preference for C at the 5' operator position (Table IVGo) in all three groups and the optimal pentamer sequence between the 5' and 1' positions was found to be CTTAC. The binding affinity could further be increased by performing a symmetrical change at the 5 position, i.e. by introducing a G residue next to the operator subsite (ACAA) contacted by the wild-type R domain of RRTRPS (compare OR*1–1'C and O571 in Table IGo).


View this table:
[in this window]
[in a new window]
 
Table IV. Effect of the 5' operator base (bold) on the binding affinity of RRTRPS for the TTAC, TAAC and TATC subsites (underlined)
 
The data in Table IIGo show that the optimal operator subsite of the RTATG domain is among the TTTA, TTGA and TTGT tetramers. The affinities are not very high and a clear preference for any of these sites, even after considering the context effect by using several designed analogs (bottom part of the Table IIGo), could not be observed. The highest affinity was observed with the OcI(1) operator, which was selected for the wild-type domain (Chen et al., 1997Go). It should be noted that the TTGT subsite represents the specificity of the R domain, therefore the TTGT containing operators can form two differently oriented complexes with RRTATG. Thus the affinity data shown in Table IIGo probably overestimate the intrinsic affinity of the RTATG domain for the TTGT subsite. These results together with binding data obtained with several substitution analogs of RRTATG (see below) show that RTATG is a relaxed specificity mutant.

The high-affinity ligands obtained in the RRTRES selection (Table IIIGo) contained the TTAC or the TTCC subsite. The affinities for the TTAC sequences were several-fold higher than those for the TTCC sequences when they were compared in identical sequence contexts (see the c1/c13, c2/c12 and c6/c14 pairs). Several other sequences containing the consensus 4'T and 1'C residues (c15–c22) exhibited substantially lower affinities. The best ligand in this group contained the TACC sequence (c15 in Table IIIGo), but its affinity was still about 30-fold lower than that of the corresponding (with the same flanking bases) TTAC ligand (c5). The affinities varied from 1 to 10 pM in the TTAC group (c1–c11) which again could be due to context effects. Examples in Table IIIGo show that the preferred 5' flanking residue is T or C, whereas the preferred residue at the other side of TTAC is G or A. Similar preference for 5' C was also observed (data not shown) when the operators contained the OR*1-like spacer sequence (see OR*1–1'C in Table IIIGo). Introducing G at the symmetrical 5 position again caused a significant affinity increase (see O571 in Table IIIGo).

Identification of possible amino acid–base pair contacts

To understand the results of the selection and binding affinity studies, we attempted to delineate possible interactions between certain amino acid residues of the mutant DBDs and the selected consensus operator sites. In one approach, the putative DNA-contacting residues were substituted and the effects on the binding affinity for a set of operators were tested. In the other, a footprinting technique was used to identify the contacted base which was protected in the complex against chemical modification. In addition, the data from a previous specificity study (Simoncsits et al., 1999Go) obtained with other protein and operator mutants were also used in the evaluations.

It was suggested previously (Simoncsits et al., 1999Go) that the Arg28 residue in the RTRPS, RTRES and other similar domains could contribute significantly to the binding affinity by forming a contact with the G residue of the 1'C–G base pair of the TTAC subsite. Such a contact is frequently observed in protein–DNA complexes (Seeman et al., 1976Go; Pabo and Sauer, 1992Go; Suzuki, 1994Go; Mandel-Gutfreund et al., 1995Go) and may explain the preferential recognition of TTAC over TTAA by such domains. We used the principle of the ‘loss of contact’ approach (Ebright, 1991Go) and constructed mutants of the RTRPS domain by substituting the Arg28 residue by Ala and by Gly to obtain RRTAPS and RRTGPS. The effects of these substitutions were tested by using a set of operators containing TTAN subsites, where N is A, C, G or T. The results (Table VGo) show that both mutants bound to the TTAC operator with about 10-fold reduced affinity and they also lacked the ability to discriminate the 1' base pairs. At the same time, these interactions were relatively strong, indicating that important contacts may be maintained or become even more pronounced between other, unchanged residues. For example, the Pro29 residue may make hydrophobic contacts with the 3' T–A and 2' A–T base pairs in a manner similar to that proposed for the Tet repressor (Baumeister et al., 1992Go). Support for this assumption can be provided by comparing the data from previous affinity studies (Simoncsits et al., 1999Go) which showed that RRTRPS bound the TTAA and TTAC sequences with higher affinities than its RRTRVS and RRTRSS homologs and that RRTRPS, compared with these homologs, also exhibited a significantly stronger preference for TTAC within the TTNC ligand series. It was also observed in other studies (Y.Lin, J.Gál and A.Simoncsits, unpublished data) that further substitutions of Pro29 of RRTAPS (by Val or Ala) caused a significant affinity decrease for the TTAA ligands (not shown).


View this table:
[in this window]
[in a new window]
 
Table V. Effect of substitutions of Arg28 of RRTRPS on the binding affinities (nM) to OR*1 operator derivatives containing TTAN subsites
 
Similar substitution mutants of the RTATG domain were also constructed. Changing Ala28 to Gly and Thr29 to Gly resulted in RRTGTG and RRTAGG proteins, respectively, which were tested for binding with the b16 and the designed OR*1 derivatives of Table IIGo. It was observed that the A28G substitution caused either a slight (up to 2-fold) increase or no change in the affinities, whereas the T29G mutation caused a general (5–10-fold) decrease. Thus Thr29 plays a certain role in the recognition of the 2'1' residues, but its combination with Ala28 results in relatively low affinity, relaxed specificity binding.

The present study confirms the results of previous specificity studies with the RTRES domain (Simoncsits et al., 1999Go), which suggested that the optimal binding site is TTAC followed by TTCC. We suppose that a contact between Arg28 and the 1'C–G base pair, as suggested for the RTRPS domain above, also exists in the RTRES interactions with both the TTAC and TTCC subsites. The Glu29 residue probably accepts an H-bond from either the 2'A or 2'C residue. This assumption could be supported by using Glu29 substitution analogs. Several mutants of the RTRES domain, RTRPS, RTRVS and RTRSS, were available and binding data obtained with them showed that the Val29 and Ser29 substitutions resulted in 4- and 10-fold reduced affinity for the TTAC subsite and all three substitutions lead to at least a 20-fold affinity decrease for the TTCC subsite (Simoncsits et al., 1999Go). These data indicate the loss of a favourable contact between Glu29 and the 2'A or 2'C residue.

Methylation protection by dimethyl sulfate (DMS) was used to confirm the postulated contact between the G residue of the operator 1'C–G pair and the Arg28 of the RTRPS and RTRES domains. Figure 4Go shows that the extent of this protection in both interactions is comparable to that observed for the 2G residue in the TTGT box, and this G is shown to be in contact with Gln29 of the wild-type domain in the OR1 complex (Aggarwal et al., 1988Go). The major protein–DNA contacts observed in the wild-type complex and proposed for the mutant RTRPS and RTRES domain interactions are shown in Figure 5Go.



View larger version (51K):
[in this window]
[in a new window]
 
Fig. 4. Footprint analysis showing protection of G residues from methylation by DMS. Lane 1, G + A reaction; lanes 2 and 3, RRTRPS complex (0.5 and 5 nM protein); lanes 4 and 5, RRTRES complex (0.5 and 5 nM); lane 6, no protein. Boxed regions indicate subsites for the R (upper) and the mutant DBDs (lower box).

 


View larger version (16K):
[in this window]
[in a new window]
 
Fig. 5. Major amino acid–bp contacts observed in the wild-type DBD–operator complex (A) and proposed for the interactions of RTRPS (B) and RTRES (C, D) domain with their respective operator subsites.

 
Construction of sc proteins containing two mutant DBDs and characterization of their DNA-binding properties

The above-characterized RTRPS, RTATG and RTRES domains were used together with the previously designed R* domain as building blocks to construct several homo- and heterodimeric sc proteins. These proteins were purified and their DNA-binding properties were tested by using a collection of reference, selected or newly designed operators (the latter are labeled with subscript numbers) as shown in Table VIGo. The operator subsites which are either cognate or not to a given domain combination are underlined. These sites are separated by a spacer sequence of 6 bp, the sequence of which was conserved within this group. The affinity data are underlined when they are considered to represent cognate interactions regarding both of the protein domain-operator subsite pairs. The list of these cognate subsites is TTAA and TTTA for the R*, TTAC and TTAA for the RTRPS, TTGT, TTGA and TTTA for the RTATG and TTAC for the RTRES domain (operators containing the TTCC subsite of RTRES are shown in Table VIIGo). Owing to operator symmetry, the corresponding palindromic sequences are shown for the left subsite of the operators.


View this table:
[in this window]
[in a new window]
 
Table VI. Summary of the DNA-binding affinities (nM) of mutant homo- or heterodimeric single-chain 434 repressors
 

View this table:
[in this window]
[in a new window]
 
Table VII. Binding of the RTRESRTRES homodimer to designed operators containing combinations of TTAC and TTCC subsites
 
The data in Table VIGo show that the subsite recognition preferences of the mutant domains, observed in combination with the wild-type R domain, are generally maintained in novel combinations with other mutant domains. The interpretation of the binding affinity data may be complicated in certain cases when the domains of the double mutant protein have overlapping specificities and a common subsite is present in the test operator. For easier data comparison, several data obtained with RRTATG and RRTRPS are also included in Table VIGo. Comparison of data in the RRTATG and RTATGRTATG columns shows qualitatively similar tendencies. It is also shown that RTATGRTATG prefers the operators containing two of the RTATG binding subsites over those which contain only one, but this homodimer does not seem to be capable of high-affinity (subnanomolar) interactions. The RTATG domain can, however, participate in higher affinity interactions in the R*RTATG and RTATGRTRPS heterodimers. The highest affinity ligands for R*RTATG are O528c and O538. The data obtained with R*RTATG also reflect the R* domain preference for TTAA over TTTA (compare O528c and O528b) subsite. The RTATGRTRPS heterodimer bound most strongly to the O550, O568 and O571 operators, which agrees well with the established subsite recognition preferences of both domains and also the previously observed effects of the 5 or 5' operator base on the binding affinity. Comparison of the data for common ligands in the RRTRPS and RTATGRTRPS columns also shows that the RTRPS domain exhibits similar subsite preferences in these two sc molecules. Both the R*RTRPS heterodimer and the RTRPSRTRPS homodimer were capable of high-affinity binding to operators containing the optimal subsites of the corresponding DBDs (O546A for R*RTRPS and O565 for RTRPSRTRPS).

Generally, the sc molecules containing two mutant DBDs exhibited lower affinities than the corresponding heterodimers containing one wild-type DBD. This was also the case when the RTRES domain was combined with R*, but the RTRESRTRES homodimer exhibited very strong binding to some of the test operators in Table VIGo. Several operators containing TTAC and/or TTCC subsites were constructed for RTRESRTRES and the affinities are summarized in Table VIIGo. High-affinity binding was observed with operators containing any combination of these subsites (two TTAC or two TTCC or one TTAC and one TTCC) connected by GAAAGT (as in OR1) or GAAAAN type spacers. The data also show previously observed preferences for 5'C over 5'G in both the TTAC and the TTCC operator groups.

The role of the central, non-contacted operator bases in the high affinity binding has also been demonstrated for sc molecules containing two mutant DBDs (Table VIGo). When the 7' base was changed from A to C, a significant affinity decrease was observed for both hetero- and homodimeric mutants (compare operator pairs O538–O445 and O540–O498 for R*RTATG, R*RTRPS and R*R* interactions).


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Artificial DNA-binding proteins that recognize long sequences and exhibit novel binding specificities have been constructed mainly from the Cys2His2 type of zinc finger motifs by using design and selection principles (Desjarlais and Berg, 1993Go; Choo and Klug, 1994Go; Jamieson et al., 1994Go; Rebar and Pabo, 1994Go; Wu et al., 1995Go; Greisman and Pabo, 1997Go). In a simplified view, the Zn fingers recognize contiguous 3 bp subsites or overlapping 4 bp subsites, therefore a three-finger protein can target 9–10 bp long sequences. Recognition of even longer (continuous or/and discontinuous) sequences could be achieved by linking the Zn fingers to other DBDs (Pomerantz et al., 1995Go; Kim et al., 1997Go) or by linking two three-finger proteins (Liu et al., 1997Go; Kim and Pabo, 1998Go). Other DNA-binding motifs, such as the helix–turn–helix (HTH) motif, have received much less attention.

In this work, we used mutant DBDs of the phage 434 repressor, which belongs to the HTH protein family, as building blocks to construct sc proteins that recognize relatively long (up to 14 bp) DNA sequences with high affinity and specificity. Several sc mutants containing one wild-type and one mutant domain were used to determine the subsite recognition specificities of the mutant DBDs. These sc proteins were shown to recognize the ACAA–6 bp–NNNN general sequence where the NNNN subsite is contacted by and characteristic for the mutant domain. Four characterized mutant DBDs (R*, RTATG, RTRPS and RTRES) were linked in several combinations to obtain homo- and heterodimeric sc proteins, which were shown to recognize the NNNN–6 bp–NNNN general operator sequence. The mutant DBDs exhibited their subsite specificities in all tested novel combinations and the strong preference for the 5' or 5 operator base pair (the outer bases of the 6 bp spacer) was also maintained in the RTRPS and RTRES interactions. The two outer pentamers, a total 10 of the 14 bp sequence are therefore the major determinants of the sequence specificity. Previous studies also showed that the central four or non-contacted, operator base pairs influenced the binding affinities of both the natural (Koudelka et al., 1987Go) and the sc repressors (Chen et al., 1997Go) and the spacers of the affinity selected ligands generally contained either alternating A–T/T–A pairs or runs of at least three A–T pairs in these positions (Chen et al., 1997Go). The sc 434 repressors containing two mutant DBDs seem to share these properties: several such homo- and heterodimeric proteins of this study showed significantly weaker binding when one of the central (7 or 7') operator bases was changed from A to C. Thus, all 14 bp of the operator are important in the interaction with mutant sc repressors. In addition, previous (Chen et al., 1997Go) and the present binding site selection results revealed that the mutant derivatives also prefer the A + T-rich operator flanking regions. DNA recognition by the 434 repressor itself is a result of direct and indirect mechanisms. The double-mutant sc derivatives in this work, similar to the mutant prototype RR*69 (Chen et al., 1997Go), seem to combine the characteristic indirect effects with altered specificity direct readout in their DNA recognition mechanism.

The mutant DBDs, apart from the relaxed specificity RTATG, were able to form high-affinity sc proteins in combination with the wild-type R domain: the RR*69, RRTRPS and RRTRES heterodimers showed half-maximal binding to their corresponding optimal operators at or under 10 pM concentration, which is comparable to the data observed for the ‘wild-type’ RR69. When the mutant DBDs were combined, the affinities of the double-mutant sc derivatives were generally in the range 100–200 pM. These affinities are lower than expected and of the tested combinations, only the homodimeric RTRESRTRES exhibited high-affinity binding (5–20 pM). The wild-type 434 DBD–operator complexes show a network of interactions besides the direct contacts between the {alpha}3 ‘recognition’ helix and the operator subsites and suggest important protein–protein interactions between the DBDs (Aggarwal et al., 1988Go; Rodgers and Harrison, 1993Go; Shimon and Harrison, 1993Go). The mutations introduced into the {alpha}3 helix may differentially influence these interactions, including those at the interface of the DBDs and thereby the cooperativity of DBDs in DNA binding. It is reasonable to suppose that different DBD pairs cooperate to different extents and that the flexibilities of the different test operators also influence the cooperative binding process. Owing to such effects, quantitative interpretations of the binding data are complicated. Nevertheless, this work shows that altered specificity mutant DBDs of the 434 repressor can be combined in the sc arrangement to engineer extended recognition surfaces of expected, novel specificities.


    Notes
 
3 To whom correspondence should be addressed. E-mail: simoncs{at}icgeb.trieste.it Back


    Acknowledgments
 
We thank I.Törö and C.Guarnaccia for help with protein purifications. T.L. and J.C. were supported by fellowships from ICGEB and SISSA, respectively. M.-L.T. and A.S. are on leave from the The Arrhenius Laboratories for Natural Sciences, Stockholm University, Stockholm, Sweden.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Aggarwal,A.K., Rodgers,D.W., Drottar,M., Ptashne,M. and Harrison,S.C. (1988) Science, 242, 899–907.[ISI][Medline]

Baumeister,R., Helbl,V. and Hillen,W. (1992) J. Mol. Biol., 226, 1257–1270.[ISI][Medline]

Chen,J., Pongor,S. and Simoncsits,A. (1997) Nucleic Acids Res., 25, 2047–2054.[Abstract/Free Full Text]

Choo,Y. and Klug,A. (1994) Proc. Natl Acad. Sci. USA, 91, 11168–72.[Abstract/Free Full Text]

Desjarlais,J.R. and Berg,J.M. (1993) Proc. Natl Acad. Sci. USA, 90, 2256–2260.[Abstract]

Ebright,R.H. (1991) Methods Enzymol., 208, 620–640.[Medline]

Gates,C.M., Stemmer,W.P.C., Kaptein,R. and Schatz,P.J. (1996) J. Mol. Biol., 255, 373–386.[ISI][Medline]

Gill,S.C. and von Hippel,P.H. (1989) Anal. Biochem., 182, 319–326.[ISI][Medline]

Greisman,H.A. and Pabo,C.O. (1997) Science, 275, 657–661.[Abstract/Free Full Text]

Hollis,M., Valenzuela,D., Pioli,D., Wharton,R. and Ptashne,M. (1988) Proc. Natl Acad. Sci. USA, 85, 5834–5838.[Abstract]

Jamieson,A.C., Kim,S.H. and Wells,J.A. (1994) Biochemistry, 33, 5689–5695.[ISI][Medline]

Jana,R., Hazbun,T.R., Fields,J.D. and Mossing,M.C. (1998) Biochemistry, 37, 6446–6455.[ISI][Medline]

Kim,J.-S. and Pabo,C.O. (1998) Proc. Natl Acad. Sci. USA, 95, 2812–2817.[Abstract/Free Full Text]

Kim,J.-S., Kim,J., Cepek,K.L., Sharp,P.A. and Pabo,C.O. (1997) Proc. Natl Acad. Sci. USA, 94, 3616–3620.[Abstract/Free Full Text]

Koudelka,G.B., Harrison,S.C. and Ptashne,M. (1987) Nature, 326, 886–888.[ISI][Medline]

Kuntz,M.A. and Shapiro,D.J. (1997) J. Biol. Chem., 272, 27949–27956.[Abstract/Free Full Text]

Liang,H., Sandberg,W.S. and Terwilliger,T.C. (1993) Proc. Natl Acad. Sci. USA, 90, 7010–7014.[Abstract]

Liu,Q., Segal,D.J., Ghiara,J.B. and Barbas,C.F.,III. (1997) Proc. Natl Acad. Sci. USA, 94, 5525–5530.[Abstract/Free Full Text]

Mandel-Gutfreund,Y., Schueler,O. and Margalit,H. (1995) J. Mol. Biol., 253, 370–382.[ISI][Medline]

Pabo,C.O. and Sauer,R.T. (1992) Annu. Rev. Biochem., 61, 1053–1095.[ISI][Medline]

Percipalle,P., Simoncsits,A., Zakhariev,S., Guarnaccia,C., Sanchez,R. and Pongor,S. (1995) EMBO J., 14, 3200–3205.[Abstract]

Pomerantz,J.L., Sharp,P.A. and Pabo,C.O. (1995) Science, 267, 93–96.[ISI][Medline]

Rebar,E.J. and Pabo,C.O. (1994) Science, 263, 671–673.[ISI][Medline]

Rhodes,D. and Fairall,L. (1997) In Creighton,T.E. (ed.), Protein Function: a Practical Approach. 2nd edn. IRL Press, Oxford, pp. 215–244.

Robinson,C.R. and Sauer,R.T. (1996a) Biochemistry, 35, 109–116.[ISI][Medline]

Robinson,C.R. and Sauer,R.T. (1996b) Biochemistry, 35, 13878–13884.[ISI][Medline]

Robinson,C.R. and Sauer,R.T. (1998) Proc. Natl Acad. Sci. USA, 95, 5929–5934.[Abstract/Free Full Text]

Rodgers,D.W. and Harrison,S.C. (1993) Structure, 1, 227–240.[ISI][Medline]

Ruiz-Sanz,J., Simoncsits,A., Törö,I., Pongor,S., Mateo,P.L. and Filimonov,V.V. (1999) Eur. J. Biochem., 263, 246–253.[Abstract/Free Full Text]

Seeman,N.C., Rosenberg,J.M. and Rich,A. (1976) Proc. Natl Acad. Sci. USA, 73, 804–808.[Abstract]

Shimon,L.J. and Harrison,S.C. (1993) J. Mol. Biol., 232, 826–838.[ISI][Medline]

Sieber,M. and Alleman,R.K. (1998) Nucleic Acids Res., 26, 1408–1413.[Abstract/Free Full Text]

Simoncsits,A., Chen,J., Percipalle,P., Wang,S., Törö,I. and Pongor,S. (1997) J. Mol. Biol., 267, 118–131.[ISI][Medline]

Simoncsits,A., Tjörnhammar,M.-L., Wang,S. and Pongor,S. (1999) Nucleic Acids Res., 27, 3474–3480.[Abstract/Free Full Text]

Suzuki,M. (1994) Structure, 2, 317–326.[ISI][Medline]

Wharton,R.P. and Ptashne,M. (1985) Nature, 316, 601–605.[ISI][Medline]

Wu,H., Yang,W.P. and Barbas,C.F.,III. (1995) Proc. Natl Acad. Sci. USA, 92, 344–348.[Abstract]

Received June 8, 2000; revised April 17, 2001; accepted May 14, 2001.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (1)
Request Permissions
Google Scholar
Articles by Liang, T.
Articles by Simoncsits, A.
PubMed
PubMed Citation
Articles by Liang, T.
Articles by Simoncsits, A.