Identification of Oligomerizing Peptides*

Anjali Dhiman, Michael E. Rodgers, and Robert SchleifDagger

From the Biology Department, Johns Hopkins University, Baltimore, Maryland 21218

Received for publication, March 13, 2001

    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

The AraC DNA binding domain is inactive in a monomeric form but can activate transcription from the arabinose operon promoters upon its dimerization. We used this property to identify plasmids encoding peptide additions to the AraC DNA binding domain that could dimerize the domain. We generated a high diversity library of plasmids by inserting 90-base oligonucleotides of random sequence ahead of DNA coding for the AraC DNA binding domain in an expression vector, transforming, and selecting colonies containing functional oligomeric peptide-AraC DNA binding domain chimeric proteins by their growth on minimal arabinose medium. Six of seven Ara+ candidates were partially characterized, and one was purified. Equilibrium analytical centrifugation experiments showed that it dimerizes with a dissociation constant of ~2 µM.

    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Coiled-coils about 40 residues long seem to be the smallest naturally occurring oligomerization domains yet found in proteins. Does the absence of smaller oligomerization elements mean that none are possible or that natural selective pressures for such elements have not existed? If very strong selective pressures were exerted, might new and unique structural motifs be found? And if they were, would the resulting information prove useful in understanding protein structure and for protein engineering?

The identification and genetic selection of oligomerizing elements is possible with systems where the monomeric form of a protein or domain of a protein is inactive but the dimeric or oligomeric form is active. Transcription regulators are particularly attractive for this application. Often, binding of their monomeric form to DNA is weak, and their binding at physiological concentrations of the protein requires dimeric protein and corresponding repeated DNA sites. These systems also have the virtue that they can be adjusted to select for strong or weak dimerizing abilities of the monomers. If the affinity of the monomeric form of the protein for its DNA site is relatively high, active protein need not be a dimer in solution; i.e. occasionally, monomers independently bind to the DNA and dimerize there, utilizing the additional interaction energy between the two protein monomers. The lifetime on the DNA of the dimer thus formed is significantly longer than the lifetimes of individual monomers on the DNA. On the other hand, if the DNA binding affinity of the monomers is relatively low, then the interaction energy between the monomers of the active protein must be higher, and the protein may well exist in solution at physiological concentrations as a dimer.

We have utilized the arabinose system of Escherichia coli for the selection of dimerizing peptides (Fig. 1). Monomers of the DNA binding domain of AraC are inactive in stimulating transcription of the genes required for the catabolism of L-arabinose, but attaching a 49-amino acid coiled-coil from C/EBP1 converts the AraC DNA binding domain to a fully active form (1). We sought to isolate 30 amino acid peptides that dimerize at physiological concentrations. Therefore, in the C/EBP coiled-coil-AraC DNA binding domain construct, we replaced the region coding for the coiled-coil with 90 bases of DNA of random sequence and selected for products capable of activating transcription of the arabinose genes. While this work was in progress, three reports of similar selections have appeared. Two of these utilized lambda  phage repressor (2, 3) and report the genetic characterization of candidate elements encoded by fragments of cloned DNA. A third (4) fused random sequence DNA coding for 15 amino acids to DNA coding for zinc finger domains, selected, and after further improvement in dimerization, examined dimerization with centrifugation experiments. In our work, candidate peptides with lengths from 6 to 32 residues were identified. We purified the chimeric peptide-AraC DNA binding domain product from a candidate with a 22-amino acid dimerizing peptide and found that it dimerized with a dissociation constant in the micromolar range.


View larger version (21K):
[in this window]
[in a new window]
 
Fig. 1.   Scheme for selecting dimerizing peptides. The AraC DNA binding domain was fused to a 30-residue peptide of random sequence to generate the peptide-AraC DNA binding domain protein chimera.


    EXPERIMENTAL PROCEDURES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

General Methods-- The oligonucleotides corresponding to peptides of random sequence were cloned into the NcoI and BamHI sites of a previously synthesized (1) derivative of pSE380 (Invitrogen, San Diego, CA), pGBO10, that contained residues 169-292 of the AraC DNA binding domain and transformed into SH321 (Delta ara-leu1022 Delta lac74 galK strr thi1) (5). DNA inserted in the NcoI and BamHI region is transcribed under control of the lac regulatory system. Previously, a stop codon was inserted at the end of the AraC DNA binding domain region of pGBO10 (1), and DNA was prepared by using cesium chloride as described by Schleif and Wensink (6). Another pSE380-derived vector, pGBOO7 (1), contained an in-frame coding region for the AraC DNA binding domain adjacent to the lac promoter of pSE380 and was used as the AraC DNA binding domain control construct.

Arabinose isomerase levels were assayed as described (6). Cells were grown to an A600 of 0.5-0.8 in M10 minimal salts, 0.2% L-arabinose, 20 µg/ml leucine, 10 µg/ml thiamine, 20 µM CaCl2, and 10 µM MgCl2, and 1 ml was withdrawn. All manipulations at the DNA level were done by conventional molecular biology techniques. All candidates were sequenced using the SequiTherm EXCELTM II DNA sequencing kit from Epicentre Technologies.

Construction, Isolation, and Characterization of Ara+ Candidates-- The sequence of the dimerization domain of the full-length leucine zipper-AraC DNA binding domain fusion construct is MAKQRNVETQQKVELTSDNDRLRKRVEQLSRELDTLRGIFRQLPESSL (the underlined sequence is of the dimerization domain of the minimal leucine zipper-AraC DNA binding domain). The amino acid sequence of the AraC linker region is ESLHPPMDNRV.

For the minimal leucine zipper-AraC DNA binding domain construct, an oligonucleotide, R1, with the following sequence was synthesized: CAGGAAACAGACCATGGAGTTGACCAGTGACAATGACCGCCTGCGCAAGCGGGTGGAACAGCTGAGCCGTGAACTGGACACGCTGCGGGGTATCTTCCGCCAGCTGGGATCCGAGTCGCTCCAT (the NcoI and BamHI restriction sites are underlined). For generating the high diversity library of peptide-AraC DNA binding domain construct, a DNA oligonucleotide (R2) with the sequence CAGGAAACAGACCATGGAG(NNK)30GGATCCGAGTCGCTCCAT (the NcoI and BamHI cloning sites are underlined, and the randomized region contains 30 repeats of NNK, where N represents A/T/G/C and K represents G/T) was synthesized at the 200 nM level by Integrated DNA Technologies Co. An oligonucleotide (R3), complementary to the 3' end of R1 and R2, ATGGAGCGACTCGGATCC, was also used.

For synthesis of the double-stranded insert, equimolar amounts of either R1 or R2 were mixed with R3 in buffer containing 50 mM KCl, 20 mM Tris-Cl (pH 8.3), 1.5 mM MgCl2, 0.01% gelatin, and 0.2 mM each dNTP. 0.2 units of Taq polymerase was added, and annealing and extension were performed with the following cycling parameters: 95 °C for 5 min followed by a 1 °C per 45 s drop in temperature to 25 °C and a final extension at 75 °C for 5 min. The double-stranded nature of the oligonucleotide was verified by polyacrylamide gel electrophoresis.

The double-stranded oligonucleotide of random sequence was treated with 5 µg/ml proteinase K in 0.01 Tris-Cl (pH 7.8), 5 mM EDTA, and 0.5% SDS at 56 °C for 30 min. The sample was extracted with an equal volume of phenol followed by ethanol precipitation and digested with BamHI and NcoI endonucleases and electrophoresis on an 0.8% agarose gel. The doubly digested fragment was purified from the agarose gel using the Qiagen gel extraction kit and cloned into the NcoI and BamHI sites of pGBO10 and electroporated into SH321 host cells to obtain the randomized peptide-AraC DNA binding domain library.

Cells containing the randomized peptide library were plated on minimal salts, 2 g/liter L-arabinose, 20 µg/ml leucine, 10 µg/ml thiamine, 20 µM CaCl2, 10 µM MgCl2 and incubated at 37 °C for up to 3 days. Colonies that grew on minimal arabinose medium were isolated, and plasmid DNA was extracted and sequenced. Transcriptional activation by each isolated fusion protein was quantitated by assaying the level of arabinose isomerase produced from the chromosomal copy of the pBAD operon in SH321.

DNA Migration Retardation Assays-- DNA migration retardation assays were performed as described previously (7). End-labeled I1I2 DNA template was prepared by mixing equimolar amounts of CTTTGCTAGCCCATAGCATTTTTATCCATAAGATTAGCGGATCCTACCTGACGCTTTTTATCGC (the two ara half-sites are underlined) and the complementary primer, GCGATAAAAAGCGTCAGGTAGGATCCGCTAATCTTATGGATAAAAATGCTATGGGCTAGCAAAG, in 10 mM Tris-Cl (pH 8.0), 1 mM EDTA (pH 8.0), 5 mM MgCl2, and 50 mM KCl, heating for 10 min at 94 °C, and cooling slowly to room temperature over the course of 1 h.

Crude cell lysates for each Ara+ candidate were prepared from 1.5 ml of exponentially growing cells at an A600 of 0.7 in YT medium (6). After centrifugation, the cell pellet was resuspended in 0.3 ml of 100 mM KH2PO4 (pH 7.4), 50 mM KCl, 10% glycerol, 1 mM K-EDTA, 1 mM dithiothreitol, 1 mM phenylmethylsulfonyl fluoride and lysed by sonication. The sample was centrifuged at 8500 × g for 10 min at 4 °C, the pellet was discarded, and 60 µl of 100% glycerol was added to the supernatant. The cell lysate was stored at -70 °C and was used for ~3 weeks. For the binding reaction, protein lysate was added so that just 100% of 1 ng (~104 cpm) of end-labeled DNA was bound. Binding of the Ara+ candidate was allowed to proceed at 37 °C in 10 mM Tris acetate (pH 7.4), 1 mM EDTA, 50 mM KCl, 1 mM dithiothreitol, 5% glycerol for 10 min, after which 1.5 µg of calf thymus competitor DNA was added for another 10 min. Bound and free DNA were separated on a nondenaturing 6% polyacrylamide gel cross-linked with 0.1% methylene-bisacrylamide.

Protein Purification-- The hexahistidine tag was introduced into candidate 1 using the QuikChangeTM protocol of Stratagene. Purification of the His6 candidate 1 was performed with the QiaexpressionistTM protocol of Qiagen. One liter of cell culture was grown in YT medium to an A600 of 0.4 at 37 °C, and protein expression was induced for 5 h with 1 mM isopropyl-1-thio-beta -D-galactopyranoside. Cells were harvested by centrifugation at 7000 × g at 4 °C for 10 min. The cells were lysed on ice by grinding one weight of cells with 2.5 weights of levigated alumina to a smooth consistency for 5 min. The lysate was then incubated with lysis buffer containing 15 mM Tris-Cl (pH 8.0), 100 mM NaCl, 5% glycerol, 1 mM beta -mercaptoethanol, 1 mM phenylmethylsulfonyl fluoride, 10 µg/ml RNase, 5 µg/ml DNase on ice for 30 min. After centrifugation at 3000 × g for 10 min, the supernatant was incubated with 4 ml of preequilibrated Ni2+-nitrilotriacetic acid-Sepharose beads with gentle rocking for 6-12 h at 4 °C. The sample was spun at a low speed (<1000 × g), and the Sepharose beads were washed 10 times with at least 10 volumes each of lysis buffer containing 10 mM imidazole. The purified protein was eluted from the beads with a gradient of imidazole, and 0.5-ml aliquots were collected. The Bradford assay (8) was performed to determine protein concentration of each eluate; fractions with the highest protein concentrations were pooled and analyzed for purity on a 14% SDS-polyacrylamide gel.

Equilibrium Centrifugation-- An aliquot of 0.1 mg/ml purified candidate 1 was dialyzed against 0.1 M NaCl, 15 mM Tris-Cl (pH 8.0), 1 mM K-EDTA (pH 7.0), 5% glycerol. Three dilutions of the sample were loaded into standard double sector cells with charcoal-filled epon centerpieces and quartz windows. On the sample side of the cell, 112 µl of sample was loaded with 12 µl of fluoro carbon-43. On the solvent side, 125 µl of the buffer used for dialysis of the sample was loaded. The samples were centrifuged in a Beckman XL-I analytical ultracentrifuge at 20 °C at 34,000, 41,000, 48,000, and 54,000 revolutions/minute. At each speed, equilibrium was assumed when successive scans taken 3 h apart were unchanged. Since absorbance readings at 280 nm gave very low values, data were collected at a wavelength of 230 nm. The partial specific volume of a monomer, <A><AC>v</AC><AC>&cjs1171;</AC></A>, of the protein was calculated as 0.7236 ml/g, and the density of the solvent, rho , was estimated as 1.01728 g/ml using the program SEDNTERP (9). M1, the molecular weight of the monomer, was calculated from sigma , the reduced molecular weight, and omega , the rotor speed in radians per second, was calculated according to M1 = sigma (RT)/((1 - µrho )omega 2). The molar extinction coefficient at 230 nm, epsilon 230, was assumed to be 50,880 M-1 cm-1. The Kd value is linearly dependent on epsilon 230, and the molecular weight value is insensitive to epsilon 230.

All data sets were simultaneously analyzed using the global nonlinear least squares program, NONLIN (10). While the variance for the single species fit was adequate (variance of fit = 2.31 × 10-5), an improved variance (2.13 × 10-5) of fit was obtained when the centrifugation data were fitted to a monomer-dimer model. The distribution of residuals for the monomer dimer fit was scattered uniformly. The estimated monomer molecular mass was found to be 17.9 kDa (16.2, 18.4). This value is not dependent on the wavelength used to collect data sets as long as oligomerization does not alter epsilon . The estimation of Kd in traditional concentration units is more difficult, since it requires knowledge of the extinction coefficient. A close approximation to epsilon 280 is readily calculated from the amino acid composition (11). For candidate 1, it is 8480 M-1 cm-1. epsilon 230 can be estimated from this value in combination with an experimentally measured A230/A280 ratio. Due to low concentrations, we were unable to obtain a sufficiently accurate measure of this ratio. Thus, we assumed epsilon 230 = 6 × epsilon 280, an average value measured on several other proteins that were recently studied here in the analytical ultracentrifuge. Since Ka is linearly dependent on E for a monomer-dimer system, the accuracy of Ka is likely to be within a factor of 2 of the value reported here. The best fit Ka in absorbance units was found to be 49 OD-1.

    RESULTS
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Development of Selection for Oligomerization-- Appending the 49-amino acid leucine zipper coiled-coil from C/EBP to the DNA binding domain of AraC yields an active protein (1). In this work, we sought to identify peptide sequences of 30 amino acids or fewer that would similarly dimerize the DNA binding domain of AraC. To optimize the genetic engineering and genetic selection steps, we first worked with DNA coding for the central 30 amino acids of the C/EBP coiled-coil. We found that this peptide mediated sufficient dimerization of the DNA binding domain of AraC so that Ara+ colonies containing the chimeric protein could be selected as shown in Table I. Transformants containing the truncated leucine zipper-AraC DNA binding domain fusion construct showed growth rates comparable with that of the full-length C/EBP leucine zipper. Presumably, however, the truncated leucine zipper dimerized poorly because it, in contrast to the full-length zipper protein, showed no DNA binding activity in cell extracts.

                              
View this table:
[in this window]
[in a new window]
 
Table I
Characteristics of proteins used as controls

To identify new oligomeric peptides, we replaced the truncated C/EBP leucine zipper with a 30-residue peptide of random sequence. We ligated a DNA fragment containing 90 bases of nearly random sequence into plasmid DNA containing a copy of the AraC DNA binding domain. If all four nucleotides had been incorporated at random into the 90 bases, the probability that any codon is a chain terminator would have been 3/64. By using DNA where each third nucleotide was only G or T, this probability was reduced somewhat, to 1/32. The plasmids resulting from the construction were electroporated into AraC- cells, and Ara+ colonies were selected by growth on minimal arabinose medium. From an estimated 10,000 transformants, we found seven Ara+ candidates (Table II). The AraC DNA binding domain activates pBAD only about 5% as well as wild type AraC, and cells containing the domain instead of wild type AraC cannot grow into colonies on minimal arabinose plates in 5 days' time. All of the candidates we found grow into colonies in 3 days or less.

                              
View this table:
[in this window]
[in a new window]
 
Table II
Characteristics of the Ara+ candidates

Sequence of Candidates-- The DNA from the Ara+ candidates was sequenced, providing the amino acid sequences of the presumed dimerizing peptides (Table III). A number of the candidates contain nonsense codons within the 30-residue peptide region. Since they express the DNA binding domain of AraC, we expect that translation reinitiates at a start codon close to the stop codon (12). We tested this idea on candidate 1 by deleting the region upstream of the nonsense codon and presumptive start codon (Fig. 2). The resulting construct was as active as the original in activating transcription from pBAD. We attribute the presence of peptides longer than 30 residues to additional nucleotides that were mistakenly incorporated during the chemical synthesis of the DNA oligomer. Possibly, such oligonucleotides were present at low levels in the synthesized DNA, but the selection method enriches for longer elements.

                              
View this table:
[in this window]
[in a new window]
 
Table III
Amino acid sequence of the inserted peptides of the Ara+ candidates


View larger version (8K):
[in this window]
[in a new window]
 
Fig. 2.   Sequence of candidate 1. The codons encoded by the sequence if the original or new translational start site is used is indicated by bars above or below the sequence, respectively.

If the majority of the candidate oligomerizing peptides utilized a unique structure, we might expect to find an unusual distribution of amino acid content. Fig. 3 shows the amino acid composition expected from our "random" oligonucleotides and the amino acid composition found, excluding the initiating methionines. Ala, Phe, Met, and Asn are overrepresented, and Pro and Thr are underrepresented.


View larger version (21K):
[in this window]
[in a new window]
 
Fig. 3.   Comparison of observed (dark shaded bar) and expected (lighter bar) frequencies of occurrence of amino acid residues in the peptide regions of the Ara+ candidates. The translation initiating methionine was not included in the analysis.

Overexpression of Candidates-- The peptide-AraC DNA binding domain fusion proteins should be highly expressed, because the synthesis of the chimeric proteins is under the control of a strong promoter and the coding region is preceded by a strong ribosome binding site. To assess protease sensitivity and stability of the peptide-AraC DNA binding domain constructs, we examined the levels of the fusion protein expression of the candidates in crude cell extracts. Candidate 3 showed overexpression of three truncated protein fragments instead of the expected 18-kDa protein. This suggests possible cleavage by a cellular protease. Except for the first candidate after the addition of six histidines, none of the other candidates showed overexpression of a stable peptide-AraC DNA binding domain protein.

DNA Binding of the Chimeras-- Although none of the Ara+ candidates dramatically overexpressed stable protein, it was possible to determine their DNA binding activity in cell extracts, because this assay requires only very small amounts of protein. This assay also allows estimation of the molecular weight of the DNA-binding protein and, hence, detection of several of the more likely artifacts that could masquerade as dimerizing peptides. Using whole cell extracts, we examined DNA binding for all candidates, except candidate 7, to the I1I2 DNA template (Fig. 4). No DNA binding activity was observed for the AraC DNA binding domain itself. The truncated leucine zipper-AraC DNA binding domain protein induces transcriptional activation to 90% of wild type AraC levels in vivo but does not bind to the DNA in vitro. Candidate 2 also showed no binding to DNA, whereas the five other Ara+ candidates bound stably to DNA in vitro.


View larger version (45K):
[in this window]
[in a new window]
 
Fig. 4.   In vitro DNA migration retardation assays to assess binding by the Ara+ candidates to DNA containing the AraC I1I2 binding site. End-labeled 32P-labeled DNA was bound to 3 µl of protein lysates. Control proteins used were the 60-kDa full-length dimeric AraC protein and the 39-kDa full-length C/EBP leucine zipper-AraC DNA binding domain fusion protein.

In theory, a short peptide could oligomerize the AraC DNA binding domains by fortuitous association with an oligomeric protein. Such a possibility appears likely, since the interaction of two short peptides to form a stable oligomer seems difficult, whereas peptide-domain interactions are not infrequent (13). This association would create a high molecular weight protein complex that would cause an anomalously large shift of the DNA template in DNA migration retardation assays. Comparison of the DNA retardation rates of the candidates with the DNA retardation rates of other control proteins indicated, however, that none are associated with auxiliary proteins (Fig. 4).

Purification and Sedimentation-- We chose candidates 1 and 2 for further analysis and inserted hexahistidine tags at their N termini for Ni2+-His6 affinity purification. The first candidate then showed overexpression of a full-length protein. Apparently, the hexahistidine tag increased its overall stability, and we purified this candidate for further analysis. The protein behaved poorly, and only 20% of the total overexpressed protein was soluble. We were, however, able to obtain 0.2 mg of >95% pure peptide-AraC DNA binding domain protein from 1 liter of cell culture. The purified protein retained the same DNA binding activity as the unpurified and untagged protein.

Up to this point, no direct evidence of dimerization or oligomerization has been presented. As another alternative to direct dimerization, it is possible that peptide-monomers could bind to DNA independently but much more tightly than the AraC DNA binding domain alone. A simple test of this possibility is to decrease progressively the concentration of protein in a binding assay. At some concentration, DNA with a single bound monomer would then be observed. Experiments with purified candidate 1 (Fig. 5) showed no evidence for such DNA-monomer intermediates. Similar experiments with candidate 5 in crude extracts also showed no evidence for monomer binding (data not shown). We therefore conclude that at least these two proteins dimerize.


View larger version (30K):
[in this window]
[in a new window]
 
Fig. 5.   DNA migration retardation experiments to resolve monomeric versus dimeric binding by titration of purified candidate 1. Binding by 2-fold dilutions was measured starting from 0.3 µM. Control proteins used were the 60-kDa full-length dimeric AraC protein and the 39-kDa full-length C/EBP leucine zipper-AraC DNA binding domain fusion protein.

We performed sedimentation equilibrium experiments to examine the strength and nature of oligomerization of the purified protein. The best fit of the sedimentation data to a single ideal nonassociating model yields a predicted molecular mass of 29 kDa, much different from the 17.5-kDa molecular mass derived from the protein sequence. Fitting the data from three protein concentrations and four different centrifugation speeds to a monomer-dimer equilibrium (Fig. 6) not only produced a good fit; it also yielded a predicted monomer molecular mass of 17.9 ± 1.1 kDa, very close to the molecular mass predicted from the protein sequence. The dimerization equilibrium constant was 1.8 with a 67% probability of lying in the interval 0.8-3.4 µM. We did not try fitting to other models, because the monomer-dimer model gave an excellent fit to the data.


View larger version (10K):
[in this window]
[in a new window]
 
Fig. 6.   Sedimentation equilibrium data and the monomer-dimer fit for the 48,000 rpm run at 6 µM, 3 µM, and 1.5 µM concentrations of the protein.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

We replaced the dimerization domain of AraC with peptides of random sequence, and from the resulting library of randomized peptide-AraC DNA binding domain fusions we identified seven peptides of 32 residues or shorter that can confer activation from pBAD. One peptide could not be studied, but the remaining six appear to oligomerize.

In principle, the selection technique requiring growth on minimal arabinose could yield five different classes of peptides: 1) peptides that themselves self-associate to form homodimers or other higher order structures; 2) peptides that bind to the DNA binding domains, but for steric reasons the arms cannot bind cis, and thus cross-binding between two chimeric molecules generates a dimer; 3) peptides that bind other cellular structures that provide an oligomeric framework; 4) peptides that bind nonspecifically to DNA; and 5) peptides that stabilize the normally rather unstable DNA binding domain of AraC. Structurally, the existence of class two candidates appears to be incompatible with binding to direct repeat DNA half-sites. None of the candidates seems to be in class three, since the migration retardation assays showed no evidence of anomalously high retardation, and two of the candidates, candidates 1 and 5, were shown not to be in class four or five, since the possibility of monomeric binding was eliminated by titration experiments. Except for one candidate, the peptide-AraC chimeras found in this work all bound stably in vitro to DNA containing the AraC binding site consisting of the I1 half-site, to which AraC binds tightly, and the I2 half-site, to which AraC binds weakly. Comparisons of the DNA migration retardation rates with other control proteins confirmed that no higher order oligomeric structures other than the dimeric species were formed.

We purified one of our dimerizing domain-AraC domain proteins. Although it was only sparingly soluble, it was possible to obtain equilibrium sedimentation data that showed it to dimerize rather tightly, with a Kd of 1.8 µM. In the selection experiments reported by Wang and Pabo (4), 15-mer dimerizing peptides were isolated. The fusion proteins exist as monomers at solution concentrations up to 50-100 µM in the absence of DNA but dimerize at a concentration of 2.5 nM in the presence of DNA. Probably, our selection for dimerization was more stringent than that used by Wang and Pabo, because we used the relatively weak binding site I1-I2 for the in vivo selections, and hence the candidates were required to dimerize tightly.

The natural interdomain linker of AraC was retained in our constructs. It is thus possible that these 8 amino acids interact with the 22 amino acids of the added peptide and contribute to the dimerization of the chimeric protein. Direct experiments with peptide should resolve this issue. The occurrences of Ala, Phe, Met, Asn, Pro, and Thr in the collection of oligomerizing peptides we found were greater than random. The probabilities that these particular amino acids would individually have deviated as much or more than what we found were 3.5, 1.02, 0.37, 5.6, 0.27, and 3%, respectively. These numbers are derived by noting that since there were 151 total amino acids, excluding the initiating methionines, in the peptides, and 31 possible codons, the probability that the set contains n or more alanines, for which there are two possible codons in the 31, is as follows.
<LIM><OP>∑</OP><LL>n</LL><UL>151</UL></LIM><FR><NU>151!</NU><DE>n!(151−n)!</DE></FR>(29/31)<SUP>151<UP> − </UP>n</SUP>(2/31)<SUP>n</SUP> (Eq. 1)
Since the abnormal distribution extends across most of the peptides and because a number of amino acids are involved, the distribution probably reflects some general principles relevant to the association of short amino acid sequences. The actual sequences do not obviously reveal their secondary or tertiary structures. None looks like a leucine zipper coiled-coil, although the program PredictProtein (14-16) does predict that 16 contiguous residues out of 22 in the peptide of candidate 1 form an alpha -helix. Clearly, additional studies on both the selection method and the peptides that are found should yield much interesting information.

    ACKNOWLEDGEMENT

We thank Ula Gryczynski for assistance.

    FOOTNOTES

* This work was supported by National Institutes of Health Grant GM18277 and National Science Foundation Grant DBI-9871456.The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Dagger To whom correspondence should be addressed: Biology Dept., Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218. Tel.: 410-516-5206; Fax: 410-516-5213; E-mail: bob@gene.bio.jhu.edu.

Published, JBC Papers in Press, March 26, 2001, DOI 10.1074/jbc.M102220200

    ABBREVIATIONS

The abbreviation used is: C/EBP, CCAAT/enhancer-binding protein.

    REFERENCES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

1. Bustos, S. A., and Schleif, R. F. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 5638-5642[Abstract]
2. Zhang, Z., Murphy, A., Hu, J. C., and Kodadek, T. (1999) Curr. Biol. 9, 417-420[CrossRef][Medline] [Order article via Infotrieve]
3. Japelli, R., and Brenner, S. (1999) Biochem. Biophys. Res. Commun. 266, 243-247[CrossRef][Medline] [Order article via Infotrieve]
4. Wang, B. S., and Pabo, C. O. (1999) Proc. Natl. Acad. Sci. U. S. A. 96, 9568-9573[Abstract/Free Full Text]
5. Hahn, S., Dunn, T., and Schleif, R. F. (1984) J. Mol. Biol. 180, 60-72
6. Schleif, R. F., and Wensink, P. (1981) Practical Methods in Molecular Biology , Springer-Verlag, New York
7. Hendrickson, W., and Schleif, R. F. (1984) J. Mol. Biol. 178, 611-628[Medline] [Order article via Infotrieve]
8. Bradford, M. (1976) Anal. Biochem. 72, 248-254[CrossRef][Medline] [Order article via Infotrieve]
9. Laue, T. M., Shah, B. D., Ridgeway, T. M., and Pelletier, S. L. (1992) in Analytical Ultracentrifugation in Biochemistry and Polymer Science (Harding, S. , and Rowe, A., eds) , Royal Science of Chemistry
10. Johnson, M. L., Correia, J. J., Yphantis, D. A., and Halvorson, H. R. (1981) Biophys. J. 36, 575-588[Abstract]
11. Pace, C. N., Vajdos, F., Fee, L., Grimsley, G., and Gray, T. (1995) Protein Sci. 4, 2411-2423[Abstract/Free Full Text]
12. Files, J. G., Weber, K., and Miller, J. H. (1974) Proc. Natl. Acad. Sci. U. S. A. 71, 667-670[Abstract]
13. Schleif, R. F. (1999) Proteins 34, 1-3[CrossRef][Medline] [Order article via Infotrieve]
14. Rost, B., and Sander, C. (1993) J. Mol. Biol. 232, 584-599[CrossRef][Medline] [Order article via Infotrieve]
15. Rost, B., and Sander, C. (1994) Proteins 20, 216-226[Medline] [Order article via Infotrieve]
16. Rost, B. (1996) Methods Enzymol. 266, 525-539[CrossRef][Medline] [Order article via Infotrieve]


Copyright © 2001 by The American Society for Biochemistry and Molecular Biology, Inc.