1Genencor International, 925 Page Mill Road, Palo Alto, CA 94304, USA and 2Genencor International BV, Leiden, The Netherlands
3 To whom correspondence should be addressed. E-mail: vschellenberger{at}genencor.com
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: combinatorial mutagenesis/consensus mutation/lactamase/proteolysis/thermostability
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In nature, protein families have developed as a result of the continuous process of random mutagenesis, recombination and selection, which tends to eliminate most destabilizing mutations (Kimura, 1991). As a result, residues that stabilize a protein tend to be more prevalent than other amino acids at any given position in a protein family. It has been shown for several proteins that consensus mutations, which replace a particular amino acid of a specific protein with the most common amino acid that is present in that position among the family members, frequently lead to stabilized protein variants (Steipe et al., 1994
). A particularly striking example is the recently described synthesis of a phytase gene that was based on the consensus sequence of 13 homologous phytase sequences (Lehmann et al., 2000
). This consensus gene differed in 20% of the encoded amino acids from the closest natural phytase gene and the encoded protein was more stable towards thermal denaturation than any of its parent proteins. However, in a subsequent study, the same group showed that only a fraction of the consensus mutations contributed to protein stability and that several of the consensus mutations actually destabilized the protein (Lehmann et al., 2002
). These results suggest that some consensus mutations should be avoided. Therefore, we developed a method that can rapidly identify stabilizing consensus mutations for a protein of interest. In this paper, we demonstrate that combinatorial consensus mutagenesis (CCM) allows the rapid introduction of multiple stabilizing consensus mutations into a protein while avoiding the introduction of destabilizing mutations. We have applied the process to ß-lactamase from Enterobacter cloacae and showed that one-third of all variants in the CCM library were more stable than the parent protein. Regression analysis of sequence and stability information allowed us to identify the most stabilizing consensus mutations, which were added to the first-generation stabilized variant in a subsequent round of combinatorial mutagenesis, resulting in variants with further stabilization.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
All libraries were based on plasmid pCB04, which contains the gIII signal sequence of phage M13 fused to the mature sequence of BLA from E.cloacae driven by a lac promoter. The amino acid sequence of BLA from E.cloacae (Figure 1) was obtained from Genbank. The gene encoding this sequence which was codon optimized for Escherichia coli was synthesized by Aptagen. The sequence GGGSAETVEHHHHHH was added to the C-terminus of BLA as a purification handle. Plasmid pCB04 is based on pUC19 and contains a chloramphenicol resistance marker.
|
For each library, the reactions contained 50100 ng of template plasmid pCB04, 1 µl of primer mix (100 or 10 µM stock of combined primers to give the desired primer concentrations mentioned above), 1 µl of dNTPs, 2.5 µl 10x QCMS reaction buffer, 18.5 µl of deionized water and 1 µl of enzyme blend (QCMS kit), for a total volume of 25 µl. The thermo-cycling program was 1 cycle at 95°C for 1 min, followed by 30 cycles of 95°C for 1 min, 55°C for 1 min and 65°C for 10 min. DpnI digestion was performed by adding 1 µl of DpnI (provided in the QCMS kit), incubation at 37°C for 2 h, addition of 0.5 µl of DpnI and incubation at 37°C for an additional 2 h. A 1 µl volume of each reaction was transformed into 50 µl of TOP10 electrocompetent cells (Invitrogen). A 250 µl volume of SOC was added after electroporation, followed by 1 h of incubation with shaking at 37°C. Following incubation, 1050 µl of the transformation mix were plated on LA plates with 5 mg/l chloramphenicol (CMP) or LA plates with 5 mg/l CMP and 0.1 mg/l cefotaxime (CTX) for selection of active BLA clones. The number of colonies obtained on both types of plates was comparable for all libraries. For the preparation of libraries NA02 and NA03 we carried out a single QCMS reaction, which was subsequently split into two portions after DpnI digestion. One portion, NA02, was transformed directly into E.coli and the second portion, NA03, was heated at 95°C for 2 min before transformation into E.coli. This experiment was conducted to determine if denaturation of hemimethylated DNA after DpnI digestion would reduce the fraction of parent clones in the resulting library.
Screening for BLA stability
Libraries NA01, NA02 and NA03 were plated on to agar plates with LA medium containing 5 mg/l chloramphenicol. No CTX was added to these plates to allow sequencing of isolates without BLA activity. Thirty colonies from each library were transferred into the wells of one 96-well plate containing 200 µl of LB with 5 mg/l CMP. Four additional wells were inoculated with TOP10/pCB04, which served as the wild-type control during the assay. A master plate was generated by adding glycerol and was stored at 80°C.
A 96-well plate containing 200 µl per well of LB with 5 mg/l CMP and 0.1 mg/l CTX was inoculated from the master plate using a replication tool. The plate was incubated for 3 days at 25°C in a humidified incubator shaking at 225 r.p.m. The following operations were performed with each well of the cultured 96-well plate: 50 µl of culture were transferred into a plate that contained 50 µl of B-PER cell lysis reagent (Pierce). The suspension was incubated at room temperature for 90 min to lyse the cells and release BLA. The lysate was diluted 1000-fold and 10 000-fold into 100 mM citrate200 mM phosphate buffer pH 7.0 containing 0.125% octyl glucopyranoside (Sigma). The diluted samples were heated at 56°C for 1 h with mixing at 650 r.p.m. Subsequently, 20 µl of the sample were transferred to 180 µl of nitrocefin assay buffer [0.1 mg/l nitrocefin (Oxoid) in 50 mM phosphate-buffered saline containing 0.125% octyl glucopyranoside] and the BLA activity was determined using a Spectramax Plus plate reader (Molecular Devices) at 490 nm. In parallel, a control sample was subjected to the same procedure but the heating step was omitted. Based on both activity readings the fraction of BLA activity that remained after the heat treatment was calculated for each of the 90 variants and four controls on the plate.
The screen of library NA04 was done following the protocol given above, except that the samples were incubated at 46°C in 50 mM imidazole pH 7.0 containing 10 mM CaCl2, 0.005% Tween-20 and 0.1 mg/ml thermolysin (Sigma) to test the protease resistance of the variants. Proteolysis was quenched by transferring 40 µl of sample into a fresh plate containing 10 µl of 50 mM EDTA to inactivate thermolysin.
Statistical analysis
The contributions of all mutations to stability were calculated using MS Excel. A matrix, Mki, was established containing one row for each isolate and one column for each of the 28 observed mutations. The matrix contained the value of one if the particular isolate carried the particular mutation. The logarithm of the remaining activity of each isolate was entered. In a separate column, the theoretical stability of each isolate was calculated using Equation 2 and parameters Pk and C. The sum of squares of the differences between the measured and the calculated values of logRi was calculated for each isolate as the target cell. Using the solver function of Excel, we optimized the parameters Pk and C to minimize the value in the target cell. All parameters were set to zero as initial estimates.
Purification of BLA variants
Strains were inoculated into 1 l of Terrific Broth (TB) containing 5 mg/l CMP and incubated at 37°C overnight. Cells were harvested by centrifugation (6000 g for 15 min). The pellets were resuspended in 200 ml of phosphate-buffered B-PER solution (Pierce). The suspension was shaken for 1 h at room temperature until the pellets were solubilized. Cell debris and insoluble protein were removed by centrifugation (15 000 g for 15 min). The supernatant was stored at 4°C until purification.
Proteins were first purified using immobilized metal affinity chromatography (INAC). The purification was done using a Bio-Cat (Applied Biosystems) and a Waters column (22 x 95 mm) packed with POROS 20MC media (Applied Biosystems). The column was loaded with 250 mM NiCl, washed with water and equilibrated with 10 mM HEPES, 0.5 M NaCl, pH 8.4. Samples were loaded on the column, washed with equilibration buffer and eluted with 10 mM HEPES, 0.5 M NaCl and a gradient of 200 mM imidazole.
The proteins were further purified by affinity chromatography using m-aminophenylboronic acid (PBA) resin (Sigma). This purification was done by gravity flow. A 15 ml amount of PBA resin was packed in a disposable column (15 x 120 mm) (Bio-Rad Laboratories) and equilibrated with 20 mM triethanolamine, 0.5 M NaCl, pH 7.0. After loading the sample, the columns were washed with four column volumes of equilibration buffer and subsequently BLA was eluted with 0.5 M sodium borate, 0.5 M NaCl, pH 7.0. The resulting proteins had a purity of 99% as judged by SDSPAGE.
Thermostability analysis
Circular dichroism (CD) experiments were performed on an Aviv 62ADS spectrophotometer, equipped with a five-position thermoelectric cell holder supplied by Aviv. Buffer conditions were 10 mM adjusted to pH 7.1 and 50 mM citratephosphate at pH 7.0. The final protein concentration for each experiment was in the range 1020 µM. Data were collected in a 0.1 cm pathlength cell. Thermal denaturation experiments were performed at 220 nm, the wavelength in the far-UV spectra with maximum signal difference, as expected for a mixed -helix and ß-sheet protein. The temperature was increased from 30 to 80°C with data collected every 2°C. The equilibration time at each temperature was 0.1 min and data were collected for 4 s per sample. The thermal denaturation data were fitted to a two-state transition (Chen et al., 1992
) using Savuka software provided by Dr Osman Bilsel (University of Massachusetts, Worcester Campus, MA). The mid-point of the transition (Tm) is an apparent value because the thermal denaturation of all BLA variants studied was not reversible. Because of irreversibility and a slow kinetic component caused by aggregation, we chose sufficiently fast scan rates and experiments were carefully controlled for time spent in the thermal denaturation transition, so that small temporal variations did not lead to apparent Tm differences.
Protease sensitivity
A 1 µg amount of purified protein was incubated with different concentrations of protease in 150 µl in 0.2 ml strip tubes (MJ Research), 100 mM TrisHCl, 10 mM CaCl2, 0.005% Tween-20, pH 7.9, for different time periods at 37°C in quadruplicate. Trypsin, chymotrypsin and thermolysin (Sigma) were added at 200800 µg/ml concentration. The BLA activity was measured for samples incubated with and without protease by monitoring the hydrolysis of the chromogenic substrate nitrocefin (Oxoid). The activity of protease-treated sample relative to the untreated sample was calculated for each variant.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Using the sequence of the parent gene ß-lactamase from E.cloacae as a query, we identified 37 homologous sequences in GenBank. These sequences were aligned using the clustalW algorithm in the Vector NTI software package (Informax). The resulting consensus sequence differed in 29 positions from the parent gene. The amino acid sequence of ß-lactamase from E.cloacae and the 29 consensus mutations are depicted in Figure 1. Based on this information, we designed 29 mutagenic primers such that each primer encoded the replacement of one residue in the parent sequence with the corresponding consensus residue. Using these 29 mutagenic primers, we constructed combinatorial libraries using the QCMS method. To our knowledge, there are no prior reports of this method being applied with such a large number of mutagenic primers. Therefore, we decided to test several variations of the protocol as described in Materials and methods, yielding three libraries, NA01, NA02 and NA03.
Screening first-generation libraries
We chose 30 clones randomly from each library for characterization. Sequence analysis showed that the individual isolates contained up to 11 of the planned consensus mutations. Only 24 of the 90 isolates contained no mutations. Two isolates contained significant deletions in the BLA gene and eight isolates contained unplanned point mutations which must have been introduced by the QCMS approach. Twenty-eight of the planned 29 consensus mutations were observed in at least two of the isolates. The only mutation not found in any of the isolates was T1A. Subsequent sequence analysis revealed an error in the design of the primer encoding this particular mutation and we did not consider mutation T1A during further analysis.
The frequency distribution of mutations in the libraries is shown in Figure 2. Libraries NA02 and NA03 contained a large fraction of combinatorial variants whereas almost half the isolates from NA01 did not contain any mutations. We conclude that reducing the concentration of mutagenic primers below the manufacturer's recommendation results in a more favorable distribution of mutations when constructing large combinatorial libraries. Subsequently, we measured the thermostability of BLA activity for each isolate. Figure 3 shows that about one-quarter of all isolates in the libraries were more stable than the parent protein. Hence the fraction of stabilized mutants in the CCM libraries is significantly higher than the fraction of stabilized variants in random libraries that can be obtained by PCR mutagenesis or similar methods. Such non-targeted mutagenesis libraries typically require the screening of hundreds to thousands of variants in order to identify a few stabilized mutants (Matsumura and Aiba, 1985; Kuchner and Arnold, 1997
; Cherry et al., 1999
). Therefore, our data support the hypothesis that a large fraction of consensus mutations leads to the stabilization of a protein. The most stable variant, NA03.8, contained three mutations: Q95E, A153S and I334L. This variant was chosen as parent for a second-generation library.
|
|
The CCM process allows one to add multiple stabilizing mutations to a protein in a single round of mutagenesis and screening. Because most variants contain multiple mutations, it is not immediately obvious which mutations are actually responsible for the increase in protein stability. However, we reasoned that it should be possible to estimate the stabilizing contribution of individual mutations by analyzing the correlation between the stability of a variant and its sequence. The set of 90 random isolates contained a total of 250 consensus mutations; thus each of the 28 mutations was on average observed nine times. We anticipated that stabilizing mutations should confer increased stability to most of the variants that contain them. Figure 4 shows the frequency of individual consensus mutations in the 20 most stable and the 20 least stable of the 90 random isolates from libraries NA01-3. It shows that many of the mutations are more prevalent in stabilized variants compared with less stable variants. The most prominent example is V284I, which occurred in 12/20 of the most stable variants but in only 1/20 of the least stable isolates. Other mutations that are prevalent in stabilized variants are Q95E, A153S, A208P, N232R and I262V. However, a few mutations occur more frequently in unstable isolates, Y170F, Q219E, V227A and P295A, which suggests that some of the consensus mutations may actually destabilize the protein.
|
![]() | (1) |
![]() | (2) |
The calculation considered all 80 BLA variants which did not contain any unplanned mutations. Table I shows the calculated stabilization parameters for all 28 consensus mutations. Figure 5 shows that the model describes most of the measured variations in BLA stabilities. The analysis identified several mutations that appear to have significant stabilizing effect on BLA: V11I, V25I, R91K, Q95E, A153S, N232R, S247T and I262V. In addition, we identified mutations that reduce BLA stability: Q219E, V227A, P295A and Y170F.
|
|
We chose variant NA03.8 as parent for a second-generation combinatorial library, NA04. Variant NA03.8 contains the two most stabilizing mutations, Q95E and A153S, in addition to one neutral mutation, I334L. Our initial statistical analysis, which was based on limited sequence information, had identified nine additional stabilizing mutations: V11I, V25I, R91K, N232R, S247T, I262V, V283L, V284I, T342K. We reused the nine mutagenic primers from round one to add these mutations to variant NA03.8. One primer was added to the mixture that encoded mutations V283L and V284I, which cannot be recombined using individual mutagenic primers. The mutagenesis was performed using the same protocol as for library NA02. We screened library NA04 for increased stability of BLA in the presence of thermolysin to avoid the risk of isolating variants that re-fold after heat stress. Thermolysin is a thermostable protease with a broad specificity for hydrophobic amino acids. It has been reported that thermolysin selectively cleaves unfolded proteins (Arnold and Ulbrich-Hofmann, 1997). We tested 352 random isolates from library NA04, 332 of which expressed sufficient activity to allow measurement of protease stability. The screening results are shown in Figure 6. It is clear that a very significant number of isolates is more stable than the parent of the library, NA03.8. We chose 22 of the most stable variants for a repeat analysis and found all of them to be more stable than variant NA03.8. The stabilized variants contained between two and eight of the nine intended mutations. Mutation I262V was observed in 21 of the 22 isolates and mutation R91K was detected in 17 isolates. These two mutations appear to have the largest stabilizing effect. The most stable isolate, NA04.17, contained eight consensus mutations.
|
We chose three BLA variants for purification and further characterization. Thermal denaturation was monitored at 220 nm (Figure 7). All variants showed irreversible denaturation. However, CCM variants showed an increase in apparent Tm between 4.8°C for NA03.8 and 9.1°C for NA04.17 (Table II). Studies on sensitivity of the variants towards three proteases, chymotrypsin, trypsin and thermolysin, which have very different primary specificities, showed that all variants were stabilized against inactivation by all three tested proteases (Figure 8). Furthermore, there was a very clear correlation between the stability towards proteolysis and the thermostability of these variants. These observations strongly suggest that the rate of proteolysis of BLA is mainly driven by the overall thermodynamic stability of the protein and not by the specificity of the attacking protease.
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To identify stabilizing consensus mutations, we assumed that the effects of all mutations on protein stability are additive. We are aware that mutations can have complex and non-additive interactions between each other. It appears, however, that the effect of small changes that result in subtle perturbations of protein structure can be described frequently by assuming simple additivity (Wells, 1990; Gregoret and Sauer, 1993
; Sandberg and Terwilliger, 1993
). The purpose of our analysis was not to calculate exact thermodynamic contributions of each mutation to the stability of the protein. Instead, we focused our work on the rapid and efficient generation of a data set that allows one to identify stabilizing mutations based on screening data. The validity of this approach is supported by the identification of stabilized variants containing multiple consensus mutations from our second-generation library.
Our data suggest that stabilization of BLA was achieved by adding multiple consensus mutations and that most of these mutations make small but positive contributions to stability. Visual inspection of the crystal structure of BLA (Lobkovsky et al., 1994) allowed us to divide the consensus mutations into three groups: mostly solvent exposed (15 mutations), partially buried (11 mutations) and completely buried (three mutations). Both stabilizing and destabilizing mutations were observed in all three groups and no correlation between surface exposure and effect on stability could be detected. Of the two mutations that appear to make the largest stabilizing contributions, one is located on the surface (Q95E) and the other buried within the protein (A153S).
The literature contains several reports demonstrating that the proteolytic susceptibility of a protein depends largely on the conformational stability of the protein and to a lesser degree on the actual amino acid sequence (Fontana, 1988; Zappacosta et al., 1996
; Arnold and Ulbrich-Hofmann, 1997
). Our results follow that trend and show that the stability of BLA variants depends mainly on their thermodynamic stability and not on the specificity of the attacking protease. This observation can have significant implications for protein design. For example, it should be possible to identify mutants of proteins that resist proteolysis during in vivo therapeutic or analytical applications, without exact knowledge of the enzyme(s) responsible for proteolysis.
In this study, we limited mutagenesis to those amino acid changes that occurred in the consensus sequence of the protein, i.e. the most abundant amino acid for a given position. However, our method can be expanded easily to include other amino acid changes that occur less frequently in a particular position of the multiple sequence alignment. This would enable one to generate larger first-generation libraries and subsequent analysis of sequence and performance of isolates would allow one to identify rapidly the most valuable individual mutations. Such a statistical analysis is less informative for more traditional random libraries, which contain a large number of different mutations, but in which most individual mutations are too rare to allow multiple combinatorial observations.
There are currently multiple genome sequencing projects, which have led to a rapidly growing database of available sequences. Such databases will greatly facilitate the identification of protein homologs and thus the application of CCM to protein engineering.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Braxton,S. and Wells,J.A. (1992) Biochemistry, 31, 77967801.[ISI][Medline]
Chen,B.L., Baase,W.A., Nicholson,H. and Schellman,J.A. (1992) Biochemistry, 31, 14641476.[ISI][Medline]
Chen,K. and Arnold,F. (1993) Proc. Natl Acad. Sci. USA, 90, 56185622.
Cherry,J.R., Lamsa,M.H., Schneider,P., Vind,J., Svendsen,A., Jones,A. and Pedersen,A.H. (1999) Nat. Biotechnol., 17, 379384.[CrossRef][ISI][Medline]
Cunningham,B. and Wells,J. (1987) Protein Eng., 1, 319325.[ISI][Medline]
Fontana,A. (1988) Biophys. Chem., 29, 181193.[CrossRef][ISI][Medline]
Gallagher,T., Bryan,P. and Gilliland,G.L. (1993) Proteins, 16, 205213.[ISI][Medline]
Gregoret,L.M. and Sauer,R.T. (1993) Proc. Natl Acad. Sci. USA, 90, 42464250.
Hoseki,J., Yano,T., Koyama,Y., Kuramitsu,S. and Kagamiyama,H. (1999) J. Biochem. (Tokyo), 126, 951956.[Abstract]
Jung,S., Honegger,A. and Pluckthun,A. (1999) J. Mol. Biol., 294, 163180.[CrossRef][ISI][Medline]
Kimura,M. (1991) Proc. Natl Acad. Sci. USA, 88, 59695973.
Kristensen,P. and Winter,G. (1998) Fold. Des., 3, 321328.[ISI][Medline]
Kuchner,O. and Arnold,F.H. (1997) TIBTECH, 15, 523530.
Lehmann,M., Kostrewa,D., Wyss,M., Brugger,R., D'Arcy,A., Pasamontes,L. and van Loon,A.P. (2000) Protein Eng., 13, 4957.[CrossRef][ISI][Medline]
Lehmann,M., Loch,C., Middendorf,A., Studer,D., Lassen,S.F., Pasamontes,L., van Loon,A.P. and Wyss,M. (2002) Protein Eng., 15, 403411.[CrossRef][ISI][Medline]
Lobkovsky,E., Billings,E.M., Moews,P.C., Rahil,J., Pratt,R.F. and Knox,J.R. (1994) Biochemistry, 33, 67626772.[ISI][Medline]
Matsumura,I., Wallingford,J.B., Surana,N.K., Vize,P.D. and Ellington,A.D. (1999) Nat. Biotechnol., 17, 696701.[CrossRef][ISI][Medline]
Matsumura,M. and Aiba,S. (1985) J. Biol. Chem., 260, 1529815303.
Miyazaki,K., Wintrode,P.L., Grayling,R.A., Rubingh,D.N. and Arnold,F.H. (2000) J. Mol. Biol., 297, 10151026.[CrossRef][ISI][Medline]
Nicholson,H., Becktel,W.J. and Matthews,B.W. (1988) Nature, 336, 651656.[CrossRef][ISI][Medline]
Ostermeier,M. (2003) Trends Biotechnol, 21, 244247.[CrossRef][ISI][Medline]
Sandberg,W.S. and Terwilliger,T.C. (1993) Proc. Natl Acad. Sci. USA, 90, 83678371.
Sandgren,M., Gualfetti,P.J., Shaw,A., Gross,L.S., Saldajeno,M., Day,A.G., Jones,T.A. and Mitchinson,C. (2003) Protein Sci., 12, 848860.
Shi,Y.Y., Mark,A.E., Wang,C.X., Huang,F., Berendsen,H.J. and van Gunsteren,W.F. (1993) Protein Eng., 6, 289295.[ISI][Medline]
Shusta,E.V., Kieke,M.C., Parke,E., Kranz,D.M. and Wittrup,K.D. (1999) J. Mol. Biol., 292, 949956.[CrossRef][ISI][Medline]
Sieber,V., Pluckthun,A. and Schmid,F.X. (1998) Nat. Biotechnol., 16, 955960.[ISI][Medline]
Steipe,B., Schiller,B., Pluckthun,A. and Steinbacher,S. (1994) J. Mol. Biol., 240, 188192.[CrossRef][ISI][Medline]
Stemmer,W.P.C. (1995) Bio/Technology, 13, 549553.[CrossRef][ISI]
Strausberg,S., Alexander,P., Wang,L., Gallagher,T., Gilliland,G. and Bryan,P. (1993) Biochemistry, 32, 1037110377.[ISI][Medline]
Wells,J.A. (1990) Biochemistry, 29, 85098517.[ISI][Medline]
Zappacosta,F., Pessi,A., Bianchi,E., Venturini,S., Sollazzo,M., Tramonato,A., Marino,G. and Pucci,P. (1996) Protein Sci., 5, 802813.
Received August 30, 2004; revised November 16, 2004; accepted November 19, 2004.
Edited by Jacques Fastrez