Directed enzyme evolution guided by multidimensional analysis of substrate-activity space

Anna-Karin Larsson, Lars O. Emrén2, William G. Bardsley and Bengt Mannervik1

Department of Biochemistry, Uppsala University, Biomedical Center, Box 576, SE-751 23 Uppsala, Sweden 2Previously Lars O.Hansson

1 To whom correspondence should be addressed. e-mail: Bengt.Mannervik{at}biokem.uu.se


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The directed evolution of protein function frequently involves identification of mutants with improved properties from a population of variants obtained by mutagenesis. The selection of clones to parent the subsequent generation is crucial to the continued creation of superior progeny. In the present study, multivariate analysis guided the evolution of human glutathione transferase (GST) T1-1 to 65-fold enhanced alkyltransferase activity. Six alternative substrates monitored the substrate-activity space that characterized a mutant library of enzymes, obtained by recombination of DNA and heterologous expression in Escherichia coli. A subset of mutants was identified by their proximity in the targeted region of six-dimensional factor space. DNA from these mutants was recombined to create a new generation of GST variants from which an improved enzyme was isolated. The multidimensional cluster analysis is applicable to quantitative properties in any population of molecules undergoing evolution and can guide the tailoring of proteins, nucleic acids and other chemical structures to novel and improved functions.

Keywords: alkyltransferase/cluster analysis/directed evolution/glutathione transferase/molecular quasi-species


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Directed evolution of protein function is a frontline area of research attracting increasing interest for a variety of reasons (Stemmer, 2002Go). First, probing the functional plasticity of a given protein fold is important for the understanding of the natural evolution of novel protein activities. Secondly, in biotechnology the design of enzymes and other biomolecules with enhanced or altered properties is of obvious significance for applications in medicine, pharmaceutical industries, agriculture, environmental sciences, etc. Two major approaches are commonly used for in vitro evolution of protein function from mutant libraries: selection based on a discriminatory quality and screening of the mutant library for the desired property. In both cases, mutants are identified to serve as parents for a following generation of mutants. This recursive sequence of mutagenesis and isolation of improved mutants is usually carried out for several cycles, but it is not obvious how the parents for the next generation should be optimally chosen. Opting only for the ‘best’ member of each generation may restrict the necessary genetic variability of the lineage, whereas choosing the members too indiscriminately could lead to a loss of the targeted property.

In nature the evolution of a protein requires the simultaneous optimization of several functional parameters. For example, improving the catalytic efficiency of an enzyme may concomitantly affect thermal stability, pH optimum and solubility. In the optimization of a catalyst there may also be a choice between increasing substrate selectivity or developing improved activity with several alternative substrates. The present investigation demonstrates how the multidimensional functional space of a library of mutant enzymes can be explored and how subsets of variants with similar activity profiles can be identified. Multivariate statistical methods (Krzanowski, 2000Go), including principal components, dendrograms and K-means cluster analyses, were used in the characterization of consecutive glutathione transferase (GST) libraries. One of the mutant enzymes was purified to homogeneity and compared with the wild-type human GST T1-1 (hGST T1-1). Its catalytic efficiency was enhanced over the wild-type alkyltransferase activities by 65-fold.

Our paper presents a rational approach based on multidimensional factor analysis, which should be of value to the engineering not only of proteins but also of nucleic acids (Gold et al., 1995Go; Joyce, 1998Go; Wilson and Szostak, 1999Go) and other molecules (Erlanson et al., 2000Go; Houghten, 2000Go) for novel functional properties.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The T1/T2 library

cDNAs encoding the Theta class GSTs, hGST T1-1 and rat (r) GST T2-2, were randomly fragmented and recombined to create a mutant library. The construction of this T1/T2 library (the F1 generation) has been described in detail (Broo et al., 2002Go).

Preparation of lysates

Colonies of the T1/T2 mutant library were randomly picked and the bacteria were grown to saturation overnight in 2 ml of 2TY medium supplemented with ampicillin (100 µg/ml). After a 100-fold dilution, the cultures were grown for 2 h before the expression of GSTs was induced by addition of isopropyl ß-D-thiogalactopyranoside (IPTG) to a final concentration of 0.2 mM. The bacteria were harvested by centrifugation 18 h after the induction. Each pellet was resuspended in 0.1 M sodium phosphate pH 6.5 and the bacteria were lysed by four cycles of freezing at –80°C and thawing at 37°C. After the final thawing, the suspensions of lysed bacteria were centrifuged and the resulting supernatants were stored at –80°C until the activity measurements were performed. Lysates of wild-type hGST T1-1 and rGST T2-2 were used in parallel as controls.

Measurement of GST activity

The GST activities of the prepared lysates were measured with six alternative substrates: dichloromethane (DCM), 1,2-epoxy-3-(4-nitrophenoxy)propane (EPNP), 4-nitrobenzyl chloride (NBC), 4-nitrophenethyl bromide (NPB), 1-chloro-2,4-dinitrobenzene (CDNB) and 1-menaphthyl sulfate (MS). All activity measurements were performed in microplates on a SPECTRAmaxPLUS384 microplate spectrophotometer (Molecular Devices, Sunnyvale, CA). For all substrates except DCM, the formation of the GSH conjugate was monitored continuously. The activity with DCM was determined by an end-point assay measuring the amount of formaldehyde formed after 40 min. For details of the GST assays see Habig and Jakoby (1981Go) and Broo et al. (2002)Go.

Amplification of DNA to create a new library

Five clones with high alkyltransferase activity in the F1 generation were selected and used as starting material for a new generation, i.e. the F2 library. DNA preparations from the five selected mutants were used separately as templates for PCR amplification. The PCR primers used were pKK for (5'-AAT TGT GAG CGG ATA ACA AT-3') and Eco RGNB (5'-AAG CTG AAA ATC TTC-3').

Cloning of mouse and rat GST T1 sequences

In order to increase the diversity of the library, mouse and rat GST T1 sequences were introduced in the DNA shuffling procedure. Mouse GST T1 was cloned from Mouse Liver QUICK-clone cDNA (Clontech, Palo Alto, CA) using the primers mT1 start (5'-ATA TGA ATT CAT GGT TCT GGA GCT GTA C-3') and mT1 stop (5'-ATA TAA GCT TTT ATT ACT GGA TCA TTG CCA G-3'). The PCR product was digested with EcoRI and HindIII and ligated into the EcoRI and HindIII cloning sites of pKK-D (Björnestedt et al., 1992Go).

For the rat GST T1 sequence, the primers rT1 Pst I for (5'-TGA CCA CTG GTA CCC CCA AGA CCT GCA-3') and rT1 stop (5'-CCC AGA GTG CTG ACC ATG ATC CAG TAA TAA AAG CTT ATA T-3') were used to amplify nucleotides 243–720 of the coding sequence from Rat Liver QUICK-Clone cDNA (Clontech).

Digestion of DNA

A mixture of the amplified DNA from the five mutants, consisting in total of ~1 µg, was digested with 0.2–0.8 U DNase I (Roche, Mannheim, Germany) in 20 mM Tris–HCl pH 8.0, 1 mM MgCl2 at room temperature. The digestion was performed in several rounds of 2–3 min each. After each round a small sample of the reaction mixture was run on a 2% (w/v) agarose gel. Freezing the sample on dry ice stopped the digestion during the analysis. When the size of the fragments was 100 bp or less, DNase I was inactivated by heating the reaction mixture to 70°C for 10 min. After phenol and chisam extractions the DNA was separated by electrophoresis on a 2% (w/v) agarose gel and all fragments smaller than 100 bp were recovered from the gel. DNase digestions of mGST T1 and rGST T1 cDNA were performed separately in the same way as for the mixed mutants.

Reassembly of GST cDNA sequences

All the recovered DNA from the digestion reactions, including fragments of T1/T2 mutants, mGST T1 and rGST T1, were mixed together and used in a reassembly PCR. In addition to the DNA, this reassembly reaction contained 0.2 mM dNTPs, 40 U/ml Pfu DNA polymerase and buffer as recommended by the manufacturer (Stratagene, La Jolla, CA). The PCR conditions were 3 min at 95°C, followed by 40 cycles of 1 min at 94°C, 2 min at 50°C and 2 min at 72°C, completed by 10 min at 72°C.

Amplification of full-length GST coding sequences and construction of the F2 library

A small amount of the product from the reassembly reaction was used as a template in a PCR to amplify full-length coding sequences. In addition, the PCR mixture contained 0.8 mM each of flanking primers pKK for and Eco RGNB, 0.2 mM dNTPs, 12 U/ml Pfu DNA polymerase and buffer as recommended by the manufacturer. The temperature cycle, 1 min at 94°C, 2 min at 55°C and 2 min at 72°C, was run 35 times, followed by 10 min at 72°C. The amplified product was digested with EcoRI and HindIII, purified on gel and ligated into the EcoRI and HindIII cloning sites of pKK-D. The product of the ligation reaction was used to transform electrocompetent Escherichia coli XL1-Blue cells (Stratagene).

Preparation of lysates and catalytic activity measurements

Clones from the F2 library were randomly picked from agar plates and grown in 350 µl of 2TY supplemented with 100 µg/ml ampicillin. After growth overnight at 37°C in 96-well microplates the cultures were diluted 100-fold into fresh 2TY supplemented as above. Two hours after the dilution, the expression of GST was induced by addition of IPTG to a final concentration of 0.2 mM. The bacteria were grown for an additional 16 h before being harvested by centrifugation. Each pellet was resuspended in 0.1 M sodium phosphate pH 6.5. Lysis of the bacteria was performed by four rounds of freezing–thawing. Following centrifugation, the supernatants were transferred to a new 96-well microplate and stored at –80°C for subsequent measurements of enzymatic activities. The GST activities of lysates from mutants of the F2 library were tested with NPB and EPNP as alternative substrates. Each microplate with lysates of F2 mutants contained lysates of hGST T1-1 and mGST T1-1 as controls.

Sequencing

From the F2 library, 13 clones with increased alkyltransferase activity were selected for DNA sequencing. The sequencing was performed with Big Dye v2.0 (Applied Biosystems, Foster City, CA) using an ABI Model 310 DNA sequencer (Applied Biosystems). DNA sequences were analyzed using Chromas version 1.45 (Technelysium Pty Ltd, Helensvale, Australia).

Linking a histidine tag to the F2:1215 mutant

An N-terminal histidine tag sequence (5'-CAT CAC CAT CAT CAT CAC-3') was attached to mutant F2:1215 from the F2 library by PCR. The PCR products were digested with EcoRI and HindIII and ligated to the EcoRI and HindIII cloning sites of pKK-D. The ligation mixture was used to transform electrocompetent E.coli XL1-Blue cells.

Purification of mutant F2:1215

Mutant F2:1215 was expressed as described for wild-type hGST T1-1 (Jemth and Mannervik, 1997Go). The harvested bacteria were resuspended in 20 mM sodium phosphate pH 7.4, 0.5 M NaCl, 1 mM 2-mercaptoethanol, 0.1 M imidazole and lysed by sonication and addition of lysozyme (0.2 mg/ml). After centrifugation the resulting supernatant was purified on Ni-IMAC (Amersham Biosciences, Uppsala, Sweden). The protein was eluted with 0.5 M imidazole and thereafter dialyzed against a buffer containing 10 mM Tris–HCl pH 7.8, 20% (v/v) glycerol, 1 mM 2-mercaptoethanol and 0.02% (w/v) sodium azide.

Kinetic analysis of mutant F2:1215

Steady-state kinetic properties of the purified mutant F2:1215 were determined with NPB and EPNP as alternative substrates. The substrate concentrations varied between 10 and 200 µM for NPB and between 25 and 500 µM for EPNP. The concentration of GSH was 10 mM. Steady-state kinetic parameters were determined by non-linear regression analysis; KM and kcat values expressed per enzyme subunit were determined by fitting the Michaelis–Menten equation to the experimental data.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Directed evolution of hGST T1-1

hGST T1-1 was targeted for directed enzyme evolution. Potential improvements of hGST T1-1 in the substrate-activity space were probed by recursive DNA recombination and screening procedures. The mutant library T1/T2 was created as a first generation of variants (F1) by recombination of DNA from hGST T1-1 and rGST T2-2, and the mutant enzymes were functionally characterized by use of six alternative electrophilic substrates (Broo et al., 2002Go). The six-dimensional substrate-activity space was explored by means of multivariate analyses to define a subgroup of mutants suited to parent the next generation. The cycle was then repeated by recombining the DNA of the selected parents with cDNA of rGST T1-1 and mouse GST T1-1 (mGST T1-1), thereby creating the second generation of variants (F2). Finally, mutants with further improved alkyltransferase activity were identified.

The T1/T2 library harbors functionally diverse mutants

The members of the T1/T2 library (the F1 generation) had previously been shown to be structurally and functionally divergent. Lysates of bacterial clones expressing isolated GST T1/T2 mutants were assayed for activity with six different substrates representing alternative substitution and addition reactions (Figure 1). In order to make the diverse activities comparable, the activity values were normalized to the activities characterizing the parental enzymes. The DCM and EPNP values were related to hGST T1-1 and the remaining four substrates were related to rGST 2-2 (Broo et al., 2002Go). Normalized activities (per cent) of 94 GST T1/T2 mutants and of the parental GSTs are shown in Figure 2. The activity values in the lysates are dependent on the amount of expressed enzyme protein, which varies among the different clones. Nevertheless, it is clear that the relative activities with the different substrates change from clone to clone. The most marked differences are displayed with DCM and MS, which distinguish the parental enzymes hGST T1-1 and rGST T2-2, respectively (Jemth and Mannervik, 1997Go). Like the wild-type enzymes, none of the mutant enzymes displayed significant activity with both of these discriminating substrates. Of special interest for the present study was the alkyltransferase activity as monitored with DCM and NPB. The activities with DCM and NPB were highly correlated (Figure 3) and five clones displayed high alkyltransferase activity.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 1. Structures of the six alternative substrates used in the characterization of the GST T1/T2 library. The arrows indicate the positions of nucleophilic attack by the thiolate group of GSH. The atoms are color-coded as follows: C, gray; Cl, green; O, red; N, blue; H, white; S, yellow; Br, brown. DCM is a substrate only for hGST T1-1, whereas MS and CDNB are substrates only for rGST T2-2. The other three substrates show activity with both of the wild-type enzymes.

 


View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2. Activities of 94 clones from the GST T1/T2 library with six alternative substrates. The normalized values represent the ratio between the activity in the lysate of a given clone and the activity in the lysate of hGST T1-1 (for DCM and EPNP) and of rGST T2-2 (for MS, CDNB, NBC and NPB). The activities of hGST T1-1 and rGST T2-2 are shown to the right at the rear of the panel.

 


View larger version (17K):
[in this window]
[in a new window]
 
Fig. 3. Members of the GST T1/T2 F1 library represented in the three-dimensional subspace spanned by the activities with DCM, NPB and EPNP (open circles). The three-dimensional graph is further projected onto the two-dimensional DCM/NPB plane (closed circles) showing the correlation between the two alkylhalide substrates.

 
The substrate-activity space of the T1/T2 library can be described by three orthogonal variables

The six-tuple of activities with the different substrates serves as a fingerprint for a given GST mutant, which can be regarded as a point in six-dimensional substrate-activity space. Figure 3 shows the points representing the mutants characterized in the three-dimensional subspace spanned by the substrates DCM, NPB and EPNP. From a statistical point of view, it can be argued that the variance of the activity data might be accounted for by less than six independent variables. This possibility was investigated by a principal component analysis, and a scree plot of the eigenvalues demonstrated that only three principal components are needed to account for 95% of the variance (Figure 4). Thus, the dimensionality of the functional space harboring the analyzed GST mutants is not significantly higher than 3 based on the six substrates used. In fact, the orthogonal principal components 1 and 2 account for 75% of the variance. However, it should be noted that the principal component analysis, like the other statistical procedures used in the present study, is dependent on the standardization of the data and possible weighting schemes (Bardsley, 2002Go). If one of the independent variables (substrate-activity values) dominates in the analysis, its contribution can be attenuated by proper weighting. In this manner, qualitatively more significant factors can be evaluated without being overwhelmed by quantitatively more influential ones.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 4. Principal component analysis and scree plot of eigenvalues of 94 members of the F1 generation of the GST T1/T2 library. (Top) Representation of the activity data by principal components 1 and 2, showing two sets of mutants with highly correlated activities forming a reclining ‘V’, as well as an outlier, mutant 88. (Bottom) Scree plot showing that three principal components are sufficient to account for the variance of the activity data.

 
The principal component analysis shows that most of the GST T1/T2 mutants projected onto the plane formed by principal components 1 and 2 fall along two lines forming a reclining ‘V’ (Figure 4). One clone (number 88) is an obvious outlier with properties distinct from the other mutants studied. Thus, the sample of GSTs from the library could be divided into two major groups plus the outlier that forms a group by itself. Further subdivision of the mutants could be made on the basis of cluster analysis.

Discrete clusters of clones can be identified in the T1/T2 library

Distances in n-dimensional space can be used in order to form subsets of points that are close to one another. Thus, proximity in the six-dimensional substrate-activity space can be used for subgrouping of the GST mutants. Figure 5 shows a dendrogram based on the Euclidian distances among the GST mutants in factor space. The diagram provides information for each individual clone about its functionally most closely related neighbors. Clone 88 is far removed from the other clones in accordance with its separate location in the principal component analysis (Figure 4). The other clones form branches based on their similarities in catalytic activities. A group consisting of clones 18, 20, 57, 61 and 63 is distinguished by displaying high alkyltransferase activities with DCM and NPB.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 5. Dendrogram of Euclidian distances between mutants in six-dimensional substrate-activity space. Mutant designations are indicated on the x-axis and the parental GSTs are marked ‘T1’ (first left) and ‘T2’ (ninth from the right). Branches converging at distances approximately <200 represent mutants with similar activity profiles forming clusters in the F1 generation of the GST T1/T2 library. Mutant 88 is separated from all other variants analyzed by approximately 1500 distance units and is clearly an outlier (Figure 4). A cluster consisting of mutants 18, 20, 57, 61 and 63 (right) is also clearly distinct from the parental GSTs and the other variants, and this subset was chosen as parents for the F2 generation.

 
K-means cluster analysis is an alternative approach, in which the data points are linked to a stipulated number of clusters by minimizing the sums of the distances to the same number of centroids. The GST T1/T2 mutants characterized were partitioned into clusters in a series of analyses, and irrespective of the number of clusters tested (4–15) the five GST variants with the functional similarities revealed in the dendrogram (Figure 5) consistently formed a cluster without any other members. Figure 6 shows a plot of the data fitted into four clusters. One of them consisted of clone 88 alone, previously identified as an outlier. The cluster containing five mutants with high alkyltransferase activity was selected for further evolution by recombination of their GST-encoding DNA sequences.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 6. K-means cluster analysis of mutants in the F1 generation of the GST T1/T2 library. The activity values are clustered around four centroids based on proximity in six-dimensional substrate-activity space. Mutant 88 (blue dot) is separated from all other GST variants, and mutants 18, 20, 57, 61 and 63 (red dots) are clustered together as in the dendrogram of Figure 5. The five mutants 18, 20, 57, 61 and 63 are characterized by high activities with both DCM and EPNP, as shown in the presented two-dimensional projection. The five mutants segregated into the same cluster irrespective of the predefined number of clusters (4–15), whereas the mutants represented by green and yellow circles were further subdivided into additional subsets when the number of clusters increased.

 
The sequences of the selected mutants exhibit few structural substitutions

Multivariate analysis of substrate-activity profiles is not dependent on information about amino acid and nucleotide sequences or variations in protein structures. Nevertheless, for an understanding of the relationship between functional plasticity and structure it is valuable to examine sequence variations that accompany the functional evolution. The parental DNA sequences of hGST T1-1 and rGST T2-2 are 63% identical, and the mutants analyzed in the F1 generation are predominantly based on one sequence or the other. Mutant 88 was a T2 sequence with a functional N49D modification and four silent codon mutations. The five mutants selected for prominent alkyltransferase activity were all highly similar in the 5' end of the coding DNA sequence and contained at least 59 nucleotides identical with the T2 sequence. The major portions of the mutant sequences, through the 3' end, were essentially T1 sequences with very few mutations. The common denominator of the five selected clones is the presence of Ser in position 14; none of the other mutants sequenced was found to contain this residue. Mutants 18 and 20 contained 74 and 141 nucleotides, respectively, identical with the T2 5' end, resulting in one and six non-silent mutations, respectively, in addition to the C14S modification in mutants 57, 61 and 63.

Further evolution of GSTs with enhanced alkyltransferase activity

The five selected clones from the F1 generation were subjected to DNA shuffling (Stemmer, 1994Go) together with cDNA encoding mouse and rat GST T1-1 wild-type sequences. The addition of the latter two sequences was made in order to increase the genetic diversity and reduce the risk of inbreeding in the creation of the F2 generation from the GST T1/T2 library.

The F2 generation of GST mutants was expressed in E.coli and lysates of bacterial clones were assayed for activity with NPB as well as with the alternative substrate EPNP. Both these activities can be monitored spectrophotometrically in real time and give more accurate values than the end-point assay of DCM. Dendrogram and K-means cluster analyses of 1031 mutants from the F2 generation identified a subset characterized by elevated alkyltransferase activity with the two alternative substrates (Figure 7).



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 7. K-means cluster analysis of 1031 mutants in the F2 generation of hGST T1-1. The activity values are clustered around five centroids in the two-dimensional plane spanned by the activities measured with EPNP and NPB in bacterial lysates. Activity values are normalized to the activity of lysates of bacteria expressing wild-type mGST T1-1. Mutant F2:1215 was chosen for purification, sequence analysis and functional characterization.

 
From the cluster with members displaying the highest alkyltransferase activity, mutant F2:1215 was isolated for purification and further analysis. The enzyme activity measured in the crude bacterial lysate was more than two orders of magnitude higher than that of the wild-type hGST T1-1.

F2:1215 is a GST variant with increased alkyltransferase activity

The purified F2:1215 protein demonstrated significantly enhanced catalytic activity with NPB and EPNP in comparison with the parental hGST T1-1 (Table I) and rGST T2-2 (Jemth et al., 1996Go). The targeted alkyltransferase activity showed a 65-fold elevation of catalytic efficiency, kcat/Km, as measured with NPB. The epoxide addition, assayed with EPNP, was enhanced 7-fold in catalytic efficiency. With both NPB and EPNP an increased kcat value was the main contributor to the improved activity.


View this table:
[in this window]
[in a new window]
 
Table I. Steady-state kinetic parameters of F2:1215 and hGST T1-1
 
DNA sequence analysis demonstrated that mutant F2:1215 was composed of three segments from three distinct parental primary structures. The 5' end contained approximately 50 nucleotides from the rGST T2-2 sequence, the middle region (approximately 620 nucleotides) derived from hGST T1-1, whereas approximately 50 nucleotides in the 3' end were most similar to the mGST T1-1 sequence. At the protein level, F2:1215 differed from wild-type hGST T1-1 only in three of a total of 240 residues: C14S, T226I and W234R. Molecular modeling of hGST T1-1 (Flanagan et al., 1998Go) indicates that amino acid residues 226 and 234 are near the substrate-binding site, whereas residue 14 is located in an {alpha}-helix one turn away from the active-site Ser 11. However, more incisive kinetic studies are needed to elucidate the mechanism underlying the change in steady-state parameters.

In the sequence analysis the additional silent mutations served as evidence for the three separate origins of the F2:1215 sequence. However, the synonymous codons also had functional consequences at the RNA level, since F2:1215 was expressed at a >10-fold higher level than the parental hGST T1-1.

From a general evolutionary point of view, it is noteworthy that the substitution of three amino acids leading to enhanced alkyltransferase activity of hGST T1-1 is not dependent on the recombination of DNA fragments from different mammalian species, but could have arisen by four separate point mutations in the human gene. It is therefore evident that DNA encoding the improved enzyme is only four steps removed in the nucleotide-sequence space from the DNA encoding wild-type hGST T1-1.


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
In natural systems the evolution of novel protein functions is based on clonal selection of cells that acquire an advantage in proliferation as a result of the altered functional and physico-chemical properties of the protein. However, optimization is a multidimensional problem with boundary conditions that may limit the number of acceptable mutations in a given structure. Thus, high catalytic efficiency of an enzyme may be necessary but is not a sufficient condition for a protein to be selected for a given function, since mutations may affect the expression, folding and stability of the protein.

The directed evolution of protein function to a large extent involves isolation of mutants with improved properties from a population of variants obtained by mutagenesis (Voigt et al., 2000Go; Glieder et al., 2002Go; Lin and Cornish, 2002Go; Santoro and Schultz, 2002Go; Tao and Cornish, 2002Go). If the targeted protein can be made a selectable marker, improved variants can be isolated from host cells grown under stringent conditions. In some cases an evolving binding affinity can be the basis for the selection of improved variants (Hansson et al., 1997Go; Keefe and Szostak, 2001Go). However, in general the isolation of mutants with valuable properties has to be based on screening of individuals in the mutant population. Like in nature, the directed evolution of optimized functions requires recursive changes of the genetic material through several generations. Conventional wisdom may suggest that the individual showing the highest improvement of the desired properties would be the best progenitor for the following generation. However, the breeding of animals and plants has demonstrated that a too narrow genetic background will lead to inbreeding and consequent degeneration of the progeny. At the level of molecular evolution the degenerative alterations may involve improper folding of the polypeptide chain, loss of thermal stability, etc.

Eigen and co-workers have developed the concept of the ‘molecular quasi-species’ as a descriptor of the subpopulation from which the next improved generation derives (Eigen et al., 1988Go). The quasi-species can be regarded as a stochastic variable with a distribution in multidimensional factor space. Similarly, the directed evolution of molecules entails the crucial problem of selecting the group of variants that are suitable progenitors for a following new generation. In practice, this amounts to collecting a number of individuals with enhanced function from a population of mutants. By screening for several independent properties the desired genetic variability can be obtained in addition to the targeted property. On the other hand, excessive mutational diversity will lead to an ‘error threshold’, which must not be exceeded (Eigen et al., 1988Go). In the tailoring of proteins, nucleic acids (Gold et al., 1995Go; Joyce, 1998Go; Wilson and Szostak, 1999Go) and other molecules (Erlanson et al., 2000Go; Houghten, 2000Go) for novel functions, multivariate analysis and cluster analysis can be used to strategically guide the sampling of mutants.

The present study illustrates the use of multivariate analysis to monitor and guide directed enzyme evolution for enhanced activity. hGST T1-1 was evolved to 65-fold increased alkyltransferase activity by shuffling of DNA from mutant GST T1-1 clones selected on the basis of targeted properties. In order to broaden the genetic background, cDNA from rat and mouse GSTs was included in the creation of new recombinants of the selected mutants. A panel of six alternative substrates was used to explore the functional space, and the selection of clones was based on multidimensional analysis of activity profiles with the distinguishing substrates. The reactions monitored represent nucleophilic substitution reactions involving alkylhalides, i.e. DCM and NPB, aralkyl compounds, i.e. NBC and MS, and an aryl halide CDNB, as well as an epoxide addition reaction (EPNP). The principal component analysis of the kinetic data showed that the variance in the catalytic properties examined can essentially be accounted for by three orthogonal variables and it is clear that some of the activities are highly correlated. However, other choices of the numerous GST substrates (Mannervik and Danielson, 1988Go) may expand the substrate-activity space to dimensions higher than three.

In the present case, nucleophilic substitution in alkyl transfer reactions was the activity screened for. Alkylhalides have toxicological interest in view of their occurrence as environmental and occupational pollutants (Wheeler et al., 2001Go) as well as their similarity to cytostatic drugs used in cancer chemotherapy. However, GSTs have activity with different electrophilic functional groups, and directed evolution may selectively improve the catalytic efficiency with any of a variety of alternative substrates. Novel evolutionary pathways could target, for example, addition reactions. For this purpose, multivariate cluster analysis is a powerful approach to optimizing the selection of a suitable subset of molecular species to serve as progenitors of catalysts with desired properties.


    Acknowledgements
 
We thank Drs Kerstin Broo and Per Jemth for earlier contributions to the study of the GST T1/T2 mutant library. This work was supported by the Swedish Research Council.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Bardsley,W.G. (2002) http://www.simfit.man.ac.uk (Version 5.4, release 4.022).

Björnestedt,R., Widersten,M., Board,P.G. and Mannervik,B. (1992) Biochem. J., 282, 505–510.[ISI][Medline]

Broo,K., Larsson,A.-K., Jemth,P. and Mannervik,B. (2002) J. Mol. Biol., 318, 59–70.[CrossRef][ISI][Medline]

Eigen,M., McCaskill,J. and Schuster,P. (1988) J. Phys. Chem., 92, 6881–6891.[ISI]

Erlanson,D.A., Braisted,A.C., Raphael,D.R., Randal,M., Stroud,R.M., Gordon,E.M. and Wells,J.A. (2000) Proc. Natl Acad. Sci. USA, 97, 9367–9372.[Abstract/Free Full Text]

Flanagan,J.U., Rossjohn,J., Parker,M.W., Board,P.G. and Chelvanayagam,G. (1998) Proteins, 33, 444–454.[CrossRef][ISI][Medline]

Glieder,A., Farinas,E.T. and Arnold,F.H. (2002) Nat. Biotechnol., 20, 1135–1139.[CrossRef][ISI][Medline]

Gold,L., Polisky,B., Uhlenbeck,O. and Yarus,M. (1995) Annu. Rev. Biochem., 64, 763–797.[CrossRef][ISI][Medline]

Habig,W.H. and Jakoby,W.B. (1981) Methods Enzymol., 77, 398–405.[Medline]

Hansson,L.O., Widersten,M. and Mannervik,B. (1997) Biochemistry, 36, 11252–11260.[CrossRef][ISI][Medline]

Houghten,R.A. (2000) Annu. Rev. Pharmacol. Toxicol., 40, 273–282.[CrossRef][ISI][Medline]

Jemth,P. and Mannervik,B. (1997) Arch. Biochem. Biophys., 348, 247–254.[CrossRef][ISI][Medline]

Jemth,P., Stenberg,G., Chaga,G. and Mannervik,B. (1996) Biochem. J., 316, 131–136.[ISI][Medline]

Joyce,G.F. (1998) Proc. Natl Acad. Sci. USA, 95, 5845–5847.[Free Full Text]

Keefe,A.D. and Szostak,J.W. (2001) Nature, 410, 715–718.[CrossRef][ISI][Medline]

Krzanowski,W.J. (2000) Principles of Multivariate Analysis. A User’s Perspective. Oxford University Press, New York.

Lin,H. and Cornish,V.W. (2002) Angew. Chem. Int. Ed. Engl., 41, 4402–4425.[CrossRef][Medline]

Mannervik,B. and Danielson,U.H. (1988) CRC Crit. Rev. Biochem., 23, 283–337.[ISI][Medline]

Santoro,S.W. and Schultz,P.G. (2002) Proc. Natl Acad. Sci. USA, 99, 4185–4190.[Abstract/Free Full Text]

Stemmer,W.P.C. (1994) Proc. Natl Acad. Sci. USA, 91, 10747–10751.[Abstract/Free Full Text]

Stemmer,W.P.C. (2002) J. Mol. Catal. B Enzymol., 19–20, 3–12.

Tao,H. and Cornish,V.W. (2002) Curr. Opin. Chem. Biol., 6, 858–864.[CrossRef][ISI][Medline]

Wheeler,J.B., Stourman,N.V., Thier,R., Dommermuth,A., Vuilleumier,S., Rose,J.A., Armstrong,R.N. and Guengerich,F.P. (2001) Chem. Res. Toxicol., 14, 1118–1127.[CrossRef][ISI][Medline]

Wilson,D.S. and Szostak,J.W. (1999) Annu. Rev. Biochem., 68, 611–647.[CrossRef][ISI][Medline]

Voigt,C.A., Kauffman,S. and Wang,Z.G. (2000) Adv. Protein Chem., 55, 79–160.[ISI][Medline]

Received October 13, 2003; accepted October 16, 2003 Edited by Alan Fersht





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (1)
Request Permissions
Google Scholar
Articles by Larsson, A.-K.
Articles by Mannervik, B.
PubMed
PubMed Citation
Articles by Larsson, A.-K.
Articles by Mannervik, B.