Predicting the structure of protein cavities created by mutation

Claudia Machicado, Marta Bueno and Javier Sancho1

Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza, 50009 Zaragoza, Spain


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
To assist in the efficient design of protein cavities, we have developed a minimization strategy that can predict with accuracy the fate of cavities created by mutation. We first modelled, under different conditions, the structures of six T4 lysozyme and cytochrome c peroxidase mutants of known crystal structure (where long, hydrophobic, buried side chains have been replaced by shorter ones) by minimizing the virtual structures derived from the corresponding wild-type co-ordinates. An unconstrained pathway together with an all-atom atom representation and a steepest descent minimization yielded modelled structures with lower root mean square deviations (r.m.s.d) from the crystal structures than other conditions. To test whether the method developed was generally applicable to other mutations of the kind, we have then modelled eighteen additional T4 lysozyme, barnase and cytochrome c peroxidase mutants of known crystal structure. The models of both cavity expanding and cavity collapsing mutants closely fit their crystal structures (average r.m.s.d. 0.33 ± 0.25 Å, with only one poorer prediction: L121A). The structure of protein cavities generated by mutation can thus be confidently simulated by energy minimization regardless of the tendency of the cavity to collapse or to expand. We think this is favoured by the fact that the typical response observed in these proteins to cavity-creating mutations is to experience only a limited rearrangement.

Keywords: barnase/cytochrome c/lysozyme/peroxidase/protein cavity/protein design/protein stability


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
The design of protein ligands usually concentrates on achieving a satisfactory chemical and shape complementarity between a small molecule and a region of the protein surface (Amzel, 1998Go; Wlodawer and Vondrasek,1998Go; Gane and Dean, 2000Go). Towards that end, protein flexibility, which is highest at the surface of proteins, poses a serious problem (Jones et al., 1997Go). Because the X-ray or NMR structure of a given protein target may be just one of the many possible with similar energy, ligand binding may lead to a significant decrease in the protein conformational entropy, with a concomitant loss of binding energy. Experimental resolution of the structure of complexes between designed ligands and their targets sometimes shows that the binding has occurred in a protein conformation significantly different from that used for the design (Schoichet et al., 1993Go). Coping with protein surface flexibility is in fact a major issue in docking strategies (Abagyan and Totrov, 2001Go).

Unlike protein surfaces, the interior of proteins, far more rigid and thus lacking the aforementioned problem, has so far received little attention as a suitable scenario for ligand binding. There are several reasons for this. First, proteins are very compact and few cavities, usually small, are found in them (Brunori and Gibson, 2001Go); second, hosting ligands in protein cavities may face an important kinetic barrier compared with binding at the surface; and, third, the hydrophobic protein interior seems to offer little potential for specificity. However, protein cavities can be easily made by truncation mutations; the kinetic barrier can be overcome, if required, by refolding the protein in the presence of excess ligand; and there is recent mounting evidence that certain interactions involving hydrophobic residues, such as the cation–{pi} interaction (Fernández-Recio et al., 1999Go; Gallivan and Dougherty, 1999Go) and hydrogen bonding to {pi} clouds (Steiner and Koellner, 2001Go), can be specific enough. In addition, conventional hydrogen bonding to internal polar groups (such as the peptide bond) can also be used to confer specificity to protein–ligand complexes. The pioneering work of Matthews and co-workers showed that small organic molecules could be hosted in protein cavities created by mutation (Eriksson et al., 1992aGo). Further work by this group indicates, however, that proteins do not always act passively upon cavity-creating mutations but that they often tend either to collapse or to expand around the newly created cavity (Eriksson et al., 1992bGo; Xu et al., 1998Go). This is certainly inconvenient if ligand binding sites are to be created by rational side chain deletions inside a protein since it introduces the need for a subsequent protein structure determination in order to know the real structure of the cavity.

We show here that the fate (collapse, expansion or simply no change) of protein cavities created by substituting apolar, buried side chains by apolar smaller ones can be accurately predicted by a simple method that involves energy minimization under certain conditions. Our method predicts the coordinates of the atoms that surround cavities created in T4 lysozyme, barnase and cytochrome c peroxidase and therefore constitutes a useful tool for designing ligand binding sites inside proteins.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Protein X-ray structures

High-resolution crystal structures of T4 lysozyme, barnase and cytochrome c peroxidase mutants, where buried, hydrophobic side chains have been mutated to shorter ones, were used as modelling targets (Goodin and McRee, 1993Go; Buckle et al., 1996Go; Baldwin et al., 1998Go). Most mutants contain Leu, Val or Ile to Ala substitutions, but mutants with Phe or Met to Ala and also with Trp to Gly and Arg to Ala replacements were also considered. The Protein Data Bank codes of the proteins used are, for T4 lysozyme 2LZM [WT], 1l63 [pseudo WT with C54T and C97A mutations], 1l67 [L46A], 1l90 [L99A], 1l69 [L133A], 200l [L121A], 1l85 [F153A], 226L [L133G], 222L [M102A], 238L [V103A], 252L [M102A/M106A], 241L [I29A], 1L89 [L99A/F153A], 244L [I100A], 245L [M6A]), 239L [I17A], 236L [V87A], 235L [V111A], 237L [V149A], 243L [I58A], 246L [F67A]; for barnase 1A2P [WT], 1BRI [I76A], 1BRJ [I88A], 1BRK [I96A]; and for cytochrome c peroxidase 1CCA [WT], 1CMQ [W191G], 1DJ1 [R48A]. The lysozyme L133A and L133G mutations are introduced into the wild-type gene whereas all other lysozyme mutants contain, in addition, C54T and C97A mutations.

Energy minimization

All minimizations were carried out using the CHARMm force field, as implemented in InsightII (MSI) (Brooks et al., 1983Go). Explicit solvent molecules, as present in the crystal structure, and hydrogen atoms were considered. Non-bond terms were truncated at 11 Å (smoothing from 8 Å), with a switching function for van der Waals and electrostatic terms (Brooks et al., 1983Go). Since crystal water molecules were explicitly included, a constant dielectric of 1 was used throughout the minimizations. The non-bonded list, which defines the groups of atoms included in the calculation of non-bond energies (van der Waals and electrostatic), was updated every 10 steps.

Two thousand steps of conjugate gradients or steepest descents were applied to each structure in an unconstrained path (Fletcher and Reeves, 1964Go). Minimizations were started from the X-ray structures of the wild-type T4 lysozyme, barnase and cytochrome c peroxidase after having implemented the appropriate in silico mutations. Nineteen T4 lysozyme, three barnase and two cytochrome c peroxidase truncation mutants of available X-ray structure were modelled. The optimized protocol consisted of 2000 steps of steepest descents, distance-dependent dielectric constant (1 times r), a gradient tolerance of 0.1 kcal/mol.Å, no constraints in the system, cut-offs of 8 and 11 Å with a switching function to evaluate non-bond interactions and updating every 10 steps.

Measurement of cavity volume and cavity collapse or expansion

Cavity volume was calculated using a probe radius of 1.4 Å using the method implemented in Swisspdb Viewer (Guex and Peitsch, 1996Go). Percentages of collapse of the modelled (after minimization) and real cavities (as seen in the X-ray structures) with respect to the theoretical cavities (in silico mutations before minimization) were calculated as 100(VtVm)/Vt and 100(Vt Vc)/Vt, respectively, where Vt is the volume of the cavity created by replacing in silico a given side chain by Ala (or Gly) before any minimization is performed, Vm is the volume of that cavity after minimization and Vc is the cavity volume in the mutant X-ray structure. Negative percentages of collapse indicate cavity expansion.

Comparison of modelled and X-ray structures and calculation of solvent accessibility and of a flexibility index

To compare model and crystal structures, root mean square deviations (r.m.s.d) were calculated. First, the model and X-ray structures were superimposed in all-atom mode using Swisspdb Viewer. Then, r.m.s.d. values for the atoms of surface cavity residues, or of their side chains, were calculated. Percentages of solvent exposure for the mutated side chains in the wild-type structure were calculated using the Connolly algorithm, with a 1.4 Å probe radius (Connolly, 1983Go). A mean flexibility index of the protein structure at the mutation region was calculated as the average of the B-values of the atoms of the surface cavity residues.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Optimization of the energy minimization protocol with a subset of mutants

The minimization method was established by probing a range of different conditions (see Table IGo). The energy minimization algorithm was the first variable analysed. To do that we modelled three cavity-forming T4 lysozyme mutants (L99A, M102A, L133A) using either conjugate gradients or steepest descents. The steepest descents algorithm consistently gave better results, that is, lower r.m.s.d. values than conjugate gradients for the three mutants analysed (Table IGo). Next, we determined the number of iteration steps required to reproduce best the crystal structures, using three T4 lysozyme cavity-forming mutants (L99A, L133A and M102A) and three mutants where no cavity appeared after mutation (I17A, I76A and R48A). The best overall results for the steepest descents algorithm were obtained with 2000 iteration steps, although using 1000 steps yielded comparable results. The influence of the number of iteration steps on the performance of the conjugate gradients algorithm was similar. We then optimized some other variables of the energy minimization protocol such as the gradient tolerance value, update frequency and cut-off values for non-bonded interactions. The most accurate results were obtained using a gradient of 0.1 kcal/mol.Å as the convergence criterion, updating every 10 steps and cut-offs of 8 and 11 Å, with a switching function, for non-bonded interactions. Finally, a radius-dependent dielectric method was selected to minimize the effects of long-range force truncation (Brooks et al., 1985Go).


View this table:
[in this window]
[in a new window]
 
Table I. Optimization of the energy minimization method for calculating the structure of lysozyme truncation mutants
 
Prediction of final cavity volume and of the cavity tendency to reduce or to expand

The volumes of the cavities present in the X-ray structures of the truncation mutants, calculated using a probe radius of 1.4 Å, are compared in Table IIGo with those of the theoretical cavities that would have arisen if no protein rearrangements had occurred as a consequence of the mutations. The volumes of these virtual or theoretical cavities were calculated from the coordinates of the wild-type proteins modified so that the side chains mutated to alanine appeared truncated to their ß-carbons (to the {alpha}-carbon in the L133G and W191G mutants). As Table IIGo shows, many of these virtual cavities tend to collapse to some extent, some markedly, as a consequence of the protein rearrangements that lead to the most stable conformation available in the mutants. In some cases, however, the most stable conformation is attained by enlarging the virtual cavity, as reflected by a larger cavity volume in the crystal structure. To quantify cavity expansion or collapse, we calculate percentages of volume reduction from the volumes of the virtual and X-ray cavities (Eriksson et al., 1992bGo). Our values differ slightly from those quoted by Eriksson et al. (1992b) and by Buckle et al. (1996) because they used for the probe a radius of 1.2 Å and for the ß-carbon 2.02 Å, whereas in our calculations we used a probe radius of 1.4 Å and for the ß-carbon 2.10 Å (Xu et al., 1998Go).


View this table:
[in this window]
[in a new window]
 
Table II. Cavity mutants modelling
 
To test the performance of the minimization procedure in predicting the volumes of protein cavities originating in truncation mutations, we modelled the structure of 24 truncation mutants by both steepest descents and conjugate gradients and then calculated the volumes of the modelled cavities (Table IIGo). These modelled volumes are compared with the experimental values in Figure 1Go. As shown, both conjugate gradients and steepest descents algorithms yield cavity volumes in excellent agreement with the experimental values.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 1. Comparison of crystal cavity volumes and modelled cavity volumes. The volumes were calculated as described in Materials and methods. Top, conjugate gradients; bottom, steepest descents. Circles, lysozyme mutants; triangles, barnase; squares, cytochrome c peroxidase.

 
A more stringent test of the minimization procedure can be made by assessing its ability to predict the individual fate of each cavity. To assess the ability of our minimization procedure to capture the intrinsic tendency of engineered protein cavities to collapse or to expand, we calculated the percentages of cavity reduction predicted from the models, which are compared in Figure 2Go with the corresponding experimental values. It can be seen that both cavity expansions and cavity collapses are well predicted by the two algorithms. The performance of the minimization procedure is remarkable when the steepest descents algorithm is used: in only two cases is a cavity that expands predicted to collapse (I29A and V149A) and in no case is a collapsing cavity predicted to expand (see Table IIGo).



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 2. Percentage of volume reduction in the modelled and crystal structures relative to the theoretical cavity volume (see Materials and methods). Top, conjugate gradients; bottom, steepest descents. Circles, lysozyme mutants; triangles, barnase; squares, cytochrome c peroxidase. Two mutants with either crystal or model cavity volumes of zero, leading to artefactually large percentages of volume reduction (V149A, V103A), and one mutant with no change of cavity volume (F67A) are not considered in the fit (open symbols).

 
Accurate prediction of cavity structure

The main goal of the present study, however, was to predict in detail the structural response of proteins to cavity-creating mutations; in other words, to be able to calculate accurately the structure of the protein around the cavity without having to determine it from X-ray or NMR experiments. In this respect, our energy minimization procedure is able to predict not only the fate of cavities (i.e. to reduce or to expand), but also their actual shape. To assess the similarity between the modelled structures and their corresponding crystal structures, they were superimposed and r.m.s.d of the atoms in cavity surface residues were calculated (Table IIGo). Only the atoms in the side chains were computed as they are expected to show a greater mobility and therefore to depart more from the theoretical structure than main chain atoms. The conjugate gradients method yields modelled structures with r.m.s.d from 0.25 to 1.88 Å for cavity surface side chain atoms, with typical values around 0.7 Å. The best performance is again offered by the steepest descents; 96% of the structures calculated (23 out of 24) show, at cavity surface side chains, r.m.s.d from 0.16 to 0.48 Å. The only model that is predicted with lower resolution is that of L121A, whose cavity experiences by far the largest volume reduction of all the cavities studied, perhaps because the leucine is slightly exposed (Table IIGo) and in a sequence segment rich in residues with high B-factors (not shown).

Thus, the minimization protocol presented here (using steepest descents) allows the calculation of the coordinates of side chains facing internal protein cavities created by truncation mutations with high accuracy for most of the cases tested [incidentally, for all the cases where the mutated side chain is completely buried (13 structures; see Table IIGo)]. Although the minimization protocol presented here follows an unconstrained path, no significant distortion of the protein structures occurs outside the cavity regions (not shown). Therefore, minimizing a protein mutant bearing an internal side chain deletion by this procedure essentially yields the protein X-ray structure in a very short time. Two examples (one of cavity expansion and one of cavity collapse) where the excellent prediction of cavity surface atoms coordinates can be seen are shown in Figure 3Go.



View larger version (65K):
[in this window]
[in a new window]
 
Fig. 3. Representative modelling of cavity expansion and reduction in T4 lysozyme mutants using steepest descents energy minimization. (A) Cavity expansion in the L99A T4 lysozyme mutant; (B) cavity reduction in the V111A T4 lysozyme mutant.

 
Suitable mutations for cavity prediction

To assess the extent to which the performance of the minimization is compromised in particular types of mutations, we analysed whether the quality of the models (as judged from the model/crystal cavity side chains r.m.s.d in Table IIIGo) is related to a number of structural characteristics of the cavities created by the mutations. Neither the average B-factor of cavity surface side chain atoms (taken as a measure of local flexibility), nor the solvent accessibility of the mutated residue (within our 0–15% window), nor its secondary structure location, nor the theoretical volume of the cavity are related to the accuracy of the models (Table IIGo). This indicates that the minimization can be applied in principle to any buried (>85%) apolar residue in a given protein. We note, anyway, that the mutant with the highest r.m.s.d. (L121A) is slightly exposed and close to crystal water molecules. An obvious limitation of the method is that it does not attempt to predict whether a given mutation will yield a thermodynamically stable protein or whether water will bind to the cavity; these issues have to be determined by experiment.


View this table:
[in this window]
[in a new window]
 
Table III. R.m.s.d of theoretical and of steepest descents-modelled cavities
 
The reaction of proteins to cavity-creating mutations

Our analysis of the rearrangements experienced by 24 cavity-creating mutants from three different proteins allows an interesting conclusion to be drawn: for mutations involving apolar, buried side chains, the rearrangements experienced by these proteins are almost invariably small. In many cases (13 in 24), the protein collapses slightly by an average of 29 ± 31 Å3 (leaving L121A aside: 21 ± 13 Å3); in four cases there is no significant volume change; and in seven cases there is an average volume increase of 11 ± 10 Å3. This is a particularly favourable scenario for ligand binding design because it suggests that, in many instances, the structure of the cavity mutant could be taken, as a first approximation, as that of the theoretical structure resulting form the in silico implementation of the mutation. To test this, we compare in Table IIIGo the similarity between the modelled and crystal structures with that between the theoretical and crystal structures. As Table IIIGo shows, despite the fact that the energy minimization consistently approximates cavity volumes to those appearing in the crystal structures and predicts the tendency of the cavity to expand or to collapse (Table IIGo and Figures 1 and 2GoGo), the model/crystal r.m.s.d are only slightly lower on average that the theoretical/crystal r.m.s.d. However, this direct comparison is complicated by the fact that, unlike the modelled structures, wild-type crystal structures (from where all theoretical structures are calculated) and mutant structures have been presumably subjected to the same minimization procedure in the laboratories where the structures were solved, which can increase the model/crystal r.m.s.d relative to the theoretical/crystal values. Thus, a first approximation to the structure of cavity-creating mutants can be obtained by simply performing the in silico mutation, while a higher refined structure is obtained when this theoretical structure is subjected to the optimized minimization procedure reported here, that captures the tendency of the cavity to expand or to collapse. The minimization is especially important in some instances where significant displacements of surface-located side chains occur upon mutation because the minimized structures do reveal those movements (Figure 4Go). We offer in Table IVGo, for experimental testing, a list of predicted cavity sizes of not yet reported mutants of the three proteins.



View larger version (49K):
[in this window]
[in a new window]
 
Fig. 4. Rearrangement of cavity surface side chains in the L99A lysozyme mutant accurately modelled by steepest descents minimization. The figure shows the movement of F114, F153 and Y88 at the cavity surface from the theoretical structure (green) to the crystal (blue) and modelled (red) structures.

 

View this table:
[in this window]
[in a new window]
 
Table IV. Predicted cavities
 
Conclusion

X-ray analysis of protein response to cavity-creating mutations had shown that replacement of buried, bulky, hydrophobic side chains by alanine leads to slight side chain adjustments rather than to substantial repacking of protein cores. Perhaps for this reason the simple minimization procedure implemented here can predict with high accuracy the structure of the mutated proteins so that their coordinates can be obtained from those of the corresponding wild-type proteins without having to perform X-ray or NMR studies. We hope that this will stimulate the consideration of protein cores as suitable scenarios for binding site design and the use of proteins as small molecule carriers and deliverers.


    Notes
 
1 To whom correspondence should be addressed. E-mail: jsancho{at}posta.unizar.es Back


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Abagyan,R. and Totrov,M. (2001) Curr. Opin. Chem. Biol., 5, 375–382.[CrossRef][ISI][Medline]

Amzel,L.M. (1998) Curr. Opin. Biotechnol., 9, 366–369.[CrossRef][ISI][Medline]

Baldwin,E., Baase,W., Zhang,X., Feher,V. and Matthews,B. (1998) J. Mol. Biol., 277, 467–485.[CrossRef][ISI][Medline]

Brooks,B., Bruccoleri,R., Olafson,B., States,D., Swaminathan,S. and Karplus,M. (1983) J. Comput. Chem., 4, 187–217.[ISI]

Brooks,C., Brunger,A. and Karplus,M. (1985) Biopolymers, 24, 843–865.[ISI][Medline]

Brunori,M. and Gibson,Q. (2001) MBO Rep., 8, 674–679.

Buckle,A., Cramer,P. and Fersht,A. (1996) Biochemistry, 35, 4298–4305.[CrossRef][ISI][Medline]

Connolly,M. (1983) Science, 221, 709–713.[ISI][Medline]

Eriksson,A., Baase,W., Wozniak,J. and Matthews,B. (1992a) Nature, 355, 371–373.[CrossRef][ISI][Medline]

Eriksson, A., Baase,W., Zhang,X., Heinz,D., Blaber,M., Baldwin,E. and Matthews,B. (1992b) Science, 255, 178–183.[ISI][Medline]

Fernández-Recio,J., Romero,A. and Sancho,J. (1999) J. Mol. Biol., 290, 319–330.[CrossRef][ISI][Medline]

Fletcher,R. and Reeves,C. (1964) Comput. J., 7, 148–154.

Gallivan,J. and Dougherty,D. (1999) Proc. Natl Acad. Sci. USA, 96, 9459–9464.[Abstract/Free Full Text]

Gane,P. and Dean,P. (2000) Curr. Opin. Struct. Biol., 10, 401–404.[CrossRef][ISI][Medline]

Goodin,D. and McRee,D. (1993) Biochemistry, 32, 3313–3324.[ISI][Medline]

Guex,N. and Peitsch,M. (1996) Protein Data Bank Q. Newsl., 77, 7.

Jones,G., Willett,P., Glen,R., Leach,A. and Taylor,R. (1997) J. Mol. Biol., 267, 727–748.[CrossRef][ISI][Medline]

Schoichet,B., Stroud,R., Santi,D., Kuntz,I. and Perry,K. (1993) Science, 259, 1445–1450.[ISI][Medline]

Steiner,T. and Koellner, G. (2001) J. Mol. Biol., 305, 535–557.[CrossRef][ISI][Medline]

Wlodawer,A. and Vondrasek,J. (1998) Annu. Rev. Biophys. Biomol. Struct., 27, 249–284.[CrossRef][ISI][Medline]

Xu,J., Baase,W., Baldwin,E. and Matthews,B. (1998) Protein Sci., 7, 158–177.[Abstract/Free Full Text]

Received December 3, 2001; revised April 19, 2002; accepted May 21, 2002.