Refinement of protein cores and protein–peptide interfaces using a potential scaling approach

Ralph Nico Riemann1,2 and Martin Zacharias1,3

1International University Bremen, School of Engineering and Science, D-28759 Bremen, Germany and 2Institut für Molekulare Biotechnologie, Beutenbergstrasse 11, D-07745 Jena, Germany

3 To whom correspondence should be addressed. E-mail: m.zacharias{at}iu-bremen.de


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Refinement of side chain conformations in protein model structures and at the interface of predicted protein–protein or protein–peptide complexes is an important step during protein structural modelling and docking. A common approach for side chain prediction is to assume a rigid protein main chain for both docking partners and search for an optimal set of side chain rotamers to optimize the steric fit. However, depending on the target–template similarity in the case of comparative protein modelling and on the accuracy of an initially docked complex, the main chain template structure is only an approximation of a realistic target main chain. An inaccurate rigid main chain conformation can in turn interfere with the prediction of side chain conformations. In the present study, a potential scaling approach (PS-MD) during a molecular dynamics (MD) simulation that also allows the inclusion of explicit solvent has been used to predict side chain conformations on semi-flexible protein main chains. The PS-MD method converges much faster to realistic protein–peptide interface structures or protein core structures than standard MD simulations. Depending on the accuracy of the protein main chain, it also gives significantly better results compared with the standard rotamer search method.

Keywords: molecular dynamics simulation/peptide–protein docking/potential smoothing/protein design/side chain prediction


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
The prediction of amino acid side chain conformations is an important step in comparative protein modelling, protein design and for generating atomic coordinates from low-resolution protein folding simulations. Most of the current side chain prediction methods are based on elegant and rapid searches of side chain rotamer libraries to identify an optimal set of side chain conformations for a given rigid protein main chain structure (Summers and Karplus, 1989; Holm and Sander, 1991Go; Lee and Subbiah, 1991Go; Dunbrack and Karplus, 1993; Koehl and Delarue, 1994Go; Dahiyat and Mayo, 1997Go; Tuffery et al., 1997; Samudrala and Moult, 1998Go; Swain and Kemp, 2001Go; Xiang and Honig, 2001Go; reviewed by Dunbrack, 2002Go; Canutescu et al., 2003Go; Källblad and Dean, 2003Go). However, in the case of comparative protein modelling, depending on the target–template similarity, the main chain structure of the template is only an approximation of a realistic target main chain. In the case of protein design one is interested in predicting the structure of a protein mutation before the synthesis. X-ray crystallography studies indicate that the protein main chain conformation can change in response to point mutations (Arcus et al., 1995Go; Wright and Lim, 2001Go). An inaccurate rigid main chain conformation can in turn interfere with the prediction of side chain conformations (Tufféry et al., 1997Go). Another limitation of current side chain prediction methods is due to the application of relatively simple energy functions that in many cases consider only the steric packing of generated side chain conformations and completely neglect the influence of surrounding solvent molecules (Canutescu et al., 2003Go).

The situation is similar in the case of predicted protein–protein or protein–peptide complexes. A typical protein–protein docking simulation to systematically search for putative interaction geometries results in a set of possible complexes that can still deviate from a realistic structure owing to approximations used during the initial docking approach (e.g. rigid protein partners). The refinement, in particular of the interface between the binding partners, is an important step in order to achieve an accurate complex structure (Najmanovich et al., 2000Go; Smith and Sternberg, 2002Go). As for the prediction of protein core structures, a straightforward and rapid approach for interface refinement is to assume that the protein main chain structure at the interface is approximately correct and to predict side chain conformations based on the rotamer search and optimal steric fit (Gray et al., 2003Go). The success of this approach may significantly depend on the accuracy of the main chain structure at the interface close to the native structure, after the initial docking attempt. In addition, water molecules, not present during standard side chain rotamer searches, might be of importance for the protein–protein interface structure.

In principle, molecular dynamics (MD) simulations allow for simultaneous adjustment of both side chain and backbone atoms and can be performed in the presence of water molecules. However, repacking of side chain conformations during an MD simulation starting, for example, from a random initial side chain placement involves crossing of many high-energy barriers. The simulation can easily be trapped in unfavourable states that are separated from low-energy states by large energy barriers. A standard method to enhance the convergence of MD simulations is to use the simulated annealing (SA) approach, i.e. to initially use a high simulation temperature to allow for barrier crossing followed by cooling of the simulation to identify favourable low-energy states (Brunger et al., 1997Go). The high initial temperatures used in SA approaches may, however, interfere with the presence of explicit water molecules during the simulation and can also distort the overall protein fold. In the present study, we have developed and tested a potential scaling method that allows the specific lowering of energy barriers for residues located in the protein core or at the interface of protein–peptide complexes during an MD simulation with a minimum perturbation of other residues and surrounding water molecules. The smooth rescaling of the potential during MD simulations allows a rapid convergence to an optimal interface structure with simultaneous adjustment of both side chain and backbone protein core or interface structure. The approach was tested on three test proteins and two protein–peptide complexes with known structures, including a systematic evaluation and comparison with standard MD simulations and a rotamer search based method. In general, the potential scaling (PS-MD) approach was found to be much more efficient than standard MD simulations to obtain realistic side chain conformations starting from arbitrary initial placements. Depending on the accuracy of the protein main chain, the approach also performed better than search strategies based on rotamer libraries.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
The performance of the potential scaling approach for the prediction of protein core side chains was systematically evaluated on three test structures that represent different protein folds. One structure consisted of four ß-strands enclosed between two {alpha}-helices in a sandwich conformation (SH2 domain; PDB entry: 1CWE; Mikol et al., 1995Go). The second test case was an all ß-sheet protein (SH3 domain; PDB entry: 1GFC; Kohda et al., 1994Go). This structure corresponds to an energy-minimized average structure obtained by NMR spectroscopy (the other two test cases have been determined by X-ray crystallography). The third test protein was an all {alpha}-helical protein (DNA-binding fragment of the cro repressor protein of Escherichia coli phage lambda; PDB entry: 1R69; Mondragon et al., 1989Go). The three cases (test set) were used to systematically optimize the generation of start structures and to test simulation parameters. In addition, the potential scaling method was further evaluated on a set of 12 proteins with between 56 and 124 amino acid residues (evaluation set), for which high-resolution X-ray structures were available for the comparison of predicted and experimentally observed side chain conformations (the PDB code of the protein structures is given in Table I). To test the potential scaling method for protein–protein and protein–peptide interface refinement two protein–peptide complexes were used as model test systems. One complex consisted of the protein MDM2 from Xenopus laevis in complex with an {alpha}-helical trans-activation domain (residues 17–27) of the protein p53 (PDB entry: 1YCQ; Kussie et al., 1996). The second complex corresponds to the protein binding domain (residues 218–277 were taken) of the chaperone protein GroEL and a peptide consisting of 11 residues (residues 602–612; mPDB entry: 1DKD; Chen and Sigler, 1999Go).


View this table:
[in this window]
[in a new window]
 
Table I. Evaluation of the PS-MD approach on a set of proteins with known (X-ray) structure

 
Potential scaling of side chain interactions

All energy minimizations (EMs) and MD simulations were performed using a modified version of the sander module of the Amber6 package (Case et al., 1999Go) with the Cornell et al. (1995)Go force field parameter file (parm94). During this potential scaling MD run, all pairwise Lennard–Jones and electrostatic interactions of selected side chain atoms beyond Cß were scaled. A soft-core scaling scheme of the following form was employed (Zacharias et al., 1994Go) and implemented in the sander code:

Here, {lambda} corresponds to a scaling parameter that can vary between 0 (original interaction energy) and 1 (no interaction), whilst {delta} is the shift parameter and {varepsilon}ij is the minimum Lennard–Jones energy for the interaction between atoms i and j, respectively. This functional form allows a smooth transition between zero and full potential and preserves the position of the energy minimum of the Lennard–Jones term (see Figure 1). For {delta} several values from 10 to 30 Å2 were tested and, owing to its performance, {delta} = 20 Å2 was finally chosen for the simulations using the ddd model, whereas {delta} = 15 Å2 was used for simulations in the presence of explicit water molecules. During the potential scaling run the scaling parameter {lambda} was varied linearly in 21 steps.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 1. Example of a soft-core separation-shifted Lennard–Jones potential scaled by various scaling factors {lambda} (see Materials and methods).

 
Since the PS-MD approach is expected to especially improve the sampling of buried and sterically constrained side chains, it was applied only to buried side chains and side chains located at the interface of protein–peptide complexes (85% buried solvent-accessible surface area as judged by the solvent-accessible surface of the side chain in the experimental reference structure). The respective residue numbers for the set of test structures are as follows. (i) In the case of 1R69 (10 residues): 6, 20, 24, 31, 34, 35, 44, 48, 52 and 59; (ii) in the case of 1CWE (18 residues): 19, 29, 30, 31, 32, 42, 43, 44, 45, 56, 61, 71, 77, 80, 83, 85, 87 and 98; and (iii) in the case of 1GFC (10 residues): 5, 11, 18, 19, 21, 27, 29, 38, 49 and 54. The same selection criteria for the definition of buried residues (85% buried solvent-accessible area) were used in the case of the evaluation set of protein structures (see Table I). For the two complexes all ligand residues of the p53 peptide and the GroEL peptide, respectively, were included in the potential scaling procedure.

Restraint MD simulation and generation of start structures

In order to allow for conformational relaxation of the protein main chain during potential scaling and standard MD simulations, positional window restraints were applied that allow for free motion within a preset range with respect to the start structure and application of a soft quadratic penalty (force constant of 5.0 kcal mol–1 Å–2) for deviations beyond the preset range. In general, all side chains involved in the potential scaling were completely free to move throughout all stages of the simulations.

Start structures (24) for each test case were generated using high-temperature MD simulations (900 K; 100 ps), with {lambda} values close to 1 for the selected fully mobile side chains (means with reduced interaction of the selected side chains) and positional window restraints (±1 Å) with respect to the experimental structure for the rest of the protein. The structures were subsequently energy minimized (2000 steps) with the full force field followed by a short MD simulation (300 K, 3 ps) and another EM (2000 steps; both with the same positional window restraints as above). Generated structures with high steric strain and structures with an average root mean square deviation (RMSD) of <2.0 Å for buried side chains with respect to the experiment were not considered. This protocol leads to start structures with randomly misplaced buried side chains that are sterically possible and average main chain deviations from the experiment of 0.4–0.8 Å (RMSD of the C{alpha} atoms). The same conditions were used to generate start structures for the evaluation set of proteins (12 protein structures, see Table I) except that in this case only eight start structures per protein were used.

In the case of the peptide–protein complexes the peptide side chains were first randomized using the MD simulations, with {lambda} values close to 1 for the peptide side chains (see above). In addition, ligand peptides were randomly misplaced by up to 1.3 Å (RMSD of the C{alpha} atoms) from their placement in the experimental structure to mimic an imperfect docking result. The randomized initial structures were energy minimized using the full potential and by applying window restraints (allowing free motion within ±2 Å from the start structure). This step also leads to a relaxed protein main chain in part compatible with the initial side chain placements. This step was necessary to compare the PS-MD method with standard and ‘temperature SA’ MD simulations using the same start structures and the full potential during the entire simulation.

Simulations with a distance-dependent dielectric constant

Simulations in the absence of solvent were performed using a distance-dependent dielectric constant of {varepsilon} = 4r (ddd model), a 12.0 Å cut-off and a time step of 1 fs. Bonds involving hydrogen were restrained by applying the RATTLE (Anderson, 1983Go) algorithm. In addition to regular MD simulations (at 300 K with a time constant of 2.0 ps) and PS-MD simulations, the performance of SA (Kirkpatrick et al., 1983Go; Brunger et al., 1997Go) simulations was also tested starting at a simulation temperature of 1000 K cooled down to 300 K (within 11 steps) for the same simulation time as used for the PS-MD and regular MD runs. The start structures also served as restraining reference structures during the PS-MD, SA-MD and standard MD simulations. The total simulation length was 252 ps. All final structures were energy minimized using the full energy function.

Explicit solvent simulations

In order to systematically test the potential scaling scheme for protein–peptide interface refinement in the presence of explicit solvent, the solvated part of the protein–peptide complexes was restricted to a water cap around the binding site. In the case of the MDM2–p53 complex, a water cap with radius 18.5 Å and 421 TIP3 water molecules (Jorgensen et al., 1983Go) centred at the C{alpha} atom of Leu22 was added. Beyond the radius of 18.5 Å a harmonic potential with a force constant of 1.5 kcal mol–1 Å–2 was applied to prevent solvent molecules from escaping out of the water cap. The cap completely embedded the peptide and a large part of the receptor protein. A similar set-up was used in the case of the GroEL fragment and the bound peptide, with a solvent cap radius of 19.5 Å (531 TIP3 water molecules) centred at the C{alpha} atom of Thr605 of the ligand peptide. All receptor and peptide atoms were mobile during the simulations; however, the motion of all non-scaled atoms was restricted to ±2 Å of the corresponding atom position in the start structure (window restraints). All simulations with explicit solvent were performed using a dielectric constant of {varepsilon} = 1.0. An 11.0 Å cut-off distance for non-bonded interactions was also used and the system was coupled to a heat bath of 300 K. The time step was 1 fs for all MD runs and bonds involving hydrogen have been restrained by applying the RATTLE algorithm (Andersen, 1983Go).

Comparison with a side chain rotamer search method and experiment

For comparison the systematic side chain rotamer search method SCWRL3.0 (Canutescu et al., 2003Go) based on the dead-end elimination theorem, a rigid backbone and a simple steric packing energy function was applied on the same set of start structures. The SCWRL3.0 approach is a deterministic side chain prediction method. Therefore, differences in the final results are solely due to the differences in protein main chains of the various start structures. To allow direct comparison with the PS-MD results the same set of buried or interface side chains as used during PS-MD was rebuilt [nearly experimental conformations for the other side chains employing the –s (side chain fixing) option of SCWRL3.0].

Comparison with experimental structures

The PS-MD method for predicting the conformation of buried or interface side chain allows for conformational changes of the protein backbone within preset boundaries (typically free backbone movements of up to ±1 Å were allowed). However, this complicates the comparison of predicted side chain placements with experimental reference structures in terms of a RMSD of buried side chains from the experiment, since it is not only influenced by the side chain but also by the backbone conformation. In order to allow a direct comparison of the side chain placements with experimental reference structures, the final structures of the PS-MD and MD simulations as well as the SCWRL3.0 results were energy minimized using the experimental backbone structures as restraining reference structures. This step basically eliminates the contribution of the protein main chain. At the same time the side chains stay nearly in the same orientation/conformation. Independently, final structures were also compared with the experimental structures in terms of internal coordinates (side chain dihedral torsion angles).

Smoothing of RMSD distribution curves

To allow better comparison, RMSD distribution curves were smoothed using a step-wise error function Fi(RMSD) representing a single RMSD result and the graph G(RMSD) is then the normalized sum over all N final structures, with

and

where Ri is the RMSD value of the i-th structure, N the normalizing pre-factor and {Delta} is the step width (=0.04 Å). The function {theta} (rr0) equals zero if the distance r is smaller than r0 and 1 if it is greater.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Core side chain prediction with a semi-flexible main chain

In a first set of simulations the PS-MD approach was used to predict the core side chain conformations of three test proteins. These test structures represent different protein folds. One structure (PDB entry: 1CWE; Mikol et al., 1995Go), termed 1CWE in the following, consists of four ß-strands enclosed between two {alpha}-helices in a sandwich conformation ({alpha}/ß-protein with a SH2 domain fold). The second test case, termed 1GFC, was an all ß-sheet protein (NMR structure of an SH3 domain; PDB entry: 1GFC; Kohda et al., 1994Go) and the third test structure (PDB entry: 1R69; Mondragon et al., 1989Go), termed 1R69, corresponds to an all {alpha}-helical protein (DNA-binding fragment of the cro repressor protein of E.coli phage lambda, see Figure 2). The approach was compared with standard MD simulations, (temperature) simulated annealing (T-SA-MD) runs and a systematic rotamer search method based on the dead-end elimination theorem and using a fixed protein main chain (SCWRL3.0). In order to test the sensitivity of the simulation results on the side chain start placement and small deviations of the protein main chain in each case, 24 start structures with random but sterically allowed initial placement of buried side chains were generated (see Materials and methods). An overview of the simulation steps of the PS-MD method is given in Figure 3. The protein main chain of these start structures deviated from the corresponding experimental main chain structure on average by 0.3–0.8 Å but also included larger local deviations at positions of initially misplaced side chains (of 1–1.5 Å). To allow conformational relaxation of the complete protein structures during the side chain building process, the protein main chain and solvent-exposed residues were allowed to move from the respective start positions by up to ±1.0 Å (positional window constraints). Larger displacements were penalized by a soft harmonic potential (force constant 5.0 kcal mol–1 Å–2). Initial tests indicated that the PS-MD approach is not useful for the structure prediction of solvent-exposed side chains. For example, in the case of the 1R69 test protein a set of 24 random start structures with all side chains included in the PS-MD approach was generated and showed an average side chain RMSD of ~2.1 Å from the experiment. Application of the PS-MD method resulted in final structures with only slightly improved solvent-exposed side chain conformations and an average RMSD of ~1.7 Å from the experiment (average over all side chains with >50% solvent accessibility). In contrast, the buried side chains (<15% solvent accessible) were on average predicted to better than 0.8 Å (Figure 4, 1r69). Also, in the absence of explicit solvent and employing a distance-dependent dielectric constant charged solvent-exposed residues are promoted to move towards the protein interior that interfere with the placement of buried non-polar residues. Therefore, the above positional window restraints were applied to both atom positions of the main chain as well as solvent-exposed residues. The purpose was to keep these residues in an extended solvent-exposed conformation. All buried side chains subject to the potential scaling approach were, however, completely mobile.



View larger version (63K):
[in this window]
[in a new window]
 
Fig. 2. Superposition (in stereo) of one phage 434 (PDB: 1R69), the start structure (A, in grey) and the final structure obtained after potential scaling molecular dynamics (B, in grey) on the experimental structure (black). Seven (20, 31, 34, 44, 48, 52 and 59) out of 10 residues that were initially misplaced and fully flexible during the simulation are shown in stick representation (protein main chain in tube representation).

 


View larger version (33K):
[in this window]
[in a new window]
 
Fig. 3. Schematic overview of the PS-MD simulation approach.

 


View larger version (42K):
[in this window]
[in a new window]
 
Fig. 4. RMSD distributions of predicted buried side chain core residues (see Materials and methods) with respect to the corresponding experimental structure. (A) The buried side chain RMSD of start structures (dashed curve), final structures after standard MD (thin line) and after applying PS-MD (solid line) are shown. The panels show the results for start structures with a main chain deviation between 0.4 and 0.8 Å from the experiment. All RMSD values for the buried side chains have been obtained after performing energy minimizations, including harmonic restraints with respect to the corresponding experimental main chain structure (see Materials and methods). In (B) the dashed line indicates the results of rotamer-based SCWRL3.0 predictions (main chain of the corresponding start structure). Results of ‘temperature simulated annealing’-MD simulations are indicated using a thin line, whereas the distribution obtained with the PS-MD method are indicated using solid lines.

 
A selected start structure and a final structure superimposed on the corresponding experimental structures are illustrated in Figure 2 (1R69 case). Except for the second test case the PS-MD method resulted in buried side chain placements closer to the experiment compared with standard MD simulations (Figure 4A) and ‘temperature’ SA-MD simulations (Figure 4B) of the same length (252 ps). The SCWRL3.0 rotamer search method applied on the same set of start structures performed the same as the PS-MD method for two test cases (1GFC and 1CWE; Figure 4B), but worse in the case of the third {alpha}-helical test case (1R69).

Each of the various start structures resulted in final structures of different force field energy. In order to check if the final energy (obtained after energy minimizing the final structure, including the positional window restraints to the start structure; see Materials and methods) could be used to select realistic predictions, the 12 lowest energy structures (out of 24) from the PS-MD and the SCWRL3.0 predictions were compared (Figure 5A). This subset has an on average lower side chain RMSD with respect to the experiment and further improves the prediction results (Figure 5A). However, it does so for both PS-MD and SCWRL3.0 approaches and results in the same trends, as seen in the comparison of all final structures (Figure 5A). To find out if the PS-MD method can improve the SCWRL3.0 predictions, the latter predicted structures were used as start structures for the PS-MD approach. A slight improvement was observed for test cases 1 (1GFC) and 2 (1CWE) and a significant improvement in the case of test structure 3 (1R69). The RMSD distributions appear very similar to those obtained starting from the original start structures (compare Figures 4 and 5B). Hence, starting from the SCWRL3.0 predictions does not further improve the results and indicates that protein main chain deviations of each start structure may limit the quality of the prediction. It is important to note that the start structures used for the above systematic test showed overall close agreement to the protein main chain of the experimental structure. However, the generation of start structures with initially misplaced side chains can lead to protein main chain deviations that are larger than the average main chain deviation. It is likely that the significant local deviation of the main chain is one reason for the relatively poor performance of both the SCWRL3.0 and the PS-MD methods for some start structures.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 5. (A) Buried side chain RMSD distribution of the 12 energetically most favourable final structures (out of the 24 with a main chain deviation of 0.4–0.8 Å from the experiment, see the legend of Figure 3) obtained with the SCWRL3.0 approach (dashed line) and the PS-MD approach (solid line). (B) Application of the PS-MD method (solid line) using the 24 final SCWRL3.0 structures (dashed line) as start conformations (0.4–0.8 Å main chain deviation from the experiment).

 
Core side chain prediction on an evaluation set of proteins

For a more reliable validation of the present PS-MD method, 12 additional high-resolution X-ray protein structures (>2.0 Å resolution) were chosen from the PDB consisting of 56–124 amino acids (evaluation set, see Table I). For comparison the SCWRL3.0 side chain prediction method was applied to each target using the experimental main chain structure. As indicated in the second column of Table I, in this case SCWRL3.0 gives excellent predictions, with RMSDs of buried side chains of <1 Å (in most cases). However, this represents an ideal case and in practical applications of side chain prediction one often relies on main chain template structures that deviate from the real main chain structure (e.g. homology modelling or prediction of the structure of mutations based on the wild-type template). In order to generate template structures with small but significant main chain deviations from the experimental reference in each case, eight start structures were generated with random initial placement of buried side chains and protein main chain deviations of 0.5–0.8 Å from the experiment (see start structure generation in Materials and methods and Table I). The deterministic side chain prediction using SCWRL3.0 and these slightly ‘incorrect’ main chains as (rigid) templates performed quite well, but significantly worse than when using the experimental main chain structure as the template (compare columns 4 and 2 in Table I). The PS-MD approach starting from the same start structures and using the same conditions as described in the previous benchmark (and the same procedure to calculate the deviation from the experiment, see above) resulted in overall more realistic side chain placement in 10 out of 12 cases compared with SCWRL3.0 (compare columns 5 and 4 in Table I).

Comparative protein modelling using PS-MD

Comparison of homologous protein structures in the range of 30–50% sequence identity in most cases translates to a structural deviation of the main chain in the range of 0.5–2 Å (Martí-Renom et al., 2000Go). In the case of a relatively even distribution of sequence variation (Figure 6), it is expected that the main chain deviation between homologous protein structures may also be evenly distributed along the protein chain (except for long sequence insertions or deletions). The PS-MD approach was applied to predict the side chain conformations of buried residues in 1R69 based on the protein main chain of the two homologous proteins 2CRO and 1ADR (both bacterial DNA-binding domains). The sequence identity of the two proteins with respect to 1R69 is 57% (2CRO) and 30% (1ADR), respectively. The overall main chain RMSD of these two protein domains with respect to the known structure of 1R69 was 0.8 Å (2CRO) and 1.5 Å (1ADR), respectively. These structures have also been used previously to benchmark protein side chain predictions (Chung and Subbiah, 1995Go; Desjarlais and Handel, 1999Go; Schueler-Furman and Baker, 2003Go). In the case of the 1GFC structure, a protein main chain of the homologous protein 1QWE (viral C-Src SH3 domain) was used for side chain prediction. The sequence identity of the two latter proteins is 34% (main chain RMSD ~1.4 Å).



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 6. (A) Sequence alignment for modelling the SH3 domain sequence of PDB entry 1GFC onto the backbone of the related 1QWE main chain. (B) Sequence alignment used to model the phage 434 repressor protein (1R69) sequence onto the protein main chain of the cro repressor protein (1CRO) and the phage P22 repressor protein (1ADR).

 
The sequence of each template was initially replaced by the corresponding target sequence and the alignment given in Figure 6. Loop segments were built using the SPDBV program (Guex and Peitsch, 1997Go). The PS-MD approach was started in each case from 24 start structures (12 in the case of 1GFC) with random initial placement of the buried side chains (see Materials and methods and the previous paragraph). In contrast to the previous benchmark, in the case of comparative modelling the known template structure (2CRO or 1ADR in the case of modelling 1R69 or 1QWE in the case of modelling 1GFC) was used as the restraining reference structure (with window restraints, see Materials and methods). For comparison, the SCWRL3.0 method was applied (using the template structure main chains, respectively, as templates). Note that since the SCWRL3.0 approach is a deterministic side chain prediction method it results in only one final structure. All final structures were compared with the experimental 1R69 and 1GFC structures, respectively, by evaluating the deviation of the side chain torsion angles {chi}1 and {chi}2 (from those in the known target structure; see Tables II and III). In addition, to separate main chain and side chain RMSD the final structures were shortly energy minimized using the respective main chain (1R69 with window restraints, 1GFC with harmonic restraints because of the high backbone deviation) as a restraining reference (this eliminates the main chain RMSD of the final structures with very little effect on the side chain placement). The same procedure was applied to the predictions using SCWRL3.0. For all cases the buried side chain RMSD of final structures obtained after applying the PS-MD approach was significantly lower than in the case of the SCWRL3.0 prediction (Figure 7). In addition, the predicted {chi}1 and {chi}2 side chain dihedral angles using PS-MD were overall in closer agreement with the known target structures compared with the SCWRL3.0 results (Tables II and III).


View this table:
[in this window]
[in a new window]
 
Table II. Side chain prediction for target sequence 1R69 modelled on the main chains of template 2CRO and 1ADR

 

View this table:
[in this window]
[in a new window]
 
Table III. Side chain prediction for the target sequence 1GFC modelled on the main chains of the template 1QWE

 


View larger version (24K):
[in this window]
[in a new window]
 
Fig. 7. (A) Buried side chain RMSD distribution with respect to the experimental 1R69 structure of 24 final structures after applying PS-MD based on the 2CRO main chain (solid curve) and with the 1ADR main chain (thin curve). The two corresponding SCWRL3.0 results are shown as vertical lines (solid/thin line corresponds to the buried side chain RMSD of the 1R69 sequence based on 2CRO/1ADR main chains, respectively). (B) Buried side chain RMSD distribution with respect to the experimental 1GFC structure based on the 1QWE main chain using the PS-MD approach (12 structures). The buried side chain RMSD using the 1GFC sequence on the 1QWE main chain is shown as a vertical line.

 
Refinement of predicted protein–peptide interfaces

The refinement of side chain placements at protein–protein and protein–peptide interfaces is an important step during docking simulations or refinement of low-resolution structures and complexes obtained by homology modelling. Such protein–protein interfaces are at least in part close to the solvent and it is expected that the inclusion of explicit solvent molecules and readjustment of the protein main chain are critical for a realistic interface side chain prediction. The performance of the side chain PS-MD method for interface refinement was tested on two protein–peptide complexes with known 3D structures (PDB-entries: 1YCQ and 1DKD; Figures 8 and 9). One case consisted of the protein MDM2 from X.laevis in complex with an {alpha}-helical trans-activation domain (residues 17–27) of the protein p53 (PDB entry: 1YCQ). The second complex corresponds to the protein-binding domain of the chaperone protein GroEL (residues 218–277) and a peptide consisting of 11 residues (PDB entry: 1DKD; residues 602–612).



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 8. (A) Surface representation of the MDM2 protein in complex with an {alpha}-helical fragment of protein p53 (residues 17–27 in stick representation; PDB entry: 1YCQ). (B) and (C) indicate two docked structures with similar main chain placement of the p53 peptide fragment (start structures 3 and 5, see Table V) but alternative sterically possible interface side chain placements.

 


View larger version (59K):
[in this window]
[in a new window]
 
Fig. 9. (A) Superposition of one GroEL-bound peptide start structure (residues 602–612, light grey) onto the experimental peptide structure (black) in complex with the chaperone GroEL-binding domain (residues 218–277; PDB entry: 1DKD). The GroEL-binding domain is shown as surface representation and the peptides as stick model. (B) Same as in (A) but showing the final peptide structure (light grey) obtained using SCWRL3.0. (C) Same as in (A) showing the final peptide structure (light grey) obtained using standard MD simulations (see Materials and methods). (D) Same as in (A) with the final structure from the PS-MD approach (light grey). For comparison the experimental bound peptide is shown in black in each panel.

 
These relatively small test cases with 4–5 critical amino acids at the interface were chosen since they allow a systematic test with several start structures of various degrees of protein main chain and side chain deviations from the known experimental structure. In both test cases the larger protein partner (receptor) of the complex forms a concave surface at the interface with little possible variation of side chain placements. In contrast, the convex surface of the smaller peptide ligand allows many sterically possible side chain conformations. For the present test calculations, all side chains of the ligand peptides were completely mobile, whereas the peptide main chain and all residues of the receptor protein were allowed to move freely within a window of ±2 Å with respect to the start structure. This still allows for considerable adjustment of the complex geometry during the refinement simulations. All side chains of the peptide ligand were subject to potential scaling. To mimic an initial docking of the peptide–protein complexes the start structures contained random placements of the ligand side chains, and the complete ligand peptide was initially placed near the receptor binding region with overall main chain misplacements (ligand) of between 0.2 and 1.3 Å from the experimental placement (Tables III and IV). Such displacements are typical for successful initial protein–protein and protein–peptide docking placements (Halperin et al., 2002Go). The RMSD of the initially placed interface side chains was in the range of 1–4 Å from the experiment. Between 420 and 530 TIP3P water molecules were added in a cap completely encompassing the ligand and interface region (see Materials and methods). For both test cases the PS-MD approach performed significantly better than both standard MD simulations and application of the SCWRL3.0 approach. Except for the ligand placements with small deviation from the experiment, neither SCWRL3.0 nor standard MD simulations (of 315 ps) gave any significant improvement of the side chain placement compared with the random starting placement. Even an extension of the standard MD simulations to a total simulation length of ~1 ns did not improve the results significantly, indicating that the side chains are trapped in conformations separated by energy barriers from conformations close to the experiment. In contrast, the PS-MD approach resulted in significant improvement of the side chain placement in ~5 (1DKD) to 6 (1YCQ) out of 7–9 start structures, respectively, during relatively short simulation times (315 ps; Figure 10). Interestingly, similar results were obtained in the absence of solvent (ddd model), indicating that at least in the present two cases steric interactions have a more dominant influence on the side chain placement than explicit solvent molecules.


View this table:
[in this window]
[in a new window]
 
Table IV. C{alpha} RMSDs of peptide–receptor start structures (1–7) with respect to the experiment (1DKD complex)

 


View larger version (56K):
[in this window]
[in a new window]
 
Fig. 10. (A) Sorted RMSD of p53 peptide interface side chains of nine final PS-MD structures (315 ps; open bars) in complex 1YCQ in comparison with the starting structures (closed bars) and results from standard MD simulations (315 ps; light grey bars). All standard and PS-MD simulations were performed, including 421 TIP3P water molecules and with ‘window restraints’ (allowing full mobility of all peptide side chains and free motion within ±2 Å of all other solute atoms with respect to the corresponding start structure). (B) Same as in (A) except that the results are presented for seven structures for the GroEL-binding peptide in complex with the GroEL protein-binding domain (PDB entry: 1DKD). (C) Same as in (A) but for vacuum runs using the ddd model. The SCWRL3.0 predictions (dark grey bars) are also included. (D) Same as in (C) but for the 1DKD case including the SCWRL3.0 predictions (dark grey bars).

 

View this table:
[in this window]
[in a new window]
 
Table V. C{alpha} RMSDs of peptide–receptor start structures (1–9) with respect to the experiment (1YCQ complex)

 

    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Most current methods to predict amino acid side chains in protein cores or at protein–protein interfaces are based on systematic searches of discrete possible rotamers based on the fixed main chain (reviewed by Dunbrack, 2002Go). Deterministic and extremely rapid algorithms based on the dead-end-elimination theorem have been developed that allow side chain prediction of medium-sized proteins within a few seconds (Canutescu et al., 2003Go). However, such approaches are limited due to the neglect of conformational adjustment of the protein main chain (Harbury et al., 1995Go). In particular, homology modelling of proteins can involve side chain placements on template protein main chains which, depending on the template–target homology, may significantly deviate from the true main chain. The purpose of the present study was to test and evaluate an alternative side chain prediction scheme that allows protein main chain adjustment during side chain generation within preset limits and the inclusion of explicit solvent using MD simulations. To overcome the sampling limitations of conventional MD simulations we employed a potential scaling scheme of the selected amino acid side chains. The approach is related to SA-MD methods where one starts with a very high simulation temperature that allows energy barriers to be overcome, followed by a gradual decrease of the temperature and final EM (Kirkpatrick et al., 1983Go; Brunger et al., 1997Go). In contrast to SA, the PS-MD approach allows the barriers involved for side chain readjustments to be selectively lowered with a minimum perturbation of other parts of the simulation system. Increased temperature simulations always affect the complete simulation systems and can for example strongly disturb the solvent atmosphere in explicit solvent simulations.

Potential scaling approaches have been used previously to tackle the problem of sampling limitations of conventional MD simulations (Piela et al., 1989Go; van Schaik et al., 1992Go; Mierke and Kessler, 1993Go; Dahiyat and Mayo, 1997Go; Huber et al., 1997Go; Tappura et al., 2000Go; Tappura, 2001Go; Riemann and Zacharias, 2004Go). However, the application of potential scaling during MD simulations to predict and refine side chain conformations in protein cores or at protein–protein interfaces has so far not been evaluated.

In the case of protein side chain prediction of protein cores (in the absence of solvent) with slightly perturbed protein main chains, PS-MD showed similar prediction accuracy to SA-MD and only for one test system better performance than the dead-end elimination-theorem based rotamer search approach SQWRL3.0. The results depended on the protein type and the best results were obtained with an {alpha}-helical protein domain (1R69). The approach was further tested on 12 proteins for which high-resolution X-ray structures have been determined (Table I). Generation of start structures with random initial buried side chain placements resulted in structures with main chain deviations of 0.5–0.8 Å from the experiment. The SQWRL3.0 approach gave excellent prediction results for buried side chains when using the experimental main chain structure. Still reasonable performance was achieved when using the generated start structures (with ‘incorrect’ main chain) as templates. However, using the same start structures for the PS-MD approach and allowing main chain flexibility during PS-MD of up to ±1 Å resulted in overall better buried side chain predictions in 10 out of 12 cases compared with the deterministic approach using a rigid main chain structure.

In the case of comparative protein modelling of the 1R69 or 1GFC test cases with protein templates that deviated by 0.8–1.6 Å from the known target main chain, the PS-MD approach performed significantly better than the SQWRL3.0 approach (and conventional MD simulations). It is important to note that although the start structures used in the first test of the method had an overall main chain deviation of ~0.4–0.8 Å from the experiment, considerably larger deviations of the main chain near initially misplaced side chains were present. This can strongly affect the performance of the approach and might be an explanation as to why the PS-MD method performed quite well in the case of comparative modelling and less well for some of the start structures used in the first test.

The present PS-MD method performed significantly better than conventional MD or the deterministic side chain prediction in the case of protein–peptide interface refinement. In this case the protein main chain of the peptide was allowed to deviate from the experiment by up to 1.3 Å, including overall translational and rotational shifts of the ligand. An additional difficulty is due to the further stabilization of the misplaced peptide conformations by EM of the incorrect side chain placements before the side chain refinement. However, this corresponds to a typical docking scenario where one energy minimizes initially docked complexes to reduce steric overlap between the receptor and the ligand. Apparently, the performance of the deterministic side chain prediction approach, which does not allow overall conformational relaxation of the protein and the peptide during side chain prediction, depends sensitively on the main chain conformation of the start structure. However, conventional MD simulations that allow overall conformational adjustment of both the protein and the peptide fail because the initial side chain placement at the interface is separated from the realistic placement by high-energy barriers. The PS-MD approach allows easy crossing of the barriers during the potential scaling and final settlement of side chains close to the realistic structure for most start conformations. A disadvantage of the present approach is that it is much more time consuming than methods based on a fixed protein main chain and a rotamer search. However, even in the case of larger proteins or protein–protein complexes it is often possible to limit the region of interest to only the interface between two proteins in the case of a protein–protein complex or to part of the protein structure (e.g. region around a protein mutation). Especially in the case of introducing several mutations in a protein, it has been shown experimentally that conformational adjustments of the protein are important and side chain prediction based on a fixed side chain (of the wild-type structure) can fail to accurately predict the side chain conformations owing to the mutation (Baldwin et al., 1993Go). Besides homology modelling of proteins and refinement of peptide–protein and protein–protein interfaces, the present PS-MD approach may also be applied in rational protein design efforts to approximately account for protein main chain relaxation to predict the effect of a protein sequence modification on the structure of the protein.


    Acknowledgements
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
The authors thank A.Barthel and D.Roccatano for helpful discussions. This work was supported by a grant to the Jenaer Center of Bioinformatics (JCB) of the Bundesminsterium für Bildung und Forschung (BMBF), Germany.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Andersen,H.C. (1983) J. Comput. Phys., 52, 24–34.[CrossRef][ISI]

Arcus,V.L., Vuilleumier,S., Freund,S.M.V., Bycroft,M. and Fersht,A.R. (1995) J. Mol. Biol., 254, 305–321.[CrossRef][ISI][Medline]

Baldwin,E.P., Hajiseyedjavadi,O., Baase,W.A. and Matthews,B.W. (1993) Science, 262, 1715–1718.[ISI][Medline]

Brunger,A.T., Adams,P.D. and Rice,L.M. (1997) Structure, 5, 325–336.[CrossRef][ISI][Medline]

Canutescu,A.A., Shelenkov,A.A. and Dunbrack,R.L.,Jr (2003) Protein Sci., 12, 2001–2014.[Abstract/Free Full Text]

Case,D.A. et al. (1999) AMBER 6, University of California, San Francisco.

Chen,L. and Sigler,P.B. (1999) Cell, 99, 757–768.[CrossRef][ISI][Medline]

Chung,S.Y. and Subbiah,S. (1995) Protein Sci., 4, 2300–2309.[Abstract/Free Full Text]

Cornell,W.D., Cieplak,P., Bayly,C.I., Gould,I.R., Merz,K.M.,Jr, Ferguson,D.M., Spellmeyer,D.C., Fox,T., Caldwell,J.W. and Kollman,P.A. (1995) J. Am. Chem. Soc., 117, 5179–5197.[CrossRef][ISI]

Dahiyat,B.I. and Mayo,S.L. (1997) Proc. Natl Acad. Sci. USA, 94, 10172–10177.[Abstract/Free Full Text]

Desjarlais,J.R. and Handel,T.M. (1999) J. Mol. Biol., 290, 305–318.[CrossRef][ISI][Medline]

Dunbrack,R.L. and Karplus,M. (1993) J. Mol. Biol., 230, 543–574.[CrossRef][ISI][Medline]

Dunbrack,R.L.,Jr (2002) Curr. Opin. Struct. Biol., 12, 431–440.[CrossRef][ISI][Medline]

Gray,J.J., Moughon,S.E., Kotemme,T., Schueler-Furman,O., Misura,K.M.S., Morozov,A.V. and Baker,D. (2003) Proteins, 52, 118–122.[CrossRef][ISI][Medline]

Guex,N. and Peitsch,M.C. (1997) Electrophoresis, 18, 2714–2723.[ISI][Medline]

Halperin,I., Ma,B., Wolfson,H. and Nussinov,R. (2002) Proteins, 47, 409–443.[CrossRef][ISI][Medline]

Harbury,P.B., Tidor,B. and Kim,P.S. (1995) Proc. Natl Acad. Sci. USA, 92, 8408–8412.[Abstract/Free Full Text]

Holm,L. and Sander,C. (1991) J. Mol. Biol., 218, 183–194.[CrossRef][ISI][Medline]

Huber,T., Torda,A.E. and van Gunsteren,W.F. (1997) J. Phys. Chem. A, 101, 5926–5930.[CrossRef][ISI]

Jorgensen,W.L., Chandrasekhar,J., Madura,J.D., Impey,R.W. and Klein,M.L. (1983) J. Chem. Phys., 79, 926–935.[CrossRef][ISI]

Källblad,P. and Dean,P.M. (2003) J. Mol. Biol., 326, 1651–1665.[CrossRef][ISI][Medline]

Kirkpatrick,S., Gelatti,C.D. and Vecchi,M.P. (1983) Science, 220, 671–680.[ISI]

Koehl,P. and Delarue,M. (1994) J. Mol. Biol., 239, 249–275.[CrossRef][ISI][Medline]

Kohda,D., Terasawa,H., Ichikawa,S., Ogura,K., Hatanaka,H., Mandiyan,V., Ullrich,A., Schlessinger,J. and Inagaki,F. (1994) Structure, 2, 1029–1040.[CrossRef][ISI][Medline]

Lee,C. and Subbiah,S. (1991) J. Mol. Biol., 217, 373–388.[CrossRef][ISI][Medline]

Martí-Renom,M.A., Ashley,C.S., Fiser,A., Sánchez,R., Melo,F. and Sali,A. (2000) Annu. Rev. Biophys. Biomol. Struct., 29, 291–325.[CrossRef][ISI][Medline]

Mierke,D.F. and Kessler,H. (1993) Biopolymers, 33, 1003–1017.[CrossRef][ISI][Medline]

Mikol,V., Baumann,G., Keller,T.H., Manning,U. and Zurini,M.G. (1995) J. Mol. Biol., 246, 344–355.[CrossRef][ISI][Medline]

Mondragon,A., Subbiah,S., Almo,S.C., Drottar,M. and Harrison,S.C. (1989) J. Mol. Biol., 205, 189–200.[ISI][Medline]

Najmanovich,R., Kuttner,J., Sobolev,V. and Edelman,M. (2000) Proteins, 39, 261–268.[CrossRef][ISI][Medline]

Piela,L., Kostrowicicli,J. and Scheraga,H.A. (1989) J. Phys. Chem., 93, 3339–3346.[CrossRef][ISI]

Riemann,N. and Zacharias,M. (2004) J. Pept. Res., 63, 354–364.[CrossRef][ISI][Medline]

Samudrala,R. and Moult,J. (1998) Protein Eng., 11, 991–997.[CrossRef][ISI][Medline]

Schueler-Furman,O. and Baker,D. (2003) Proteins, 52, 225–235.[CrossRef][ISI][Medline]

Smith,G.R. and Sternberg,M.J.E. (2002) Curr. Opin. Struct. Biol., 12, 28–35.[CrossRef][ISI][Medline]

Summers,N.L. and Karplus,M. (1989) J. Mol. Biol., 210, 785–811.[CrossRef][ISI][Medline]

Swain,M.T. and Kemp,G.J.L. (2001) In Walsh,T. (ed.), A CLP Approach to the Protein Side Chain Placement Problem. Springer-Verlag, Berlin, CP 2001, LNCS 2239, pp. 479–493.

Tappura,K. (2001) Proteins, 44, 167–179.[CrossRef][ISI][Medline]

Tappura,K., Lahtela-Kakkonen,M. and Teleman,O. (2000) J. Comput. Chem., 21, 388–397.[CrossRef][ISI]

Tufféry,P., Etchebest,C. and Hazout,S. (1997) Protein Eng., 10, 361–372.[CrossRef][ISI][Medline]

van Schaik,R.C., van Gunsteren,W.F. and Berendsen,H.J.C. (1992) J. Comput. Aided Mol. Des., 6, 97–112.[CrossRef][ISI][Medline]

Wright,J.D. and Lim,C. (2001) Protein Eng., 14, 479–486.[CrossRef][ISI][Medline]

Xiang,Z. and Honig B. (2001) J. Mol. Biol., 311, 421–430.[CrossRef][ISI][Medline]

Zacharias,M., Straatsma,T.P. and McCammon,J.A. (1994) J. Chem. Phys., 100, 9025–9031.[CrossRef][ISI]

Received January 17, 2005; revised April 21, 2005; accepted July 11, 2005.

Edited by Fred Cohen





This Article
Abstract
Full Text (PDF)
All Versions of this Article:
18/10/465    most recent
gzi052v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Riemann, R. N.
Articles by Zacharias, M.
PubMed
PubMed Citation
Articles by Riemann, R. N.
Articles by Zacharias, M.