European Molecular Biology Laboratory, Meyerhofstr. 1, Postfach 10 22 09, D-69012 Heidelberg, Germany
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: hp model/lattice simulation/local order/search optimization/search strategy
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A computationally unintensive way is to approximate the impact of the solvent by mean field calculations (Hsiang-Ai and Karplus, 1988; Cramer and Truhlar, 1992
; Fraternali and van-Gunsteren, 1996
). The mean field calculations neglect the effect of locally ordered structures and can underestimate solventprotein interactions (Smith and van-Gunsteren, 1994
). Simulations taking each water molecule of the protein surrounding shell into account (Levitt and Sharon, 1988
; Braxenthaler et al., 1997
; Warwicker, 1997
; Scheraga and Hao, 1999
) are computationally intensive and usually are only done for simulation times of a few nanoseconds.
We present here a promising implementation of solvent entropy in the context of the hp model. As a search method we took an application of the common Monte Carlo (MC) method (Metropolis et al., 1953, Kirkpatrick et al., 1983
; Aarts and Korst, 1989
), implemented in a standard way for this hp model [see Unger and Moult for further details of this standard implementation (Unger and Moult, 1993
)]. The protein chain is modeled on a lattice and only two types of residues are considered, hydrophobic (h) and hydrophilic (p) (Lau and Dill, 1989
). We further take the local order of the solvent shell around the protein into account. Two- and three-dimensional models are compared. Long-range interactions in the rest of the solvent (e.g. long-range order brought about by solvating ions or exposed polar groups) are not considered in this simplified model. We show that efficient folding of lattice chains (identification of the native fold) can be achieved by an entropy-driven simulation completely on its own. Furthermore, we show in a detailed comparison in our model that entropy-guided searching can outperform an energy-driven search.
A combination of energy- and entropy-driven searching yields the most efficient searching and is compared in detail with the above results. This illustrates, in addition, how our simple solvent shell model may advantageously be implemented in more complex simulations.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
All simulations used the hp model (Lau and Dill, 1989) and modeled the protein chain on a lattice. Only two types of residues are considered, hydrophobic (H; filled squares) and hydrophilic (P; open squares). The model meets the basic characteristics of real proteins, e.g. similar distributions of secondary structure (Lau and Dill, 1990
). The protein is represented as a chain on a two-dimensional square lattice (Figure 1
). At each point the chain can turn 90° left or right or continue ahead. The following chains with 12, 18, 24, 33 and 48 residues, respectively, were tested:
|
These chains yield well-folded compact structures and their global energy minima were known a priori as 4, 9, 8, 14 and 23, respectively.
Three-dimensional hp model
In the three-dimensional model at each point the chain can turn left, right, straight, up and down; otherwise the construction of the chain is similar to the two-dimensional model. Four chains with 12, 14, 22 and 28 residues and different topologies were tested (Figure 2a):
|
Their energy minima were also known a priori as 5, 6, 4 and 12, respectively.
Energy function
The energy function E was kept simplistic. It adds 1 for each direct (non-diagonal) lattice contact between two non-consecutive hydrophobic amino acids. `Clashes' (i.e. two residues in the same place in the lattice) are not allowed. Trials with clashes had to be newly created.
Implementation of entropy
To study entropy effects, lattice spaces adjacent to the protein chain were modeled to be filled by solvent in the following (small) water ensembles (Figure 2a). Long-range interactions in the rest of the solvent (e.g. long-range order brought about by solvating ions or exposed polar groups) were not considered in this simple model. Small water ensembles with two different properties surrounded the model protein: ordered and less ordered. The ordered water ensembles are adjacent to hydrophobic residues or the solvent (i.e. other water ensembles). The less ordered (with high entropy) exist if no hydrophobic residue is adjacent to them. An O-block (ordered) on the lattice is asigned to the ensemble of ordered water molecules, an L-block to a less ordered ensemble (Figure 2b
). The number of unordered water ensembles, Ni, was counted. It has been shown experimentally that hydrophobic molecules reduce the entropy of the surrounding (aqueous) solvent (Schulz and Schirmer, 1996
). The solvent entropy difference, S, between one protein chain conformation, N1 and a tested next one, N2, during the simulation was set to be proportional to the difference in the number of unordered adjacent lattice spaces counted (entropy is additive; we assigned 1 unordered block = 1 high-entropy unit = 1/f, with f < 1; whereas 1 ordered block = 1 low-entropy unit = 1). With the order parameter f (see below) the probability for the new configuration to be chosen is p ~ e(TS/T) = f(N1 N2). This implementation considers the entropy of the solvent according to Boltzmann statistics. The derivation of this can be done by simple algebra (see http://www.embl-heidelberg.de/~dandekar/entropy.html). In contrast, the simple energy function from above does not differentiate between ordered and less ordered ensembles (Figure 2b
). Instead, it is derived (1 for any hydrophobic contact), if the sum (`hydrogen bonds' in our model) of the connections of (water ensembleswater ensembles) and (water ensembleshydrophilic residues) is counted and compared between two conformations.
The order parameter f allowed the testing of different entropy weights during simulations, either alone or multiplying the entropy term, f(N1N2)), with the energy term, e(E/T), yielding p ~ eF/T = e(
E TS)/T to consider also the energy difference
E between two chain conformations. The model allowed the examination of three implementations: A, energy effects (
E; simplified in the context of the model; only hydrophobic interactions are considered or, alternatively, the effect of `hydrogen bonds'); B, entropy (
S, difference of ordered small water ensembles); and C, the combination of them (more realistic implementation, F = E TS). F denotes the Helmholtz free energy (corresponding also to the Gibbs free energy if changes in volume and pressure are neglected). The new chain conformation was accepted if a random number between zero and 1/constant was smaller than eF/T.
Simulation steps
A Monte Carlo algorithm was used with the following steps [implemented as in Unger and Moult (Unger and Moult, 1993)]:
Two implementations were tested and compared for each algorithm. Implementation 2 models and examines a more detailed partition function as a decision rule to change from conformation C1 to C2 (see Appendix for details), but this may prevent the algorithm from exploring new areas in search space when passing along flat surfaces. A decision constant (see Results) was introduced potentially to balance this effect and compare both implementations for a range of different values of the constant (including constant = 1, no constant introduced; the probability to accept C2 could maximally become 1, corresponding to accepting with certainty a change to conformation C2).
Implementation of algorithms A, B and C
Algorithm A (energy) Accept C2 if
rnd < constantxeE/T (standard, implementation 1)
rnd < constantx1/(1 + eE/T) (implementation 2)
where rnd is a random number between 0 and 1 and T is an artificial temperature that gradually decreased (cooled) during the run to achieve convergence. T started with T0 = 2.00 and was decreased after every 100 energy evaluations with the cooling rate c:
T = Ti = Ti 1c (c < 1)
c was optimized as a parameter (see Results).
Algorithm B (entropy) Accept C2 if
rnd < constantxf(N1N2) (standard, implementation 1)
rnd < constantx1/[1 + f(N2N1)] (implementation 2)
f < 1 was the order parameter and optimized (see Results). Note that the entropy of the protein conformations need not be considered as the algorithm itself samples over a representative ensemble of equally weighted protein microstates. Long-range interactions in the rest of the solvent (e.g. long-range order brought about by solvating ions or exposed polar groups) were not considered in this simple model. Furthermore, calculation of the term is easily effected. Comparison with the algorithm A above shows that the calculation is also not more computationally expensive.
Algorithm C (energy and entropy) Algorithm C calculated F = E TS (the free enthalpy F, equal to the energy E minus the product of temperature T and entropy S) before the decision:
rnd < constantxf(N1N2)eE/T (standard, implementation 1)
rnd < constantx1/[1 + f(N2 N1)eE/T] (implementation 2)
T and f were taken as in algorithms A and B, respectively. Note that in the comparisons, the optimal folded states (maximally packed, minimum energy but also highest solvent shell entropy) are always the same and known beforehand (cf. Figure 1). Only their energy values are given in the tables for comparison. However, in searching for these states in algorithms B and C their entropy is also considered, comparing alternative conformations. The objective function of the three different algorithms is different and favors different suboptimal folds during searching; the potential surface separating and leading to the optimal folded states is different for the three algorithms.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
Two different implementations (see Materials and methods) to evaluate the probability, including different values of the decision constant to accept a new conformation in the next step of the simulation, were compared in each strategy. Additionally, we compared different parameter values for the cooling rate c in A and C and for the entropy parameter f in B and C. We tested a broad range, for f between 0.9 and 0.001 (0.9, 0.7, 0.5, 0.3, 0.1, 0.05, 0.01, 0.001), for c between 0.999999 and 0.9 (0.999999, 0.99999, 0.9999, 0.999, 0.99, 0.9) and for the decision constant the values 1, 2, 4, 8, 16 and 32. f, c and decision constant were all optimized for the results given. The optimal range for f was 0.1 to 0.7; f = 0.3 produced the best results over all conditions. For c the optimal range was 0.990.99999 with c = 0.99 yielding the best results comparing all conditions. A decision constant >2 was advantageous and constant = 4 produced the best results.
The search strategy on solvent shell entropy alone is effective in identifying the global minimum. It is more effective in searching than the standard Monte Carlo energy minimization procedure with the simplified energy function in more than 60% of the parameter conditions investigated (33 of 54 cases; including the longer chains). Sixteen cases showed no significant difference, five cases were less efficient, including middle sized chain 3 in two dimensions, implementation 1, constant = 1 and chain 4 in three dimensions, implementation 2, constant = 2. The combination of both search strategies is easily achieved (algorithm C). This yields a further significant improvement for all tested chains including the longer ones in both lattices. It outperformed the energy-driven search (algorithm A) in 45 of 54 cases. In two of the remaining nine cases it showed a clear positive trend. It was also clearly and significantly superior (but here only in 40 of 54 cases) than entropy alone (algorithm B); in 10 cases there was no significant difference.
Implementation 2 (see Materials and methods; Appendix for details) was not superior, but slightly less efficient than the standard implementation.
Exactly 100 000 steps were compared for each strategy and each chain to allow a fair comparison of all search strategies, comparing batches of 100 runs for each condition; 100 000 steps were taken to collect a large amount of data for the algorithm comparison in a convenient time so that hundreds of runs could be compared and average statistics obtained. On average, this value was not sufficient for the 48-mer on the two-dimensional lattice to find the global minimum. For the 33-mer it was found about once (out of 100 trials) in the better strategies. Therefore, the energy of the best local minimum found was compared if the global minimum was not identified. It was not our aim to identify the global minimum under each condition but rather to compare the different algorithms under simple (global minimum always found) and demanding conditions (only the best strategies succeed sometimes). Note that a gain of one step lower in the energy of a local minimum is a significant search success as the number of local minima increases exponentially with higher energy (Lau and Dill, 1990). Solvent shell-guided searching was also here an improvement compared with the simple energy-guided search strategy and algorithm C also performed best in the comparison for longer chains. It found a local optimum with energy 22 for the 48-mer. It also found the global optimum for the 33-mer four times (Tables IIII
).
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Considering the microstates focuses on a different perspective of the folding problem. The preferred state of the protein chain is not one rare and single conformation but instead in the context of this model (and in the entropy aspect more realistic than in some other simple protein models) the global minimum of energy displays the highest number of microstates. Furthermore, breathing (rearrangement of microstate ensembles) can be seen and investigated in such models (König and Dandekar, 1999). This allows for the investigation of entropy effects as well as partial unfolding of the protein chain after the global minimum has been reached in these simulations.
The solvent shell entropy model presented here achieves efficient folding for a number of simplified protein chains. In the context of the model it was not necessary to include long-range forces to achieve this. This will have to be tested further, in particular further refinement of the structures obtained by inclusion of such forces.
The starting information required for any of the search strategies A, B and C was the same, the protein sequence was sufficient. An exhaustive enumeration for some of the smaller protein chains modeled is possible, but here we wanted to compare the speed and efficiency of the different search strategies. The hydrophobic effect was compared and considered both by its entropic and energetic consequences. This focus was chosen as hydrophobic interactions are principal forces in protein folding and to compare both strategies with their respective effects on search success. The combined strategy where energy and entropy are both considered is easily achieved and can accommodate also more complex energy functions than the simplistic one chosen for testing. It identifies the native fold best. It yielded a significantly better search effectiveness on both lattices. Additionally, we compared the algorithms with a standard (as in common MC methods) and an alternative implementation (partition function motivated) of the decision constant and showed implementation-independent, similar results for both implementations. Furthermore, both lattices tested (two- and three-dimensional) showed similar success as a control against effects from lattice choices. The use of a decision constant further improved search success.
The search strategies compared fulfill ergodicity over sufficient long observation times and to the extent that the search space is non-reducible and contiguous, any conformation can be reached with some probability from any other conformation and it is aperiodic. Detailed balance is fulfilled for the three search strategies at least so far as reversibility for any conformational change effected is possible during the simulation. Entropy maximization was implemented in the model according to the natural behavior of proteins (Schulz and Schirmer, 1996) and, in fact, entropy maximization occurs during time for any molecular interaction (more details on entropy maximization in proteinsolvent systems are given on our Web page).
Solvent shell optimization can certainly be considered to have an important impact on protein folding. In our simulations this aspect of folding was shown to be sufficient to localize the optimal protein structure of small model proteins. It should be noted that also other well known factors involved in protein folding such as hydrophobic forces, hydrogen bonds and cooperativity (Kolinski et al., 1999) are in part determined by solvent shell entropic forces. Nevertheless, exact experimental quantification of solvent shell effects remains a challenge. Our simplistic model suggests that at present their contribution to protein folding may perhaps be underestimated.
Shortle et al. showed that consideration of entropic forces is helpful in predicting and understanding the effects of mutations (Shortle et al., 1992). Also antibody and other proteinprotein interaction models are more precise if entropic effects are included. The inclusion of a simplified solvent shell entropy algorithm as given and compared in detail here improves the search success in protein folding simulations and will be taken to improve more complex folding simulations such as more extended, grid free models as a next step.
![]() |
Appendix |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() | (1) |
where Z is the normalization constant (partition function). With pi = 1,
![]() | (2) |
If C1 is the old and C2 the new conformation, p2 is the exact probability of accepting the new conformation (implementation 2 throughout the text). If the new conformation has the same or only slightly better energy value, the probability of accepting the new conformation is according to the partition function only around 0.5. Metropolis et al. (1953) made the following modification which more easily accepts a new conformation (Figure 3). If E becomes negative, p2 becomes close to 1. If E is positive, we have
![]() | (3) |
|
![]() | (4) |
![]() |
Notes |
---|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Braxenthaler,M., Unger,R., Auerbach,D., Given,J.A. and Moult,J. (1997) Proteins, 29, 417425.[ISI][Medline]
Cramer,C.J. and Truhlar,D.G. (1992) Science, 256, 213217.[ISI]
Fraternali,F. and van-Gunsteren,W.F. (1996) J. Mol. Biol., 256, 939948.[ISI][Medline]
Hsiang-Ai,Y. and Karplus,M. (1988) J. Chem. Phys., 89, 23662379.[ISI]
Kirkpatrick,S., Gelatt,C.D. and Vecchi,M.P. (1983) Science, 220, 671680.[ISI]
Kolinski,A, Ilkowsko,B. and Skolnick,J. (1999) Biophys. J., 77, 29422952.
König,R. and Dandekar,T. (1999) J. Mol. Model., 5, 317324.[ISI]
Lau,K.F. and Dill,K.A. (1989) Macromolecules, 22, 39863997.[ISI]
Lau,K.F. and Dill,K.A. (1990) Proc. Natl Acad. Sci. USA, 87, 638642.[Abstract]
Levitt,M. and Sharon,R. (1988) Proc. Natl Acad. Sci. USA, 85, 75577561.[Abstract]
Metropolis,N., Rosenbluth,A.W., Rosenbluth,M.N., Teller,A.H. and Teller,E. (1953) J. Chem. Phys., 21, 10871092.[ISI]
Scheraga,H.A. (1998) J. Biomol. Struct. Dyn., 16, 447460.[ISI][Medline]
Scheraga,H.A. and Hao,M.-H. (1999) Adv. Chem. Phys., 105, 243272.[ISI]
Schulz,G. and Schirmer,R.H. (1996) Principles of Protein Structure. Springer, Heidelberg.
Shortle,D., Chan,H.-S., Dill,K.A. (1992) Protein Sci., 1, 201215.
Smith,P.E. and van-Gunsteren,W.F. (1994) J. Mol. Biol., 236, 629636.[ISI][Medline]
Unger,R. and Moult,J. (1993) J. Mol. Biol., 231, 7581.[ISI][Medline]
Warwicker,J. (1997) Protein Eng., 10, 809814.[Abstract]
Received July 25, 2000; revised February 9, 2001; accepted February 19, 2001.