A fast empirical approach to binding free energy calculations based on protein interface information

Xiao Hui Ma, Cun Xin Wang1, Chun Hua Li and Wei Zu Chen

Center for Biomedical Engineering, Beijing Polytechnic University, Beijing 100022, China


    Abstract
 Top
 Abstract
 Introduction
 Systems and methods
 Results and discussion
 References
 
Three useful variables from the interfaces of 20 protein–protein complexes were investigated. These variables are the side-chain accessible number (Nb), the number of hydrophilic pairs (Npair) and buried apolar solventaccessible surface areas ({Delta}{Delta}ASAapol). An empirical model based on the three variables was developed to describe the free energy of protein associations. As the results show, the side-chain accessible numbers characterize the loss of side-chain conformational entropy of protein interactions and the effective empirical function presented here has great capability for estimating the binding free energy. It was found that the variables of interface information capture most of the significant features of protein–protein association. Also, we applied the model based on the variables as a rescoring function to docking simulations and found that it has the potential to distinguish the ‘true’ binding mode. It is clear that the simple and empirical scale developed here is an attractive target function for calculating binding free energy for various biological processes to rational protein design.

Keywords: docking/entropy/intermolecular interactions/protein association


    Introduction
 Top
 Abstract
 Introduction
 Systems and methods
 Results and discussion
 References
 
Protein–protein interactions play a central role in protein function. Owing to the free energy being the important criterion for protein–protein binding, research on it is important for a better understanding of protein interactions and for the subsequent application of this knowledge to protein engineering and drug design. Computer modeling makes it possible to perform direct simulations to study protein–protein associations. Accurate calculations of the free energy that drives the protein–protein association are based on molecular dynamics or Monte Carlo simulations (Karplus and Petsko, 1990Go) and the relative free energy is determined by perturbation or integration techniques (Mezei and Beveridge, 1986Go; Reynolds et al., 1992Go; Miyamoto and Kollman, 1993Go). However, these simulation methods require too much computational time for free energy calculation in conformational search, docking and drug design (Goodsell and Olson, 1990Go; Sezerman et al, 1993Go; Stoddard and Koshland, 1993Go). For simplicity, in the past decade several groups have developed empirical functions to compute the binding free energy (Novotny et al., 1989Go; Smith and Honig, 1994Go; Vajda et al., 1994Go, 1995Go, 1997Go; Jackson and Sternberg, 1995Go; Nauchitel and Villaverde, 1995; King et al., 1996Go; Weng et al., 1997Go; Xu et al., 1997Go; Zhang et al., 1997Go; Takamatsu and Itai, 1998Go; Camacho et al., 1999Go). For instance, Vadja and co-workers (Vajda et al., 1994Go) developed a relatively complete empirical free energy function:

(1)
where Eel, Gd and {Delta}Sc represent the electrostatic energy change, the desolvation free energy and the change in conformational entropy, respectively, and T is the absolute temperature. The last term, {Delta}Gconst, includes all other free energy changes associated with translation, rotation, vibration and protonation/deprotonation effects. The results show that the average difference between calculated and measured free energies of proteases and their inhibitors was ~1.3 kcal/mol, representing an error of about 10% (Vajda et al., 1995Go; King et al., 1996Go).

Subsequently, Zhang et al. put forward a binding free energy function based on the atomic contact energy (Zhang et al., 1997Go). The binding free energy is estimated by

(2)
where {Delta}Ec is the change in atomic contact energy and {Delta}Eel is the direct electrostatic interaction between protease and its inhibitor. The term {Delta}Strv denotes the entropy change associated with the six degrees of freedom of rotation/translation and vibration. The precision of {Delta}Gcal compared with experimental data was between ±0.1 and ±2 kcal/mol.

In addition, Xu et al.(1997) devised a function relative to the hydrophilic number and the molecular surface:

(3)
where Spho and Sphi indicate the buried hydrophobic and hydrophilic molecular surface and Npair denotes hydrophilic pairs of protein complexes, which relate to the strong electrostatic interactions, such as salt bridges, hydrogen bonds and polar–polar interactions.

In general, entropy loss is indispensable to the binding free energy. As is well known, the entropy calculation, however, is difficult since it depends on the complete phase space of a molecular system and is sensitive to the inclusion of correlations between motions along the many degrees of freedom (Karplus and Kushick, 1981Go; Di Nola et al., 1984Go). Pickett and Sternberg developed an empirical scale to estimate the calculation of the side-chain conformational entropy loss (Pickett and Sternberg, 1993Go). In the entropy scale the maximum conformational entropy, Sc, of each side chain was calculated by the classical expression

(4)
where the Pij value is the probability of the side chain j being in the conformational state i, which can be calculated from the observed distributions of the exposed side chains in proteins with known X-ray structures.

In order to avoid the complicated calculation for conformational entropy and to consider the effect of entropy on the binding free energy, we obtained a simple and effective empirical scale for the conformational entropy and the binding free energy through the analysis of protein interfaces. In this study, we analyzed the binding interfaces of 20 protein complexes and extracted the three variables concerned with the interface information, i.e. the side-chain accessible number (Nb), the number of hydrophilic pairs (Npair) and buried apolar solvent-accessible surface areas of complexes interface ({Delta}ASAapol). Then, the empirical scale in terms of the three variables was established by linear fitting with experimental data for the free energy. In addition, the scale was applied as a score function to the docking processes for 10 protein complexes. Finally, the feasibility and shortcomings of our empirical method are discussed.


    Systems and methods
 Top
 Abstract
 Introduction
 Systems and methods
 Results and discussion
 References
 
All X-ray structures of 20 protein complexes were taken from the Protein Data Bank (Bernstein et al., 1977Go). The unobserved atoms in each structure were generated with the InsightII package on an SGI workstation, which were selected from the extending conformations to avoid steric overlaps. Subsequently these structures were refined by energy minimization using the Gromacs programs (Berendsen et al., 1995Go). The entire atom model was chosen. The solvent-accessible surface area (ASA) was calculated according to the method of Lee and Richards (Lee and Richards, 1971Go). The atomic radii were taken from the Gromacs force field parameters. The radius of solvent probe was set to 1.4 Å. The change of the interface of the complex, {Delta}ASA, was calculated from the difference in the buried surface area of each residue between two monomers and a dimer. If the relative change rate of {Delta}ASA was more than 20%, the calculated residue was defined as an interface residue. For the apolar group, {Delta}ASAapol was determined from the buried surface area of C atoms (the contribution of S atoms was omitted).

The side-chain accessible number, Nb, was taken from the number of contacted residues in the interface and the contacted residue was defined by the effective accessibility ({Delta}RA) of its side chain, calculated by

(5)
where {Delta}At is the change of accessible surface area of side-chains and A*t is the standard side-chain surface area. If {Delta}RA of the residue across the interface of complexes was >=1, the residue was taken as a side-chain accessible residue. The approximate value for 60% of the standard side-chain surface area in Equation 5Go was set to 80 Å2 in this work.

The number of hydrophilic pairs, Npair, was defined by the distance between the critical points of hydrophilic atoms, which was basically around their centers of contact surfaces (Lin et al., 1994Go). If the distance between two hydrophilic atoms was <2.8 Å (the diameter of the solvent probe), the two atoms were treated as a hydrophilic pair.

To examine our model mentioned above, the 10 complexes with experimentally determined structures were selected as a test set to do molecular docking. The soft protein–protein docking algorithm (C.H.Li et al., in preparation) developed in our group was used for the test and was based on the ‘simplified protein’ models of Janin’s rigid-body protein–protein docking algorithm (Cherfils et al., 1991Go, 1994Go; Cherfils and Janin, 1993Go). The partial binding space including the partial surface of the receptor and complete surface of the ligand was searched, in which 3x104 different modes of contact between two proteins for each case were obtained. After filtering and clustering analysis, about 300 binding modes were retained. The binding free energy was then used to score those retained binding modes.


    Results and discussion
 Top
 Abstract
 Introduction
 Systems and methods
 Results and discussion
 References
 
Correlation analysis of interface information

The conformational entropy is able to affect the binding free energy of protein and its ligand as well as to drive protein folding. A major unfavorable entropy effect arises from the reduction in the number of accessible conformation, which is available to the protein backbone and side chains. As an approximation, we assume that the backbone in all folded conformations has the same conformational entropy. Therefore, only the entropy loss from the side chain is taken into account when the accessibility of the side chain is more than 60% of the standard side-chain surface area. When the values of the side-chain accessible number, Nb, are used to fit the side-chain conformational entropy loss according to Pickett and Sternberg’s empirical scale, the linear fitting function is given by

(6)

Figure 1Go shows a linear fitting of side-chain conformation entropy (T{Delta}S) versus Nb. It is found that Nb correlates very well with T{Delta}S values. Therefore, Nb can be used to represent the side-chain conformational entropy loss for the protein–protein binding process.



View larger version (7K):
[in this window]
[in a new window]
 
Fig. 1. Linear fitting of side-chain conformation entropy (T{Delta}S) versus Nb. Results are calculated for the 20 complexes reported in Table IGo. The correlation coefficient, R, is equal to 0.97.

 
Table IGo also lists other results, such as the buried apolar solvent-accessible areas {Delta}ASAapol, the hydrophobic interaction energy {Delta}Gd, the number of hydrophilic pairs Npair and the experimental binding free energies. Moreover, the electrostatic interaction energies {Delta}Eel of 13 complexes are taken from Zhang et al. (Zhang et al., 1997Go). Using these values, we completed the following correlation analyses between Npair and {Delta}Eel and between {Delta}Gd and {Delta}ASAapol. Similarly to Figure 1, Figure 2GoGo shows the linear fitting of electrostatic interaction energies versus Npair. Figure 3Go also shows the linear fitting of hydrophobic energies {Delta}Gd versus {Delta}ASAapol. It is found that the quantities Nb, Npair and {Delta}ASAapol capture most of the significant features of the interactions involved in those complexes.


View this table:
[in this window]
[in a new window]
 
Table I. Side-chain conformation entropy, binding free energies and fitting variables
 


View larger version (8K):
[in this window]
[in a new window]
 
Fig. 2. Linear fitting of electrostatic interaction energies versus Npair. Results are calculated for the 13 complexes. The correlation coefficient, R, is equal to 0.92.

 


View larger version (11K):
[in this window]
[in a new window]
 
Fig. 3. Linear fitting of hydrophobic energies versus {Delta}ASAapol. The values of {Delta}ASAapol are in Å2. Results are calculated for the 20 complexes in Table IGo. The correlation coefficient, R, is equal to 0.94.

 
Fast empirical calculation of binding free energy

As mentioned above, Nb, Npair and {Delta}ASAapol are related to the interface of protein complexes and correlate well with the conformational entropy change, the electrostatic interaction and the hydrophobic interaction, respectively. When the protein–protein binding free energy, {Delta}Gcal, is written as a linear function of three variables Nb, Npair and {Delta}ASAapol, {Delta}Gcal can be expressed as

(7)
where the parameters are the coefficient obtained from the multiple linear regression method and their values are listed in the second column of Table IIGo. The multiple correlation coefficient R is 95%. It is clear that Nb, Npair and {Delta}ASAapol deduced from the interface can describe well the binding free energy of protein–protein association.


View this table:
[in this window]
[in a new window]
 
Table II. Results of multiple linear regression of binding free energy
 
Table IIIGo reports the comparison among the calculated binding free energies based on the different empirical functions. It is found that our binding free energy function has a higher correlation than other functions with the experimental data. This indicates that the three variables extracted from the interface information discussed here can quantitatively represent the free energy of protein–protein association.


View this table:
[in this window]
[in a new window]
 
Table III. The comparison of four different empirical methods for calculating binding free energy
 
Application of the score function in protein–protein docking

Currently, the approach of rescoring docked conformations has made progress to some extent and has been used to rescore the lower root mean square deviation (r.m.s.d.) conformations (Norel et al., 2001Go; Smith and Sternberg, 2002Go). The main terms used in the rescoring are the statistics of residue–residue contacts across the interfaces of complexes and electrostatics. As discussed above, we presented an empirical method, which was based on the three variables extracted from the binding interface information. The calculation of the free energy of protein–protein association with the method was quick and accurate. Especially the conformational entropy has been taken into account and this term is also accurate, which is supported from analysis. Therefore, we tried to apply this approach as a scoring function to rank the putative docked structures in the protein–protein docking problem.

Table IVGo summarizes the docking results for the 10 protein–protein complexes including the name of the complexes, the ranking position of the first near-native structure using our scoring function and the corresponding r.m.s.d. from the X-ray crystallographic complex. For the first six cases, the complexes were reconstructed from the structures of the co-crystallized proteins. In these cases, the conformations of the two molecules are already ‘adapted’ to each other. For this set of docking simulations, XX was added after the PDB code in the ‘protein’ column. For the following two cases, the complexes were reconstructed from the structures in which one is from the protein of the complex and the other is from the free form. For this set of docking simulations, FX or XF was added after the PDB code, where F and X designate the free form and co-crystallized form, respectively. If the complexes were reconstructed from the structures of both proteins from the free form, FF was added to the PDB code. The docked geometry is taken into account only if the r.m.s.d. of the backbone atoms from the X-ray structure is <4.0 Å. For the 10 tested complexes, all the native-like docked geometries are found, of which six are found within the 10 top ranking solutions. This indicates that our scoring function is able to distinguish the ‘true’ binding mode from the remaining ‘false’ ones.


View this table:
[in this window]
[in a new window]
 
Table IV. Results of molecular docking calculations
 
Figure 4Go shows a comparison of the experimentally determined structures of four protein complexes and the best-ranked near-native predictions reported in Table IVGo. Although the r.m.s.d. between the predicted and X-ray structures is around 3.00 Å (see Figure 4Go), it is clear that the binding site is satisfactorily identified.



View larger version (40K):
[in this window]
[in a new window]
 
Fig. 4. Superposition of the experimentally determined structures of four protein complexes and the best ranked near-correct predictions reported in Table IVGo. Thick lines: C{alpha} trace of experimental structure. Thin lines: C{alpha} trace of predicted model. The four selection plotting structures are taken from Table IV: (A) 1ACBXX; (B) 1PPEFX; (C) 1BRBFF; (D) 2PTCFF.

 
The definition of a general form of rescoring functions is required to distinguish reliably the ‘true’ binding mode from the remaining ‘false’ ones. Also, speed is an important factor considered in the rescoring functions. As the results show, the rescoring function presented here is relatively fast and effective for scoring the putative conformations. It is expected that the rescoring function is applicable to protein–protein docking.

Conclusions

The interface information for protein–protein complexes is important for understanding protein–protein interactions and recognition. In this work, we investigated the useful variables from the interfaces and developed a simple scale to calculate the binding free energy of protein–protein association. The variables are used as a scoring function in the protein– protein docking calculation. As discussed above, the side-chain accessible number, Nb, can be reasonable for depicting the loss of side-chain conformational entropy in the binding process. The interface information for complexes has great potential for describing protein–protein association and the corresponding three variables can be used to calculate the binding free energy. The model is advantageous in terms of saving calculation time and ease of use. However, the binding free energy function presented here is based on an approximate treatment in which the molecule is treated as a ‘rigid body’. Today it is necessary to develop both new docking methods for elucidating the details of specific interactions at the atomic level and computational tools for providing information on protein–protein association in various environments (Camacho and Vadja, 2002). The interface information for complexes may give us some helpful hints on the subject and help us to get some ideas about specific associations. Work on improving the accuracy of binding free energy and molecular flexibility is currently under way.


    Notes
 
1 To whom correspondence should be addressed. E-mail: cxwang{at}bjpu.edu.cn Back


    Acknowledgments
 
We thank Professor J.Janin for providing the docking package. We also thank Dr Ben Zhuo Lu for helpful discussions. This work was supported in part by the Chinese Natural Science Foundation (Nos 29992590–2, 30170230 and 10174005).


    References
 Top
 Abstract
 Introduction
 Systems and methods
 Results and discussion
 References
 
Berendsen,H.J.C., van der Spoel,D. and van Drunen,R. (1995) Comput. Phys. Commun., 91, 43–56.[CrossRef][ISI]

Bernstein,F.C., Koetzle,T.F., Williams,G.J.B, Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535–542.[ISI][Medline]

Camacho,C.J. and Vajda,S. (2002) Curr. Opin. Struct. Biol., 12, 36–40.[CrossRef][ISI][Medline]

Camacho,C.J., Weng,Z., Vajda,S. and DeLisi,C. (1999) Biophys J., 76, 1166–1178.[Abstract/Free Full Text]

Cherfils,J. and Janin,J. (1993) Curr. Opin. Struct. Biol., 3, 265–269.[ISI]

Cherfils,J., Duquerroy,S. and Janin,J. (1991) Proteins: Struct. Funct. Genet., 11, 271–280.[ISI][Medline]

Cherfils,J., Bizebard,T., Knossow,M. and Janin,J. (1994) Proteins: Struct. Funct. Genet., 18, 8–18.[ISI][Medline]

Di Nola,A., Berendsen,H.J.C. and Edholm,O. (1984) Macromolecules, 17, 2044–2050.[ISI]

Goodsell,D.S. and Olson,A.J. (1990) Proteins: Struct. Funct. Genet., 8, 195–202.[ISI][Medline]

Jackson,R.M. and Sternberg,M.J. (1995) J. Mol. Biol., 250, 258–275.[CrossRef][ISI][Medline]

Karplus,M. and Kushick J.N. (1981) Macromolecules, 14, 325–332.[ISI]

Karplus,M. and Petsko,G.A. (1990) Nature, 347, 631–639.[CrossRef][ISI][Medline]

King,B.L., Vajda,S. and DeLisi,C. (1996) FEBS Lett., 384, 87–91.[CrossRef][ISI][Medline]

Lee,B. and Richards F.M. (1971) J. Mol. Biol., 55, 379–400.[ISI][Medline]

Lin,S.L., Nussinov,R., Fischer,D. and Wolfson,H.J. (1994) Proteins: Struct. Funct. Genet., 18, 94–101.[ISI][Medline]

Mezei,M. and Beveridge,D.L. (1986) Ann. N. Y. Acad. Sci., 482, 1–23.

Miyamoto,S. and Kollman,P.A. (1993) Proteins: Struct. Funct. Genet., 16, 226–245.[ISI][Medline]

Nauchitel,V., Villaverde,M.C. and Sussman,F. (1995) Protein Sci., 4, 1356–1364.[Abstract/Free Full Text]

Norel,R., Sheinerman,F., Petrey,D. and Honig,B. (2001) Protein Sci., 10, 2147–2161.[Abstract/Free Full Text]

Novotny,J., Bruccoleri,R.E. and Saul,F.A. (1989) Biochemistry, 28, 4735–4749.[ISI][Medline]

Pickett,S.D. and Sternberg,M.J.E. (1993) J. Mol. Biol., 231, 825–839.[CrossRef][ISI][Medline]

Reynolds,C.A., King,P.M. and Richards,W.G. (1992) Mol. Phys., 76, 251–275.[ISI]

Sezerman,U., Vajda,S., Cornette,J., DeLisi,C. (1993) Protein Sci., 2, 1827–1843.[Abstract/Free Full Text]

Smith,G.R. and Sternberg,J.E. (2002) Curr. Opin. Struct. Biol., 12, 28–35.[CrossRef][ISI][Medline]

Smith,K.C. and Honig,B. (1994) Proteins: Struct. Funct. Genet., 18, 119–132.[ISI][Medline]

Stoddard,B.L. and Koshland,D.E.,Jr. (1993) Proc. Natl Acad. Sci. USA, 90, 1146–1153.[Abstract]

Takamatsu,Y. and Itai,A. (1998) Proteins: Struct. Funct. Genet., 33, 62–73.[CrossRef][ISI][Medline]

Vajda,S., Weng,Z.P., Rosenfld,R. and DeLisi,C. (1994) Biochemistry, 33, 13977–13988.[ISI][Medline]

Vajda,S., Weng,Z.P. and DeLisi,C. (1995) Protein Sci., 8, 1081–1092.

Vajda,S., Sippl,M., Novotny,J. (1997) Curr. Opin. Struct. Biol., 2, 222–228.[CrossRef]

Weng,Z.P., DeLisi,C. and Vajda,S. (1997) Protein Sci., 6, 1976–1984.[Abstract/Free Full Text]

Xu,D., Lin,S.L. and Nussinov,R. (1997) J. Mol. Biol., 265, 68–84.[CrossRef][ISI][Medline]

Zhang,C., Vasmatzis,G., Cornette,J.L. and DeLisi,C. (1997) J. Mol. Biol., 267, 707–726.[CrossRef][ISI][Medline]

Received January 30, 2002; revised April 26, 2002; accepted May 21, 2002.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (2)
Request Permissions
Google Scholar
Articles by Ma, X. H.
Articles by Chen, W. Z.
PubMed
PubMed Citation
Articles by Ma, X. H.
Articles by Chen, W. Z.