A protein–protein docking algorithm dependent on the type of complexes

Chun Hua Li, Xiao Hui Ma, Wei Zu Chen and Cun Xin Wang1

College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100022, China

1 To whom correspondence should be addressed. E-mail: cxwang{at}bjpu.edu.cn


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
An efficient ‘soft docking’ algorithm is described to assist the prediction of protein–protein association using three-dimensional structures of molecules. The basic tools are the ‘simplified protein’ model and the docking algorithm of Wodak and Janin. The side chain flexibility of Arg, Lys, Asp, Glu and Met residues at the protein surface is taken into account. The complex type-dependent filtering technique on the basis of the geometric matching, hydrophobicity and electrostatic complementarity is used to select candidate binding modes. Subsequently, we calculate a scoring function which includes electrostatic and desolvation energy terms. In the 44 complexes tested including enzyme–inhibitor, antibody–antigen and other complexes, native-like structures were all found, of which 30 were ranked in the top 20. Thus, our soft docking algorithm has the potential to predict protein–protein recognition.

Keywords: binding free energy/molecular flexibility/molecular recognition/protein docking/protein–protein interactions


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Recently, experimental and computational efforts have increasingly been devoted to the investigations of protein–protein interactions, which is very significant for understanding biochemical processes, e.g. signal transduction, cell regulation and immune response. Given the difficulties in experimentally determining the structures of protein complexes, the docking method to predict potential binding modes computationally is currently of great interest. The principles of docking and the progress that has been made during the last decade have been described (Cherfils and Janin, 1993Go; Lengauer and Rarey, 1996Go; Sotriffer et al., 2000Go; Halperin et al., 2002Go). Many promising algorithms, such as the fast Fourier transform (FFT)-based matching (Katchalski-Katzir et al., 1992Go; Gabb et al., 1997Go), Geometric Hashing (Fischer et al., 1995Go; Norel et al., 1999aGo) and BIGGER (Palma et al., 2000Go), have been developed. However, because of the complexity of the problem, protein–protein docking is still largely at the theoretical stage and there is still considerable scope for the development of methodology.

In protein–protein docking, because of the large number of atoms and degrees of freedom involved, it would be impracticable to treat molecular flexibility in an explicit way. Currently, the solutions to this problem are mainly limited to the techniques that may tolerate a limited degree of molecular flexibility by using a ‘soft’ representation of the molecular surface (Jiang and Kim, 1991Go; Walls and Sternberg, 1992Go; Sandak et al., 1995Go; Vakser, 1995Go; Palma et al., 2000Go; Ritchie and Kemp, 2000Go). Jiang and Kim used a cube representation of the molecular surface and volume in their docking procedure (Jiang and Kim, 1991Go). Ritchie and Kemp introduced a ‘soft’ model of electrostatic complementarity in the algorithm (Ritchie and Kemp, 2000Go). Palma et al. proposed a surface-implicit method in which the surface is represented by values 0 and 1 on two grids, the surface and core grids (Palma et al., 2000Go). This digitization introduces the first level of ‘softness’ in the algorithm. In this paper, the flexible amino acid residues Arg, Lys, Asp, Glu and Met at the protein surface are softened on the basis of the ‘simplified protein’ model (Levitt, 1976Go). This softness treatment improves the effect of unbound docking to some degree.

A search procedure may produce millions of docked structures. How to reduce these solutions drastically by filtering to a range manageable by the scoring functions is a serious and challenging topic of current research. The docking method is generally based on the idea of complementarity between the interacting molecules. This complementarity may be geometric, electrostatic or hydrophobic, or all three. Most docking algorithms developed so far used the extent of geometric complementarity of the protein surfaces as the filtering criterion to eliminate a large number of solutions with poor surface matching. It is generally recognized, however, that a single filtering criterion is not sufficient to discriminate between the native-like and incorrect docked structures except in a very few cases (Shoichet and Kuntz, 1991Go). Recently, investigations on the interfaces of known protein–protein complexes (Jones and Thornton, 1996Go; Betts and Sternberg, 1999Go; Lo Conte et al., 1999Go; Norel et al., 1999bGo; Decanniere et al., 2001Go) have revealed that enzyme–inhibitor, antibody–antigen and other complexes present important differences in the interface residue composition, hydrophobicity and electrostatics. Jackson compared protein–protein interactions in these different types of complexes and concluded that enzyme–inhibitor interfaces are more static and hence more easily predicted than antibody–antigen interfaces (Jackson, 1999Go). This suggests that different filtering criteria should be applied to different types of complexes. In this paper, we focus on the type-dependent filtering technique in which, in addition to the geometric matching, we also take hydrophobicity and electrostatic complementarity into account.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
The selected test set

A collection of 44 protein–protein complexes from the Protein Data Bank (PDB) was used as test sets (Table IGo). They were chosen from different types of complexes, including 23 enzyme–inhibitor, 11 antibody–antigen and 10 other complexes. For 24 systems, docking was performed with the unbound experimental structures of both the receptor and the ligand. For the remaining 20 systems, the unbound experimental structure of only one molecule was available, and therefore the bound structure was taken for the other molecule.


View this table:
[in this window]
[in a new window]
 
Table I. The selected 44 protein–protein complexes used to test the docking algorithm
 
Treating molecular flexibility

For docking, we used the ‘simplified protein’ model (Levitt, 1976Go) with one sphere per residue and radii listed in the reference (Levitt, 1976Go) except for Arg, Lys, Asp, Glu and Met at the protein surface. As conformational changes often affect their flexible side chains (Cherfils and Janin, 1993Go; Lo Conte et al., 1999Go; Zhao et al., 2001Go), these residues were represented with spheres centered on the Cß atom with a small radius of 1.5 Å, making the molecular surface ‘softer’ to some extent at these positions than elsewhere in the protein.

Searching

We used Wodak and Janin’s docking algorithm (Wodak and Janin, 1978Go) implemented in the program DOCK (Cherfils et al., 1991Go). The six parameters that defined the position and orientation of one molecule relative to the other were five Euler rotation angles ({theta}1, {varphi}1, {theta}2, {varphi}2 and {chi}) and an intermolecular distance {rho}. Angles {theta}1 and {varphi}1 located the center of the ligand relative to the receptor; {theta}2 and {varphi}2 located the center of the receptor relative to the ligand; {chi} was a spin angle about the center line. The five angles were systematically searched in steps of 7.5°. We explored the full range of {theta}2 (±90°), {varphi}2 and {chi} (±180°), that is, the full surface of the ligand. For the receptor, we restrict the search range of the {theta}1 and {varphi}1 to ±30° around the active site. With a 7.5° step, about 4.86x106 different docked structures were generated for each complex.

Filtering

In this work, the docked structures with an interface area not less than 500 Å2 were retained and subjected to filtering. The filtering technique based on geometric matching, hydrophobicity (Zhang et al., 1997Go) and electrostatic complementarity was from the analysis in the Combined filtering section in Results and discussion (see below). We used interface areas to score geometric and electrostatic energy for electrostatic complementarity.

For the enzyme–inhibitor complexes, first the top 500 docked structures were sorted according to descending interface area. For the following ones, their interface area values were compared with the sorted lookup list containing those of the 500 best geometric matching solutions found so far. If its surface matching is poorer than that of the worst solution in the list, it will be discarded. Otherwise, it will be saved and inserted in the list at the same time the worst element is eliminated. In this way, 500 solutions were retained by geometric matching. Then, in the solutions left, in the same way, those of the 500 least desolvation free energy solutions were added. Finally, the total 1000 solutions retained by geometric matching and hydrophobicity were combined as the last retained solutions.

For the antibody–antigen complexes, the 500 solutions first retained by geometric matching and the 500 solutions then retained by electrostatic complementarity were combined as the final retained solutions. For the other complexes, the total of 1500 solutions retained by geometric matching, hydrophobicity and electrostatic complementarity were combined as the final retained solutions.

Finally, for every system, several binding modes with similar structure were replaced with an average conformation (Cherfils et al., 1991Go).

The scoring functions

After clustering, the following scoring function was evaluated:


(1)

where {Delta}Eelec denoted the changes in the electrostatic energy. A soft-core Coulombic potential was used to calculate electrostatic energy:


(2)

where k was a constant including the electrical permittivity of vacuum and rij was the distance between atoms i and j. The constant c was set to 1.2 Å. The charge parameters were from the CHARMM force field (Brooks et al., 1983Go). {Delta}Gdes(ACE) was the desolvation free energy based on the atomic contact energy (ACE) (Zhang et al., 1997Go, 1999Go):


(3)

where eij denoted the ACE between atoms i and j, and nij was a switch function (Zhang et al., 1999Go) applied to eij in the range 6–10 Å in order to avoid a sharp distance cutoff. The sum was taken over all atom pairs less than 10 Å apart.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Treatment of conformational flexibility

In order to examine the effect of the molecular flexibility treatment in our molecular model, we compared the docked structure with our modified molecular model with the experimental structure. Figure 1Go shows results obtained from the comparisons above for the complex 1BRC. The docking was performed starting from the superimposed structures (reference structure) of the enzyme trypsin (1bra) and its inhibitor APPI (1aap) upon the complex 1BRC, but far apart (200 Å). Actually, in the association of the two molecules, an obvious conformational change occurs on the Arg15 side chain of the inhibitor APPI, which can be found by comparing the bound and unbound structures of the inhibitor APPI. From Figure 1Go, the docking using the modified molecular model tolerates the appropriate overlap between the Arg15 side chain of APPI and Trp215 of trypsin, whereas a major clash would be expected to appear if the docking is performed with the original molecular model. This means that our modified molecular model can reasonably allow the side chain flexibility of the surface residues.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 1. Detail at the interface of the experimental structure of 1BRC and the structure docked starting from the enzyme trypsin (1bra) and inhibitor APPI (1aap) using the modified molecular model. Thin lines correspond to the experimental structure of 1BRC and thick lines show the conformations of the docked structure.

 
Combined filtering

Since different types of the complexes have important differences in interface hydrophobicity and electrostatics, we attempted to apply different filtering criteria to different types of the complexes. In order to compare the filtering effect of the geometric matching, hydrophobicity and electrostatic complementarity for different types of the complexes, the numbers of native-like structures in the three lists containing those of the 1000 best interface matching, 1000 least desolvation free energy and 1000 best electrostatic complementarity solutions are listed in columns S, A and E (under Filtering) in Table IIGo. The ratio of the number of native-like structures to that of the retained solutions is a key factor in evaluating the filtering effect. A docked structure is considered a native-like structure if the root mean square deviation (r.m.s.d.) of the backbone atoms (N, C{alpha}, C, O) from the reference structure is not greater than 4.0 Å.


View this table:
[in this window]
[in a new window]
 
Table II. Results of molecular docking calculations for 44 protein–protein complexesa
 
From Table IIGo, for the enzyme–inhibitor complexes, in most cases native-like solutions selected by electrostatic complementarity are fewer than those selected by geometric matching and by hydrophobicity. Furthermore, it is clear that only if the native-like solutions selected by these two criteria are combined as the retained solutions is the approach universal. For example, for 1BRC, the native-like solutions obtained by geometric matching are zero, but 107 native-like structures are selected by hydrophobicity. For 1FQ1, however, we find no native-like structures by hydrophobicity and nine native-like structures by geometric matching. Therefore, the combined filtering criterion based on geometric matching and hydrophobicity is implemented for enzyme–inhibitor complexes. For antibody–antigen complexes, the filtering effect of the hydrophobicity is bad and that of geometric matching and electrostatic complementarity seems to be relatively good. For the other complexes, it is clear that the three filtering criteria are all important in different cases.

As we know, the principles governing protein–protein recognition have obvious differences for different types of complexes. Probably it is the biological function that determines those differences. From an evolutionary perspective, enzyme–inhibitor complexes have evolved over a long period to optimize the interfaces performing their biological functions, which makes the interfaces more like the interior of proteins. Therefore, hydrophobic interaction is prominent in the association. In contrast, the antibody–antigen recognition is a ‘happenstance’ not subject to evolutionary optimization over more than a few days. The contribution of the hydrophobic interaction to antibody–antigen association is relatively poor, whereas the electrostatic interaction seems to be very important. There could be some other biological principles governing antibody–antigen recognition. For other complexes, since the biological functions are diverse, there are no evident principles of recognition observed in the analysis above. Perhaps the other complexes can be divided into homodimers and heterocomplexes based on their structures. This can be done with the increasing structures of those complexes.

Scoring putative complexes

Table IIGo summarizes the docking results (under Scoring). The ranking position of the first native-like structure is listed for each of the 44 complexes, followed by the corresponding r.m.s.d. relative to the reference structure. There are 30 cases in which the first native-like structures are ranked within the top 20. These cases include the complexes 1CHO, 1CGI, 1TGS, 1EFU*, 1MDA, 1FIN and 1IGC*, in which the relatively large integral conformational changes of the receptors or ligands occur during the complex formation (see the last column for C{alpha} r.m.s.d. in Table IGo). It should be noted, however, that the native-like structures do not always correspond to the best scoring solutions and, often, incorrect docked structures are ranked first. Perhaps the effect of evaluating native-like structures will be better if in the scoring function, H-bond and van der Waals energies are taken into account in addition to electrostatic and desolvation energies after energy minimization. Additionally, properly combined with experimental information on the complex, the method will increase the successful probability of predicting the complex structures.

Figure 2Go shows a comparison between the experimental structure of the complex 1BRC and the best-ranked native-like prediction reported in Table IIGo. Although there is a major clash between the Arg15 side chain of APPI and Trp215 of the enzyme trypsin (see Figure 1Go), the native-like structure is ranked first and it is clear that the binding site is satisfactorily identified.



View larger version (40K):
[in this window]
[in a new window]
 
Fig. 2. Superposition of the experimental structure of protein complex 1BRC and the best ranked native-like prediction reported in Table IIGo. Thick lines, C{alpha} trace of experimental structure; thin lines, C{alpha} trace of predicted mode.

 
Conclusions

It should be pointed out that the docking simulations in this paper are based on the assumption that the binding region on one of the two proteins is known. In the spherical polar coordinates used in this work, this information is given as a simple constraint in just one or two of the angular degrees of freedom. The computation time is much reduced. Ritchie and Kemp also used the same coordinates in their docking algorithm and successfully predicted the structures of some protein–protein complexes (Ritchie and Kemp, 2000Go). In their test, when the search ranges of two angle degrees of freedom are limited to ±30° around the active site, the first native-like structures of 11 out of 18 complexes are ranked the top 20 (Ritchie and Kemp, 2000Go). In this paper, the first native-like structures of 30 out of 44 tested complexes are ranked the top 20. This indicates that our algorithm captures some important factors in the protein–protein association and can provide useful help for the study of molecular recognition.

The guidance of docking by the characters on protein–protein interfaces will be important. Currently, many important features of antibody–antigen interfaces have been reported. For example, tyrosine residues represent over a quarter of the total interaction energy donated by the antibody (Jackson, 1999Go). Therefore, we might add this information to filtering or scoring just for antibody–antigen docking.

In summary, our soft docking algorithm has several advantages: (1) the modified molecular model can improve the simulation result for the unbound protein–protein docking; (2) the type-dependent filtering technique can retain much more native-like structures and increase the successful probability of predicting complex structures; and (3) the scoring function based on the binding free energy can effectively distinguish the correct from the incorrect structures. However, the main shortcoming of this algorithm is that only a partial binding space is searched. This is obviously a limitation for the docking simulations in which no information about the binding site is known. Work on improving our docking algorithm is in progress.


    Acknowledgments
 
We thank Professor Janin and Dr Cherfils for providing the rigid-body protein–protein docking program. We also thank Dr Ben Zhuo Lu for helpful discussions. This work was supported in part by the Chinese Natural Science Foundation (No.s 29992590-2, 30170230 and 10174005).


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Betts,M.J. and Sternberg,M.J.E. (1999) Protein Eng., 12, 271–283.[Abstract/Free Full Text]

Brooks,B.R., Bruccoleri,R.E., Olafson,B.D. and States,D.J. (1983) J. Comput. Chem., 4, 187–217.[ISI]

Cherfils,J. and Janin,J. (1993) Curr. Opin. Struct. Biol., 3, 265–269.[ISI]

Cherfils,J., Duquerroy,S. and Janin,J. (1991) Proteins: Struct. Funct. Genet., 11, 271–280.[ISI][Medline]

Decanniere,K., Transue,T.R., Desmyter,A., Maes,D., Muyldermans,S. and Wyns,L. (2001) J. Mol. Biol., 313, 473–478.[CrossRef][ISI][Medline]

Fischer,D., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1995) J. Mol. Biol., 248, 459–477.[CrossRef][ISI][Medline]

Gabb,H.A., Jackson,R.M. and Sternberg,M.J. (1997) J. Mol. Biol., 272, 106–120.[CrossRef][ISI][Medline]

Halperin,I., Ma,B., Wolfson,H. and Nussinov,R. (2002) Proteins: Struct. Funct. Genet., 47, 409–443.[CrossRef][ISI][Medline]

Jackson,R.M. (1999) Protein Sci., 8, 603–613.[Abstract]

Jiang,F. and Kim,S.H. (1991) J. Mol. Biol., 219, 79–102.[ISI][Medline]

Jones,S. and Thornton,J.M. (1996) Proc. Natl Acad. Sci. USA, 93, 13–20.[Abstract/Free Full Text]

Katchalski-Katzir,E., Shariv,I., Eisenstein,M., Friesem,A.A., Aflalo,C. and Vakser,I.A. (1992) Proc. Natl Acad. Sci. USA, 89, 2195–2199.[Abstract]

Lengauer,T. and Rarey,M. (1996) Curr. Opin. Struct. Biol., 6, 402–406.[CrossRef][ISI][Medline]

Levitt,M. (1976) J. Mol. Biol., 104, 59–107.[ISI][Medline]

Lo Conte,L., Chothia,C. and Janin,J. (1999) J. Mol. Biol., 285, 2177–2198.[CrossRef][ISI][Medline]

Norel,R., Retrey,D., Wolfson,H.J. and Nussinov,R. (1999a) Proteins: Struct. Funct. Genet., 35, 403–419.[CrossRef]

Norel,R., Retrey,D., Wolfson,H.J. and Nussinov,R. (1999b) Proteins: Struct. Funct. Genet., 36, 307–317.[CrossRef][ISI][Medline]

Palma,P.N., Krippahl,L., Wampler,J.E. and Moura,J.J.G. (2000) Proteins: Struct. Funct. Genet., 39, 372–384.[CrossRef][ISI][Medline]

Ritchie,D.W. and Kemp,G.J.L. (2000) Proteins: Struct. Funct. Genet., 39, 178–194.[CrossRef][ISI][Medline]

Sandak,B., Nussinov,R. and Wolfson,H.J. (1995) Comput. Appl. Biosci., 11, 87–99.[Abstract]

Shoichet,B.K. and Kuntz,I.D. (1991) J. Mol. Biol., 221, 327–346.[CrossRef][ISI][Medline]

Sotriffer,C.A., Flader,W., Winger,R.H., Rode,B.M., Liedl,K.R. and Varga,J.M. (2000) Methods, 20, 280–291.[CrossRef][ISI][Medline]

Vakser,I.A. (1995) Protein Eng., 8, 371–377.[Abstract]

Walls,P.H. and Sternberg,M.J. (1992) J. Mol. Biol., 228, 277–297.[ISI][Medline]

Wodak,S.J. and Janin,J. (1978) J. Mol. Biol., 124, 323–342.[ISI][Medline]

Zhang,C., Vasmatzis,G., Cornette,J.L. and DeLisi,C. (1997) J. Mol. Biol., 3, 707–726.[CrossRef]

Zhang,C., Chen,J. and DeLisi,C. (1999) Proteins: Struct. Funct. Genet., 34, 255–267.[CrossRef][ISI][Medline]

Zhao,S., Goodsell,D.S., Olson,A.J. (2001) Proteins: Struct. Funct. Genet., 43, 271–279.[CrossRef][ISI][Medline]

Received August 8, 2002; revised January 2, 2003; accepted February 11, 2003.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (7)
Request Permissions
Google Scholar
Articles by Li, C. H.
Articles by Wang, C. X.
PubMed
PubMed Citation
Articles by Li, C. H.
Articles by Wang, C. X.