College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100022, China
1 To whom correspondence should be addressed. E-mail: cxwang{at}bjpu.edu.cn
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: binding free energy/molecular flexibility/molecular recognition/protein docking/proteinprotein interactions
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
In proteinprotein docking, because of the large number of atoms and degrees of freedom involved, it would be impracticable to treat molecular flexibility in an explicit way. Currently, the solutions to this problem are mainly limited to the techniques that may tolerate a limited degree of molecular flexibility by using a soft representation of the molecular surface (Jiang and Kim, 1991; Walls and Sternberg, 1992
; Sandak et al., 1995
; Vakser, 1995
; Palma et al., 2000
; Ritchie and Kemp, 2000
). Jiang and Kim used a cube representation of the molecular surface and volume in their docking procedure (Jiang and Kim, 1991
). Ritchie and Kemp introduced a soft model of electrostatic complementarity in the algorithm (Ritchie and Kemp, 2000
). Palma et al. proposed a surface-implicit method in which the surface is represented by values 0 and 1 on two grids, the surface and core grids (Palma et al., 2000
). This digitization introduces the first level of softness in the algorithm. In this paper, the flexible amino acid residues Arg, Lys, Asp, Glu and Met at the protein surface are softened on the basis of the simplified protein model (Levitt, 1976
). This softness treatment improves the effect of unbound docking to some degree.
A search procedure may produce millions of docked structures. How to reduce these solutions drastically by filtering to a range manageable by the scoring functions is a serious and challenging topic of current research. The docking method is generally based on the idea of complementarity between the interacting molecules. This complementarity may be geometric, electrostatic or hydrophobic, or all three. Most docking algorithms developed so far used the extent of geometric complementarity of the protein surfaces as the filtering criterion to eliminate a large number of solutions with poor surface matching. It is generally recognized, however, that a single filtering criterion is not sufficient to discriminate between the native-like and incorrect docked structures except in a very few cases (Shoichet and Kuntz, 1991). Recently, investigations on the interfaces of known proteinprotein complexes (Jones and Thornton, 1996
; Betts and Sternberg, 1999
; Lo Conte et al., 1999
; Norel et al., 1999b
; Decanniere et al., 2001
) have revealed that enzymeinhibitor, antibodyantigen and other complexes present important differences in the interface residue composition, hydrophobicity and electrostatics. Jackson compared proteinprotein interactions in these different types of complexes and concluded that enzymeinhibitor interfaces are more static and hence more easily predicted than antibodyantigen interfaces (Jackson, 1999
). This suggests that different filtering criteria should be applied to different types of complexes. In this paper, we focus on the type-dependent filtering technique in which, in addition to the geometric matching, we also take hydrophobicity and electrostatic complementarity into account.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
A collection of 44 proteinprotein complexes from the Protein Data Bank (PDB) was used as test sets (Table I). They were chosen from different types of complexes, including 23 enzymeinhibitor, 11 antibodyantigen and 10 other complexes. For 24 systems, docking was performed with the unbound experimental structures of both the receptor and the ligand. For the remaining 20 systems, the unbound experimental structure of only one molecule was available, and therefore the bound structure was taken for the other molecule.
|
For docking, we used the simplified protein model (Levitt, 1976) with one sphere per residue and radii listed in the reference (Levitt, 1976
) except for Arg, Lys, Asp, Glu and Met at the protein surface. As conformational changes often affect their flexible side chains (Cherfils and Janin, 1993
; Lo Conte et al., 1999
; Zhao et al., 2001
), these residues were represented with spheres centered on the Cß atom with a small radius of 1.5 Å, making the molecular surface softer to some extent at these positions than elsewhere in the protein.
Searching
We used Wodak and Janins docking algorithm (Wodak and Janin, 1978) implemented in the program DOCK (Cherfils et al., 1991
). The six parameters that defined the position and orientation of one molecule relative to the other were five Euler rotation angles (
1,
1,
2,
2 and
) and an intermolecular distance
. Angles
1 and
1 located the center of the ligand relative to the receptor;
2 and
2 located the center of the receptor relative to the ligand;
was a spin angle about the center line. The five angles were systematically searched in steps of 7.5°. We explored the full range of
2 (±90°),
2 and
(±180°), that is, the full surface of the ligand. For the receptor, we restrict the search range of the
1 and
1 to ±30° around the active site. With a 7.5° step, about 4.86x106 different docked structures were generated for each complex.
Filtering
In this work, the docked structures with an interface area not less than 500 Å2 were retained and subjected to filtering. The filtering technique based on geometric matching, hydrophobicity (Zhang et al., 1997) and electrostatic complementarity was from the analysis in the Combined filtering section in Results and discussion (see below). We used interface areas to score geometric and electrostatic energy for electrostatic complementarity.
For the enzymeinhibitor complexes, first the top 500 docked structures were sorted according to descending interface area. For the following ones, their interface area values were compared with the sorted lookup list containing those of the 500 best geometric matching solutions found so far. If its surface matching is poorer than that of the worst solution in the list, it will be discarded. Otherwise, it will be saved and inserted in the list at the same time the worst element is eliminated. In this way, 500 solutions were retained by geometric matching. Then, in the solutions left, in the same way, those of the 500 least desolvation free energy solutions were added. Finally, the total 1000 solutions retained by geometric matching and hydrophobicity were combined as the last retained solutions.
For the antibodyantigen complexes, the 500 solutions first retained by geometric matching and the 500 solutions then retained by electrostatic complementarity were combined as the final retained solutions. For the other complexes, the total of 1500 solutions retained by geometric matching, hydrophobicity and electrostatic complementarity were combined as the final retained solutions.
Finally, for every system, several binding modes with similar structure were replaced with an average conformation (Cherfils et al., 1991).
The scoring functions
After clustering, the following scoring function was evaluated:
![]() | (1) |
where Eelec denoted the changes in the electrostatic energy. A soft-core Coulombic potential was used to calculate electrostatic energy:
![]() | (2) |
where k was a constant including the electrical permittivity of vacuum and rij was the distance between atoms i and j. The constant c was set to 1.2 Å. The charge parameters were from the CHARMM force field (Brooks et al., 1983).
Gdes(ACE) was the desolvation free energy based on the atomic contact energy (ACE) (Zhang et al., 1997
, 1999
):
![]() | (3) |
where eij denoted the ACE between atoms i and j, and nij was a switch function (Zhang et al., 1999) applied to eij in the range 610 Å in order to avoid a sharp distance cutoff. The sum was taken over all atom pairs less than 10 Å apart.
![]() |
Results and discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
In order to examine the effect of the molecular flexibility treatment in our molecular model, we compared the docked structure with our modified molecular model with the experimental structure. Figure 1 shows results obtained from the comparisons above for the complex 1BRC. The docking was performed starting from the superimposed structures (reference structure) of the enzyme trypsin (1bra) and its inhibitor APPI (1aap) upon the complex 1BRC, but far apart (200 Å). Actually, in the association of the two molecules, an obvious conformational change occurs on the Arg15 side chain of the inhibitor APPI, which can be found by comparing the bound and unbound structures of the inhibitor APPI. From Figure 1
, the docking using the modified molecular model tolerates the appropriate overlap between the Arg15 side chain of APPI and Trp215 of trypsin, whereas a major clash would be expected to appear if the docking is performed with the original molecular model. This means that our modified molecular model can reasonably allow the side chain flexibility of the surface residues.
|
Since different types of the complexes have important differences in interface hydrophobicity and electrostatics, we attempted to apply different filtering criteria to different types of the complexes. In order to compare the filtering effect of the geometric matching, hydrophobicity and electrostatic complementarity for different types of the complexes, the numbers of native-like structures in the three lists containing those of the 1000 best interface matching, 1000 least desolvation free energy and 1000 best electrostatic complementarity solutions are listed in columns S, A and E (under Filtering) in Table II. The ratio of the number of native-like structures to that of the retained solutions is a key factor in evaluating the filtering effect. A docked structure is considered a native-like structure if the root mean square deviation (r.m.s.d.) of the backbone atoms (N, C
, C, O) from the reference structure is not greater than 4.0 Å.
|
As we know, the principles governing proteinprotein recognition have obvious differences for different types of complexes. Probably it is the biological function that determines those differences. From an evolutionary perspective, enzymeinhibitor complexes have evolved over a long period to optimize the interfaces performing their biological functions, which makes the interfaces more like the interior of proteins. Therefore, hydrophobic interaction is prominent in the association. In contrast, the antibodyantigen recognition is a happenstance not subject to evolutionary optimization over more than a few days. The contribution of the hydrophobic interaction to antibodyantigen association is relatively poor, whereas the electrostatic interaction seems to be very important. There could be some other biological principles governing antibodyantigen recognition. For other complexes, since the biological functions are diverse, there are no evident principles of recognition observed in the analysis above. Perhaps the other complexes can be divided into homodimers and heterocomplexes based on their structures. This can be done with the increasing structures of those complexes.
Scoring putative complexes
Table II summarizes the docking results (under Scoring). The ranking position of the first native-like structure is listed for each of the 44 complexes, followed by the corresponding r.m.s.d. relative to the reference structure. There are 30 cases in which the first native-like structures are ranked within the top 20. These cases include the complexes 1CHO, 1CGI, 1TGS, 1EFU*, 1MDA, 1FIN and 1IGC*, in which the relatively large integral conformational changes of the receptors or ligands occur during the complex formation (see the last column for C
r.m.s.d. in Table I
). It should be noted, however, that the native-like structures do not always correspond to the best scoring solutions and, often, incorrect docked structures are ranked first. Perhaps the effect of evaluating native-like structures will be better if in the scoring function, H-bond and van der Waals energies are taken into account in addition to electrostatic and desolvation energies after energy minimization. Additionally, properly combined with experimental information on the complex, the method will increase the successful probability of predicting the complex structures.
Figure 2 shows a comparison between the experimental structure of the complex 1BRC and the best-ranked native-like prediction reported in Table II
. Although there is a major clash between the Arg15 side chain of APPI and Trp215 of the enzyme trypsin (see Figure 1
), the native-like structure is ranked first and it is clear that the binding site is satisfactorily identified.
|
It should be pointed out that the docking simulations in this paper are based on the assumption that the binding region on one of the two proteins is known. In the spherical polar coordinates used in this work, this information is given as a simple constraint in just one or two of the angular degrees of freedom. The computation time is much reduced. Ritchie and Kemp also used the same coordinates in their docking algorithm and successfully predicted the structures of some proteinprotein complexes (Ritchie and Kemp, 2000). In their test, when the search ranges of two angle degrees of freedom are limited to ±30° around the active site, the first native-like structures of 11 out of 18 complexes are ranked the top 20 (Ritchie and Kemp, 2000
). In this paper, the first native-like structures of 30 out of 44 tested complexes are ranked the top 20. This indicates that our algorithm captures some important factors in the proteinprotein association and can provide useful help for the study of molecular recognition.
The guidance of docking by the characters on proteinprotein interfaces will be important. Currently, many important features of antibodyantigen interfaces have been reported. For example, tyrosine residues represent over a quarter of the total interaction energy donated by the antibody (Jackson, 1999). Therefore, we might add this information to filtering or scoring just for antibodyantigen docking.
In summary, our soft docking algorithm has several advantages: (1) the modified molecular model can improve the simulation result for the unbound proteinprotein docking; (2) the type-dependent filtering technique can retain much more native-like structures and increase the successful probability of predicting complex structures; and (3) the scoring function based on the binding free energy can effectively distinguish the correct from the incorrect structures. However, the main shortcoming of this algorithm is that only a partial binding space is searched. This is obviously a limitation for the docking simulations in which no information about the binding site is known. Work on improving our docking algorithm is in progress.
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Brooks,B.R., Bruccoleri,R.E., Olafson,B.D. and States,D.J. (1983) J. Comput. Chem., 4, 187217.[ISI]
Cherfils,J. and Janin,J. (1993) Curr. Opin. Struct. Biol., 3, 265269.[ISI]
Cherfils,J., Duquerroy,S. and Janin,J. (1991) Proteins: Struct. Funct. Genet., 11, 271280.[ISI][Medline]
Decanniere,K., Transue,T.R., Desmyter,A., Maes,D., Muyldermans,S. and Wyns,L. (2001) J. Mol. Biol., 313, 473478.[CrossRef][ISI][Medline]
Fischer,D., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1995) J. Mol. Biol., 248, 459477.[CrossRef][ISI][Medline]
Gabb,H.A., Jackson,R.M. and Sternberg,M.J. (1997) J. Mol. Biol., 272, 106120.[CrossRef][ISI][Medline]
Halperin,I., Ma,B., Wolfson,H. and Nussinov,R. (2002) Proteins: Struct. Funct. Genet., 47, 409443.[CrossRef][ISI][Medline]
Jackson,R.M. (1999) Protein Sci., 8, 603613.[Abstract]
Jiang,F. and Kim,S.H. (1991) J. Mol. Biol., 219, 79102.[ISI][Medline]
Jones,S. and Thornton,J.M. (1996) Proc. Natl Acad. Sci. USA, 93, 1320.
Katchalski-Katzir,E., Shariv,I., Eisenstein,M., Friesem,A.A., Aflalo,C. and Vakser,I.A. (1992) Proc. Natl Acad. Sci. USA, 89, 21952199.[Abstract]
Lengauer,T. and Rarey,M. (1996) Curr. Opin. Struct. Biol., 6, 402406.[CrossRef][ISI][Medline]
Levitt,M. (1976) J. Mol. Biol., 104, 59107.[ISI][Medline]
Lo Conte,L., Chothia,C. and Janin,J. (1999) J. Mol. Biol., 285, 21772198.[CrossRef][ISI][Medline]
Norel,R., Retrey,D., Wolfson,H.J. and Nussinov,R. (1999a) Proteins: Struct. Funct. Genet., 35, 403419.[CrossRef]
Norel,R., Retrey,D., Wolfson,H.J. and Nussinov,R. (1999b) Proteins: Struct. Funct. Genet., 36, 307317.[CrossRef][ISI][Medline]
Palma,P.N., Krippahl,L., Wampler,J.E. and Moura,J.J.G. (2000) Proteins: Struct. Funct. Genet., 39, 372384.[CrossRef][ISI][Medline]
Ritchie,D.W. and Kemp,G.J.L. (2000) Proteins: Struct. Funct. Genet., 39, 178194.[CrossRef][ISI][Medline]
Sandak,B., Nussinov,R. and Wolfson,H.J. (1995) Comput. Appl. Biosci., 11, 8799.[Abstract]
Shoichet,B.K. and Kuntz,I.D. (1991) J. Mol. Biol., 221, 327346.[CrossRef][ISI][Medline]
Sotriffer,C.A., Flader,W., Winger,R.H., Rode,B.M., Liedl,K.R. and Varga,J.M. (2000) Methods, 20, 280291.[CrossRef][ISI][Medline]
Vakser,I.A. (1995) Protein Eng., 8, 371377.[Abstract]
Walls,P.H. and Sternberg,M.J. (1992) J. Mol. Biol., 228, 277297.[ISI][Medline]
Wodak,S.J. and Janin,J. (1978) J. Mol. Biol., 124, 323342.[ISI][Medline]
Zhang,C., Vasmatzis,G., Cornette,J.L. and DeLisi,C. (1997) J. Mol. Biol., 3, 707726.[CrossRef]
Zhang,C., Chen,J. and DeLisi,C. (1999) Proteins: Struct. Funct. Genet., 34, 255267.[CrossRef][ISI][Medline]
Zhao,S., Goodsell,D.S., Olson,A.J. (2001) Proteins: Struct. Funct. Genet., 43, 271279.[CrossRef][ISI][Medline]
Received August 8, 2002; revised January 2, 2003; accepted February 11, 2003.