1 Department of Biological Sciences and Biotechnology, Laboratory of Protein Sciences MOE, Tsinghua University, Beijing 100084 and 2 Institute of Physics, Chinese Academy of Sciences, Beijing 100080, China
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: interface packing/molecular complex/molecular recognition/SOFTDOCK/systematic docking
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The principles that we derived from these studies have been applied to molecular docking of proteins. Methods of molecular docking have also been developed rapidly in the last decade. It is generally true that using the atomic coordinates to represent a molecule and docking it to another molecule using atomic force fields are computationally not feasible, especially when dealing with proteins. Therefore, various representations of molecular surface and volume have to be designed.
The method of representation is often closely related to the search algorithm of the solution space. The search algorithm is a combination of the generation of solutions and the evaluation of the generated solutions. The solution space is defined by the relative rotation and translation between the docked molecules and also the other degrees of freedom included to describe the conformational flexibility of the ligand and receptor. Kuntz and co-workers (Kuntz et al., 1982) first generated a molecular surface (Connolly, 1981
, 1983
) of a molecule and then represented the surface with a set of spheres as positive or negative images of the surface. The positive image of the ligand was then docked to the negative image of the receptor. The negative image of the receptor can also be used to match with the atomic coordinates of small ligands from a known database. Efficient searching and matching algorithms have been developed for macromolecular docking with the incorporation of volume overlap checking (Shoichet and Kuntz, 1991
). Shape and chemical descriptors of the surface have also been developed with this basic representation (Meng et al., 1992
; Shoichet et al., 1992
; Shoichet and Kuntz, 1993
).
Jiang and Kim used the molecular surface of a molecule to represent the surface with the surface normals attached to the surface dots (Jiang and Kim, 1991). The surface and volume of a molecule are then digitized into grids called surface cubes and volume cubes. Searching and matching between two sets of molecular surface cubes and volume cubes are then achieved through exhaustive sampling of the rotation and translation space with a fast translation algorithm. In order to accommodate certain conformational flexibility, a soft dock method is used to evaluate the generated solutions. The softness is implemented through (1) varying the cube size and (2) the cone angle cutoff in calculating the local surface complementarity. The advantage of this cube representation is that it contains the most important characteristics of the surface (using surface dots with areas and normals) and the details of the representation are variable according to the cube size. Moreover, this representation does not contain higher level abstraction and hence is not limited by the set of shape descriptors used in a particular representation in describing a complicated surface. Its disadvantage, however, is that the searching requires more computation.
One type of abstraction is the sparse critical point representation (Lin et al., 1994; Norel et al., 1994a
,b
, 1995
). The high knobs and deep holes of a surface are selected by the extrema of a shape function and the normal vector for each critical point is also calculated, which was found to be crucial in the efficient matching and correct scoring of the generated solutions. Another elegant representation of the molecular surface is to use quadratic shape descriptors (Goldman and Wipke, 2000
). The authors suggested that a shape-explicit docking algorithm, that is, when the shape information is used explicitly in generating the solutions, could be more efficient than a shape-implicit algorithm. Their docking results with small ligand and enzyme receptor systems were compared with other methods and found to be better. There are many other examples of molecular surface representations and search algorithms which are combinations of shape-explicit and shape-implicit representations (Lee and Rose, 1985
; Katchalski-Katzir et al., 1992
; Lawrence and Davis, 1992
; Masek et al., 1993
; Bohm, 1994
; Helmer-Citterich and Tramontano, 1994
; Jones et al., 1995
; Perkins et al., 1995
; Rarey et al, 1996
; Sobolev et al., 1996
; Given and Gilson, 1998
; Stahl and Bohm, 1998
; Hou et al., 1999
; Palma et al., 2000
).
As is well known, most docking algorithms have their shortcomings and cannot work universally on every complexed molecule. Therefore, it is necessary to study when the algorithms are effective and when they are not. So far, only one study has been performed with such a variety and large number of known crystal complexes (Vakser et al., 1999) that are now available in the Protein Data Bank (PDB). In this work, we used the SOFTDOCK program and tested it using different sets of parameters. Through statistical analysis, we selected a set of values for the cube size, the cutoff cone angle and the volume overlap cutoff for docking protein complexes in general. We also suggested the best function for calculating the interface complementarity out of four choices. We then docked a series of 71 known proteinprotein complexes found in the PDB (Bernstein et al., 1977
). We found that most complexes could be found by the SOFTDOCK program with a broad range of parameters. However, some complexes with small interface areas (relative to the complex size) and discontinuous interface surfaces required more stringent parameters for calculating the complementarity. We introduced the signal-to-noise ratio as a measure of tightness of the binding of an interface, which seems to be correlated with the shape of the interface and uncorrelated with the size of the interface. Systematic docking of a large set of known proteinprotein complexes offered an opportunity to evaluate the SOFTDOCK program and to define an optimum set of parameters in docking unknown complexes and studying molecular recognition.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The algorithm for conducting a complete, exhaustive, six-dimensional search to match two sets of points with properties attached to each point was developed previously and described by Jiang and Kim (Jiang and Kim, 1991). Briefly, the rotational space is sampled uniformly with a given angle distance. The sampled rotations represented in the present study by polar angles are searched exhaustively. For each rotation applied, a fast translation search is performed, which is also exhaustive. This fast translation algorithm first calculates the difference vector between a pair of points, each from one of the two sets, in Cartesian coordinates and the matching score according to the properties is assigned to this pair of points. Then, the matching score is stored in a three-dimensional matrix, called the difference vector space or the translation vector space, according to the difference vector. Hence all pairs of points from the two sets with the same difference vector are accumulated in the same position in the difference vector space. A local maximum of the accumulated scores in the difference vector space corresponds to a possible good match and its position in the difference vector space gives the corresponding translation vector that should be applied for this good match.
Four function forms for calculating the local complementarity, i.e. the matching score, are shown in Figure 1. The effect of these functions should be similar, as can be seen from their forms. The difference is that the step function used in the previous study cannot be parallelized, whereas the Gaussian function used in the present study can, when such optimization is available. The other functions are also amenable to parallelization. The score function is the sum of the local complementarity scores minus the weight for volume overlap times the volume overlap.
|
We used the signal-to-noise ratio to evaluate the effect of the selected parameters. The signal-to-noise ratio of a crystal complex was calculated in the following way. All possible rotations within 30° were searched, that is, the polar angle phi ranged from 0 to 360°, psi from 0 to 180° and chi from 0 to 30° with 10° steps. In the translation search, the whole surfaces of the probe and the target molecules were included. After the search, the solutions close to the crystal complex were extracted and the maximum score was found and its signal-to-noise ratio was calculated, as described above. For reference, the number of raw solutions generated for each complex tested is about 20 000. The number of correct solutions found may vary depending on the docking parameters and the individual complexes.
We used the list of crystal complexes as given by Lo Conte et al. (Lo Conte et al., 1999) to retrieve the corresponding atomic coordinates when they were available in the PDB (www.rcsb.org) (Bernstein et al., 1977
). Each binary complex was separated into a probe molecule and a target molecule for docking according to the biological interacting pairs.
The SOFTDOCK program consists of many programs. These programs can be run in batch mode with c shell and awk scripts in a Unix or Linux environment. For further understanding of docking rules, the atomic details of the complex interfaces were examined by jiffy C and Perl programs. The numbers of buried main chain atoms and total buried atoms of the interface were summed. %M (percentage of main-chain atoms among total atoms of interfaces) was calculated in the following way. The atoms of both component molecules were extracted if the distance between the probe atom and the target atom was <4 Å. Among these atoms, %M was defined as the percentage of main-chain atoms among total extracted atoms.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We first used 10 crystal complexes to test the parameters systematically, namely 1AVW, 1BRS, 1IGC, 1NCA, 1TX4, 1VFB, 1YCS, 1YDR, 2TRC and 4THC. They represent a variety of complexes with different biological functions. The results for testing the effect of the cube size, cone angle and scoring function type are shown in Table I. It can be seen that the signal-to-noise ratio (SNR), which is a measure of the interface complementarity, increases when the cube size and the cone angle decrease. When the cube size is large, the surface areas and normals are averaged within each cube and therefore the molecular surface is smoothed and the detailed features are removed. Hence it is not surprising that the calculated interface complementarity decreases.
|
Next, we tested the effect of the different functions for calculating the interface complementarity. Four representative functions are shown in Figure 1. Function No. 1, the step function, was used in the previous study (Jiang and Kim, 1991
). One of the motivations for changing the function form was to be able to parallelize the algorithm so that the SOFTDOCK program could run faster when the vectorization optimization is available for multi-processor machines. For a cube size of 2.4 Å and a volume cutoff of 1500, on a Pentium III 450 MHz 128 MB Linux system, a typical run for one of the 71 complexes takes about 5 min. The corresponding results are shown in Table II
. The SNRs obtained for the four functions are similar without significant variation. This is understandable because these four functions are similar to each other in their forms and the Gaussian function (No. 4) is most similar to the original step function (No. 1). The Gaussian function was used in almost all of our tests.
|
Seventy one proteinprotein complexes was selected, taken from Lo Conte et al. (Lo Conte et al., 1999), namely 1A2K, 1ACB, 1AGR, 1AK4, 1AIP, 1AO7, 1ATN, 1AVW, 1BRS, 1BTH, 1CBW, 1CHO, 1CSE, 1DAN, 1DHK, 1FJ, 1DVF, 1DKG, 1EBP, 1EFN, 1EFU, 1FBI, 1FC2, 1FIN, 1FLE, 1FSS, 1GG2, 1GLA, 1GOT, 1GUA, 1HIA, 1HWG,1IAI, 1IGC 1JHL, 1KB5, 1MCT, 1MEL, 1MKW, 1MLC, 1NCA, 1NFD, 1NMB, 1NSN, 1OSP, 1PPF, 1SEB, 1STF, 1TBQ, 1TCO, 1TGS, 1TOC, 1TX4, 1UDI, 1VFB, 1YCS, 1YDR, 2BTF, 2JEL, 2KAI, 2PCC, 2PTC, 2SIC, 2SNI, 2TRC, 3HFL, 3HFM, 3SGB, 3TPI, 4CPA and 4HTC. Among them, the probe and target molecules for 1AO7, 1TX4 and 2PTC are ABC:DE (molecule IDs), B:A and I:E, respectively. Docking was performed for each complex using a cube size of 2.4 Å, a cone angle of 40°, the Gaussian function for calculating the interface complementarity and a volume overlap cutoff of 1500. As summarized above, this parameter set tends to implement enough `softness' and works on most complexes tested. The volume overlap cutoff of 1500 was chosen by trial, and is usually about the minimum size of the interface area of a generic complex. A lower value of volume overlap cutoff may lead to the exclusion of correct solutions.
Among the 71 complexes, 57 had correct solutions and their corresponding SNRs were calculated. The correlation coefficient between SNR and the ratio of the interface area to the total number of atoms of the complex is 0.36, which means that there is no significant correlation. The same is true for SNR and the interface area, the correlation coefficient for which is 0.34. The interface area represents the absolute size of the interface. The ratio of the interface area to the total number of atoms of the complex represents the relative size of the interface to the size of the complex. Here we used the total number of atoms to approximate the total surface area of a complex. The SNR can be thought of as a measure of the degree of interface complementarity. Hence the degree of the interface complementarity can vary from one complex to another independent of the size of the interface. In other words, the SNR reflects more the shape characteristics of the interface. It could be thought as a measure of the tightness of the binding of a complex.
Of the 71 complexes, 14 did not give correct solutions with the chosen parameter set (2.4, 40, 1500, Gaussian function). These 14 complexes were 1MKW, 1OSP, 1NCA, 1NMB, 1IAI, 1NFD, 1DFJ, 1GLA, 2PCC, 1AIP, 1AK4, 2BTF, 1SEB and 1YCS. We found that these 14 complexes had lower ratios of the interface area to the total number of atoms than the 57 complexes whose correct solutions could be found. A histogram of these two sets of complexes as a function of the ratio mentioned in the above is shown in Figure 2. The histogram clearly shows two profiles with well-separated peaks for the two sets. Since we used the whole molecular surfaces of the probe and the target in our docking, the background noise will increase as the relative size of the interface to the whole complex decreases. To verify this explanation, we chose another set of parameters (1.6, 20, 1500, Gaussian function) with more stringent criteria for calculating the complementarity to reduce the background noise and enhance the true signal. We found the correct solutions for 10 of the 14 complexes, i.e. missing only four complexes, namely 1MKW, 1GLA, 1DKG and 2PCC (data not shown).
|
|
It is worth noting that complex 2PCC (cytochrome peroxidasecytochrome) is an electron transport system and the molecular recognition between the two component molecules is dominated by electrostatic interactions. Other examples of molecular recogniton by electrostatic interactions have been found by Botti et al. (Botti et al., 1998). For this type of molecular recognition, the electrostatic complementarity will play the major role and the shape complementarity an auxiliary role. This is consistent with the current implementation of SOFTDOCK. In the future, the calculated electrostatic field or potential of a macromolecule could be represented in grid space and used in SOFTDOCK.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Second, the large cube size smears the molecular surface and removes the detailed features. However, the docking results showed that many complexes could be docked and hence recognized at a cube size of 1.46 Å. Similar observations have been suggested previously and applied to low-resolution docking (Vakser, 1996).
Third, in our original design of the soft docking algorithm, the softness is implemented through varying the cube size, the cone angle and the volume overlap cutoff. Increasing these three parameters will increase the softness of docking. However, the increased softness will lead to increased background noise. For some complexes, the recognition can occur at the `soft' level whereas for other complexes the recognition occurs at the `hard' level. In the latter case, the complementarity at the detailed level is essential for recognition, so conformational rearrangement will be necessary to determine whether a molecular complex can be formed. The results presented here show that the majority of the molecular complexes known to date could be found with the strategy of soft docking. It will be interesting to compare our results with those of other docking algorithms now that more proteinprotein complexes are available.
The results also delineate what types of molecular complexes will be hard to dock with the current implementation of SOFTDOCK. Electrostatics are important for molecular recognition. In some cases they dominates the complementarity in molecular recognition whereas the shape complementarity takes a lesser role. On the other hand, in many cases, the shape complementarity is sufficient to achieve molecular recognition; here 57 of the 71 crystal complexes tested were docked successfully by considering the shape complementarity alone.
It is true that one of the main conclusions of our present work, i.e. that electrostatics are important for molecular recognition, has already been accepted by the scientific community. However, it worth noting that most supporting evidence comes from the positive control, that is, the successful docking of a known complex by considering the distribution of the electrostatic potential only (we quoted only a couple of examples). In our case, we support this conclusion by a negative control, that is, a known complex for which docking failed using only the shape complementarity was shown to have been a complex in which electrostatics play the dominant role. Therefore, our results support the same conclusion from a different point of view.
From our current understanding of molecular recognition, our results suggest that there should be a diversity of mechanisms involved in molecular recognition. For example, a diversity of molecular interfaces and conformational changes during complex formation have been observed (Lo Conte et al., 1999). Molecular recognition depends on the principle of complementarity, be it shape or electrostatics. However, using different parameters, we find that molecular recognition of different complexes requires different degrees of complementarity. We call this degree of complementarity the `softness' to reflect the special properties of our docking algorithm. It is biologically relevant to understand this type of variation in molecular recognition. For example, in enzymeinhibitor, antibodyantigen and signaling complexes, the `softness' is different for different biological functions. Admittedly, the `softness' is somewhat related to our docking algorithm and may not have a generalized meaning in molecular recognition. For example, for a peptide fragment in an MHC binding site, the docking could be very `soft' as it involves large conformational changes but the binding is very tight. The peptideMHC complex may be a special case in molecular recognition. We think that it is still too early to say that the effect of the docking parameters is irrelevant to biological understanding, but instead it is helpful to know the `softness' of molecular recognition for the biological complex that one is studying, at least for selecting the docking parameters.
The accuracy of the docking of two proteins is usually estimated by the root mean square deviation (r.m.s.d.) between the docked complex and the known crystal complex. The r.m.s.d. of the C atoms for the contact residues was not calculated for the best correct solution for each complex tested in the present work. However, it could be calculated as was done earlier (Jiang and Kim, 1991
) and the resulting values should be comparable to those from other docking methods. We have not provided these values because we consider that we established the validity of our method in the previous study. Another reason for not calculating the r.m.s.d. in our present docking of 71 complexes is that we have not optimized the final docking complexes using energy minimization, so it is not a fair comparison with other studies such as the ab initio docking of a full-atom model of lysozyme to an antibody with 1.6 Å accuracy (Totrov and Abagyan, 1994
).
Owing to the simplicity and versatility of our representation, several new features could be implemented in SOFTDOCK to encompass a variety of molecular recognition mechanisms and conformational changes. For example, we could easily include electrostatic potential (Knegtel et al., 1997; Lorber and Schoichet, 1998) and contact pair-wise potential (Miyazawa and Jernigan, 1985
) in the representation and apply the ensemble docking method, which should bring about a significant improvement in the applicability of SOFTDOCK. Furthermore, we should extend the concept of the complementarity in the light of the current data on molecular recognition and complexes and explore other methods of calculating the complementarity and predicting the molecular recognition.
Finally, we have shown that for docking known crystal complexes, the correct solution has the highest score. When the binding sites on the probe and target (ligand and receptor) molecules are known, the correct solution is also the top solution. When docking structures of unbound molecules, our docking method also works but requires a filtering and clustering procedure. A paper describing the augmented procedure and the related results has been submitted elsewhere. Futhermore, it will be very interesting to test SOFTDOCK on structures from homology modeling. We think that it might work because, first, the binding sites are known, second, the correct solutions could be refined with some restraints on key interactions and third, many candidate solutions could be evaluated at the atomic level using molecular dynamics simulation. Our general goal is to try to develop our method into an automated procedure where a scientist only has to evaluate a few top solutions for their biological relevance. For very difficult cases, we try to limit the number of solutions for evaluation to around 50. For SOFTDOCK, we really think that we are close to this goal.
![]() |
Notes |
---|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bernstein,F.C., Koetzle,T.F., Williams,J.B., Meyer,E.F.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[ISI][Medline]
Bohm, H-J. (1994) J. Comput.-Aided Mol. Des., 8, 623632.
Botti,S.A., Felder,C.E., Sussman,J.L. and Silman,I. (1998) Protein Eng., 11, 415420.[Abstract]
Chothia,C. (1997) In McCrae,M.A., Saunders,J.R., Smyth,C.J. and Stow,N.D. (eds), Molecular Aspects of HostPathogen Interaction. Cambridge University Press, Cambridge.
Chothia,C. and Janin,J. (1975) Nature, 256, 705708.[ISI][Medline]
Connolly,M.L. (1981) QCPE Bull., 1, 18 (MS, QCPE 429).
Connolly,M.L. (1983) Science, 221, 709713.[ISI][Medline]
Given,J.A. and Gilson,M.K. (1998) Proteins, 33, 475495.[CrossRef][ISI][Medline]
Goldman,B.B. and Wipke,W.T. (2000) Proteins, 38, 7994.[CrossRef][ISI][Medline]
Helmer-Citterich,M. and Tramontano,A. (1994) J. Mol. Biol., 235, 10211031.[CrossRef][ISI][Medline]
Hou,T., Wang,J., Chen,L. and Xu,X. (1999) Protein Eng., 8, 639647.
Janin,J. (1995) Biochimie, 77, 497505.[CrossRef][ISI][Medline]
Janin,J. (1996) Prog. Biophys. Mol. Biol., 64, 145165.[ISI]
Janin,J. and Chothia,C. (1990) J. Biol. Chem., 265, 1602716030.
Jiang,F. and Kim,S.-H. (1991) J. Mol. Biol., 219, 79102.[ISI][Medline]
Jones,S. and Thornton,J.M. (1995) Prog. Biophys. Mol. Biol., 63, 131165.[CrossRef][ISI][Medline]
Jones,S. and Thornton,J.M. (1996) Proc. Natl Acad. Sci. USA, 93, 1320.
Jones,S. and Thornton,J.M. (1997) J. Mol. Biol., 272, 121132.[CrossRef][ISI][Medline]
Jones,G., Willett,P. and Glen,R.C. (1995) J. Comput.-Aided Mol. Des., 9, 532549.
Katchalski-Katzir,E., Shariv,I., Eisenstein,M., Friesem,A.A., Aflalo,C. and Vakser,I.A. (1992) Proc. Natl Acad. Sci. USA, 89, 21952199.[Abstract]
Knegtel,R.M.A., Kuntz,I.D. and Oshiro,C.M. (1997) J. Mol. Biol., 266, 424440.[CrossRef][ISI][Medline]
Kuntz,I.D., Blaney,J.M., Oatley,S.J., Langridge,,R., Ferrin,T.E. (1982) J. Mol. Biol., 161, 269288.[ISI][Medline]
Laskowski,R.A., Luscombe,N.M., Swindells,M.B. and Thornton,J.M. (1996) Protein Sci., 5, 24382452.
Lawrence,M.C. and Davis,P.C. (1992) Proteins, 12, 341.
Lee,R.H. and Rose,G.D. (1985) Biopolymers, 24, 16131627.[ISI][Medline]
Lin,S.L., Nussinov,R., Fischer,D. and Wolfson,H.J. (1994) Proteins, 18, 94101.[ISI][Medline]
Lo Conte,L., Chothia,C. and Janin,J. (1999) J. Mol. Biol., 285, 21772198.[CrossRef][ISI][Medline]
Lorber,D.M. and Shoichet,B.K. (1998) Protein Sci., 7, 938950.
Masek,B.B., Merchant,A. and Matthews,J.B. (1993) Proteins, 17, 193202.[ISI][Medline]
Meng,E.C., Shoichet,B. and Kuntz,I.D. (1992) J. Comput. Chem., 13, 505524.[ISI]
Miyazawa S and Jernigan R.L. (1985) Macromolecules, 18, 534552.[ISI]
Norel,R., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1994a) Biopolymers, 34, 933940.[ISI][Medline]
Norel,R., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1994b) Protein Eng., 7, 3946.[Abstract]
Norel,R., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1995) J. Mol. Biol., 252, 263273.[CrossRef][ISI][Medline]
Palma,P.N., Krippahl,L., Wampler,J.E. and Moura,J.J.G. (2000) Proteins, 39, 372384.[CrossRef][ISI][Medline]
Perkins,T.D.H., Mills,J.E.J. and Dean,P.M. (1995) J. Comput.-Aided Mol. Des., 9, 479490.
Rarey,M., Wefing,S. and Lengauer,T. (1996) J. Comput.-Aided Mol. Des., 10, 4154.
Shoichet,B. and Kuntz,I.D. (1991) J. Mol. Biol., 221, 327346.[CrossRef][ISI][Medline]
Shoichet,B. and Kuntz,I.D. (1993) Protein Eng., 6, 723732.[Abstract]
Shoichet,B., Bodian,D.L. and Kuntz,I.D. (1992) J. Comput. Chem., 13, 380397.[ISI]
Sobolev,V., Wade,R.C., Vriend,G. and Edelman,M. (1996) Proteins, 25, 120129.[CrossRef][ISI][Medline]
Stahl,M. and Bohm, H.-J. (1998) J. Mol. Graphics Modelling, 16, 121132.[CrossRef][ISI][Medline]
Totrov,M. and Abagyan,R. (1994) Nature Struct Biol., 1, 259263.[ISI][Medline]
Tsai,J., Lin,S.L., Wolfson,H. and Nussinov,R. (1996) J. Mol. Biol., 260, 604620.[CrossRef][ISI][Medline]
Vakser,I.A. (1996) Protein Eng., 9, 3741.[Abstract]
Vakser,I.A., Matar,O.G. and Lam,C.F. (1999) Proc. Natl Acad. Sci. USA, 96, 84778482.
Received March 21, 2001; revised December 14, 2001; accepted January 4, 2002.