An analysis of conformational changes on protein–protein association: implications for predictive docking

Matthew J. Betts and Michael J.E. Sternberg1

Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, 44 Lincoln's Inn Fields, London, WC2A 3PX, UK


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Controls: differences between...
 Differences between complexed...
 Implications for comparative...
 Conclusions
 References
 
Conformational changes on complex formation have been measured for 39 pairs of structures of complexed proteins and unbound equivalents, averaged over interface and non-interface regions and for individual residues. We evaluate their significance by comparison with the differences seen in 12 pairs of independently solved structures of identical proteins, and find that just over half have some substantial overall movement. Movements involve main chains as well as side chains, and large changes in the interface are closely involved with complex formation, while those of exposed non-interface residues are caused by flexibility and disorder. Interface movements in enzymes are similar in extent to those of inhibitors. All eight of the complexes (six enzyme–inhibitor and two antibody–antigen) that have structures of both components in an unbound form available show some significant interface movement. However, predictive docking is successful even when some of the largest changes occur. We note however that the situation may be different in systems other than the enzyme–inhibitors which dominate this study. Thus the general model is induced fit but, because there is only limited conformational change in many systems, recognition can be treated as lock and key to a first approximation.

Keywords: protein conformational change/protein–protein docking


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Controls: differences between...
 Differences between complexed...
 Implications for comparative...
 Conclusions
 References
 
The formation of a specific complex between different proteins has a major role in many biological processes. X-Ray crystallographic and nuclear magnetic resonance studies have revealed the structures of more than 100 heteroprotein complexes, and general principles of the nature of the molecular recognition have been established (e.g. Janin and Chothia, 1990Go; Jones and Thornton, 1996Go). Here, rather than consider the static structural aspects of heteroprotein recognition, we present a general analysis of the conformational changes that accompanies complex formation. The implications of our results for predictive docking of protein complexes are then discussed.

This paper considers only heteroprotein complexes formed from folded protein that are stable in isolation. A recent analysis of this system (Jones and Thornton, 1996Go) considered 10 enzyme–inhibitor, six antibody–antigen and five other types of complexes. There tends to be a uniformity in the static features of the complex interface despite a variety of shapes. The interface, which rarely has cavities, buries between 980 ± 580 Å2 of accessible surface area with between 1.13 ± 0.47 hydrogen bonds per 100 Å2 buried accessible surface area. Janin and Chothia (1990) characterized the interface as formed from around 55% non-polar, 25% polar and 20% charged residues.

When the structures of a complex and of its components in isolation have been determined, the workers report the conformational change on association (e.g. Hecht et al., 1991Go, 1992Go; Bhat et al., 1994Go; Chantalat et al., 1995Go). On the limited data sets available at the time, Huber (1979), Janin and Wodak (1983) and Bennett and Huber (1984) described general features of conformational changes in protein. More recently, Stanfield and Wilson (1994) have reviewed conformational changes in antibody–antigen association, and in a series of papers by Lesk and Chothia (1988), Gerstein and Chothia (1991) and Gerstein et al. (1994), the nature of domain movements in proteins has been analysed. However, these studies are dominated by the conformational change induced by small molecules binding to proteins. Our aim is to quantify the extent of conformational changes in a single type of recognition—the formation of heteroprotein complexes.

The extent of conformational change on protein–protein association has implications for the development of algorithms to dock proteins starting from the coordinates of the unbound components. In general, the docking algorithms (for reviews see Janin, 1995Go; Shoichet and Kuntz, 1996Go; Sternberg et al., 1998Go) employ the rigid-body approximation and initially search for favourable associations of the unbound components. The conformational change on association is treated as a subsequent refinement step. The results of our analysis will provide a framework to guide the application and the development of these algorithms.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Controls: differences between...
 Differences between complexed...
 Implications for comparative...
 Conclusions
 References
 
All of the structures used were solved by X-ray crystallography, have been refined and have a resolution of 2.8 Å or better. When more than one structure that satisfied these criteria was available for a particular protein or complex, the one with the best resolution was chosen. If more than one structure had this resolution, the most recently solved was used.

Residues identified in the relevant paper or PDB file as having poor electron density were excluded from calculations of conformational change, as were those residues containing one or more atoms with B-factor greater than or equal to 50 Å2. The conformation of these residues is expected to differ more than that of others because of uncertainty in their position, or high mobility.

Residues were defined as exposed if their total relative side-chain surface area, or total relative main-chain surface area in the case of glycine, was greater than 15%. All others were defined as buried. Surface area was calculated by the algorithm of Lee and Richards (1971), implemented by Suhail Islam (personal communication), with a probe radius of 1.4 Å. `Relative areas' are relative to that of the particular residue in its extended conformation (Miller et al., 1987Go).

Independently solved structures of identical proteins

To obtain a value for the amount of structural change that can be expected from experimental differences in the determination of crystal structures, pairs of independently solved crystal structures of identical proteins were investigated. A similar analysis has been performed by another group (Flores et al., 1993Go).

From the April 1996 release of the Structural Classification of Proteins (SCOP) database (Murzin et al., 1995Go), we searched for sets of non-complexed structures with 100% identical sequence, no non-water heteroatoms and no insertions or deletions. When more than one set was available for the same SCOP classification, sets of native structures were chosen in preference to sets of mutants. If any of these sets contained more than two structures, then the two structures with the best resolution were used. If there were still more than two structures in any set, the two most recently solved structures were chosen.

Twelve pairs were found (Table IGo). Members of each pair were solved in the same space group, except turkey lysozyme (PDB codes 135l and 2lz2). Refinement procedures were not the same for members of each pair, meaning that any different systematic errors caused by the different procedures will show up in this analysis. These differences are justified in the context of the comparisons made with pairs of complexed and unbound structures, where the space groups and refinement methods often differ.


View this table:
[in this window]
[in a new window]
 
Table I. Pairs of independently solved structures of identical proteins
 
The members of each pair were superposed by least-squares fitting of all pairs of C{alpha} atoms (see below for algorithm details).

Complexed and unbound structures

In the April 1996 release of the PDB, 92 different protein–protein complexes that satisfied the initial criteria were found. For each component of these complexes, classifications from the April 1996 release of SCOP were used to identify structures of unbound forms with identical classifications. This gave a set of 31 complexes with one or both of their components available in an unbound form (Table IIGo).


View this table:
[in this window]
[in a new window]
 
Table II. Structures of proteins in complexed and unbound forms
 
Eighteen of the complexes are enzyme–inhibitors, seven are antibody–antigens, and the remaining six are of other types. One of these six is a methylamine dehydrogenase heterotetramer, H2L2, bound to two molecules of amicyanin. However, each amicyanin molecule is in contact with the H and L subunits of only one HL dimer (Chen et al., 1992Go), and so it is justified for us to look at only the interactions between one of these dimers and one amicyanin. Eight of the complexes (six enzyme–inhibitors and two antibody–antigen) have the unbound structure of both of the components available. For the other 23 this was available for only one component.

Interface residues for each component of a complex were defined as those that have at least one atom 4 Å or nearer to the other component. The unbound forms of the proteins were then superposed on the bound forms by least squares fitting of C{alpha} atoms of non-interface residues.

Identical proteins in different complexes

We wished to investigate whether different bound forms of the same protein are more similar to each other than to the unbound form. If so, then where available they could be used as the starting structure in a docking simulation.

The set of bound and unbound proteins (Table IIGo) was searched for cases where the same protein was present in different complexes, as well as in an unbound form, using SCOP classifications to identify identical proteins. Five different proteins were found to have this data available (Table IIIGo), not including lysozyme and neuraminidase. These were ignored because their partners in the complexes are antibodies. These do not necessarily bind in the same place, and consequently one would not expect changes in the interface to be common in all the complexes. Three of the five proteins are from the same family (eukaryotic proteases), and two of them are trypsins. This means that it is unreasonable to attempt to distinguish between movements of the five proteins, and also that any conclusions that are made from the five as a whole must be used cautiously, as they will be biased towards the eukaryotic protease family.


View this table:
[in this window]
[in a new window]
 
Table III. Structures of proteins available in several different complexes
 
Each structure of a particular protein was superposed on every other structure of that protein by least squares fitting of C{alpha} atoms.

Calculations of conformational change

Pairs of proteins were superposed on atoms mentioned above by the least squares fitting algorithm of McLachlan (1979), implemented by Suhail Islam (personal communication) and by the `roughfit' option of the Structural Alignment of Multiple Proteins (STAMP) program of Russell and Barton (1992). Calculations of conformational change, based on the resulting superpositions, were calculated using programs written specifically for the work presented in this paper.

Root mean square deviations were calculated over all atoms concerned. For side chains this is not the same as the average r.m.s. over all residues concerned, because different types of residues have different numbers of atoms in their side chains.

Torsion angles that change minima are identified by looking for changes in their class. Torsion angles were considered to be of a particular class if they were 60° or less from the position of minimum energy of that class (Janin et al., 1978Go). In this way a change of 10, for example, that does not involve crossing an energy maximum (i.e. a conformation that involves steric clash) is not counted, whereas one that does is. {chi}2 angles were only examined for change if the related {chi}1 angle did not change minima.

Certain residue types (Arg, Asp, Glu, Phe and Tyr) have portions of their side chains that are symmetrical, and others (Asn, Gln and His) can be considered to have symmetry due to difficulties in distinguishing some atom types in the electron density. For example, a rotation of 180° of the benzene ring of phenylalanine (about {chi}2) gives two identical conformations. Differences of this type between all pairs of structures were corrected for so that labelling differences in the PDB files do not show up as conformational changes in our calculations. A special case is leucine, which has no such symmetry but which has two different conformations, corresponding to a rotation of 180 about {chi}2, that are difficult to distinguish in electron density maps (Janin et al., 1978Go). We therefore do not calculate {chi}2 torsion angles for leucines.


    Controls: differences between independently solved structures of identical proteins
 Top
 Abstract
 Introduction
 Materials and methods
 Controls: differences between...
 Differences between complexed...
 Implications for comparative...
 Conclusions
 References
 
Comparisons of independently solved structures of identical proteins give an indication of the differences in structure that can be expected from differences in their experimental determination. Twelve such pairs of crystal structures (Table IGo) were found in the Brookhaven Protein Data Bank (PDB).

In the rest of the paper, any conformational changes that have magnitudes that are equal to or smaller than the differences found here cannot be distinguished from differences in the experimental determination of structures. The word `control' is used to refer to the appropriate value.

Overall measures

Several measures were used to analyse the overall conformational differences between the members of each pair: C{alpha} root mean square deviation (r.m.s.d.), side-chain r.m.s.d., and the percentage of {chi}1 and {chi}2 angles that occupy different minima. These were calculated separately for both exposed residues and all residues (Table IVGo). Unfortunately the data have heavy-tailed non-normal distributions, which make means and standard deviations inappropriate measures for comparisons with the other data sets examined in this paper. Therefore a cut-off was chosen for each measure such that 95% of all the control pairs have values below it. The effect of this is to remove one outlier (the largest value), as there are twelve pairs in total. These cut-offs are given in the last row of Table IVGo, and summarized below (see table legends for details of implementation).


View this table:
[in this window]
[in a new window]
 
Table IV . Overall differences between independently solved structures of identical proteins
 
The values for all residues are useful for comparisons with studies by other groups. The values for exposed residues are particularly relevant to this work because the differences in the conformation of interface residues between bound and unbound structures can be compared with them.

The 95% cut-off for r.m.s. deviation of C{alpha} atoms is 0.6 Å over exposed residues and 0.4 Å over all residues. The C{alpha} r.m.s. deviation over all residues from a similar analysis (Flores et al., 1993Go) is higher at 1.0 Å. This reflects both the differences in the two data sets, and the fact that we ignore residues with poor electron density or B-factors greater than 50 Å2, whereas they do not. The conformation of these residues is expected to differ more than that of others because of uncertainty in their position, or high mobility. The 95% cut-off for r.m.s. deviation of side-chain atoms is 1.7 Å over exposed residues and 1.6 Å over all residues.

Changes in side-chain torsion angles were also calculated for exposed residues and for all residues. For structure comparison, a particularly useful measure of torsion angle change is the percentage of side-chain torsion angles that occupy different minima (see Materials and methods). {chi}2 angles are only examined for change when their related {chi}1 angle does not change. The 95% cut-offs are 31% of {chi}1 angles and 23% of {chi}2 angles for exposed residues, and 24 and 21% for all residues.

For all {chi}1 angles, 87.1% occupy the same minima, and for all {chi}2 angles (where {chi}1 does not change) this value is 90.1%. These compare well with the equivalent values calculated by Flores et al. (1993) (81.7% for {chi}1 angles and 86.7% for {chi}2 angles), though our results suggest that torsion angles are more conserved. For exposed residues, 83.1% of {chi}1 angles and 87.9% of {chi}2 angles (where {chi}1 does not change) occupy the same minima.

The two structures of transforming growth factor ß (TGF-ß) have already been compared in detail by Daopin and Davies (1994), and our results confirm theirs. They also present four different methods for estimating the coordinate errors. Two of these use resolution and R-factor to estimate overall coordinate error (Luzzati, 1952Go; Srinivasan and Ramachandran, 1965Go), and the other two estimate local errors from temperature factors. One uses Cruickshank's equations (Cruickshank, 1949Go, 1954Go, 1967Go) and the other is an empirical method based on the observed relationship between the temperature factors and positional differences of a pair of trypsin structures (Chambers and Stroud, 1979Go). The values from these methods compare well with those from the direct comparisons of the two structures. However, as the authors point out, the estimations from the first three methods cannot give a value for systematic differences in the determination of structures; these can be found only by comparing independently solved structures. This caveat also applies to the work of Tickle et al. (1998), in which error estimates for two crystallin structures were calculated from full-matrix least-squares refinement. The estimation from the empirical method suffers by being based on only one pair of structures.

Movements of individual residues

For each of the 20 commonly occurring amino acids, we have calculated the C{alpha} displacements and side-chain r.m.s.d.'s of every exposed residue of that type. The results are again given as a `95% cut-off' (Table VGo), because the data have heavy-tailed non-normal distributions that make means and standard deviations inappropriate. These 95% cut-offs include most residues, but exclude large outliers caused by N- or C-terminal residues and those caused by residues adjacent to ones poorly defined in the electron density (the poorly defined ones themselves are excluded from the calculations—see Materials and methods).


View this table:
[in this window]
[in a new window]
 
Table V. Differences between independently solved structures of identical proteins, by residue type (exposed residues only)
 
As expected, C{alpha} displacements are largely unaffected by residue type, reflected in equal values of 0.5 Å for all types except glycine, where the value is 1.0 Å. The larger value for glycine is reasonable when considering that the backbone will be more flexible because of a lack of steric hindrance of a side chain. The values of side-chain r.m.s.d. are also sensible. They range from 0.5 Å for small residues such as alanine and inflexible residues such as phenylalanine, up to 4.5 Å for arginine, which has a long and potentially flexible side-chain.


    Differences between complexed and unbound structures
 Top
 Abstract
 Introduction
 Materials and methods
 Controls: differences between...
 Differences between complexed...
 Implications for comparative...
 Conclusions
 References
 
Overall measures

The C{alpha} and side-chain r.m.s.d.'s were analysed for all the pairs of complexed and unbound structures listed in Table IIGo. These calculations were performed separately for interface residues and for exposed non-interface residues. The results (Figure 1Go) show that, in many complexes, conformational change is no higher than differences from experimental error: for each of the measures, more than half of the pairs have values that are equal to or below the relevant control limit. Also, nearly half of all the pairs (19 out of 39) do not move more than the controls by any of these measures. Values higher than the control limits are mostly caused by large movements of a few residues, which are discussed later.



View larger version (47K):
[in this window]
[in a new window]
 
Fig. 1. R.m.s.d.'s between complexed proteins and unbound equivalents. The dotted lines show the values expected from experimental differences in the determination of the structures, calculated from exposed residues of independently solved structures of identical proteins (Table IVGo). Pairs of complexed and unbound proteins are identified by the PDB code of the complexed protein, followed by the chain identifier(s) of the relevant chain(s). The numbers above each bar are the number of residues that have a C{alpha} displacement or side-chain r.m.s.d. (as appropriate) that is above the control for that amino acid type (Table VGo). (a) C{alpha} r.m.s.d.'s for interface residues. (b) Side-chain r.m.s.d.'s for interface residues. (c) C{alpha} r.m.s.d.'s for exposed non-interface residues. (d) Side-chain r.m.s.d.'s for exposed non-interface residues.

 
It would be useful to know if side-chain movements are more substantial than those of main chains, as this would provide additional justification for the approach of docking procedures that simulate flexibility only in the side chains (Weng et al., 1996Go; Jackson et al., 1998Go). The controls show that this is the case (Figure 2Go). This is reasonable because more atoms contribute to side-chain r.m.s.d., and the side chains are less constrained by local interactions. Figure 2Go also shows that the relationship between side-chain r.m.s.d. and C{alpha} r.m.s.d. is similar in the interfaces of the complexed-unbound pairs.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 2. The relationship between side-chain and C{alpha} r.m.s.d. x, Exposed residues of control systems; o, interfaces of complexed-unbound pairs. The solid lines show the control values (Table IVGo). The dotted line is for y = x, displayed to clarify the differences between the measures.

 
Figure 3Go shows the percentage of side-chain torsion angles of interface residues and of exposed non-interface residues that change minima. The results confirm the general extent of movement shown by the r.m.s.d. calculations—approximately half of all the pairs of structures (20 out of 39) have values for all these measures that are equal to or below the relevant control limits. In addition, Figure 3a and bGo show that changes in the interface occur more frequently at {chi}2 than at {chi}1.



View larger version (54K):
[in this window]
[in a new window]
 
Fig. 3. Percentages of residues whose side-chain torsion angles change minima. The dotted lines show the values expected from experimental differences in the determination of the structures, calculated from exposed residues of independently solved structures of identical proteins (Table IVGo). Percentages of {chi}2's calculated using only those residues whose {chi}1's have not changed. Pairs of complexed and unbound proteins are identified by the PDB code of the complexed protein, followed by the chain identifier(s) of the relevant chain(s). (a) Percentages of {chi}1's of interface residues that change minima. (b) Percentages of {chi}2's of interface residues that change minima. (c) Percentages of {chi}1's of exposed non-interface residues that change minima. (d) Percentages of {chi}2's of exposed non-interface residues that change minima.

 
Large individual residue movements

The C{alpha} displacements and side-chain r.m.s.d.'s of individual residues were compared against the control values for the relevant amino acid type (Table VGo), and those that had values greater than the controls are described in the next two sections. Figure 1Go shows counts of these residues for each pair of complexed and unbound structures, alongside the overall C{alpha} or side-chain r.m.s.d. These numbers vary widely for those pairs with an overall measure above the appropriate control limit. This reflects either substantial changes of several residues, or the fact that large changes of individual residues can dominate the overall measures of small regions.

All large C{alpha} displacements (above 3 Å) and large side-chain r.m.s. deviations (above 5.6 Å) of exposed non-interface residues can be explained by one of the following causes (though note that these limits are greater than the control limits):

  1. the residue is adjacent to an interface residue that moves, and therefore is part of a loop movement in the interface. In such cases the whole loop has not been classified as interface, because not all the residues that make up the loop have at least one atom 4 Å or less from the other component of the complex.
  2. the residue is at the end of a chain, or only one to three residues away.
  3. the residue is at the end of a cleavage fragment, or only one to three residues away.
  4. the residue is adjacent to a region missing from or poorly defined in the electron density map.

Hence all large residue movements of exposed residues that are not in the interface can be explained by either their close proximity to the interface (a), or by structural disorder (b, c and d), which is also the cause of movements greater than the controls in the systems used to define them. They are not due to hinge-bending or shear movements between domains as sometimes seen when small molecules bind (Gerstein et al., 1994Go). However, smaller movements than these that are nevertheless greater than the controls occur to a large extent. They may be due to crystal packing differences, which are less in the controls because all but one have identical space groups and most also have very similar unit cell dimensions. An exception to these generalities is human growth hormone complexed with its receptor. This is a four helix bundle with two long crossover connections and a short loop that move substantially (as already noted by Chantalat et al., 1995Go).

Changes in the interfaces occur for a variety of reasons: to form specific interactions required for the action of the protein; to avoid steric clash; or to improve shape complementarity and allow hydrogen bonding (Janin and Chothia, 1990Go). The largest changes of interface residues are discussed in more detail later.

Do interface regions move more than exposed non-interface regions?

To answer this question, it is only meaningful to look at those systems where measurements of movements of the interface and/or the exposed non-interface regions are greater than movement of exposed residues in the controls.

The results suggest that interfaces typically have greater conformational change than other exposed parts of the structures. This is probably due to the fact that changes in the interface occur for specific reasons, rather than simply as a result of flexibility or disorder (see above). The effect is more noticeable in side-chain movement. Three of the measures, side-chain r.m.s.d. (Figure 4bGo) and percentages of {chi}1 (Figure 4cGo) and {chi}2 (Figure 4dGo) angles that change minima, all indicate more movement in interface regions than in exposed non-interface regions. This is shown most strongly by the percentages of {chi}2's that change minima—all but one of the pairs have greater values for their interface regions than they do for their exposed non-interface regions. Surprisingly, Figure 4aGo shows that more pairs have greater movement of the main chain (measured by C{alpha} r.m.s.d.) for exposed non-interface regions than they do for interface regions. However, the numbers are the same if two pairs are ignored: human growth hormone complexed with its receptor (discussed above), and amicyanin complexed with methylamine dehydrogenase. In this protein the first 15 N-terminal residues form an irregular outer ß-strand connected to a loop of six residues that are poorly defined in the electron density (Durley et al., 1993Go). The loop itself is excluded from our calculations because of its poor definition, but it confers flexibility on the included N-terminal ß-strand.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 4. Comparisons of the conformational changes of interface regions with those of exposed non-interface regions. The solid lines show the control values (Table IVGo). Differences are only noteworthy outside the bottom left section marked out by the solid lines. The dotted line is for y = x, displayed to clarify the differences between the regions. (a) C{alpha} r.m.s.d. (b) Side-chain r.m.s.d. (c) Percentages of {chi}1's that change minima. (d) Percentages of {chi}2's that change minima.

 
Changes in the interfaces of particular complexes

Changes in interfaces occur for a variety of reasons: to form specific interactions required for the action of the protein, to avoid steric clash, or to improve shape complementarity and allow hydrogen bonding (Janin and Chothia, 1990Go). The changes of interface residues discussed below include all those that are equal to or larger than those of the exposed non-interface residues that could be explained by structural disorder or proximity to the interface (see above).

Changes that allow the formation of specifically required interactions are the largest and most extensive seen in the structures examined. When chymotrypsinogen binds to human pancreatic secretory trypsin inhibitor (PDB code 1cgi), the specificity pocket and oxyanion hole necessary for inhibitor binding are formed by large movements of loops Ser189–Ser195 and Val213–Cys220 towards the inhibitor (Figure 5aGo). This change is the same as occurs when the zymogen is activated by hydrolysis. Smaller C{alpha} shifts of inhibitor loop Tyr10–Arg21, along with side-chain movements towards the enzyme of some of these residues, alter the pattern of hydrogen bonding and allow binding to chymotrypsinogen. The changes are largely the same as those noted by Hecht et al. (1991, 1992).



View larger version (99K):
[in this window]
[in a new window]
 
Fig. 5. Large changes in the interfaces of complexes, shown by comparisons of the complexed and unbound structures of the components. (a) Chymo- trypsinogen (yellow, complexed; mauve, unbound)–human PTI (green, complexed; mauve, unbound). (b) Antibody D1.3 (yellow, complexed; mauve, unbound)–lysozyme (green, complexed; mauve, unbound). (c) Amicyanin (yellow, complexed; mauve, unbound)–methylamine dehydrogenase (molecular surface coloured by potential, complexed). (d) Chymotrypsin (yellow, complexed; mauve, unbound)–ovomucoid (cyan coloured molecular surface, complexed). (e) Human growth hormone (yellow, complexed; mauve, unbound)–human growth hormone receptor (cyan coloured molecular surface, complexed). (f) Bovine pancreatic trypsin inhibitor in three different complexes (mauve, unbound; yellow, in complex with rat trypsin; orange, in complex with kallikrein; green, in complex with bovine ß-trypsin; cyan molecular surface, kallikrein).

 
Specifically required interactions in the interface between human growth hormone and its receptor (PDB code 3hhr) are also formed by large changes. This complex involves one hormone molecule binding to a dimer of receptors, and it is thought that this dimerization is caused by hormone binding and that it is the mechanism of signal transduction (Chantalat et al., 1995Go). Large changes are required for different parts of the hormone to bind to structurally identical parts of each receptor molecule. The biggest occur mainly in the long crossover loop between helices one and two and the short loop between helices two and three (Figure 5eGo). Tyr103 on the short loop is involved in receptor binding (Chantalat et al., 1995Go), and moves by a side-chain r.m.s.d. of 8.5 Å towards the interface. This change is accommodated by large associated movements of Gly104–Asn109 away from the interface (C{alpha} displacements up to 11.5 Å, and side-chain r.m.s.d.'s up to 14.7 Å). Other smaller but still extensive changes (C{alpha} displacements up to 5.4 Å and side-chain r.m.s.d.'s up to 7.7 Å) occur in the long crossover loop. They improve surface complementarity by moving away from the interface and forming minihelices, rather than hydrogen bonding to helix four in a position that would clash with the receptor.

Interactions that appear to be less necessary for function, because they simply alleviate minor steric clash or improve hydrogen bonding and van der Waals contacts, are noticeably less extensive. However, they can still involve large changes of a few residues. Figure 5bGo shows changes of this nature that occur when the interface between hen egg white lysozyme and the variable domain of antibody D1.3 (PDB code 1vfb) is formed. Gly102 of lysozyme moves with a C{alpha} displacement of 7.5 Å, which brings it to within 2.1 Å of Arg99 on the heavy chain of the antibody. Movement of Arg99 was noted in a comparison of complexed and unbound antibody (Bhat et al., 1994Go), along with a decrease in its mobility as shown by a decrease in temperature factor. The two residues either side of lysozyme Gly102 (Asp101 and Asn103) are not classified as interface but also move significantly—they are part of a loop movement. Another large but isolated discrete change occurs with Arg125 of lysozyme (side-chain r.m.s.d. = 6.3 Å), with the possible creation of a hydrogen bond to Ser93 on the light chain of the antibody. In other complexes, discrete changes not directly related to function occur to improve electrostatic complementarity; for example, the movement of Lys73 of amicyanin on binding to methylamine dehydrogenase (PDB code 1mda; Figure 5cGo), or to positions that would be highly exposed to solvent if adopted in the unbound structure; for example, Phe39 of {alpha}-chymotrypsin (PDB code 1cho, Figure 5dGo).

Differences between different types of component

In Table IIGo there are eight complexes (six enzyme–inhibitors and two antibody–antigens) which have both of their components solved in an unbound form. These data enable a comparison of the extent of conformational change in the different components (enzymes against inhibitors, and antibodies against antigens). The number of interface residues that have a side-chain r.m.s.d. larger than the relevant control is similar for the different components. The same is true for C{alpha} displacement (except for subtilisin complexed with chymotrypsin inhibitor, and Fab D44.1 bound to lysozyme). This suggests that in many cases the extent of conformational change is the same in the different components. However, side-chain r.m.s.d.'s calculated over all interface residues give a different but incorrect result, suggesting that the interfaces of inhibitors and antigens are more mobile than those of their enzyme and antibody partners (Figure 6Go). This is incorrect because inhibitors and antigens have smaller interfaces than their partners in the complexes, with between 30 and 84% of the number of residues, and therefore a few large side-chain movements have more of an effect on the overall r.m.s.d.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 6. Comparison of the side-chain r.m.s.d.'s in the interfaces of inhibitors or antigens with those in their protein partners (enzymes or antibodies). Points are identified by the PDB of the complex (Table IIGo). The solid lines show the control values (Table IVGo). The dotted line is for y = x, displayed to clarify the differences between the components.

 
Differences between different types of complex

A comparison of the amount of conformational change in equivalent components of different types of complexes would also be useful. Enzymes are comparable with antibodies and inhibitors are comparable with antigens, in terms of their relative sizes in the complexes, as mentioned above, and in terms of conformational change, because the two types of complexes behave like each other (Janin and Chothia, 1990Go). A comparison of the inhibitors and antigens in our data set (Table IIGo) is justified as there are six and seven of each, respectively, that have structures of both the complexed and unbound forms. The numbers of these that have values above the controls suggest that side-chain movement is more common in the interfaces of inhibitors than in that of antigens, as measured by both side-chain r.m.s.d. (Figure 1bGo) and the percentage of {chi}2's that change minima (Figure 3bGo). Once again, the differences are caused by large changes of a few residues. However, this does not invalidate the results because of the similar number of residues in the interfaces. The numbers for interface C{alpha} r.m.s.d. (Figure 1aGo) are equal or very similar for both types of component. This is also true for the percentages of {chi}1's of interface residues that change minima (Figure 3aGo). There are not enough antibodies with both components solved in an unbound form to justify a comparison of them with the enzymes. The other complexes, that are not enzyme–inhibitor or antibody–antigen, show mixed results and should be considered individually. ß-Actin, in complex with profilin, has a similar number of residues in its interface when compared with inhibitors and antigens (though at the high end of the range), and a significant percentage of the {chi}2's of these residues change minima (Figure 3bGo). None of the other measures of interface movement are above the controls. Amicyanin complexed with methylamine dehydrogenase and human growth hormone complexed with its receptor both have large changes in their interfaces for all four measures—C{alpha} r.m.s.d. (Figure 1aGo), side-chain r.m.s.d. (Figure 1bGo), and the percentages of {chi}1's and of {chi}2's that change minima (Figure 3a and bGo). Amicyanin has a small number of interface residues, so large changes of a few residues have a greater effect on these measures. Human growth hormone has double the number of interface residues that enzymes and inhibitors have (the receptor is a dimer, and the hormone effectively has two interfaces, one with each monomer). Therefore the large values seen for these measures are definitely significant, but there are also large changes of the whole molecule (discussed previously). The number of interface residues in the interface of subtilisin complexed with subtilisin prosegment is similar to the number in the growth hormone complex, but in this case only the percentage of {chi}2's that change minima is above the control (Figure 3bGo). The deoxyribonuclease I–actin and glycerol kinase–glucose specific factor III (GSF III) complexes have little significant movement of their interfaces, except for the percentage of {chi}1's of the interface of GSF III that change minima (Figure 3aGo).

Differences in the structures of identical proteins in different complexes

Table IIIGo gives information on five proteins that are present in more than one complex in the main data set (Table IIGo). Lysozyme and neuraminidase are not considered because their partners in the complexes are antibodies. These do not necessarily bind in the same place, and consequently one would not expect changes in the interface to be common in all the complexes. The only difference between comparing unbound structures with complexed and complexed with complexed is that the interface may be affected. Therefore it is appropriate to concentrate just on those residues that are common to the interface of all the complexes of a particular protein. We have examined the C{alpha} displacements and side-chain r.m.s.d.'s of these residues.

From work described in previous sections, only one of the proteins, bovine pancreatic trypsin inhibitor (PTI), has interface side-chain r.m.s.d.'s between all structures of that protein in a complex and the unbound form that are larger than the control. These structures have only one common interface residue that changes its conformation by more than the control limits. This residue, Arg17, has a much more similar conformation in the complexes than it does in the unbound structure (Figure 5fGo). The change avoids steric hindrance that would occur with the unbound conformation. It is only in this protein that the interfaces of the complexes appear more similar to each other than to the same region in the unbound structure. Arginine 17 in the unbound structure appears to have been built in the most common structure, perhaps suggesting that it is mobile and was poorly defined in the electron density map. However, it has a slightly lower temperature factor than in the complexed structures, suggesting that it was not more mobile than in those structures.

In the subtilisin complexes there are several residues common to the interface that have differences greater than the controls. His64 in the unbound structure and in the protein bound to subtilisin prosegment has a large side-chain r.m.s.d. when compared with the other situations. However, in the unbound structure this residue has two possible positions. The one used in this analysis has an occupancy of 0.8. However, this corresponds to a structure with phenylmethylsulfonate (PMS) bound with an occupancy of 0.7. The 0.2 occupancy structure of His64, with no bound PMS, is much closer to the structures of the complexes with inhibitors, but not to that with prosegment. His64 in the complex with prosegment differs from the others because the bulk of the prosegment binds away from the active site, with only eight residues of the C-terminus extending into the active site. In the other complexes, steric hindrance by the inhibitor, which is different to that caused by PMS, favours the 0.2 occupancy conformation of His64. There are also small differences in the conformations of Ser101 and Tyr104, but the conformations in the complexes are not significantly more similar to each other than they are to the unbound conformation. All the other common interface residues have conformations that are the same to the level of the controls.

In all comparisons between the three examples of bovine chymotrypsin (one unbound and two complexed), Phe39 differs by a large side-chain r.m.s.d. (around 5 Å). The difference between the two complexed structures is slightly smaller than in comparisons with the unbound, reflecting that the conformational change occurs only after Cß (i.e. involves a {chi}1 rotation), rather than from C{alpha} onwards. Tyr146 differs slightly in all comparisons, but is at the end of a chain break. Ser218 differs most in comparisons with one of the complexes, and has a similar structure in the other complex and the unbound protein. All other common interface residues have conformations that are the same to the level of the controls.

In the bovine trypsin complexes, the conformations of only one of the common interface residues (Tyr39) differ by more than the controls, and in this case the conformations of the complexes are not more similar to each other than they are to that of the unbound. The same residue of rat trypsin differs between the unbound form and the two bound forms, but does not differ between the two bound forms. However, the differences are small.

The limitations of the data set make it difficult to draw firm conclusions. However, it appears that when the changes in the interface are small, the structures of the interfaces in the complexes are no more similar to each other than they are to the unbound structure. Larger changes are more likely to be common to all complexes, possibly indicating that they are more significant to binding.


    Implications for comparative modelling and predictive docking
 Top
 Abstract
 Introduction
 Materials and methods
 Controls: differences between...
 Differences between complexed...
 Implications for comparative...
 Conclusions
 References
 
The control values also have implications for all attempts at precise modelling of structures, such as comparative modelling and predictive docking, as it is unreasonable to expect the models to be accurate to a higher degree than crystal structures.

Martin et al. (1997) assessed the results of the comparative modelling section of the 1996 Critical Assessment of Structure Prediction (CASP2). Their control values were mainly C{alpha} and all-atom r.m.s.d.'s, derived from a subset of three of the CASP2 targets which each had a pair of structures that were solved independently. This data set gave a value for C{alpha} r.m.s.d. that was similar to ours (0.6 Å compared with 0.4 Å). They found that targets with high sequence identity (~85%) to a protein of known structure were modelled to within these limits.

To analyse how the changes found affect the ability of docking algorithms to predict correctly the structure of a protein–protein complex from the unbound structures of its components, we have looked at the results of the program FTDOCK (Gabb et al., 1997Go). This algorithm was developed and tested on a data set containing five of the complexes analysed by us (Table IIGo), using exactly the same structural data for the bound and unbound forms. The algorithm performs a global rigid-body search of rotational and translational space, and scores each potential structure on shape and electrostatic complementarity. The best 4000 from this search are filtered using distance constraints from biochemical data, and then undergo local refinement scored by shape complementarity.

The algorithm performed best on the {alpha}-chymotrypsinogen–PTI complex, with a correct structure (i.e. one with an interface C{alpha} r.m.s.d. of 2.5 Å or less when compared with the crystal structure of the complex) ranked first out of 133 predictions that remained after local refinement. This is somewhat surprising in the light of our analysis, as the interface regions of the two components show some of the largest C{alpha} and side-chain r.m.s.d.'s observed (Figure 1Go), and percentages of side-chain angles that change minima that are mostly above the control levels (Figure 3Go). These large values are caused by sizeable movements of several individual interface residues, as discussed previously. It is interesting to note, however, that none of these residues would have caused bad steric clash had they stayed in their unbound conformation.

Three of the other four complexes (kallikrein–PTI, subtilisin–chymotrypsin inhibitor and Fab D44.1–lysozyme) were all predicted with varying degrees of success. All have some large movements of interface residues which avoid potential steric clash. The final complex, subtilisin–subtilisin inhibitor, had no correct solution in the top 4000 predictions. This is puzzling at first glance. Although both components have some interface residues that show movement above the control, and would cause steric clash if the movements did not occur, these movements are no more severe than those seen in the previous three complexes. However, the unbound structure of subtilisin inhibitor has a region (Ala62–Met70) where only the approximate path of the main-chain could be traced, with associated uncertainties in the placement of the side-chains (see PDB file for code 2ssi). These residues were therefore excluded from our analysis, but unfortunately some of them are interface residues and would cause substantial steric clash if they remained in their unbound conformations.

We see here that conformational change which does not occur to avoid steric clash is coped with quite well, even when it is to the level seen in the {alpha}-chymotrypsinogen–PTI complex. There is sufficient shape complementarity to identify the correct complex, despite the large conformational change.


    Conclusions
 Top
 Abstract
 Introduction
 Materials and methods
 Controls: differences between...
 Differences between complexed...
 Implications for comparative...
 Conclusions
 References
 
Conformational changes on complex formation have been evaluated by overall measures of r.m.s.d.'s of C{alpha} atoms and of side-chain atoms, and by the percentages of side-chain torsion angles that change minima. In addition, measures of C{alpha} shift and side-chain r.m.s.d.'s for individual residues were employed. The main conclusions from this study are:

  1. A comparison of structural differences between independently solved structures of identical protein provides bench-marks to evaluate conformational change. These benchmarks are an r.m.s.d. of 0.6 and 1.7 Å for C{alpha} atoms and for side-chain atoms of exposed residues. Only conformational changes greater than these values were taken as substantial. Shifts for individual residue types were also established. Residues which become part of the interface go from being exposed in the unbound structure to packed, and therefore less mobile, in the complex. Thus using the changes of exposed residues of independently solved structures of identical proteins, which are exposed in both structures, as bench-marks to evaluate the conformational changes of interface residues will overestimate the level above which change should be considered to be substantial. For this reason, protein–protein docking algorithms which are unable to allow for changes up to the level of the bench-marks could well be able to correctly predict the structure of a complex. Movement may also be substantial in more cases than we have suggested. Our analysis is therefore conservative.
  2. Just over half of the proteins have a substantial shift on complex formation as judged by any of the overall measures. Many of these changes are only just above the benchmark. Thus many heteroprotein complexes are formed without substantial conformational change.
  3. Main-chain as well as side-chain atoms can have significant shifts on complex formation.
  4. The largest conformational changes in exposed non-interface residues are the consequence of flexibility and disorder rather than a change in conformation caused by, for example, shear or hinge bending between domains on association as occurs on binding small ligands (Gerstein et al., 1994Go). In contrast, conformational changes in the interface are intimately involved in the complex formation.
  5. When account is taken of the different sizes of enzymes and inhibitors, then the extent of conformational change is similar for these two types of components.
    There are coordinates for bound and unbound forms of both components for eight complexes (six enzyme–inhibitor and two antibody–antigen). All show conformational change in at least one component by at least one of the global measures. In three of the eight complexes (1brb, 2kai, 2ptc), there is only significant global change for the side-chains and no C{alpha} atom moves more than 1.0 Å. In the others there are both main-chain and side-chain shifts.

This analysis confirms the induced-fit model for protein–protein recognition. Often the largest movements are not from the functionally important residues, such as those forming the active sites, but interface regions that are peripheral to these residues. The conformational change can alleviate steric clashes, improve van der Waals packing, or lead to the formation of hydrogen bonds or salt bridges. However, in several of the systems examined the extent of conformational change is not as substantial as those whose complexes were successfully predicted by FTDOCK (Gabb et al., 1997Go). For these systems, recognition in shape and charge can, as a first approximation, be treated as a lock and key.

There still is a limited number of systems for which there is information about conformational change. As more structures of complexes and their unbound components are solved, the conclusions from this analysis may need to be revised. In particular the extent of conformational change may vary between the different biological systems. The enzyme–inhibitor complexes that dominate this study may generally exhibit less conformational changes than complex formation involved in other process, such as signalling. The high binding affinity seen in enzyme–inhibitor and antibody–antigen association may rule out large conformational changes, whereas conformational changes of other proteins may be fundamental to their mechanisms. For those systems with limited conformational change, predictive docking should prove a valuable method to obtain structural models from unbound components and thereby provide insights into biological recognition. Applications such as computational ligand design (Caflisch and Karplus, 1995Go) are less able to tolerate conformational changes as large as those presented here. However, the lower accuracy of structures of complexes generated by predictive docking can still provide information about the functional region of a protein, and could therefore suggest the types of molecules which should be screened for inhibition.


    Acknowledgments
 
Thanks to Suhail Islam for helpful discussions of conformation, the use of several programs and production of protein pictures. Thanks also to Richard Jackson for useful criticism and suggestions, and to Rob Russell for the use of code, several programs and a parsed SCOP database.


    Notes
 
1 To whom correspondence should be addressed Back


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Controls: differences between...
 Differences between complexed...
 Implications for comparative...
 Conclusions
 References
 
Bennett,W.S. and Huber,R. (1984) CRC Crit. Rev. Biochem., 15, 291–384.[ISI][Medline]

Bhat,T.N., Bentley,G.A., Boulot,G., Green,M.I., Tello,D., Dall'Acqua,W., Souchon,H., Schwarz,F.P., Mariuzza,R.A. and Poljak,R.J. (1994) Proc. Natl Acad. Sci. USA, 91, 1089–1093.[Abstract]

Caflisch,A. and Karplus,M. (1995) Perspectives Drug Discovery Des., 3, 51–84

Chambers,J.L. and Stroud,R.M. (1979) Acta Crystallogr., B35, 1861–1874.[ISI]

Chantalat,L., Jones,N.D., Korber,F., Navaza,J. and Pavlovsky,A.G. (1995) Protein Peptide Lett., 2, 333–340.[ISI]

Chen,L., Durley,R., Poliks,B.J., Hamada,K., Chen,Z., Mathews,F.S., Davidson,V.L., Satow,Y., Huizinga,E., Vellieux,F.M.D. and Hol,W.G.J. (1992) Biochemistry, 31, 4959–4964.[ISI][Medline]

Cruickshank,D.W.J. (1949) Acta Crystallogr., 2, 65–82.[ISI]

Cruickshank,D.W.J. (1954) Acta Crystallogr., 7, 519.[ISI]

Cruickshank,D.W.J. (1967) In Kasper,J. and Lonsdale,K. (eds), International Tables for X-ray Crystallography, Vol. 2. Kynoch Press, Birmingham (present distributor: Kluwer Academic Publishers, Dordrecht), pp. 319–340.

Daopin,S. and Davies,D.R. (1994) Acta Crystallogr., D50, 85–92.

Durley,R., Chen,L., Lim,L.W., Mathews,F.S. and Davidson,V.L. (1993) Protein Sci., 2, 739–752.[Abstract/Free Full Text]

Flores,T.P., Orengo,C.A., Moss,D.S. and Thornton,J.M. (1993) Protein Sci., 2, 1811–1826.[Abstract/Free Full Text]

Gabb,H.A., Jackson,R.M. and Sternberg,M.J.E. (1997) J. Mol. Biol., 272, 106–120.[ISI][Medline]

Gerstein,M. and Chothia,C. (1991) J. Mol. Biol., 220, 133–149.[ISI][Medline]

Gerstein,M., Lesk,A.M. and Chothia,C. (1994) Biochemistry, 33, 6739–6749.[ISI][Medline]

Hecht,H.J., Szardenings,M., Collins,J. and Schomburg,D. (1991) J. Mol. Biol., 220, 711–722.[ISI][Medline]

Hecht,H.J., Szardenings,M., Collins,J. and Schomburg,D. (1992) J. Mol. Biol., 225, 1095–1103.[ISI][Medline]

Huber,R. (1979) Trends Biochem. Sci., 4, 271–276.[ISI]

Jackson,R.M., Gabb,H.A. and Sternberg,M.J.E. (1998) J. Mol. Biol., 276, 265–285.[ISI][Medline]

Janin,J. (1995) Prog. Biophys. Mol. Biol., 64, 145–166.[ISI][Medline]

Janin,J. and Chothia,C. (1990) J. Biol. Chem., 265, 16027–16030.[Free Full Text]

Janin,J., Wodak,S., Levitt,M. and Maigret,B. (1978) J. Mol. Biol., 125, 357–386.[ISI][Medline]

Janin,J. and Wodak,S.J. (1983) Biophys. Mol. Biol., 42, 21–78.

Jones,S. and Thornton,J.M. (1996) Proc. Natl Acad. Sci. USA, 93, 13–20.[Abstract/Free Full Text]

Lee,B. and Richards,F.M. (1971) J. Mol. Biol., 55, 379–400.[ISI][Medline]

Lesk,A.M. and Chothia,C. (1988) Nature, 335, 188–190.[ISI][Medline]

Luzzati,V. (1952) Acta Crystallogr., 5, 802–810.[ISI]

Martin,A.C.R., MacArthur,M.W. and Thornton,J.M. (1997) Proteins, S1, 14–28.

McLachlan,A.D. (1979) J. Mol. Biol., 128, 49–79.[ISI][Medline]

Miller,S., Lesk,A.M., Janin,J. and Chothia,C. (1987) Nature, 328, 834–836.[ISI][Medline]

Murzin,A., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536–540.[ISI][Medline]

Russell,R.B. and Barton,G.J. (1992) Proteins, 14, 309–323.[ISI][Medline]

Shoichet,B.K. and Kuntz,I.D. (1996) Chem. Biol., 3, 151–156.[ISI][Medline]

Srinivasan,R. and Ramachandran,G.N. (1965) Acta Crystallogr., 19, 1008–1014.[ISI]

Stanfield,R.L. and Wilson,I.A. (1994) Trends Biotech., 12, 275–279.[ISI][Medline]

Sternberg,M.J.E., Gabb,H.A. and Jackson,R.M. (1998) Curr. Opin. Struct. Biol., 8, 250–256.[ISI][Medline]

Tickle,I.J., Laskowski,R.A. and Moss,D.S. (1998) Acta Crystallogr., D54, 243–252.[ISI]

Weng,Z., Vajda,S. and Delisi,C. (1996) Protein Sci., 5, 614–626.[Abstract/Free Full Text]

Received June 26, 1998; revised January 7, 1999; accepted January 21, 1999.