Department of Pure and Applied Chemistry, University of Strathclyde, 295 Cathedral Street, Glasgow G1 1XL, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: conformation/domain/mechanism/prediction/protein
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Intramolecular interfaces in protein structures vary considerably in extent and complexity. A surface-accessible site may have an interface between two, three or more sections of chain beneath it. Each interface might be expected to differ in its sensitivity to perturbing influences. The more complex the interface that underlies a site, the more of the molecule that is required to maintain the disposition of the site, and the more likely it is that reorientation of the recognition moieties could be induced by structural perturbations remote from the site.
Given this background, a systematic search procedure for the detection of vulnerable sub-structural interfaces in protein structures, which also includes a means of ranking such interfaces according to their likely contribution to molecular stability, becomes a desirable tool for the protein scientist. In particular, such a methodology could assist prediction of those sites on a protein surface where a recognition event could have the most potential for causing a significant change in conformation, and also where in the molecular structure natural or artificial sequence modification is most likely to have a conformational impact and where it may not.
The starting point we chose in developing our methodology was to consider the topological arrangement of the folded polypeptide chain. In natural proteins, these topologies are evidently critical to the constitution and execution of their chemical properties, as tertiary structure is more conserved than primary sequence generally. As is now clear, not only does nature use, and reuse, only a small selection of all possible fold topologies (Richardson, 1981; Zhang and Delisi, 1998
), but also in some cases topological similarity exists beyond any obvious amino acid sequence similarity. In the latter instance, either evolutionary preservation of the topology has been a stronger selective pressure compared to sequence preservation, or similar topologies have evolved convergently. As demonstrated by both natural and artificial sequence evolution, the natural topologies also have a great capacity to absorb the effects of multiple side chain mutations without catastrophic effect on fitness (describable as soft failure) (Pritchard and Dufton, 2000
). It is this very high level of topological preservation despite primary sequence variation that allows for accurate prediction of similar structures amongst homologous protein sequences.
Most importantly from our point of view, however, the topology provides the basic definition of the number, character, and contribution to stability of the sub-structural interfaces (or intramolecular fault lines) that can underlie the recognition surfaces. Although some proteins, such as calmodulin and the serpins, undergo gross conformational changes associated with function when the local secondary structural environment changes, or the polypeptide chain is cleaved, the fold and sequence of a protein defines much of the potential for surface adjustment via mutation or ligand binding. We investigated how we could search systematically a fold topology in order to highlight its interfaces and grade them according to their likely contribution to molecular stability. Having identified the regions that are theoretically vulnerable, would there be corroborating experimental evidence that the interfaces were associated with sites of intermolecular recognition? If there were, could a prediction be made that the interactive site is modifiable by, or able to generate, some kind of intramolecular conformational change? In this paper, we analyse four well-known molecular folds to establish the validity of the exercise and its predictive potential. The selection covers examples between 60 and 307 amino acids in length, both with and without catalytic activity.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A simple visual inspection of a protein 3D structure (viewed as an -carbon skeleton or ribbon diagram, for example) can suggest which regions of the global fold are likely to be autonomously folding, structurally cohesive and stable entities despite perturbing influences. These entities are deserving of the description domain (Siddiqui et al., 2001). Visual inspection can also suggest which parts of the fold arrangement would experience large-scale changes if the residues stabilizing them were disturbed (e.g. residues that provide interfacial adhesion between domains, or subunits of quaternary structure). However, this procedure is invariably subjective and is difficult to explore and report in a rigorous manner without recourse to computer analysis and quantitative assessment. The approach we have taken is to consider the 3D locality of every residue position in a protein fold in turn and to score, then rank, that locality according to whether it includes residues from nearby (proximal) or from far away (distal) positions in the fully extended linear polypeptide chain. In this way, an assessment can be made as to whether each position exerts its influence within an element of locally determined structure, at an interface between such elements, or at some intermediate point. By merging and expanding overlapping residue localities on the basis of these rankings, it is possible to detect the extent and topology of potentially significant sub-structural interfaces.
Step 1: Definition of a volume around each residue
The spatial volume considered around each residue should be a suitable (in the context of the information one expects to gain) approximation of the range and direction of the physico-chemical influence exerted and experienced by that residue. It is nominally possible to construct a volume of any arbitrary geometry and size about any component of the considered residue. For our primary investigation, however, we considered only the simple, computationally inexpensive, approach of constructing a sphere about the -carbon of each residue. This spherical volume, as well as being easy to assess and automate, is relatively insensitive to structural changes between homologues other than a gross change in the fold. Thus, while it is a suitable basis for assessing properties of the overall fold that do not invoke, or are relatively unaffected by, details of the individual side chain character and 3D orientation of each residue, it is unsuitable for investigating the interactions between individual side chains. Such analyses would be better served by, for example, including considerations of side chain orientation in the considered volume.
The choice of sphere radius is important as amino acid side chains are of varying effective length and direction, and thus influence, depending on their orientation and environment. A constant sphere radius generalizes such factors and concentrates on relations due to the fold topology. For this analysis, the sphere radius was set at a constant value of 7 Å. This radius has previously been found to be useful in similar clustering methods based on the C backbone (Cardle and Dufton, 1994
, 1997
), and represents the approximate direct influence of any substitution at the site under consideration (Bajaj and Blundell, 1984
).
Step 2: Clustering
Each residue position in the folded protein chain is considered in turn, proceeding in sequential order from the N-terminus to the C-terminus. Other residue positions are added to form a cluster about the considered residue if they are found within the defined (here, spherical) volume about the -carbon. For this investigation, residues were added to the cluster only if their C
atoms were contained within the defined volume. Other rules for determining whether a residue should be included in the cluster (e.g. hydrophobic residues only, atoms in the volume other than C
, etc.) are possible, however. Again, our decision reflects the desire to investigate gross properties of the fold, and other metrics may be appropriate for other studies.
Step 3: Scoring the cluster
Defining the volume around a residues -carbon atom and determining clusters are general procedures that can be applied to several different modes of analysis (e.g. Cardle and Dufton, 1994
, 1997
). It is the nature of the scoring mechanism that individualises this method and from which we have derived its name (SID: Simple Intrasequence Difference). At its simplest, SID scoring calculates a value related to the maximum chain separation of residues found within the cluster obtained in step 2 above. Three such methods of scoring are described here, though more are possible.
(a) Simple difference (highest lowest, HL)
By this method, the SID score for a cluster is the chain separation of the two most chain-extreme residues in that cluster, i.e. the lowest residue number found in the cluster subtracted from the highest.
This score indicates how distant in the primary sequence are the most extreme parts of the structure that have packed into the same volume of space. Hence a cluster that contains parts of two or more chain segments that are far apart on the polypeptide chain will have a high score, while a cluster that contains only a single contiguous section of chain will have a low score (Figure 1, Table I
). A cluster overlapping an interface will have a high score, whereas a cluster found within an element will have a low score.
|
|
(b) Simple difference (greatest gap, GG)
This scoring method requires the ordering of all the positions found within a cluster into numerical order. The difference between consecutive residue numbers is then calculated. The largest such value is taken to be the SID score for that cluster.
Where only two non-overlapping chain segments are found within a cluster, a high score is obtained. Where only a single segment of chain is found, a low score results. However, the important distinction from the HL score is that when more than two segments of chain are found in the cluster, the GG score gives a score that is lower than the corresponding HL score.
Where the GG score is equal to one (1), the cluster encompasses only chain-local residues within the structure. The core positions of these clusters may represent the nuclei of structurally stable and cohesive elements of local structure. Substitutions of these side chains might be expected to result primarily in local structural changes.
(c) Differential score (DIFF)
The GG score is most useful when subtracted from the HL score to produce the differential score:
![]() |
Where the GG and HL scores are similar (as in single local sections of structure and junctions between two chain-distant structures) DIFF is small. Where HL > GG, i.e. at the junction of three or more sections of structure, DIFF is large (Figure 1, Table I
). A low differential score may thus indicate a simple interface between two sections of chain or an element. A large differential score, however, would likely indicate a multi-way interface between three or more sections of structure. Such junctions between several sections of structure are potentially significant, as they are difficult to stabilize by conventional secondary structural arrangements, and so represent a strong potential for relative motion.
The type of interface enclosed by a particular cluster can be identified absolutely by consideration of the HL, GG and DIFF scores for that cluster, and the number of residues that contribute to the cluster (Figure 1).
(d) Comparing SID scores across members of a protein family
In addition to calculating SID scores for single exemplars of a protein family, it is also desirable to find out how SID scores for the family fold may be changed as a result of primary sequence evolution, activation mechanisms, ligand binding, chemical modification and so forth. When comparing SID data for homologous folds, or different states of the same fold, we obtained the simple standard deviation in score for every residue position in the chain because the crystal structures available represent only a small sample of the population of available conformations. This measure conveys the inherent variation in the score for each cluster and can be taken as a guide to regions of potential motion or adjustment in the molecule.
Where the standard deviation is close to zero, there is little or no variation in SID score across the structures, which implies that the 3D environment around the position in question is narrowly defined, and not prone to conformational changes. Where the deviation is large, however, the particular members of the cluster vary depending on the specific structure, which may imply some structural or conformational rearrangement between different forms that merits further investigation.
Step 4: Selection of significant SID scores
The outcome of the SID calculation for a 3D structure is a value for each residue position in the chain that reflects its topological situation. In view of the fact that fold topologies are so highly conserved in protein families, the results can be extrapolated to other related variants with a confidence in proportion to the extent of the sequence homology. All the SID values obtained for a chain fold are potentially diagnostic because they express site-specific topological viewpoints on behalf of the residues concerned. By extension, the magnitude of a SID value can therefore be treated as a reflection of the impact that a substitution, modification or perturbation of the residue side chain might have on the molecular conformation as a whole. There is no threshold definable above or below which a SID score for a residue position in a fold can be regarded as truly significant or non-significant. Similarly, although it is valid to compare scores between different variations or modifications of the same family fold, it is not appropriate to try and compare scores between different fold types (i.e. different protein families). The principal difficulty here is that the range of SID scores for a fold will depend on its overall chain length. The power of SID analysis lies in its ability to assess the characteristics of specific protein topologies internally from the standpoints of each of the residues that comprise its backbone, and in the rendering of that complex 3D information into a simple numerical output for easy comprehension.
To extract significance from the SID scores, there are different ways of proceeding depending on whether only one fold is being considered, or a series of similar folds are being compared. For an individual fold, the residue positions can be ordered from the highest scoring to the lowest, and working from the top score downwards, their clustering in the 3D fold can be examined (see below for our approach). The effect of this procedure is to highlight the most prominent interfaces in the molecular fold topology (in terms of linear chain separation) first and then to either extend these interfaces, or nucleate less prominent new ones, by systematically taking into account lower and lower scores. Alternatively, the lowest SID scores can be considered first, such that residue positions with relatively minor roles are clustered first to define apparently autonomously folded structural elements, sub-domains and domains. Thereafter, consideration of the higher scores brings into focus the interfaces between these elements and finally the most marked of the internal interfaces. Whichever route is chosen, the complete list of residue positions needs to be traversed to convey to an observer the relative internal contributions of residue positions and their 3D ensembles towards defining and maintaining the fold topology and its capacity for adjustment.
If desired, one can focus on exceptional interfaces in a fold rather than developing a complete view of the topological implications, by determining outliers in the SID data. To do this, the median and the upper (UQ) and lower (LQ) quartiles of the SID score distribution (HL, GG or differential) should be determined. The inter-quartile distance (IQD) is then defined as (UQ LQ). Clusters that lie outside the range defined by (UQ + 1.5 x IQD) > SID score > (LQ 1.5 x IQD) are classed as outliers. We used the boxplot function of Kaleidagraph v3.07 (Synergy Software).
For each of the protein folds analysed in this paper, we chose to work down from the highest SID scores so that the most significant sub-structural interfaces were highlighted first. From there, we incorporated successively lower scores to detect lesser and tributary interfaces such that the molecule became divided successively into quasi-autonomous domains and sub-domains. To simplify data presentation, limit the length of this paper and demonstrate the potential of SID analysis in its fundamental form, our reporting is based on the HL calculation option and the illustrations are of the HL data. The GG and subsequent DIFF calculation options (results not shown) were employed in the comparison of uncomplexed and complexed conformations to define more closely the nature of the differences between them.
When comparing the SID data for two or more related folds (i.e. either folds of related variants, or different states of one fold), the interest lies in discovering which residue positions have sensed the largest changes in their topological situation. For example, do they sense a lowering in SID value (such as might be caused by a local interface widening or opening) or an increase in value (interface narrowing or closing)? Again, what constitutes a quantitatively significant or insignificant change is internally defined and difference values obtained for one fold type have no inherent comparability with such values obtained for a contrasting fold type. Having been made aware of the localities in the fold that have given rise to the most pronounced differences in SID score, one most return to a comparison of the original 3D structures to establish the actual scale of the changes.
In our examination of the data, we were particularly interested in prominent intramolecular interfaces that appeared to be vulnerable to intermolecular recognition events. Such interfaces, if disturbed, would have the potential to instigate an adjustment of the chain topology, and therefore a conformational change in the protein, in response to the binding. If such interfaces could be associated with known binding properties, then involvement of conformational changes in the translation of the binding into an action (e.g. catalysis or activation) needed to be considered seriously. Thus, we needed an expression of the surface accessibility of the interfaces, and this was achieved by correlating SID score for a residue position versus the number of other positions found within its 7 Å vicinity (example of plot shown in Figure 2). Our attention was directed to the combinations of high SID score (i.e. important interfacial location) and few neighbouring residues (surface exposure). These residue positions were then examined back in the context of the 3D fold.
|
Cluster merging is the process by which the clusters obtained in the scoring stage of SID analysis are combined to visualise the structural elements and interfaces. It operates identically to the procedure of cluster merging used by regiovariation analysis (Cardle and Dufton, 1994, 1997
) as an iterative process.
First, clusters are ranked according to their SID score (either highest to lowest or vice versa). For the purposes of explanation we ranked the clusters from highest to lowest SID score. Beginning the first round at the clusters with the highest SID score, we examine the members of all the clusters. If two or more clusters share a common residue position, then the members of those clusters are combined to form a single larger cluster. Once this procedure has been completed for all clusters with that SID score, the process moves on to the next highest SID score. In this round, all the clusters at that score are compared both with each other and with the existing merged and unmerged clusters from the last round. Again, if any clusters share a common member, they are merged. This procedure is repeated for progressively lower SID scores until all clusters have been examined, or all positions have been allocated to a single cluster.
When merging takes place from lowest to highest SID score clusters, the stable structural elements are identified. When merging clusters goes in the opposite direction, from highest to lowest SID score, the interfaces between those elements are seen.
Choice of proteins
We selected four exemplars of the best known and experimentally well-studied protein folds to examine initially, namely those of bovine pancreatic trypsin inhibitor (BPTI) (PDB code 4pti), secretory prophospholipase A2 (2bp2), chymotrypsin (2gmt) and carboxypeptidase A (2ctb), from the PDB database (http://pdb-browsers.ebi.ac.uk). This selection of folds covers sequence lengths from 58 to 307 residues and the folds contain various mixtures of classical secondary structure elements, and in the two larger examples, readily identifiable domains. Brief descriptions of the four families are given below.
The pancreatic trypsin inhibitor fold is composed of around 58 amino acid residues, which are formed into a twin-stranded ß-sheet with two helical sections packed against it. The fold, which is stabilised by three disulfide bridges, is representative of a large family found throughout living organisms. Most examples (BPTI included) act as serine proteinase inhibitors (e.g. of the chymotrypsin considered herein) by binding strongly within the enzymes substrate binding site. Some examples found in snake venom can block potassium (dendrotoxin) or calcium (calcicludine) channels and are thus neurotoxic. These actions (protease inhibition and neurotoxicity) are usually mutually exclusive.
The phospholipase A2 fold is comprised of about 123 residues and contains five sections of helix, a twin-stranded ß-sheet and several disulfide bridges whose location varies between enzyme subclasses. The fold is representative of a large family of enzymes variously capable of hydrolysing 3-sn-phosphoglycerides at the C2 position in either micellar or bilayer form. This example acts primarily on micellar substrates after activation and usually requires a calcium atom for activity. Some examples from snake venom are additionally myotoxic and/or neurotoxic.
The chymotrypsin fold comprises 245 residues formed into two predominantly ß-sheeted domains. Chymotrypsin is the founding member of the large serine proteinase family of enzymes (e.g. trypsin, elastase, kallikrein), all of which are capable of cleaving protein sequences internally, but with different specificities.
The carboxypeptidase A fold contains some 307 residues formed into a central multistrand ß-sheet with helices packed against it. Its main enzymic activity is to cleave C-terminal residues from protein substrates according to specificity, and the enzyme requires a zinc atom for catalytic activity.
A supporting reason for the choice of these folds was that 3D data also exists for precursor folds, and folds of sequence variants and the fold as complexed to other ligands. This allows for the controlled comparison of structurally, functionally and sequentially induced changes to the fold. The CATH structural (http://www.biochem.ucl.ac.uk/bsm/cath) and 3Dee domain definition (http://circinus.ebi.ac.uk:8080/3Dee/help/help_intro.html) databases were also consulted to obtain the current view of the sub-domain structure of the folds.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Interfaces highlighted by SID. The highest SID score is 57 and is shared by residue positions 1, 57 and 58. These positions and their localities emphasize the interfacing of the N- and C-termini (which mark the start and end of helices respectively). Consideration of successively lower SID scores highlights more of this interface until residue position 23 and its neighbours 29 and 43 become included. Positions 23/29 lie opposite each other in the central, twisted, ß-sheet, and form the centre of a packing interface for the N- and C-terminal helices against this ß-sheet. Position 43 is located within one of two notable chain segments in the molecule that do not adopt classical secondary structures. This segment crosses over the edge of the ß-sheet in the vicinity of position 23. Hence, the highest SID scores (i.e. 5750) relate to a multi-way interface between the terminal helices, the ß-sheet and one of the unstructured segments. Within this locale is found one disulfide bridge, that links the two helices together (555), and a second bridge that links the C-terminal helix to the ß-sheet (3051). Continuation down the scale of SID scores serves to expand the interface already defined to include more positions within the N-terminal helix and positions associated with the interface of the anti-parallel strands that run between the helices and the 1438 disulfide bridge.
|
|
Interfacial adjustments related to evolution and enzyme binding. Several 3D structures for different BPTI variants have been obtained, both as free forms and as complexed to serine proteinases (Bode and Huber, 1992). This enables alterations to conformation (expressed as variations in SID scores for homologous positions) caused by sequence evolution and enzyme binding to be discerned.
The uncomplexed versions show significant variations around positions 28, 29, 47, 54 and 5658. Lesser alterations are found around 9, 10, 13, 30, 33, 44 and 48. These suggest that evolutionary changes have mainly altered the interfacing of the C-terminal helix with the C-terminal side of the ß-sheet, while the smaller changes have mostly related to the interfacing of the two anti-parallel non-classically structured strands with each other and the opposite side of the ß-sheet. The C-terminal helix is connected directly to one of the unstructured strands by the 4447 segment that stretches round the edge of the sheet.
Upon complexation with serine proteinase enzymes, interfacial adjustment is seen around positions 6, 25, 28, 47, 54, 5658. Smaller events occur around 12, 5, 910, 1819, 21, 29, 30, 44, 46, 48 and 52. These changes are similar to those observed above, but are more extensive, this time also involving the N-terminal helix interfacing with the ß-sheet.
Apparently, whereas natural sequence variation has consequences primarily for the interfacing of the C-terminal helix to the remainder of the molecule (and thereby one of the unstructured strands to which it is connected), binding to an enzyme also changes the interfacing of the N-terminal helix such that both unstructured strands may be affected.
Phospholipase A2 fold (Figures 3 and 5)
Interfaces highlighted by SID. The highest SID score is 99 and is achieved by the vicinities of residue positions 8, 12 and 103. In the fold, the localities of these three positions overlap such that three separate secondary structure elements are encompassed. For the most part, the locale highlights the interfacing between the end segment of the N-terminal helix (helix A) with one of the two major helices that form the core of the molecule (helix E). Also included is a smaller interface between helix A and the hairpin turn of the ß-sheet (i.e. positions 12, 80, 81). In other words, the region is a sandwiching of helix A between helix E and the ß-sheet. Part of this locale constitutes the hydrophobic channel through which the phospholipid substrate is thought to be drawn (Scott et al., 1991; Arni and Ward, 1996). Taking into account lower SID scores down to 91 expands this region and introduces two satellite areas that are initially unconnected to the original. The second site is formed by an interface of the non-classically structured segments 2436 and 115123. The third is another three-element interface between the N-terminal residue and two transitional segments, namely where helix D joins to the ß-sheet, and where the ß-sheet joins helix E (residues 1, 6673, 92, 95).
|
The remaining high scoring site involving the N-terminal residue is lent immediate stability by the 6191 disulfide bridge and to a lesser extent by the 8496 and 1177 bridges (as noted above, the 1177 bridge is absent in Type II phospholipases). Part of this locale (helix D) is missing in the majority of the known variants because five residues (6266) have been deleted in them relative to the pancreatic enzyme.
Surface-accessible interfaces. When the SID scores are judged against local residue packing density, the two most externally accessible and high scoring locales are the calcium binding vicinity and the area that includes the N-terminus. Clearly, the calcium binding area has to permit entry of the metal ion and the formation around it of the correct ligand geometry (the metal can be removed or added without major denaturation). The vicinity of positions 69 and 72 that encompasses position 1 has a lower accessibility in the proenzyme because of the presence of an N-terminal extension that is removed in the activation process. In other members of the family, accessibility is further modified by the absence of helix D.
Interfacial adjustments related to evolution. Eight experimentally determined 3D structures for variants of this enzyme were submitted to SID analysis and significant variations noted. The most prominent adjustments have been in the vicinities of positions 21, 50 and 121123, with lesser changes around 11, 22, 33, 47, 68, 73, 77, 8082, 94, 100, 110, 117 and 132. When the clustering of these positions in the fold is considered, areas similar to those highlighted above are found to be involved, namely around the N-terminal, the C-terminal end of helix A, and the calcium binding domain. There is also some variation in the proximity of the core helices C and E.
Chymotrypsin fold (Figures 3 and 6)
Interfaces highlighted by SID The highest SID scores (i.e. above 200) relate to two areas in the chain fold with approximately equal prominence. One is the vicinity of positions 189/190 where the N-terminal strand intrudes into domain 2, while the other is around positions 120/121, the central point of the sequence about which the molecular fold is approximately palindromic. Positions 120/121 are also the mid-point of the main-chain crossover from domain 1 to domain 2.
|
The 120/121 environment includes the domain crossover strand, part of the domain 1/domain 2 interface and the original N-terminus. It can be regarded truly as the central point of the fold and lies below the adjacent catalytic residues His57 (contributed from domain 1) and Ser195 (contributed from domain 2). The 189/190 locale is close by within domain 2.
As progressively lower SID scores are considered, more and more of the domain interface is highlighted so that a stage is reached when all the interface becomes merged with the S1 pocket/Second N-terminal vicinity.
Surface-accessible interfaces. Most outstanding is the original N-terminal strand in the vicinity of residue position 3 where positions 15 encounter positions 120122 of the domain crossover and positions 204208 of domain 2. The implied vulnerability to external modification is substantially reduced by disulfide bridge 1122. The bridge links the N-terminal strand to the domain crossover link. However, the potential contribution to molecular stability of this N-terminal chain segment is changed drastically by the activation cleavage between positions 15 and 16, the consequence being that the segment can no longer be regarded as an integral part of the fold (the equivalent N-terminal strand is lost from trypsin because this bridge has no equivalent in trypsin).
The locality of 190 is again prominent and the area of interest broadly comprises strands 1620, 143146, 184194 and 220226. The need for external accessibility in this region is clear: it has to permit entry of the P1 side chain of a substrate (e.g. a tyrosyl side chain) for recognition purposes. Some stability is given to the area by the 191220 disulfide bridge, which is wholly contained within it, and the nearby 168182 bridge. In the transition between delta and alpha/gamma chymotrypsin, positions 147148 are cleaved out, which would loosen up the area.
A third area with high SID score and relatively high accessibility is where position 193 on domain 2 interfaces with strand segments 3334 and 3942 on domain 1. Although the domain interface produces high SID scores along its length, it is for the most part densely packed and inaccessible except for this particular region.
Interfacial adjustments related to evolution, activation and inhibitor binding. The folds of seven uncomplexed trypsins were analysed, the results averaged and the positions with the most significant standard deviations in SID score noted. The most prominent conformational variations concerned the environments of positions 30, 103, 139, 141, 203, 221 and 235. More minor variations are seen at 66, 70, 79, 91, 94, 96, 101, 113, 125, 177 and 219. These positions highlight the outer surface of the molecule, particularly the loop extremities. There are also notable clusters at either end of the interface between the two major domains.
To ascertain the fold adjustments involved in the activation of trypsinogen to trypsin, 11 trypsinogen folds were calculated and averaged for comparison with the free enzyme. It is clear from the SID results for these folds that there is substantial conformational variation amongst the trypsinogens, but on activation to trypsin, the fold becomes defined within much more narrow limits. In order of decreasing magnitude, the following positions experience significant alterations in their SID scores as a result of activation and thereafter show much less score variation: 191, 146, 16, 142, 190, 143, 41, 145, 17, 189, 144, 150, 159, 140, 188, 114, 23, 152, 193. These positions are all associated with the setting up of the fully functional S1 specificity pocket architecture (as caused by the intrusion into the second domain of the new chain N-terminal created for the enzyme by the activation process). While 114 lies some distance away in the first domain, it is directly affected by the reorientation of the nearby N-terminal chain segment. In contrast, although positions 79, 103, 125 and 139 also experience change as a result of the activation process, the variation in SID score suggests they move to a less restricted situation. Positions 79 and 139 lie adjacent to the N-terminal strand while 103 and 125 sandwich the start of the C-terminal helix, so it can be deduced that the activation process affects the relationship of both chain termini with respect to the major domains. Inspection of the relevant SID scores shows a narrowing of the major domain interface is involved. Only one position experiences a significant alteration in SID score, yet is conformationally fixed in both trypsinogen and trypsin. This is position 194, a critical part of the S1 specificity pocket which effectively switches from one fixed position to another.
The final comparison that can be made is between the free trypsins and trypsins complexed with inhibitors. Over 60 examples of the complexed trypsin fold were averaged and compared. In order of decreasing magnitude, complex formation changes specifically the SID-measured environments of positions 116, 114, 58, 146, 24, 71 and 187. The changes are generally much smaller than those seen in the activation process, but they also involve a loss of juxtapositional consistency when the different complexes are compared. The positions relate to the surroundings of position 24 (i.e. the locale of interaction between the N-terminus and the S1 specificity pocket), so it is implied that S1 pocket occupancy has either loosened up the area, or that different inhibitors change the locale in different ways. A notable change in isolation is seen at position 58, possibly related to the proximity of the catalytically vital histidine 57.
Carboxypeptidase fold (Figures 3 and 7)
Interfaces highlighted by SID. The highest SID scores relate primarily to the 3D confluence of three chain segments within the structure. These are 813, 6785 and 277294. At lower scores, segments 196198 and 116127 become included in the confluence. A satellite site (comprising segment 95102 and position 305, the C-terminus) arises nearby. Inspection of the fold topology according to the hierarchy of the SID results divides the molecule initially into two domains (1189 and 190307) wherein the latter domain is itself an assemblage of sub-domains 200266 and 272307. This enzyme is not widely recognized as having a domain structure because of the ß-sheet that forms the core of the molecule, but the domains can be readily observed by undoing the hydrogen bonding between the parallel ß-strands 6067 and 189196.
|
Surface-accessible interfaces. The most accessible part of the high scoring area is the conjunction of chain segments 913, 7185 and 277294, while a secondary focus involves the C-terminal (segments 99103, 301307). The third focus is the confluence of segments 5666, 187197 and 263272. The latter area contains E270 and is linked to the most accessible focus via the zinc binding site (i.e. residues 72 and 197).
There is only one disulfide bond in the molecule, and it does not act to stabilize any of the prominent structural interfaces detected. Instead, it lies wholly within a non-classically structured part of the 1189 domain that is probably too small to have sufficient inherent stability from non-covalent forces.
Interfacial adjustments related to evolution and inhibitor binding. The available 3D data for this enzyme not only includes several variants, but also the proenzyme and the enzyme complexed to small inhibitors. For the uncomplexed activated forms, the most pronounced SID-detected variations occur around positions 58, 73, 182 and 302, with lesser changes around 14, 47, 59, 116, 120, 144, 231, 275 and 297. These correspond to adjustments in the vicinity of the zinc ion and the C-terminus.
The activation process involves substantial alterations, initiated by the newly formed N-terminus, with positions 12, 40, 73 and 144 experiencing the most environmental changes. The positions are in line with one another on one face of the ß-sheet such that the N-terminus is able to exert influence upon the zinc locality and Asn/Arg 144/145 which bind the substrate carboxyl group. A second site of adjustment again involves the locality of the C-terminus of the enzyme.
The complexation process also causes adjustment near the N-terminus, centred about position 77, but this time extending onwards over the edge of the ß-sheet to its other side (182, 237) via 153155 and the Zn site. The perturbations around 77 are caused by inhibitor binding in the S1 site while those on the opposite side of the ß-sheet are caused by binding in the S1' site. Yet again there is some adjustment near the C-terminus.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
For the three larger folds, we observed that crowded residue positions tend to generate high SID scores (i.e. the more residues that are clustered within 7 Å of a given residue, the greater is the probability that residues from far away in the sequence will be encompassed). This reflects a tendency for chain interfaces to be buried in the interior of the molecular superstructure (presumably to minimise vulnerability to externally induced fold disruption). However, the highest SID scores seen for each protein fold show a different relationship to local residue packing density. There is a strikingly even distribution across the different packing densities for these prominent scores, with densities of 811 residues per 7 Å sphere being the most common. In other words, the most long-range interfaces (in terms of distance apart in the chain sequence of the contributing residues) tend not to be completely buriedthey are for the most part only moderately packed localities and therefore potentially of restricted accessibility from the external environment (see Figure 2 for the carboxypeptidase case). In some instances, the individual folds possess more than one prominent substructure interface. These major interfaces appear to represent significant mid-ground between surface and core where only certain ligands can gain access to some of the more important determinants of conformational stability.
Vulnerable interfaces (i.e. high scoring and with apparent external accessibility) are frequently compensated wholly or partially by the presence of disulfide bridges. These provide strong covalent linkages between the distant sequence segments that form the interface (e.g. Kunitz inhibitors, phospholipases, serine proteinases). This is not to imply that high scoring SID areas without disulfide bridges are topologically unstable, merely that a potential for a significant sub-structural realignment might remain if the typical non-covalent stabilizing forces (i.e. the hydrophobic/hydrophobic, charge/charge and charge/aromatic interactions, identified by Brocchieri and Karlin, as being chiefly responsible for the internal stabilization of protein fold arrangements) were not maximized, or were reduced by an external agency (Brocchieri and Karlin, 1995). Likewise, the presence of disulfide bridges across an interface does not remove the possibility of all adjustment, it implies only that any adjustments should be small scale.
Prominent and externally vulnerable interfaces may also accommodate metal atoms (e.g. phospholipases, carboxypeptidases), suggesting that these guests are required for the conformational impact of their liganding as well as their particular chemical properties.
SID scores facilitate the division of the molecules into sub-domains
The identification of the most prominent sub-structural interfaces by the SID calculation also indicates which parts of the superstructure may deserve to be identified as semi-autonomous domains. There have been many attempts to systematically detect sub-structural domains in folds based on a range of intuitive and subjective criteria. According to the widely cited CATH and 3Dee protein database assignments (see Materials and methods), the pancreatic trypsin inhibitor, phospholipase and carboxypeptidase folds are all classifiable as single domains. The serine proteinase fold, however, is classified as having two domains, one comprising the N-terminal segment with residues 121233 and the other the C-terminal segment with residues 24121. The SID results present a quite different picture, especially in respect of the phospholipase and carboxypeptidase folds.
Functional sites are associated with major interfaces
Our evidence suggests that for the four folds examined, the functional and guest binding sites have developed in molecular loci where the various tributary interfaces between the domains and sub-domains combine and converge in 3D to create particularly prominent and multi-component interfaces (witness the Pn residues in the inhibitors, the N-terminal locale/calcium site/hydrophobic channel in the phospholipases, the S1 pocket/catalytic site in serine proteinases, and the zinc binding/protein cleaving site of carboxypeptidases). We expect this has provided the opportunity for residue substitutions remote from the functional sites, and substitutions scattered over a wide area of the molecule, to have had a bearing on the evolutionary development and tuning of the functional sites. These substitutions could have caused adjustments of tributary interfaces in their vicinity, with the consequence that effects were relayed downstream via the interfacial connectivity to the functional site itself. Presumably, most of the effects would have been received at the functional site as slight changes of juxtaposition amongst the functional site residues. Depending on the character and connectivity of the interfacial paths along which they might travel, mutation-induced adjustments might be amplified, damped or combined to create more subtle effects than would be achieved by any side chain substitution in the site itself. In the lifetime of the protein, a modification or binding remote from the functional site could also bring about an adjustment of the functional site via the interfacial connectivity. Thus, the significance of a specific topology is that it will dictate the directionality of the adjustments possible, and also determine from which parts of the molecule mutation or binding events (e.g. allosteric factors, metal ions) will be able to exert an influence on particular components of the functional site.
Interfacial adjustment may also play a mechanistic role
Given the network of tributary interfaces that can be identified as converging upon a functional site, it should also be considered that a binding event in the functional site could radiate influence out to a large proportion of the molecule via the same interfacial network. However, whereas the evolutionary process adjusts the site in small steps from different locations, a ligand binding event (especially in the cases of the serine proteinases and carboxypeptidases) could produce a multipoint impact on a large proportion of the highest order interfaces simultaneously. The combination of the incoming ligand imposing some of its geometry upon the host protein, plus the possibility of some interface-stabilizing residues switching from intra- to intermolecular binding, could precipitate large-scale interfacial adjustments that might impact across the whole of the host protein. There is the clear opportunity, therefore, for global molecular adjustment to be harnessed to any ensuing translation of the binding into a response (e.g. channel opening or catalysis).
With this possibility in mind, we re-examined the four folds above to see if the execution of their prime function had the potential to involve significant internal domain reorientation.
Pancreatic trypsin inhibitor fold
In such a simple molecule, wherein the fold is defined by a ß-sheet with two helices packed against it, the adjustment potential resides in altering the juxtapositions of the three elements of secondary structure (Pritchard and Dufton, 1999). The scope for relative motion is reduced substantially by the 555 and 3051 disulfide bridges, but what remains has consequences primarily for the two strands that link the helices to the ß-sheet. This is because the two strands do not adopt classical secondary conformations or have any special anchorage to the sheet. In effect, the two strands are strung out from one end of the sheet to the other with alterations in tension having the most likely impact upon them. Given the comprehensive experimental data, there is little need for speculation other than to suggest that while adjustment of both the helix/sheet interfaces appears to be an integral part of the enzyme binding process, adjustment of only the C-terminal helix/sheet interface (and thereby the tension of the 3645 strand) seems to be a profitable evolutionary process. Interestingly, SID defines the C-terminal helix as a more autonomous part of the structure than the N-terminal helix.
Phospholipase A2 fold
The most vulnerable aspects of this molecular fold are for the most part compensated for by the strategic locations of disulfide bridges and the liganding of the calcium ion. However, the A helix, which is involved in high scoring regions at each end and is linked to the calcium binding region via helix B, has no covalent anchorage save the 1177 disulfide bridge, and even this is absent in the Type II variants. Moreover, the topological environment of the start of helix A is variable between family members because of deletions/insertions in and around helix D. SID analysis, coupled with the experimentally observed 3D variation seen between homologues (above), suggests that the helix AB segment (i.e. chain segment 130) is that part of the fold most likely to be prone to externally/internally induced adjustment with respect to the remaining superstructure, perhaps effecting an opening/closing hinge movement about its midpoint (the helix AB/helix E crossover). In view of structural variation, it may be disposed differently in different family representatives, and have greater freedom in Type II phospholipases.
In relation to what has been shown experimentally about the enzyme activation and substrate binding, the areas highlighted by SID are associated closely with some of the most critical features. The highest scoring site is located close to where the hydrocarbon chains of phospholipid substrates are expected to bind (i.e. in inhibitor complexes, the hydrocarbon chains lie in the elbow created by helices A and B) (Scott et al., 1991; Arni and Ward, 1996). The calcium binding area (the second region highlighted by SID) is expected to bind the phosphate moiety of the substrate, while the remaining high SID region is around the N-terminal (Scott and Sigler, 1994
). The latter area plays a central role in activating the enzyme to attack aggregated substrate (e.g. in micellar form). In the proenzyme, an N-terminal extension causes the locality to be disordered and only substrate monomers can be hydrolysed. Removal of the extension causes the newly formed
-amino function to link to D99 via a water molecule and hydrogen bond to residues 4 and 71, actions which set up both the catalytic mechanism and the Interface Recognition Site (IRS) for optimal activity against substrate aggregates (Dijkstra et al., 1983
; Scott et al., 1991
; Dua et al., 1995
). It has been postulated that aggregated substrate is able to stimulate the enzymic action via a conformational change in the protein, mediated by the encounter with the IRS (aggregated substrate can also enhance inhibitor and calcium binding) (Slotboom et al., 1982
). More recent findings confirm that binding to lipid/water interfaces causes the N-terminal helix and ß-sheet to change position (VandenBerg et al., 1995
). Moreover, in the examples of this enzyme encountered frequently as dimers, the IRS and catalytic site would be inaccessible to substrate unless a conformational change occurred to expose them (Brunie et al., 1985
; daSilvaGiotto et al., 1998). Interestingly, in the dimers, the interfacing involves the two most exposed areas of high SIDthe calcium binding area of one molecule interacts with the N-terminal area of the second in a symmetrical fashion.
There appear, therefore, to be strong grounds for anticipating some element of conformational adjustment in the mechanism of these enzymes, and the SID results give focus to the idea. Secondary binding of the substrate monomers (i.e. via the hydrocarbon chains) occurs close to the site of highest SID between helices A and B, and the N-terminal locality and the calcium binding area are both secondary areas of high SID that are associated with the mechanistic events. These regions are potentially in communication via change in the conformational disposition of the helix A/helix B segment. The impetus for adjustment could arise from any of the three sites individually or from an encounter between the IRS and an aggregated substrate (in which case the substrate could influence all three high SID areas simultaneously). An important experimental finding of relevance is that catalytic efficiency of the enzyme increases as the hydrocarbon chain length is increased (Slotboom et al., 1982); this would befit a mechanism in which secondary binding at the site of highest SID was translated into an allosteric optimization of catalytic efficiency. As to the possible extent of the adjustment in the molecule, it will be recalled that SID identifies two sub-domains with wrap around arms, namely 171 and 72109/101123. According to the evidence detailed above, the intramolecular docking of one of these arms (the N-terminal one) is influenced directly by binding to substrate hydrocarbon chains and lipid/water interfaces, while the other (the C-terminal one) could be disturbed by any change to the calcium ligand field (e.g. by interaction with the substrate phosphate). Finally, the N-terminal resides near the hinge of the two domains (as retained by the 6191 disulfide bridge and adjusted by deletions/insertions and the ß-sheet disposition). All these circumstances are consistent with activation and substrate binding mechanisms that have, either over evolutionary time or in real time, consequences for the juxtaposition of helices C and E. Since the catalytic residues are found at this interface (Janssen et al., 1999
), contributed from either side, the ultimate issue would seem to be the adjustability of their precise 3D orientation.
Chymotrypsin fold
With two major domains in the molecule of similar size and fold, and no cross interface disulfide bridges, the largest scale conformational adjustments that can be entertained involve some kind of movement of these domains with respect to each other. The highlighting by SID of the area of the original N-terminus, and of the new N-terminus/S1 specificity pocket would be consistent with this notion because it is clear that the N-terminal strand is the main obstacle to domain adjustment. The N-terminal strand is essentially a wrap around arm that emerges from domain 1 to embrace domain 2. In chymotrypsinogen, this arm is a substantial double strand loop with the N-terminal anchored at the domain crossover. This would appear to prevent the possibility of any significant movement of the two domains relative to each other. However, in the transition from proenzyme to active enzyme, the N-terminal loop is cut at its point of reversal, removing the restraint that the 1122 disulfide and interaction of the 115 segment with domain 2 could have against domain movement. Thereafter, the new N-terminal becomes associated with the disposition of the S1 binding pocket, and in doing so is placed in a situation where it may be affected by substrate binding in the S1 pocket. Thus, the SID results suggest that the greatest potential for conformational adjustment is centred round the S1 pocket and the nearby domain interface, which includes the catalytic residues 57 and 195 responsible for the actual peptide bond cleavage. Moreover, the restraint against domain movement appears to be reduced by activation, while what restraint remains comes under the influence of substrate binding events. It has been inferred from inhibitor binding studies that substrates bind across the domain interface with the scissile peptide bond actually bridging the interface. The majority of the Sn secondary interactions take place with the 120240 domain while the majority of the Sn' secondary interactions take place with the 1120 domain. This situation is ideal for a mechanism in which a movement of the two large domains is used to strain/distort the scissile peptide bond preparatory to hydrolysis (Dufton, 1990).
Carboxypeptidase A fold
Based on the SID results above, there would appear to be some considerable scope for conformational change in response to substrate binding and the subsequent hydrolytic events. The circumstantial evidence includes:
The barrier to any major conformational response to binding (i.e. the domains moving with respect to each other) is constituted primarily by the parallel ß-sheet formation between strands 6067 and 189196, and the zinc liganded between residues 72 and 196. In order to propose a fold adjustment in the mechanism, it would be necessary to suppose that the changes in the ligand field of the zinc atom during catalysis (it is hypothesized that the zinc co-ordinates the tetrahedral intermediates) (Christianson and Lipscomb, 1989) are in some way connected with a temporary destabilisation/adjustment of the domain interface. Whether the behaviour of the zinc could initiate a response by precipitating the loss of the parallel inter-strand hydrogen bonding, or whether secondary binding by the substrate (i.e. substrate residues P1, P2, P3, P4, etc.) near these strands could impose allosterically a catalytically efficient geometry on the zinc by adjusting its ligands via the interface, remains open to question. The benefit of causing a domain adjustment is that the scissile bond would become the focus of opposing conformational adjustments in the enzyme by virtue of its location, and the fact that the substrate structure either side of it is anchored in different domains and on opposite sides of the ß-sheet. This could distort/strain the peptide bond in such a way as to ease the catalytic task (e.g. by reducing the resonance stabilization and making the peptide bond more ester-like). It is interesting to note that a similar contrivance (i.e. substrate scissile bond stretched over a ß-sheet edge) is observable for subtilisin. In the case of carboxypeptidase, some role for the C-terminal helix seems likely since not only does it lie along the main interface and have a degree of structural independence, but also there is an unexplained centre of residue conservation around the C-terminus in the enzyme family as a whole (Cardle and Dufton, 1994
).
![]() |
Conclusion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our application of the SID technique suggests that the proteins we have examined have been underestimated in their ability to function as true mechanical devices (i.e. to deploy structural movement and induction of strain as a prime means of translating binding events into inhibitory or catalytic consequences) (Williams, 1993). This is not to imply that the stereo-electronic catalytic mechanisms that have been proposed for the enzymes are fundamentally incorrect, only that a significant contribution to the process may have remained undetected (e.g. that the proteinases could make use of primary and secondary binding to create a physical distortion of their target substrate bond and link it to a catalytic group reorientation adjacent to that bond). In the case of proteinases, this reappraisal goes some way to explaining why, for example, different fold topologies and catalytic groupings are required for aminopeptidase, endopeptidase and carboxypeptidase actions, and for proteinases and oligopeptidases (i.e. that different fold-dependent mechanical solutions are required according to the location of the target peptide bond within the substrate chain). Applied to further protein families, the SID technique may help confirm that natural proteins have been based on a limited subset of the possible chain topologies because only a few chain fold patterns have proved to be generally robust and incrementally adjustable when subject to random mutational events and functional selection. Many of these enduring topologies may have the additional virtue of presenting unique, but reliable, conformational responses when other molecules become bound to their exposed sub-structural interfaces.
![]() |
Notes |
---|
2 Present address: Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, UK
3 To whom correspondence should be addressed. E-mail: mark.dufton{at}strath.ac.uk
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bajaj,M. and Blundell,T. (1984) Annu. Rev. Biophys. Bioeng., 13, 453492.[CrossRef][ISI][Medline]
Bode,W. and Huber,R. (1992) Eur. J. Biochem., 204, 433451.[Abstract]
Brocchieri,L. and Karlin,S. (1995) Proc. Natl Acad. Sci. USA, 92, 1213612140.[Abstract]
Brunie,S., Bolin,J., Gewirth,G. and Sigler,P.B. (1985) J. Biol. Chem., 260, 97429749.
Cardle,L. and Dufton,M. (1994) Protein Eng., 7, 14231431.[Abstract]
Cardle,L. and Dufton,M. (1997) Protein Eng., 10, 131136.[Abstract]
Christianson,D.W. and Lipscomb,W.N. (1989) Acc. Chem. Res., 22, 6269.[ISI]
Czapinska,H. and Otlewski,J. (1999) Eur. J. Biochem., 260, 571595.
daSilvaGiotto,M.T., Garratt,R.C., Oliva,G., Mascarehas,Y.P., Giglio,J.R., Cintra,A.C.O., deAzevedo,W.F., Arni,R.K. and Ward,R.J. (1998) Proteins: Struct. Funct. Genet., 30, 442454.[CrossRef][Medline]
Dijkstra,B.W., Renetseder,R., Kalk,K.H., Hol,W.G.J. and Drenth,J. (1983) J. Mol. Biol., 168, 163179.[ISI][Medline]
Dua,R., Wu,S.K. and Cho,W.H. (1995) J. Biol. Chem., 270, 263268.
Dufton,M. (1990) FEBS Lett., 271, 913.[CrossRef][ISI][Medline]
Gerstein,M., Lesk,A.M. and Chothia,C. (1994) Biochemistry, 33, 67396749.[ISI][Medline]
Harris,J.B. (1991) In Harvey,A.L. (ed), Snake Toxins. Pergamon Press, New York, pp. 91129.
Janssen,M.J.W., van der Wiel,W.A.E.C., Beiboer,S.H.W., van Kampem,M.D., Verheij,H.M., Slotboom,A.J. and Egmond,M.R. (1999) Protein Eng., 12, 497503.
Matthews,B.W., Sigler,P., Henderson,B. and Blow,D.M. (1967) Nature, 214, 652656.[ISI][Medline]
Oue,S., Okamoto,A., Yano,T. and Kagaiyama,H. (1999) J. Biol. Chem., 274, 23442349.
Pritchard,L. and Dufton,M.J. (1999) J. Mol. Biol., 285, 15891607.[CrossRef][ISI][Medline]
Pritchard,L. and Dufton,M.J. (2000) J. Theor. Biol., 202, 7786.[CrossRef][ISI][Medline]
Richardson,J.S. (1981) Adv. Protein Chem., 34, 167339.[Medline]
Sayle,R. (1996) RASWIN (Molecular Graphics) Version 2.6.4. Glaxo-Wellcome Ltd., Greenford.
Scott,D.L. and Sigler,P.B. (1994) Adv. Protein Chem., 45, 5388.[ISI][Medline]
Scott,D.L., White,S.P., Browning,J.L., Rosa,J.J., Gelb,M.H. and Sigler,P.B. (1991) Science, 254, 10071010.[ISI][Medline]
Slotboom,A.J., Verheij,H.M. and de Haas,G.H. (1982) In Hawthorne,J.N. and Ansell,G.B. (eds) Phospholipids. Elsevier Biomedical Press, Amsterdam, Chap. 10.
VandenBerg,B., Tessari,M., Boeleus,R., Dijkman,R., deHaas,G.H., Kaptein,R. and Verheij,H.M. (1995) Nature Struct. Biol., 2, 402406.[ISI][Medline]
Williams,R.J.P. (1993) Trends Biochem. Sci., 18, 115117.[CrossRef][ISI][Medline]
Yang,C.C. (1994) J. Toxicol. Toxin Rev., 13, 125177.[ISI]
Zhang,C.O. and DeLisi,C. (1998) J. Mol. Biol., 284, 13011305.[CrossRef][ISI][Medline]
Received September 27, 2001; revised September 1, 2002; accepted December 3, 2002.