1 Department of Biochemistry, 80, Tennis Court Road, Old Addenbrooks Site, Cambridge CB2 1GA, UK and 2 Centre for Cellular and Molecular Biology, Uppal Road, Hyderabad 500 007, India
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: ß-strand packing/comparative modelling/helix packing/protein data analysis/structure prediction
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the case of homologous proteins of <40% sequence similarity, amino acid differences lead to changes in the residue volumes in the core. In order to accommodate these changes, the equivalent SSEs in the common core undergo significant relative shifts and rotations (Lesk and Chothia, 1980, 1982
, 1986
; Chothia and Lesk, 1982
, 1986
). As a consequence the root mean square (r.m.s.) differences of the structures increase, leading to a decrease in the number of topologically equivalent residues. This limits the predictive capability of comparative modelling procedures. In order to develop a method to predict relative shifts in the secondary structural elements, it is essential to understand the precise quantitative relationships of the distances between SSEs and the properties of the residues involved in their packing.
Many studies covering various aspects relating to conformation, geometry and packing of ß-sheets have been reported. For example, the conformation of ß-sheets with respect to ß-strand connectivity has been analysed by Sternberg and Thornton (1977a,b). The ß-sheet geometry in proteins, emphasizing the allowed conformational flexibility among the parallel, anti-parallel and mixed ß-sheets, has been discussed by Salemme (1983). The relative rigid body shifts and rotations in equivalent ß-sheets in the immunoglobulin and the plastocyaninazurin families have been reported (Chothia and Lesk, 1982; Lesk and Chothia, 1982
). Chothia and Janin (1981, 1982) have reported the details of orthogonal and aligned packing of ß-sheets. Principles governing ß-sandwich structures have been analysed (Cohen et al., 1981
). Efimov (1997a,b) has described construction of structural trees for protein superfamilies of ß-proteins having root structures characterized by aligned packing and orthogonal packing of ß-sheets. There are reports on coiling (Chothia, 1983
), energetics (Chou et al., 1986
) and folding pattern (Chothia and Finkelstein, 1990
) in ß-sheet proteins. The propeller assembly of ß-sheets, their preferred assembly with sevenfold symmetry and principles determining ß-sheet barrels have been analysed (Murzin, 1992
; Murzin et al., 1994a
,b
). Recently, Chothia et al. (1997) described all known folds of ß-proteins. However, little attention has been paid to quantitative relationships that might allow the prediction of inter-sheet distances.
Blundell and co-workers (Reddy and Blundell, 1993; Reddy et al., 1999
) have reported quantitative packing relationships in three cases of SSE packing, viz. helix to helix, helix to ß-strand and helix to ß-sheet, and showed their potential use in comparative modelling of
and
ß classes of proteins. In this paper, we analyse packing between two ß-strands belonging to two different ß-sheets in a large number of protein structures and show that the inter-axial distance between the two interactively packed ß-strands is significantly correlated with the weighted sum of the volumes of the interacting residues at their packing interface. We investigate the most common factors that influence packing. We also further demonstrate the usefulness of the distancevolume relationship in the prediction of inter-axial distances in homologous proteins.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The three-dimensional coordinates of protein structures deposited in the Protein Data Bank (PDB) (Bernstein et al., 1977) were used to identify the ß-strands with a length of at least five residues using the definition of Kabsch and Sander (1983) as implemented by Smith (1989) in his SSTRUC program. Two non-redundant sets of ß-strand pairs were considered. The first set (A) included pairs from proteins with sequence identities <25% and defined at 2 Å (Hobohm et al., 1992
). However, this set omits pairs from homologous proteins where differences in packing are significant and can include those that are identical. The second set (B) was derived from a set of protein structures solved at 2.5 Å or better resolution, corresponding to 6531 protein chains. Where more than one identical pair was identified from homologous proteins, only the pair from the structure defined at the best resolution was considered for analysis. The results presented in the following sections have been obtained using set B unless stated otherwise.
Calculation of amino acid residue-dependent parameters of the packing interface
To identify the interactively packed pairs in a protein structure, the solvent contact surface area (SCSA) of every SSE was calculated in the presence and absence of every other SSE. The fractional loss in SCSA of a residue j (denoted by ndaj) due to packing was calculated as ndaj = (aij apj)/astdj, where aij and apj are the SCSA values of the residue j of a SSE in the absence and presence of the other SSE, respectively, and astdj is the standard state value of SCSA for the residue type j (see Table I). The residues with ndaj values
0.1 are considered as the interacting residues and the packing interface between the two SSEs constitutes the side chains of all such interacting residues. The total fractional loss of solvent contact areas at the packing interface is calculated as nda =
ndaj ; j = 1, nint where nint is the total number of interacting residues. The SCSAs of the SSEs were determined with a spherical probe of radius of 1.4 Å using an algorithm of Richmond and Richards (1978) as implemented by Sali (1991) in his program PSA.
|
|
The packing geometry is characterized by the distance between the interactively packed SSEs and their mutual angle of orientation, each defined with respect to the linear axes of the SSEs. In the case of ß-strands we quantify the distance in two ways. The first, referred to as dip, is the distance between the midpoints on the projected interaction regions on the axes of the secondary structural elements (distance between the midpoints of b11b12 and b21b22 in Figure 1) and the second, referred to as dcl, is the shortest of the distances between the projected C
positions on the two axes of the secondary structural elements (distance between the projections of the two residues A63 and V7 in Figure 1
). Both dip and dcl have been used to explore the packing relationship. The angle of mutual orientation is calculated as the angle between the N- to C-terminal directional axis vectors (the vectors AB and A'B' in Figure 1
) projected on to the plane perpendicular to the line joining their midpoints. The axis of a ß-strand is calculated using a method suggested by Blundell et al. (1983) for helices. A probe ß-strand of length of four residues is superposed on to the real ß-strand, the superposed C
atoms of the real ß-strand are projected on to the probe axis (the projected C
positions are referred to as `real-axis points') and a straight line is fitted using the projected C
positions.
|
![]() |
Results and discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
ß-Strandß-strand packing
In the present analysis we focus only on the interactive packing of ß-strands that occurs due to the aligned packing of two ß-sheets (Chothia and Janin, 1981). However, interactive packing between two ß-strands can also occur due to the folding of a ß-sheet on to itself, giving rise to orthogonal packing of the ß-sheet (Chothia and Janin, 1982
). This is not discussed in this paper.
Of the 3500 interactively packed ß-strand pairs, about 40% were interacting only through the residues at one of the termini. Figure 2 illustrates different modes of interactions in such pairs. As such interactions are very unlikely to determine the packing geometry between the ß-strands, such pairs have been removed from the data set, leaving 1986 pairs (occurring in 428 protein chains) which were used for further analysis.
|
|
The minimum, maximum and average values of the two inter-axial distances dip and dcl are given in Table III. Although the average values are about 10 Å, the distances vary from 4 to 15 Å. Distances in most of the pairs, derived from ß-sandwich structures, are distributed in the range 911 Å, which is similar to that reported earlier (Cohen et al., 1981
).
|
|
In a majority (80%) of pairs, the interface is occupied by three to six interacting residues, although some pairs have as few as two residues and some others have as many as 10 residues (see Table III). The number of interacting residues is, roughly, inversely proportional to the inter-axial distance, i.e. the interfaces with shorter inter-axial distances contain a greater number of residues than those with longer inter-axial distances. The total volume V varies from 227 to 1612 Å3 (Table III
).
Packing relationship: inter-axial distances and interacting residue volumes
An inspection of the plot of V versus the distances dcl and dip for interacting pairs of ß-strands did not reveal a clear correlation (Table IV). We further examined the correlation between distances and the functions of the form (V/nda) discussed in Reddy and Blundell (1993) which gave a very good correlation for helixhelix packing. The corresponding values of the correlation coefficient (r) are given in Table IV
and these are better than those obtained for V versus inter-axial distances, implying that the actual contribution of the total residue volume to the packing depends on the extent of SCSA of the interacting residues buried. The best correlation is obtained for F6 versus dcl in most of the families (see Table IV
). In further discussion here we consider only the relationship F6 versus dcl and refer to this as the packing relationship. We also tested the relationship with set A and obtained a slightly improved correlation (Table V
).
|
The correlation coefficients were found to be slightly less when the full residue contributions were not considered (Table V). This means that inclusion of main-chain atomic contributions to the packing improves the overall packing relationship.
The interacting pairs were also examined for side chainside chain contacts between them. The side chainside chain contacts were defined as those with inter-atomic distances less than or equal to the sum of the van der Waals radii of the corresponding atoms plus 0.6 Å (Lesk and Chothia, 1980). The van der Waals radii for the atoms (C = 1.7, O = 1.5, N = 1.6 and S = 1.8 Å) were taken from Bondi (1964).
Of 1986 pairs, only 60% had one or more contacts between them. The correlation for F6 versus dcl is relatively poor (see Table V) in the majority of the families, suggesting residues not in contact also contribute to the packing in those families. In the following discussion we consider only the packing relationship obtained from full-residue contributions.
The slopes and intercepts of the regression lines best fitting the F6dcl relationship in various families of proteins are given in Table VI. In order to obtain an unbiased, general regression line that best represents the packing relationship for a pair of ß-strands, the average values of slopes and intercepts were calculated (see Table VI
). Figure 4
shows the scatter plot for the packing relationship superposed with the general regression line.
|
|
In general, r values for parallel and anti-parallel pairs are ~0.8 and ~0.7, respectively, indicating a better correlation for parallel pairs than anti-parallel pairs (Table VII). Among the anti-parallel pairs, those comprising ß-strands contiguous in the sequence and connected by linkers of three to nine residues [ß-arcs (Efimov, 1987
)] showed a poor correlation, indicating that such ß-strand pairs are probably constrained by the short linkers.
|
Although most points are evenly spread about the regression line (
1.1 Å ) (Figure 4
), about 40 points lie well below having dcl < 6 Å. In these pairs one or both ß-strands are severely distorted (bent, coiled or twisted) (see Figure 5a
for an example) and therefore the inter-axial distances are uncorrelated with the volume-dependent function.
|
Residues at the packing interface of outliers
Visual inspection indicated that many pairs in the BRL group (Figure 5b) involve small amino acids such as Ala and Gly, while those in the ARL (Figure 5c
) involve large hydrophobics, such as Phe and Leu. The propensity of each amino acid residue i to occur in each group was calculated as
![]() |
where fi = % occurrence of a residue i in that outlier group and Fi is the % occurrence of a residue i in all pairs, regardless of the group to which it belongs; pi values were calculated using set A. The following residues appeared as preferred and non-preferred residues for the two groups of outliers:
Preferred Non-preferred
BRL outliers A, G, M, N, S, T F, I, L, V, W, Y
ARL outliers F, I, L, M, V, W, Y A
Side chains of the polar residues in the BRL group point away from the packing interface, resulting in a lesser residue volume contribution than expected. For ARL pairs the side chains of apolar residues align at the packing interface. A similar residue-based effect on the packing of helices in the globin family has been reported (Efimov, 1979). In this family the presence of small and polar residues at the interfaces gives rise to small inter-helix distances, whereas the presence of apolar residues gives rise to large inter-helix distances.
Distortion of ß-strands
Ideally a regular ß-strand is an extended structure with a small right-handed twist about its axis. However, in proteins ß-strands are often bent or coiled (Chothia, 1973, 1984
). Such a structural distortion can affect the packing relationship in an interacting pair. The structural distortion was quantified by the r.m.s. deviation (denoted by
ax) of the real-axis points from the linear axis.
ax varies from 0.3 to 4.0 Å. and the number of pairs showing an absolute deviation |
dcl| > 0.5 Å from the regression line for increasing values of
ax is given in Table VIII
. More than half of them are outliers and their percentage increases with the value of
ax. Thus, a key factor responsible for the deviation from the packing relationship is the structural distortion associated with one of or both the ß-strands in the pairs.
|
We also investigated interactive units consisting of two hydrogen-bonded (parallel or anti-parallel) ß-strands (ß-sheet units) and belonging to two different ß-sheets (see Figure 6). The inter-unit distances in the pairs where each ß-strand has at least one residue that undergoes a loss of one tenth of its SCSA upon packing were calculated as the distance between the ortho-centres of each of the units (In Figure 6
. O and O' are the ortho-centres for the ß-sheet units 1 and 2, respectively). The positions of ortho-centres were computed using the four C
projections of the first and the last interacting residues from each ß-strand on to their respective axes (A, B, C, D, A', B', C' and D' in Figure 6
). In 457 pairs of unique sequence there were significant correlations (r values 0.69 and 0.61) between the inter-sheet distance (dOO') and the volume-related functions F5 and F6. However, the correlations were not as good as those for pairs of ß-strands. Of 427 pairs 93% belong to anti-parallelanti-parallel class, 5.3% belong to anti-parallelparallel class and only 1.3% belong to parallelparallel class. The value of the correlation coefficient in the anti-parallelanti-parallel class, 0.70, is better than those for all the pairs of ß-sheet units.
|
Reddy and Blundell (1993) investigated helixhelix packing in X-ray structures solved at 3.0 Å or better resolution and investigated about 1000 helix pairs. The present data set of 6531 protein chains contains 6081 non-identical interacting pairs [excluding those interacting only through the terminal residues (three residues) at either N or C-termini]. The corresponding r values are given in Table IX. Functions F3 and F5 give rise to a very good correlation with dip (r = 0.79).
|
Prediction of the inter-axial distance between two ß-strands belonging to two ß-sheets in a protein (`target') using the packing relationship requires information about the possible interacting residues in those ß-strands. This can be obtained by sequence alignment of target with its homologue of known 3-D structure (`template'). Using the residues of the target equivalent to the interacting residues in the template, the value of the function F6 can be calculated and the inter-axial distance can be predicted using an appropriate regression line depending on the angle of packing (parallel or anti-parallel) in the template pair (see Table VII). The total residue volume V can be calculated using the values given by Chothia (1975) (Table I
) for 20 amino acid residues. The nda value can either be derived from the representative values of residues nda_avj (Table I
) calculated as the average values from the interacting residues in all the interacting pairs analysed in this study or taken from the equivalent interacting residues in the template.
The usefulness of the packing relationship to predict the inter-axial distance between two ß-strands was tested using seven families of homologous proteins with average sequence identities varying from 15 to 51% (Table X). The information regarding possible interacting residues was obtained using the structure-based sequence alignments, obtained using COMPARER (Sali and Blundell, 1990
) and deposited in the in-house database HOMSTRAD (http://cryst-bioc.cam.ac. uk/~homstrad) (Mizuguchi et al., 1998
). In every family each protein was considered as a target and a distance prediction was made using every other protein as template in that family (938 targettemplate ß-strand pairs). For each of the target ß-strand pairs two values of F6 were computed using the two values of nda and the inter-axial distances dp1 and dp2 were predicted.
|
|
Hence this investigation shows that the predicted inter-axial distances are more useful than those taken from templates. For comparative modelling one can obtain as many predicted distances as the number of homologues used as templates. Therefore, a weighted average of the predicted distances is used in modelling of the target. The weights are taken as equal to the inverse of square of sequence identities between the template and the target (Srinivasan and Blundell, 1993).
Conclusions
We have investigated the interactive packing in ß-strands and have shown that the distance between two ß-strands is significantly correlated with the weighted sum of the volumes of all the interacting residues at the packing interface. The same is also shown in the packing of ß-sheet units and helices. Two factors seem to influence packing of ß-strands: structural distortions in the interacting pairs and the presence of certain types of amino acid residues at the packing interface. We have also shown the usefulness of the packing relationship in the prediction of inter-axial distances between two equivalent ß-strands in homologous proteins. The predicted distances are often found closer to the observed distances than the template distances, indicating an advantage in using predicted distances over the template distances in comparative modelling of proteins.
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[ISI][Medline]
Blundell,T.L., Barlow,D., Borkakoti,N. and Thornton,J. (1983) Nature, 306, 281283.[ISI][Medline]
Blundell,T.L., Sibanda,B.L., Sternberg,M.J.E. and Thorton,J.M. (1987) Nature, 323, 347352.
Blundell,T.L. et al. (1988) Eur. J. Biochem., 172, 513520.[Abstract]
Bondi, A, (1964) J. Phys. Chem., 68, 441451.[ISI]
Browne,W.J., North,A.C.T., Philips,D.C., Brew,K., Vanaman,T.C. and Hill,R.L. (1969) J. Mol. Biol., 42, 6586.[ISI][Medline]
Chothia,C. (1973) J. Mol. Biol., 75, 295302.[ISI][Medline]
Chothia,C. (1974) Nature, 248, 338339.[ISI][Medline]
Chothia,C. (1975) Nature, 254, 705708.[ISI][Medline]
Chothia,C. (1983) J. Mol. Biol., 163, 107117.[ISI][Medline]
Chothia, C. (1984) Annu. Rev. Biochem., 53, 537572.[ISI][Medline]
Chothia, C and Finkelstein,A.V. (1990) Annu. Rev. Biochem., 59, 10071039.[ISI][Medline]
Chothia,C. and Janin,J. (1981) Proc. Natl Acad. Sci. USA, 78, 41464150.[Abstract]
Chothia,C. and Janin,J. (1982) Biochemistry, 21, 39553965.[ISI][Medline]
Chothia,C. and Lesk,A.M. (1982) J. Mol. Biol., 160, 325342.[ISI][Medline]
Chothia,C. and Lesk,A.M. (1986) EMBO J., 5, 823826.[Abstract]
Chothia,C. and Lesk,A.M. (1987) Cold Spring Harbour Symp. Quant. Biol., 52, 399405.[ISI][Medline]
Chothia,C., Lesk,A.M., Levitt,M., Amitt.,A.G., Mariuzza,R.A., Phillips,S.E.V. and Poljak,R.J. (1986) Science, 233, 755758.[ISI][Medline]
Chothia,C., Hubbard,T., Brenner,S., Barns,H. and Murzin,A. (1997) Annu. Rev. Biophys. Biomol. Struct., 26, 597627.[ISI][Medline]
Chou,K.C., Nemethy,G., Rumsey,S., Tuttle,R.W. and Scheraga,H.A. (1986). J. Mol. Biol., 188, 641649.[ISI][Medline]
Cohen,F.E., Sternberg,M.J.E. and Taylor,W.R. (1981) J. Mol. Biol., 148, 253272.[ISI][Medline]
Efimov,A.V. (1979) J. Mol. Biol., 134, 2340.[ISI][Medline]
Efimov,A.V. (1987) FEBS Lett., 224, 372376.[ISI]
Efimov,A.V. (1997a) Proteins, 28, 241260.[ISI][Medline]
Efimov,A.V. (1997b) FEBS Lett., 407 3741.[ISI][Medline]
Evans,S.V. (1993) J. Mol. Graphics, 11, 134138.[ISI][Medline]
Greer,J. (1981) J. Mol. Biol., 153, 10271042.[ISI][Medline]
Havel,T.F. and Snow,M.E. (1991) J. Mol. Biol., 217, 17.[ISI][Medline]
Hobohm,U., Scharf,M., Schneider,R. and Sander,C. (1992) Protein Sci., 1, 409417.
Johnson,M.S., Srinivasan,N., Sowdhamini,R. and Blundell,T.L. (1994) CRC Crit. Rev. Biochem. Mol. Biol., 29, 168.
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[ISI][Medline]
Lesk,A.M. and Chothia,C. (1980) J. Mol. Biol. 136, 225270.[ISI][Medline]
Lesk,A.M. and Chothia,C. (1982) J. Mol. Biol., 160, 325 342.[ISI][Medline]
Lesk,A.M. and Chothia,C. (1986) Phil. Trans. R. Soc. Lond., 317, 345356.
Levitt,M. and Chothia,C. (1976) Nature, 261, 552558.[ISI][Medline]
Mizuguchi,K., Deane,C.M., Blundell,T.L. and Overington,J.P. (1998) Protein Sci., 7, 24692471.
Murzin,A.G. (1992) Proteins, 14, 191201.[ISI][Medline]
Murzin,A.G., Lesk,A.M. and Chothia,C. (1994a) J. Mol. Biol., 236, 13691381.[ISI][Medline]
Murzin,A.G., Lesk,A.M. and Chothia,C. (1994b) J. Mol. Biol., 236, 13821400.[ISI][Medline]
Reddy,B.V.B. and Blundell,T.L. (1993) J. Mol. Biol., 233, 464479.[ISI][Medline]
Reddy,B.V.B, Nagarajaram,H.A. and Blundell,T.L. (1999) Protein Sci., 8, 573586.[Abstract]
Richardson,J.S. (1981) Adv. Protein Chem., 34, 167339.[Medline]
Richmond,T.J. and Richards,F.M. (1978) J. Mol. Biol., 119, 537555.[ISI][Medline]
Salemme,F.R. (1983) Prog. Biophys. Mol. Biol., 42, 95133.[ISI][Medline]
Sali,A. (1991) PhD Thesis, University of London.
Sali,A. (1995) Curr. Opin. Biotechnol., 6, 437451.[ISI][Medline]
Sali,A. and Blundell,T.L. (1990) J. Mol. Biol., 212, 403428.[ISI][Medline]
Sali,A. and Blundell,T.L. (1993) J. Mol. Biol., 234, 779815.[ISI][Medline]
Sali,A., Potterton,L., Yuan,F., Van Vlijmen,H. and Karplus,M. (1995) Proteins, 23, 318326.[ISI][Medline]
Sanchez,R. and Sali,A. (1997) Curr. Opin. Struct. Biol., 7, 206214.[ISI][Medline]
Smith,D. (1989) SSTRUC: A Program to Calculate Secondary Structural Summary. Department of Crystallography, Birkbeck College, University of London.
Srinivasan,N. and Blundell,T.L. (1993) Protein Engng, 6, 501512.[Abstract]
Srinivasan,S., March,C.J. and Sudarshanam,S. (1993) Protein Sci., 2, 277289.
Sternberg,M.J.E. and Thornton,J.M. (1977a) J. Mol. Biol., 110, 269283.[ISI][Medline]
Sternberg,M.J.E. and Thornton,J.M. (1977b) J. Mol. Biol., 110, 285296.[ISI][Medline]
Sutcliffe,M.J., Haneef,.I., Carney,D. and Blundell,T.L. (1987a) Protein Engng, 1, 377384.[Abstract]
Sutcliffe,M.J., Hayes,F.R.F. and Blundell,T.L. (1987b) Protein Engng, 1, 385392.[Abstract]
Received April 9, 1999; revised August 13, 1999; accepted August 21, 1999.