National Laboratory of Biomacromolecules, Institute of Biophysics, Academia Sinica, Beijing 100101, China
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: active site residues/B-factor/protein dynamics/protein secondary structure/solvent accessibility
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
X-ray analysis showed the occurrence of fluctuations of all atoms in the globular protein. A quantitative measure of the atomic motions in proteins can be obtained from the mean square fluctuations of the atoms relative to their average positions. These can be related to the atomic temperature factor or B-factor, determined in an X-ray diffraction study of a protein crystal (Ringe and Petsko, 1986). The B-factors represent smearing of atomic electron densities around their equilibrium positions due to thermal motion and positional disorder. Analysis of B-factors, therefore, is likely to provide newer insights into protein dynamics, flexibility of amino acids and protein stability (Parthasarathy and Murthy, 2000
). Although the existence of the fluctuations is now well established, our understanding of their biological role in specific areas is rather limited. The conclusions reported in the literature appear to be controversial. A study of lysozyme crystals has revealed that the residues of highest apparent motion are in the active site region where the conformational change is observed on substrate binding (Artymiuk et al., 1979
). On the other hand, Zhang et al. found that the amino acid residues at the active sites of some enzymes have lower B-factors compared with the mean values for the whole molecules (Zhang et al., 1999
). Therefore, it is necessary to examine further the B-factor distribution of proteins with a more reliable statistical method and a larger database of non-homologous proteins.
Statistical analysis has formed a large component of the research efforts aimed at understanding protein structure and function, owing to the enormous diversity and complexity. A variety of statistical analyses have also been performed on conformational states of main chains and side chains (Dunbrack and Karplus, 1993; Swindells et al. 1995
), limited proteolytic sites (Hubbard et al., 1991
), hydrogen bonding (Ippolito et al., 1990
), water structure (Thanki et al., 1991
) and topological features of secondary elements (Levitt and Chothia, 1976
; Taylor and Thornton, 1983
). Most of these analyses are concerned with protein structure and conformation. In contrast, much less attention has been paid to the atomic displacement parameters (Trueblood et al., 1996
), B-factors, obtained from X-ray crystal structure analysis of proteins. Recently, a detailed analysis of the frequency distribution of B-factors of 110 proteins has been made by Parthasarathy and Murthy (Parthasarathy and Murthy, 1997
). They found that although the temperature factors obtained from X-ray refinement of proteins at high resolution show large variations from one structure to another, the B-factors expressed in units of standard deviation about their mean value (normalized B-factor or B'-factor) at the C
atoms show a remarkable characteristic frequency distribution. In this paper, we describe the B'-factor frequency distributions for the active site and non-active site residues in 69 apo-enzyme structures. The analysis was performed over the entire sequences and for different structural subsets defined by the three-dimensional structure of proteins, such as
-helices, ß-structures and coil conformation and buried and non-buried residues. These distributions provide information on the relationship between amino acid residues and the rigidity imposed by them on the polypeptide chain. The results show that in all cases, the active site residues have, on average, lower B'-factors than the non-active site residues, suggesting that the active site residues, in general, are less flexible than the non-active site residues.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Entries for high-resolution structures containing B-values were taken from an unbiased selection of the Protein Data Bank (PDB). Because there are many redundant data in the PDB, our data set was selected initially from more than 18 000 proteins of the PDB with pairwise identity <25% (Hobohm and Sander, 1994). Protein structures determined using NMR spectroscopy and also with incomplete backbones or side chains (except for the first or the last residue of the polypeptide chain) were excluded. Those determined with resolution <2.5 Å and R-factor >0.2 were also removed. Thus, only 69 apo-enzymes of the original entries were used in this analysis (Table I
). This selected database contains a total of 16 919 amino acid residues. Information about the active site residues was obtained from the SITE lines of PDB files. If an apo-enzyme does not contain the SITE annotation but its corresponding holo-enzyme (enzymeligand complex) contains the SITE records, the holo-enzyme was then used to assign the active site residues. The positions of these residues were transferred from the holo-enzyme sequences to their identical apo-enzyme sequences. It was found that of 16 919 amino acid residues in these apo-enzymes, 346 are at active sites. For most of the enzymes in our database, the residues involved in binding and catalysis are dispersed along the amino acid sequence, although they are close in the tertiary structure to form the enzyme-active sites.
|
As a measure of main chain flexibility we chose the temperature factors, i.e. B-factors, of the C atoms. The temperature factors taken from the PDB cannot be used directly, since they may be on different scales owing to the application of different refinement procedures (Tronrud, 1996
). We therefore normalized them for each protein structure to have a distribution of zero mean and unit variance. The selection of 69 unbiased enzyme structures was used to derive normalized B-values. Temperature factors of the C
backbone atoms were taken from the PDB. For each selected protein, the B-value of each C
atom was normalized by the following equation:
![]() | (1) |
where <B> is the average value of all C atoms and
(B) is the standard deviation of the B-value for the chosen protein. The same equation was used previously by other workers (Parthasarathy and Murthy, 1997
; Carugo and Argos, 1998
). Because chain termini are usually very flexible and could have caused bias, three N- and C-terminal residues were omitted from each structure. Normalized B-factors derived from the unbiased structures were used as a measure of the flexibility of the residues in the protein.
Residues were divided into 0.5 unit ranges in B'-factor and the frequency of occurrence of residues in each bin was counted. If nik is the number of active site residues occurring in the conformational state i in the kth bin of B', then knik is the number of active site residues are in the i state,
inik is the number of active site residues in the kth bin and
iknik is the total number of active site residues over all the enzymes. The frequency distribution expressed as a percentage in the kth bin of B' for the active site residues in the conformational state i over the 69 enzyme structures is given by
![]() | (2) |
Secondary structure assignment and surface accessibility calculations
The secondary structure and exposure state of each residue were derived from the DSSP database (Kabsh and Sander, 1983). All eight types of secondary structure were grouped into three kinds:
-helix, ß-strand and coil. Three secondary structure types, H, G and I, were classified as
-helix, residues denoted E were considered as ß-strand and the remaining residues were regarded as coil. All helices shorter than five residues and all strands shorter than three residues were reassigned to coil (Eisenhaber et al., 1996
).
The concept of solvent accessibility was introduced by Lee and Richards (Lee and Richards, 1971). Studies of solvent accessibility in proteins have led to many new insights into protein structure. Often a very crude approximation of residue accessibility has been used: a projection onto two states, i.e. buriedexposed (Janin, 1979
; Hubbard and Blundell, 1987
; Miller et al., 1987
). However, it is not clear a priori how to define the thresholds to distinguish between the two states. In the present study, the solvent accessibility of a residue was calculated from coordinates using the program DSSP (Kabsh and Sander, 1983
). For comparison between amino acids of different sizes, the relative solvent accessibility was calculated as described by Rost and Sander (Rost and Sander, 1994
). Two cutoff values (5 and 16%) were used to distinguish each residue as buried or exposed (Rost and Sander, 1994
). We considered residues to be buried if their relative accessibility was less than the cutoff values.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The individual atomic B-values determined for a number of crystalline proteins vary widely depending on the position of the residue to which they belong in the tertiary structure of the protein. Their relative magnitudes correlate with intramolecular structural features will now be considered. Figure 1 shows the frequency distribution of B'-factors for amino acid residues in
-helix, ß-strand and random coil. The plot corresponds to a total of 16 919 residues in the 69 selected apo-enzymes. It can be seen from this figure that all three distributions are not symmetric about their maximum but instead have one of their tails longer than the other. The overall shapes of the
-helix and ß-sheet distributions are similar, whereas the distribution of B'-factors for random coil residues is relatively flat. The coil distribution has a maximum value at 0.1, which is about 0.5 higher than that for the ß-sheet distribution. The average B'-factors for the helical, sheet and coil residues are 0.14, 0.37 and 0.27, respectively, indicating that the coil residues, in general, have larger B-factors and therefore are more flexible than helical and sheet residues. This is not surprising because the hydrogen-bonded secondary structural elements (
-helices, ß-sheets) tend to have smaller fluctuations than the random coil parts of the protein.
|
|
From the 69 apo-enzymes selected in this study, 346 amino acid residues were identified as catalytic or substrate binding residues. It is found that the different parts of the active site belong to widely different regions of the peptide chainsometimes to different chainsthese parts being brought into suitable proximity by the specific folding of the chains. Figure 3A shows the frequency distribution of B'-factors for the active site and non-active site residues in the 69 apo-enzymes. It can be seen from this figure that the active site residues predominantly occur in regions of low B'-factor and non-active site residues tend to have higher B'-factors. The average B'-factors for the active site and non-active site residues are 0.405 and 0.008, respectively, indicating that the active site residues are, on average, less flexible than the non-active site residues. Since the frequency of occurrence of the 346 active site residues in different regions of the enzymes are significantly different, it is necessary to compare the B'-factor distribution of the active site residues with that of the non-active site residues in each particular conformational state. Table II
shows the distribution of the 346 active site residues in different regions of the enzymes. For comparison, Table II
also includes the frequency of all 16 919 amino acid residues occurring in these structural subsets. It can be seen from this table that the active site residues have an above-average tendency to be located in random coil regions and a below-average tendency to occur in helical regions. For the cutoff values of 5 and 16%, the frequencies of occurrence for the exposed active site residues are 66.8 and 33.5%, respectively. This result indicates that when the higher cutoff value is used, the majority of the active site residues occur in the interior of proteins. A more detailed comparison of the B'-factor distributions for the active site and non-active site residues in different conformational states are shown in Figures 3B
D and
4. These figures clearly indicate that in all cases, the active site residues occur predominantly in regions of low B'-factor and the non-active site residues have a tendency to exist in regions of high B'-factor. The active site helical, sheet and coil residues have average B'-factors of 0.591, 0.616 and 0.248, respectively, which are significantly lower than those of the corresponding non-active site residues. The average B'-factors for different conformational states are summarized in Table III
.
|
|
|
|
The previous analysis is restricted to the active site residues obtained by the SITE records. The PDB SITE records are defined by the depositing authors using highly variable criteria and normally only include the actual catalytic residues. In fact, the entire active site of an enzyme usually contains a wider region, e.g. the substrate binding pocket or cavity. There may be less than a dozen amino acid residues surrounding the absorption pocket of the active site and, of these, only two or three may actually participate in substrate binding and/or catalysis. Therefore, it would be of interest to know whether or not these residues, which might be expected to close around a substrate, are actually significantly more mobile than average in a variety of enzyme structures. To answer this question, the SITE records were used as a preliminary guide to identifying the entire active site region of each enzyme and then the statistical analysis was repeated for these residues. In order to distinguish between these two types of active site residues, the new term non-catalytic active site residues is used here. The non-catalytic active site residues are defined as the three nearest neighbor residues on each side of SITE records. According to this definition, the present selected dataset contains a total of 346 active site residues, 1359 non-catalytic active site residues and 15 214 non-active site residues.
Figure 5 illustrates the B'-factor frequency distribution for the active site, non-catalytic active site and non-active site residues in the 69 apo-enzymes. It can be seen that in this case, the non-catalytic active site residues also have a tendency to occur in regions of low B'-factor compared with the non-active site residues and the distribution curves for the active site and non-catalytic active site residues are very close to each other. The average B'-factors for the active site, non-catalytic active site and non-active site residues are 0.405, 0.393 and 0.045, respectively. The average B'-factors for the active site, non-catalytic active site and non-active site residues in different conformational states are summarized in Table IV
. All of them invariably exhibited similar properties, as observed for the overall distribution. The average B'-factors for the active site and non-catalytic active site residues are always less than those of the non-active site residues, suggesting that the amino acid residues at the active site regions are, on average, less flexible than those occurring in other regions.
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The specificity of enzyme action was recognized over a century ago and on this basis Fisher proposed the lockkey theory as a mechanism for enzymesubstrate binding (Laidler and Bunting, 1973). In this theory, the active sites of enzymes were envisaged to have structures sufficiently rigid to accommodate exactly the complementary structure of the substrates. This implied that the active site is sufficiently structured to perform the catalytic function of the enzyme and is probably more stable than other parts of the enzyme molecule of less functional importance. It was not until the middle of the last century that Koshland advanced the induced fit model of enzyme action in which conformational changes induced by substrate binding at the active site could produce the precise orientation of catalytic groups, relative to the substrate, needed to cause reaction (Koshland, 1958
). Since then, the number of examples where ligand binding and solvation alter protein three-dimensional structures has seemed to increase proportionally with the information available from structural biology. Conformational changes were found in catalytic sites (Bennett and Steitz, 1978
; Artymiuk et al., 1979
; Farnum et al., 1991
; Varley and Pain, 1991
), binding sites (Huston et al., 1988
), antigenic regions (Novotny et al., 1986
), sites susceptible for proteolytic cleavage (Tanaka et al., 1992
), allosteric hinge sites (Barford and Johnson, 1989
), etc. Proteins with similar functions have similar excess of flexibility in their optimum reaction conditions (Vihinen, 1987
; Varley and Pain, 1991
). In addition, X-ray studies have shown that the active sites of enzymes are usually situated at the hinge regions covering two structural domains; it would seem likely, therefore, that the active sites of most enzymes are situated in such a region that is relatively more flexible or mobile than the molecule as a whole and hence more sensitive to some denaturants (Tsou, 1995
). Based on these observations, several methods have been developed to predict antigenic regions and ligand binding sites using the crystallographic temperature factors (Karplus and Schulz, 1985
; Vihinen et al., 1994
; Carugo and Argos, 1998
).
It should be noted that when a ligand binds to a protein, the protein may adjust to accommodate the presence of the new molecule. This new structure has a lifetime as long as the ligand remains bound or as long as the structure of the ligand remains unchanged. X-ray diffraction can observe the initial and final states of such movement but gives no information about the path or time of the motion involved. Since the energy for triggered conformational change comes from the energy of specific interactions, such as breaking and remaking of hydrogen bonds and/or salt bridges, and not from thermal energy, it does not follow that a segment involved in such a movement has to be initially flexible in a dynamic sense. For example, a crystallographic study on -lytic protease showed that the active sites and their relevant regions were not manifested as high temperature factors, although they are capable of significant conformational changes in both the main chains and side chains when combined with a boronic acid inhibitor (Bone et al., 1987
). In the present study, a detailed comparison of the B-factor distributions between the active site and non-active site residues has been made. Our analysis unambiguously demonstrates that the active site residues occur predominantly in the low B-factor region. These results suggest that the mobile residues are not simply flexible (moving in a broad energy minimum with no barriers), but instead are plastic (populating a number of closely spaced, discrete conformational energy wells). Flexibility then represents vibrational motion within an energy well, whereas plasticity allows for transformation between discrete energy wells. Transition from one conformational energy well to another generally occurs by overcoming a potential energy barrier (Bone et al., 1989
; Rader and Agard, 1997
). From the results described above, it seems likely that the catalytic effectiveness of an enzyme depends very critically on the relative positions of active-site residues and the vibrational and the fast collective motions of the C
atoms of proteins appear not to have clear biological significance, although they could well influence thermodynamic state functions. Therefore, the importance of local dynamics to enzyme function needs to be carefully re-investigated. It would be of great interest to compare the results of NMR relaxation experiments with crystallographic data and see whether these very different techniques reveal a similar view of the dynamics of the active sites.
![]() |
Notes |
---|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Barford,D. and Johnson,L.N. (1989) Nature, 340, 609616.[CrossRef][ISI][Medline]
Bennett,W.S. and Steitz,T.A. (1978) Proc. Natl Acad. Sci. USA, 75, 48484852.[Abstract]
Bernstein,F.C. et al. (1977) J. Mol. Biol., 112, 535542.[ISI][Medline]
Bone,R., Shenvi,A.B., Kettner,C.A. and Agard,D.A. (1987) Biochemistry, 26, 191195.
Bone,R., Silen,J.L. and Agard,D.A. (1989) Nature, 339, 191195.[CrossRef][ISI][Medline]
Carugo,O. and Argos,P. (1998) Proteins: Struct. Funct. Genet., 31, 201213.[CrossRef][ISI][Medline]
Chothia,C., Lesk,A., Dodson,G.G. and Hodgkin,D.C. (1983) Nature, 302, 500505.[ISI][Medline]
Dobson,C.M. and Karplus,M. (1986) Methods Enzymol., 131, 362389.[Medline]
Dunbrack,R.L.,Jr and Karplus,M. (1993) J. Mol. Biol., 230, 543574.[CrossRef][ISI][Medline]
Eisenhaber,F., Frömmel,C. and Argos,P. (1996) Proteins: Struct. Funct. Genet., 25, 157168.[CrossRef][ISI][Medline]
Farnum,M.F., Magde,D., Howell,E.E., Hirai,J.T., Warren,M.S., Grimsley,J.K. and Kraut,J. (1991) Biochemistry, 30, 1156711579.[ISI][Medline]
Fersht,A. (1985) Enzyme Structure and Mechanism. Freeman, San Francisco, pp. 3241.
Gerstein,M., Lesk,A. and Chothia,C. (1991) Biochemistry, 22, 67396749.
Hobohm,U. and Sander,C. (1994) Protein Sci., 3, 522524.
Hubbard,T.J.P. and Blundell,T.L. (1987) Protein Eng., 1, 159171.[Abstract]
Hubbard,S.J., Campbell,S.F. and Thornton,J.M. (1991) J. Mol. Biol., 220, 507530.[ISI][Medline]
Huston,E.E., Grammer,J.C. and Yount,R.G. (1988) Biochemistry, 27, 89458952.[ISI][Medline]
Ippolito,J.A., Alexander,R.S. and Christianson,D.W. (1990) J. Mol. Biol., 215, 457471.[ISI][Medline]
Janin,J. (1979) Nature, 277, 491492.[ISI][Medline]
Kabsh,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[ISI][Medline]
Koshland,D.E.,Jr. (1958) Proc. Natl Acad. Sci. USA, 44, 98104.[ISI]
Karplus,M. (1986) Methods Enzymol., 131, 283307.[Medline]
Karplus,P.A. and Schulz,G.E. (1985) Naturwissenschaften, 72, 212213.[ISI]
Laidler,K.J. and Bunting,P.S. (1973) The Chemical Kinetics of Enzyme Action. Oxford University Press, Oxford, pp. 254291.
Lee,B.K. and Richards,F.M. (1971) J. Mol. Biol., 55, 379400.[ISI][Medline]
Lesk,A. and Chothia,C. (1984) J. Mol. Biol., 174, 175191.[ISI][Medline]
Levitt,M. and Chothia,C. (1976) Nature, 261, 552558.[ISI][Medline]
Miller,S., Janin,J., Lesk,A.M. and Chothia,C. (1987) J. Mol. Biol., 196, 641656.[ISI][Medline]
Novotny,J., Handschumacher,H., Haber,E., Bruccoleri,G.D., Carlson,W.B., Fanning,D.W., Smith,J.A. and Rase,G.D. (1986) Proc. Natl Acad. Sci. USA, 83, 226230.[Abstract]
Opella,S.J. (1986) Methods Enzymol., 131, 327361.[Medline]
Parak,F. and Reinisch,L. (1986) Methods Enzymol., 131, 568607.[Medline]
Parthasarathy,S. and Murthy,M.R.N. (1997) Protein Sci., 6, 25612567.
Parthasarathy,S. and Murthy,M.R.N. (2000) Protein Eng., 13, 913.
Petsko,G.A. and Ringe,D. (1984) Annu. Rev. Biophys. Bioeng., 13, 331371.[CrossRef][ISI][Medline]
Rader,A.D. and Agard,D.A. (1997) Protein Sci., 6, 13751386.
Ringe,D. and Petsko,G.A. (1986) Methods Enzymol., 131, 389433.[Medline]
Rost,B. and Sander,C. (1994) Proteins: Struct. Funct. Genet., 20, 216226.[ISI][Medline]
Swindells,M.B., MacArthur,M.W. and Thornton,J.M. (1995) Nature Struct. Biol., 2, 596603[ISI][Medline]
Tanaka,T., Kato,H., Nishioka,T. and Oda,J. (1992) Biochemistry, 32, 22592265.
Taylor,W.R. and Thornton,J.M. (1983) Nature, 301, 540542.[ISI][Medline]
Thanki,N., Umrania,Y., Thronton,J.M., Goodfellow,J.M. (1991) J. Mol. Biol., 221, 669691.[CrossRef][ISI][Medline]
Tronrud,D.E. (1996) J. Appl. Crystallogr., 29, 100104[CrossRef][ISI]
Trueblood,K.N., Burgi,H.B., Burzlaff,H., Dunitz,J.D., Gramaccioli,C.M., Schulz,H.H., Shmueli,U. and Abrahams,S.C. (1996) Acta Crystallogr., A52, 770781.
Tsou,C.L. (1995) Biochim. Biophys. Acta, 1253, 151162.
Varley,P.G. and Pain,R.H. (1991) J. Mol. Biol., 220, 531538.[ISI][Medline]
Vihinen,M. (1987) Protein Eng., 1, 477480.[Abstract]
Vihinen,M., Torkkila,E. and Riikonen,P. (1994) Proteins: Struct. Funct. Genet., 19, 141149.[ISI][Medline]
Wagner,G. and Wüthrich,K. (1986) Methods Enzymol., 131, 307326.[Medline]
Zhang,H.L., Song,S.Y. and Lin,Z.J. (1999) Sci. China, 42, 225232.
Received July 24, 2001; revised October 21, 2002; accepted December 10, 2002.