Protein thermal stability: insights from atomic displacement parameters (B values)

S. Parthasarathy and M.R.N. Murthy1

Molecular Biophysics Unit, Indian Institute of Science,Bangalore 560 012, India


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
The factors contributing to the thermal stability of proteins from thermophilic origins are matters of intense debate and investigation. Thermophilic proteins are thought to possess better packed interiors than their mesophilic counterparts, leading to lesser overall flexibility and a corresponding reduction in surface-to-volume ratio. These observations prompted an analysis of B values reported in high-resolution X-ray crystal structures of mesophilic and thermophilic proteins. In this analysis, the following aspects were addressed: (1) frequency distribution of normalized B values (B' factors) over all the proteins and for individual amino acids; (2) amino acid compositions in high B value regions of polypeptide chains; (3) variation in the B values from core to the surface of proteins in terms of their radius of gyration; and (4) degree of dispersion of normalized B values in spheres around the C{alpha} atoms. The analysis revealed that (1) Ser and Thr have lesser flexibility in thermophiles than in mesophiles, (2) the proportion of Glu and Lys in high B value regions of thermophiles is higher and that of Ser and Thr is lower and (3) the dispersion of B values within spheres at C{alpha} atoms is similar in mesophiles and thermophiles. These observations reflect plausible differences in the dynamics of thermophilic and mesophilic proteins and suggest amino acid substitutions that are likely to change thermal stability.

Keywords: protein dynamics/protein stability/temperature factors/thermophiles/X-ray structures


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Increasing the thermal stability of proteins is one of the goals of protein engineering studies. This goal is of commercial importance for industries where biocatalysts are used in extreme conditions to achieve higher solubility of substrates. It is, therefore, of primary importance to understand the factors that contribute to thermal stability. Knowledge regarding these factors has been accumulated from both experimental [mutational studies, especially on T4 lysozyme (Matthews, 1995Go)] and computational methods. Proteins from thermophilic origins function optimally at temperatures where most of their mesophilic counterparts will undergo denaturation. In spite of this dramatic increase in temperature optimum, sequential, structural, functional and chemical characteristics of thermophilic proteins are comparable to those of their respective mesophilic counterparts.

Various factors have been shown to contribute to the stability of proteins from thermophiles (Russell and Taylor, 1995Go; Querol et al., 1996Go; Jaenicke and Bohm, 1998Go; Ladenstein and Antranikian, 1998Go). Analysis of the amino acid composition of helices in thermophilic proteins appears to indicate that Tyr, Gly and Gln are enhanced whereas Val is surpressed compared with those of mesophilic proteins (Warren and Petsko,1995Go). It has also been suggested that Lys->Arg and Ser->Ala are the most frequent mutations in mesophilic to thermophilic substitutions (Arias and Argos, 1989Go). The importance of electrostatic interactions (Goldman, 1995Go; Hennig et al., 1995Go; Yip et al., 1995Go; Xiao and Honig, 1999Go), increased compactness, shortening of loops, increased hydrophobicity and decreased flexibility of {alpha}-helical segments and subunit interfaces (Kelly et al., 1993Go; Russell et al., 1997Go) have been proposed as important factors conferring thermal stability. In the case of Che Y protein from Thermotoga maritama, thermal stability appears to be achieved by factors leading to the lowering of the entropy of unfolding (Usher et al., 1998Go). Analysis of complete genome sequences has suggested loop deletion as a mechanism for thermal stability (Thompson and Eisenberg, 1999Go). Surface and volume analysis has indicated that proteins from mesophilic and thermophilic origins cannot be distinguished in terms of packing criteria (Karshikoff and Ladenstein, 1998Go). All these studies suggest that in thermophilic proteins stability is achieved through cooperative optimization of several subtle factors rather than any one predominant interaction.

The atomic displacement parameters (B values) determined by high-resolution X-ray crystallographic studies represent smearing of atomic electron densities around their equilibrium positions due to thermal motion and positional disorder. Analysis of B values, therefore, is likely to provide newer insights into protein dynamics, flexibility of amino acids and protein stability. Molecular dynamics studies have suggested that protein unfolding might be initiated at sites that are prone to large thermal fluctuations (Daggett and Levitt, 1992Go; Lazaridis et al., 1997Go). Therefore, the pattern of B values determined by high-resolution X-ray crystallographic studies might contain information regarding protein stability. The correlation between experimentally observed B values and stability, unlike the contributions of various other interactions, has not been examined in any great detail.

The distribution of B values in high-resolution crystal structures has been shown to fit accurately the sum of two Gaussian functions (Parthasarathy and Murthy, 1997Go, 1998Go). Flexibility indices of individual amino acids derived from the fitted curve reflect the dynamics of the respective amino acids. Examination of the correlation between average main-chain and side-chain B values reveals the effect of restraints used by the crystallographers for the refinement of B values and has brought out the need to have better restraints on B values (Parthasarathy and Murthy, 1999Go). It has also been demonstrated that the distribution of B values reflects the special dynamic properties associated with some proteins and could possibly be used as a validation tool. In this paper, we report the analysis of B values obtained from the crystal structures of thermophilic proteins. The degrees of dispersion in the B' factors (normalized B values) associated with atoms in spheres placed at each C{alpha} atomic position in mesophilic and thermophilic proteins are comparable. Similarly, the variation of B values from the centroid towards the surface, which is likely to depend on packing density, does not appear to be significantly different in the two sets of proteins. Although the overall frequency distribution is similar, the distributions for some amino acids, especially for Ser and Thr, are different, reflecting the role played by these residues in imparting stability to thermophilic proteins. Examination of regions of high temperature factors shows that the compositions of some residues are significantly different between thermophiles and mesophiles. These observations might be related to the role played by some key residues in imparting thermal stability.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Selection of protein structures

Ninety-three mesophilic protein coordinates used in this analysis were chosen from the representative list (Hobohm and Sander, 1994Go) of PDB (Bernstein et al., 1977) entries released in November 1996. These structures have resolutions better than 2.0 Å and R factors <0.2 (Table IGo). Twenty-one thermophilic structures with resolution better than 2.5 Å were used for the analysis (Table IGo). Between any two of the mesophilic structures the maximum sequence similarity was 25%, while no sequence similarity criterion was applied for thermophilic structures. However, except for two structures, the thermophilic data are also non-redundant (Karshikoff and Ladenstein, 1998Go).


View this table:
[in this window]
[in a new window]
 
Table I. PDB codes for the high-resolution mesophilic and thermophilic structures selected for the analysis
 
Frequency distribution of B factors

The B values at C{alpha} atoms of each selected protein were replaced by normalized B' factors defined as, B' = (B – <B>)/{sigma}(B), where <B> is the mean B value at C{alpha} atoms and {sigma}(B) is their standard deviation. Frequencies of residues in 0.5 unit ranges in B' factors were counted. Various frequency distributions for individual amino acids and individual proteins and the overall distribution were counted and fitted analytically by least-squares minimization to the sum of two Gaussian functions (Parthasarathy and Murthy, 1997Go, 1998Go), f = k1exp[–k2(B'–B1)2 + k3exp[–k4(B'–B2)2], where k1, k2, k3, k4, B1 and B2 are parameters defining the two Gaussians. The constants for the second Gaussian were determined after fitting the first Gaussian to stabilize the minimization. The areas under the two Gaussians, A1 and A2, are given by A1 = k1({pi}/k2)1/2 and A2 = k3({pi}/k4)1/2. The fractional areas under the second Gaussian, p = A2/(A1 + A2), for individual amino acids, representing the propensity to occur with high B value, were calculated.

Distribution of high B values

Amino acids with B values greater than <B> + 0.5{sigma}(B) were considered as high B value residues. The amino acid compositions of residues with high B values in mesophilic and thermophilic proteins were determined and compared. The numbers of stretches of consecutive high B value residues of length 1–5 were counted. A similar analysis was also performed with a high B value threshold of <B> + 0.75{sigma}(B).

Dispersion of B values in spheres placed at C{alpha} positions

Spheres of radius 5.0 and 7.5 Å were placed at each C{alpha} atom. The relatedness of the B values of atoms in these spheres was analysed by calculating the r.m.s. deviation of their B' factors. The frequency distributions of these r.m.s. values were determined. Also, a plausible correlation that might exist between the mean B' factors in these spheres and the corresponding atomic packing was examined.

Variation of B values with distance from centroid

For each protein, the radius of gyration was calculated as Rg = ({Sigma}|rirc|2/n)1/2, where ri and rc represent the positions of C{alpha} atoms and the centroid of the molecule, respectively. Mean B' factors in spherical shells of radius expressed in terms of Rg were computed for thermophilic and mesophilic proteins.


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
B' factor frequency distribution

Figure 1Go shows the overall frequency distribution of B' factors for mesophilic and thermophilic proteins. The plots represent curves fitted as the sum of two Gaussian functions and correspond to 30 960 amino acids for mesophiles and 10 469 amino acids for thermophiles. The six parameters characterizing the double Gaussian function are very similar for the two curves. The fractional areas under the second Gaussian, the p-values, are 0.357 and 0.361 for mesophilic and thermophilic proteins, respectively. Table IIGo gives the p-values for individual amino acids. It can be seen that Glu, Leu, Tyr and Gln have higher p-values in thermophiles than mesophiles whereas Cys, Asn, Pro, Arg and Ser have lower p-values.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 1. Frequency distribution of B' factors for mesophilic and thermophilic protein structures used in this analysis

 

View this table:
[in this window]
[in a new window]
 
Table II. p-Values of amino acids for a bin size of 0.5
 
Length distribution of high B value stretches

The frequency of occurrence of consecutive high B values [defined as <B> + 0.5{sigma}(B)], stretches of length 1–5, were determined for mesophilic and thermophilic proteins. Table IIIGo lists the relevant statistics. It can be seen that there is no substantial difference in terms of the length distribution or frequency of amino acids found in these stretches between thermophiles and mesophiles. This observation suggests that the thermophilic and mesophilic proteins do not differ in the occurrence of segments of high or low mobilities.


View this table:
[in this window]
[in a new window]
 
Table III. Frequency of occurrence of consecutive high B value stretches of length 1–5
 
Amino acid composition of residues with high B values

Figure 2a Goshows the scatter plot of overall amino acid composition in mesophiles and thermophiles. Table IVGo gives the corresponding statistics. Figure 2b and c Goshow scatter plots of high B value residues. As can be seen in Figure 2aGo and Table IVGo, the overall amino acid compositions are very similar in mesophiles and thermophiles (correlation coefficient 0.89). The average correlation coefficient, however, between compositions of high B value residues in mesophiles and thermophiles is 0.77. The residues Glu, Lys, Ser and Thr are outliers in these plots (Figure 2b and cGo and Table IVGo). Notably, the percentage Glu residues in high B value regions is nearly twice and that of Lys is nearly 1.5 times higher in thermophiles than mesophiles. In contrast, the percentages of Ser and Thr in high B value regions of thermophiles are decreased by half (Table IVGo). These are also related to larger p-values (Table IIGo) for Glu, Lys and smaller p-values for Ser and Thr in thermophiles. Similar observations were made when the B values of whole residues instead of those associated with C{alpha} alone were examined (data not shown).



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 2. Scatter plots showing amino acid compositions in mesophiles and in thermophiles. (a) Overall composition; (b) composition of residues with B > <B> + 0.50{sigma}(B); (c) composition of residues with B > <B> + 0.75{sigma}(B). Glu, Lys,Ser and Thr are marked in high B value plots.

 

View this table:
[in this window]
[in a new window]
 
Table IV. Amino acid composition of residues with high B values
 
Dispersion of B values within spheres at C{alpha} atoms

It is expected that in the well packed interior of proteins, atoms in close proximity will have correlated displacements, which will be reflected in the B values. If thermophiles have better packed interiors than mesophiles, atoms in spherical volumes around a large number of C{alpha} positions will have low r.m.s. B values in thermophiles than mesophiles. Figure 3a and b Goshow the frequency distribution of r.m.s. B' factors in these spherical regions for mesophiles and thermophiles, for radii of 5.0 and 7.5 Å, respectively. There is no significant difference in peak position or frequency in each bin between mesophilic and thermophilic proteins. This agrees with an earlier report that mesophilic and thermophilic proteins do not differ significantly with respect to packing interactions (Karshikoff and Ladenstein, 1998Go).



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 3. Frequency distribution of r.m.s. of B' factors within spheres placed at C{alpha} atomic positions for mesophilic and thermophilic proteins (a) for sphere radius 5.0 Å; (b) for sphere radius 7.5 Å.

 
B' factor distribution in spherical shells around the molecular centre

It is known that B values tend to increase continuously from the core of the protein to its surface (Bhaskaran and Ponnuswamy, 1988Go). It is of interest, therefore, to compare the increment in B values from the core of the protein to the exterior between mesophiles and thermophiles. Hence the increment in B' factors in spherical shells of radius corresponding to a specified fraction of the radius of gyration of the molecule was examined. If the interiors of thermophiles are better packed than those of mesophiles, the slope of this incremental curve will be smaller for thermophiles (Figure 4Go). It is clear from the plot that the two curves are almost identical except for fluctuations observed in shells near the outer surface of mesophiles. This fluctuation at outer shells shown by the mesophiles may be due to loops extending out from the rest of the protein.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 4. Plot of mean B' factor in bins of fractions of radius of gyration for mesophilic and thermophilic proteins.

 

    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Increasing the thermal stability of proteins has been one of the primary goals of protein engineering. Sequence comparisons between proteins from mesophilic and thermophilic origins have been performed in order to gain information about the possible mutations leading to increased thermal stability. Based on such studies, `traffic rules' have been discussed by several researchers. Lys->Arg and Ser->Ala appear to be the top two amino acid substitutions (in helical segments) from mesophilic to thermophilic protein sequences (Arias and Argos, 1989Go). Mutations of Ser to Ala and Thr to Ala in mesophilic lactate dehydrogenase have been reported to enhance the stability of the mutant by 20°C or more compared with the wild-type enzyme (Kotik and Zuber, 1993Go). Bohm and Jaenike (1994), however, have concluded that the proposed `traffic rules' of molecular adaptation cannot be used to predict extremophilic behaviour. An overall increase in conformational rigidity is also a factor contributing to the adaptation of proteins to extreme environments. It is likely that the dynamic features of thermophilic and mesophilic proteins have significant and recognizable differences. The degree to which such differences, if they exist, could be inferred from critical examination of crystallographic refinement of high-resolution structures needs to be evaluated. The preliminary analysis presented here does indeed suggest that the crystallographically determined atomic displacement parameters might actually convey information regarding thermal stability and highlight the differences between the dynamics of thermophilic and mesophilic proteins.

Better hydrogen bonding networks, electrostatic interactions and better internal packing have been suggested by various investigators as factors contributing to the stability of thermophilic proteins (Russell and Taylor, 1995Go; Querol et al., 1996Go; Jaenicke and Bohm, 1998Go; Ladenstein and Antranikian, 1998Go). Karshikoff and Ladenstein (1998) have analysed the packing density in mesophilic and thermophilic proteins and concluded that packing density is not a dominant factor contributing to the thermal stability. However, contradictory results were obtained when glyceraldehyde-3-phosphate dehydrogenase and glutamate dehydrogenase from mesophilic and thermophilic organisms were compared for their accessible surface area (Korndorfer et al., 1997Go; Knapp et al., 1997Go). Analyses of variations in B' factors around spheres at C{alpha} atoms and increment of B' factors from the core to the surface of proteins, presented here, show that thermophlic proteins are not very different from mesophlic proteins in these aspects. Packing differences, if any, are thus not reflected in the atomic displacement parameters.

The most significant observation in the present analysis is that Glu and Lys are enhanced whereas Ser and Thr are suppressed in high B value regions of thermophiles in comparison with mesophiles. The juxtaposition of these four residues is perhaps important in imparting thermal stability. These residues may be suitable candidates for site-specific mutations leading to enhanced stability. This also suggests that mutation of high B value Ser and Thr could lead to an improvement of thermal stability. The mutational experiments on lactate dehydrogenase by Kotik and Zuber (1993) and streptococcal protein G ß1 domain (Malakauskas and Mayo, 1998Go) involving Ser and Thr residues have led to considerable increases in thermal stability, suggesting that the conclusions drawn from the present analysis are likely to be significant. These results also suggest that the `traffic rules' of amino acid replacements need to be revised with reference to amino acid flexibility.


    Acknowledgments
 
This work was supported by a grant from the Council of Scientific and Industrial Research, India, to M.R.N.M. S.P. thanks the same agency for a research fellowship.


    Notes
 
1 To whom correspondence should be addressed.Email: mrn{at}mbu.iisc.ernet.in Back


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Arias,L.M. and Argos,P. (1989) J. Mol. Biol., 206, 397– 406.[ISI][Medline]

Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchui,T. and Tasumi,M. (1997) J. Mol. Biol., 112, 535–542.

Bhaskaran,R. and Ponnuswamy,P.K. (1988) Int. J. Pept. Protein Res., 32, 241–255.[ISI]

Bohm,G. and Jaenicke,R. (1994) Int. J. Pept. Protein Res., 43, 97–106.[ISI][Medline]

Daggett,V. and Levitt,M. (1992) Proc. Natl Acad. Sci. USA, 89, 5142–5146.[Abstract]

Goldman,A. (1995) Structure, 3, 1277–1279.[ISI][Medline]

Hennig,M., Darimont,B., Sterner,R., Kirschner,K. and Jansonius,J.N. (1995) Structure, 3, 1295–1306.[ISI][Medline]

Hobohm,U. and Sander,C. (1994) Protein Sci., 3, 522–524.[Abstract/Free Full Text]

Jaenicke,R. and Bohm,G. (1998) Curr. Opin. Struct. Biol., 8, 738–748.[ISI][Medline]

Karshikoff,A. and Ladenstein,R. (1998) Protein Engng, 11, 867–872.[Abstract]

Knapp,S., de Vos,W.M., Rice,D. and Ladenstein,R. (1997) J. Mol. Biol., 267, 916–932.[ISI][Medline]

Kelly,C.A., Nishiyama,M., Ohnishi,Y., Beppu,T. and Birkoft,J.J. (1993) Biochemistry, 32, 3913–3922.[ISI][Medline]

Korndorfer,I., Steipe,B., Huber,R., Tomschy,A. and Jaenicke,R. (1997) J. Mol. Biol., 246, 511–521.

Kotik,M. and Zuber,H. (1993) Eur. J. Biochem., 211, 267–280.[Abstract]

Ladenstein,R. and Antranikian,G. (1998) Adv. Biochem. Engng Biotechnol., 61, 37–85.

Lazaridis,T., Lee,I. and Karplus,M. (1997) Protein Sci., 6, 2589–2605.[Abstract/Free Full Text]

Malakauskas,S.M. and Mayo,S.L. (1998) Nature Struct. Biol., 5, 470–475.[ISI][Medline]

Matthews,B.W. (1995) Adv. Protein Chem., 46, 249–278.[ISI][Medline]

Parthasarathy,S. and Murthy,M.R.N. (1997) Protein Sci., 6, 2561–2567.[Abstract/Free Full Text]

Parthasarathy,S. and Murthy,M.R.N. (1998) Protein Sci., 7, 525.[Free Full Text]

Parthasarathy,S. and Murthy,M.R.N. (1999) Acta Crystallogr., D55, 173–180.[ISI]

Querol,E., Perez-Pons,J.A. and Villarias,A.M. (1996) Protein Engng, 9, 265–271.[Abstract]

Russell,R.J.M. and Taylor,G.L. (1995) Curr. Opin. Biotechnol., 6, 370–374.[ISI][Medline]

Russell,R.J.M., Ferguson,J.M.C., Hough,D.W., Danson,M.J. and Taylor,G.L. (1997) Biochemistry, 36, 9983–9994.[ISI][Medline]

Thompson,M.J. and Eisenberg,D. (1999) J. Mol. Biol., 290, 595–604.[ISI][Medline]

Usher,K., De la Cruz,A., Dahlquist,F., Swanson,R., Simon,M. and Remington,S. (1998) Protein Sci., 7, 403–412.[Abstract/Free Full Text]

Warren,G.L. and Petsko,G.A. (1995) Protein Engng, 9, 905–913.[Abstract]

Xiao,L. and Honig,B. (1999) J. Mol. Biol., 289, 1435–1444.[ISI][Medline]

Yip,K.S.P., Stillman,T.J., Britton,K.L., Artymuick,P.J., Baker,P.J., Sedelniova, S.E., Engel,P.C., Pasquo,A., Chiaralauce,R., Consavi,V., Scandurra,R. and Rice,D.W. (1995) Structure, 3, 1147–1158.[ISI][Medline]

Received April 23, 1999; revised September 12, 1999; accepted October 1, 1999.