1 Intramural Research Support Program, SAIC Frederick, 2 Laboratory of Experimental and Computational Biology, National Cancer Institute, Frederick Cancer Research and Development Center, Bldg 469, Rm 151, Frederick, MD 21702, USA and 3 Sackler Institute of Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: melting temperature/sequence/structure/thermophiles/thermostability
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Thermostable proteins maintain their activities and are stable at high temperatures. Identifying and understanding the factors contributing to the stability of proteins from organisms living under extreme conditions has been a long standing problem. The first high resolution crystal structure of thermolysin was reported in 1974 (Matthews et al., 1974). Perutz and Raidt (1975) commented on the stereochemical basis of thermostability of ferredoxins and hemoglobin A2. Since these pioneering efforts, several investigators have focused on the problem of the molecular basis of protein thermostability. Several reasons have been attributed to the greater stability of the thermophilic proteins (Querol et al., 1996
; Jaenicke and Bohm, 1998
; Ladenstein and Antranikian, 1998
). Among the most prominent ones are greater hydrophobicity (Haney et al., 1997
), better packing, deletion or shortening of loops (Russell et al., 1997
), smaller and less numerous cavities, increased surface area buried upon oligomerization (Salminen et al., 1996
), amino acid substitutions within and outside the secondary structures (Zuber, 1988
; Haney et al., 1997
; Russell et al., 1998
), increased occurrence of proline residues (Haney et al., 1997
; Watanabe et al., 1997
; Bogin et al., 1998
), decreased occurrence of thermolabile residues (Russell et al., 1997
), increased helical content, increased polar surface area (Haney et al., 1997
; Vogt and Argos, 1997
; Vogt et al., 1997
), increased hydrogen bonding (Vogt and Argos, 1997
; Vogt et al., 1997
) and salt bridges (Yip et al., 1995
, 1998
; Haney et al., 1997
; Russell et al., 1997
, 1998
; Elcock, 1998
; Xiao and Honig, 1999
; Kumar et al., 2000
).
Here we present a statistical analysis of parameters thought to contribute toward protein thermostability. We have carried out structural comparisons to cluster the thermophile mesophile protein families, creating a non-redundant dataset of 18 families from the Protein Data Bank (PDB) (Bernstein et al., 1977). These families span an entire spectrum, containing proteins from moderately thermophilic to hyperthermophilic organisms and their mesophilic homologs. Not all the differences observed between the thermophilic and mesophilic proteins are due to thermostability. Here we select one pair from each family. We choose the structurally most similar thermophilemesophile pair having the best resolution, so that the observed differences can be expected to be mostly due to thermostability. In our dataset, no two thermophilic proteins from different families have similar three-dimensional structures, ensuring a bias free sample. Between each thermophilemesophile pair, we have compared several structural properties such as oligomeric state, insertion/deletion of residues, compactness, hydrophobicity, helical content, hydrogen bonds and salt bridges. We find that most of these do not show consistent trends across the families, indicating versatile protein stabilization strategies adopted by the individual families. However, there are a few global trends across a large number of families. Salt bridges and side-chain hydrogen bonds increase in most of the thermophilic proteins. Interestingly, the overall amino acid distributions in the thermophilic and the mesophilic proteins are significantly different, in spite of the high sequence homologies between the protein structural pairs. The proportions of the thermolabile residue Cys and of Ser decrease significantly, while those of Arg and Tyr increase significantly in the thermophilic proteins as compared with their mesophilic homologs. Pro is observed to occur less frequently in
-helices of the thermophilic proteins. On the whole, a higher proportion of amino acids in the thermophilic proteins adopt
-helical conformation. Our results indicate a two pronged strategy adopted by the thermophiles. Thermophilic proteins appear to disfavor potentially destabilizing factors along with favoring the potentially stabilizing ones. Furthermore, here we compare our results with those obtained from an analysis of a database of 165 non-homologous proteins.
Our intention was to carry out the analysis with respect to the melting temperatures of the corresponding proteins, from both the thermophiles and the mesophiles. Melting temperatures (Tm's), are the best descriptor of thermal stability. To be able to draw reliable conclusions, we wished to focus on cases where (i) high resolution crystal structures are available for both the thermophilic protein and its mesophilic homolog; and (ii) melting temperatures for the thermophilic and mesophilic proteins have been measured and reported. Cases where the difference between the melting temperatures of the thermophilicmesophilic protein pair is not too small, and that the size of the protein is large enough, are the more meaningful ones. Too small a difference in the melting temperatures corresponds to a small difference in energy between the pair of proteins; whereas if the protein is small, the differences in structural parameters might be difficult to gauge accurately. Unfortunately, only a few cases are currently available in the literature. In these cases, the difference in the number of salt bridges between the thermophile and its mesophile homologue appears to correlate with the Tm of the thermophilic protein. While other structural factors, such as compactness and hydrophobicity, contribute to thermostability, no consistent correlation with the Tm is observed. However, we are unable to obtain statistically reliable results due to the sparse data. On the other hand, we point out that none of the structural factors correlates with the living temperatures of the thermophilic organisms.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
An index file, called source.idx, in the Protein Data Bank (PDB) (Bernstein et al., 1977) contains the names of the organisms for all protein crystal structures available in the PDB. The January 7, 1998 update of this file was searched for the keywords THERM and PYRO. This search yielded 167 (out of 6751) PDB entries containing different proteins from thermophilic organisms. The entries in which protein structures had been determined by using nuclear magnetic resistance (NMR) and/or theoretical modeling, R = 1.0 Å in cmpd_res file, were discarded, leaving us with 145 PDB entries. From this set of entries containing proteins whose structures were determined by X-ray crystallography, 113 entries containing high resolution (R
2.5 Å) structures for 55 different thermophilic proteins were selected for further study. For each of the thermophilic proteins in the list, the PDB entry with the best resolution was picked. Three-dimensional structures of the thermophilic proteins were compared all against all using a sequence order independent structural comparison technique (Tsai et al., 1996
). This computer vision-based technique superimposes spatially equivalent regions in two proteins without regard to their sequential connectivity, or to the number of residues in the protein. Since the mesophilic and thermophilic proteins have different sizes and may have different oligomeric states, this technique allows us to superimpose the conserved regions of the proteins independently of these factors. Two proteins are considered to be dissimilar if (i) the backbone C
atom superposition for the two structures yields an r.m.s.d.
2.00 Å; and (ii) the sequence identity (ID) for the two proteins is
20%. Finally, thermophilic proteins were retained in the database if they have dissimilar structures and if there is at least one high resolution crystal structure for their corresponding mesophilic homologs. This step ensures non-redundancy in the database. Eighteen different thermophilic proteins were obtained. The structure of each of the 18 proteins was compared with their corresponding homologous PDB entries. Two structures were considered to be similar if they did not satisfy both of the above conditions. At this stage, many families contain several mesophilic proteins. Application of a 2.5 Å resolution cut-off substantially decrease their number. Finally, the PDB entry which has the best resolution and contains the structure that is most similar to the thermophilic protein is selected. As far as possible, we have tried to select wild-type thermophilemesophile pairs. Attention was also paid to the presence (absence) of substrates in the thermophilic and mesophilic proteins. Choosing one thermophilemesophile pair per family, in a way such that the pair contains the best resolved structures along with the largest sequence and structure homology among the various available alternates, has several advantages. First, since the two proteins are most similar, the observed differences can be correlated with thermostability with a greater degree of confidence. Second, the variability, or the consistency of the results, can be judged from the behavior of all 18 families; and third, in particular, the behavior of the parameters is a function of two factors: the extent of structural similarity between the two molecules and the sequence similarity. The non-polar buried surface area, compactness, etc. obtained in comparisons of members of the same family would need to be calibrated against the sequence differences, and it is unclear how best to do this in practice. In an extensive recent analysis, Vogt et al. (1997) have used multiple mesophilic homologs for comparison with the thermophilic proteins. They have calibrated specific protein structural properties per 10°C rise in living temperature of the organisms in a given family. The statistical trends obtained by Vogt et al. (1997) and by us are similar, indicating the equivalence of the two approaches.
The properties of these 18 pairs of thermophilic and mesophilic proteins are summarized in Table I. The best matching protein chains in each family are indicated in the footnotes of Table I
. One PDB entry for the mesophilic protein elongation factor EF-TU-EF-TS complex (PDB entry 1EFU) from Escherichia coli is an A2B2 type tetramer with chains of type A and B being highly dissimilar. This particular protein complex has two different homologs in the thermophilic proteins, namely, EF-TU (PDB entry 1EFT) and EF-TS (PDB entry 1TFE). Furthermore, 1TFE, a dimer, matches with a single chain, 1EFU-B. The asymmetric unit of lactate dehydrogenase crystals from Bacillus stearothermophilus (PDB entry 1LDN) contains two copies of the molecule. The first copy has been used in this analysis. In all the families, the spatially overlapping regions in the superposition of the thermophilic and mesophilic proteins are very extensive. For example, in the citrate synthase family, where the similarity between the thermophilic and mesophilic proteins is relatively poor as compared with most other families, 332 residues in each chain overlap spatially. A chain of thermophilic citrate synthase (1AJ8-B) has 370 residues while a chain of mesophilic citrate synthase (1CSH) contains 435 residues. A few of the PDB entries used in this analysis have missing atoms, residues or small fragments due to poor diffraction data. Additionally, the crystal structures in several cases may be determined at low temperatures to obtain better diffraction data. However, these factors do not substantially affect the overall three-dimensional structures of the proteins. No systematic errors are expected on this count.
|
Distributions (numbers, N) and frequencies (percent, %) of all 20 amino acids were computed for the thermophilic and mesophilic proteins. In addition, we have computed their distributions in the -helices. The amino acid distributions were compared using the
2-test. Hamming distance was computed between percent (%) amino acid compositions. The change in proportion test was used to identify the amino acids whose proportions change significantly. These calculations follow Kumar and Bansal (1998a).
Structural properties
Oligomeric state For a given protein, the PDB files contain coordinates for the structure observed in a crystallographic asymmetric unit. This may not reflect the true biochemically relevant oligomeric state for the protein. In our data set these oligomeric states of the thermophilic and mesophilic proteins are tabulated by studying the biochemical data contained in the relevant literature on these proteins, indicators within the PDB files and the pointers in the PDB3DB browser.
Hydrophobicity
The hydrophobicity of a protein was calculated as the fraction of the buried non-polar area out of the total non-polar area, computed by using the methods described earlier (Tsai and Nussinov, 1997a,b
; Tsai et al., 1997
).
Compactness
The compactness (Zehfus and Rose, 1986) of a protein was defined as the ratio of solvent accessible area (Lee and Richards, 1971
; Tsai et al., 1997
) of the protein and the surface area of a sphere with equal volume to the protein (Tsai and Nussinov, 1997a
,b
).
Hydrogen bonds and salt bridges Whenever two heavy (non-hydrogen) atoms with opposite partial charges [donor (D)accepter (A) pairs] were found to be within a distance of 3.5 Å, a hydrogen bond has been inferred. The geometrical goodness of the hydrogen bond was assessed by computing the values of the following angles.
A hydrogen bond was taken to have good geometry if both these angles lie in the range 90150°. Only those hydrogen bonds which have a good geometry were included in our studies.
The presence of salt bridges was inferred when Asp or Glu side-chain carbonyl oxygen atoms were found to be within 4.0 Å distance from the nitrogen atoms in Arg, Lys and His side chains.
Helical content
The helical content of a protein refers to the percentage (%) of residues that have -helical conformation in the protein. The corresponding Dictionary of Protein Secondary Structure (DSSP) (Kabsch and Sander, 1983
) file was used to identify the residues in
-helical conformation in each protein. Overall geometries of
-helices in the thermophilic and mesophilic protein chains were characterized using HELANAL (Kumar and Bansal, 1996
; Kumar and Bansal, 1998b
). This program is available at http://www-lecb.ncifcrf.gov/~kumarsan/
Buried and exposed surface areas
Buried and accessible surface areas (Lee and Richards, 1971; Tsai and Nussinov, 1997a
,b
) have been computed for thermophilic and mesophilic protein chains as well as for 165 dissimilar monomers. Four different fractions have been computed from these areas, in each case:
Measurement of percent change in various properties
For the purpose of a comparison between a thermophilicmesophilic pair, the numbers of hydrogen bonds and salt bridges in the two proteins were normalized by their respective number of residues. Percent changes were computed as the difference between the normalized values of hydrogen bonds and salt bridges in the two proteins in each family, divided by the corresponding normalized values for the mesophilic proteins.
Changes in protein size can occur due to insertion/deletion and/or oligomerization. Percent change in protein size in each family was computed by dividing the difference in the number of residues between the thermophilic and mesophilic proteins by the number of residues in the mesophilic protein.
Percent change in hydrophobicity in each family was computed by dividing the difference in hydrophobicity for the thermophilic and mesophilic proteins by the hydrophobicity for the mesophilic protein. Percent change in compactness was also computed in the same way.
Database of 165 dissimilar monomers
A database of 165 proteins, which (i) have been solved to high resolution R 2.5 Å by X-ray crystallography and contain at least 50 amino acids, (ii) have dissimilar 3D structures, as determined by the sequence order independent structure comparison technique (Tsai et al., 1996
), and (iii) exist as monomers in solution as indicated in their PDB files, relevant biochemical literature and pointers in PDB3DB browser to other databases such as SWISS-PROT, was generated from the PDB. This database was used as a control for studying structural features, such as compactness, hydrophobicity, polar and non-polar contribution to buried and exposed surfaces in thermophilic and mesophilic protein chains.
Cases of high resolution structural pairs where the melting temperatures are currently available
For PGK the melting temperatures of the thermophilic and mesophilic proteins are close (Tm = 67 53 = 14°C). The energy difference between thermophilic and mesophilic enzymes is only 5 kcal/mol (
G = ~5 kcal/mol). Moreover, the oligomeric states of the two PGKs are also different. The thermophilic rubredoxin has a very high Tm. However, it is a very small protein, consisting of only about 50 amino acids. More than one estimate of Tm for rubredoxin further complicates the matter.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Packing
Reasons for higher stability of thermophilic proteins include better packing (Russell et al., 1997, 1998
) and hence, smaller and less numerous cavities. To study packing in a protein one can compute its compactness (Zehfus and Rose, 1986
). Compactness has been defined to be the ratio of accessible surface area (ASA) (Lee and Richards, 1971
) of a given protein to the surface area of a sphere with the same volume as the protein. Assuming that most proteins are more or less globular in shape, a better packed protein will have a smaller ratio value. We have already used this formulation to study hydrophobic folding units (Tsai and Nussinov, 1997a
,b
). Figure 1
plots the compactness versus the number of residues in thermophilic and mesophilic protein chains (one chain per protein), along with the values calculated for the 165 structurally dissimilar monomeric protein chains selected from the PDB. The compactness values for the thermophilic protein chains are very similar to those calculated for the mesophilic protein chains. They are also within the range of the compactness values obtained for the 165 dissimilar monomers. However, the overall packing of an oligomeric protein may involve two components: (i) packing of atoms within individual subunits, and (ii) the association, or packing, of the subunits with respect to each other. Consequently, we have computed the compactness for the thermophilic and mesophilic proteins in their biochemically relevant oligomeric states. The results are presented in Table II
. Again, the compactness values for thermophilic and mesophilic proteins are highly similar. Hence, there is no consistent pattern in the contribution of packing to the differences in stabilities between thermophilic and mesophilic protein pairs. Recently, Karshikoff and Ladenstein (1998) have also reached similar conclusions upon computing cavity volumes for a large number of thermophilic and mesophilic proteins.
|
|
With the rapid increase in the structural information available for proteins, it is becoming increasingly clear that the hydrophobic effect is the dominant driving force in protein folding (Dill, 1990). Hence, it has been suggested that thermophilic proteins are substantially more hydrophobic (Haney et al., 1997
) and have more surface area buried upon oligomerization (Salminen et al., 1996
) as compared with their mesophilic counterparts. As with packing, the hydrophobic effect can manifest itself at two levels: (i) hydrophobicities of the individual protein chains, and (ii) hydrophobicity due to the association of the chains. We have computed the hydrophobicity as the fraction of buried non-polar surface area out of the total non-polar surface area (Tsai and Nussinov, 1997a
,b
), for the thermophilic and mesophilic protein chains as well as their biochemically relevant oligomeric forms. Figure 2
presents a plot of the hydrophobicity versus the number of residues in thermophilic and mesophilic protein chains, along with those for the 165 dissimilar monomeric chains. The figure illustrates that thermophilic and mesophilic protein chains have very similar hydrophobicities. The values lie within the same range as those for the hydrophobicities of 165 dissimilar monomers. The hydrophobicities computed for the thermophilic and mesophilic proteins in their biochemically relevant oligomeric states are presented in Table II
. Again, the hydrophobicities of the thermophilic and mesophilic protein oligomers are very similar.
|
It has been suggested that increased polar surface area contributes to the greater stability of the thermophilic proteins (Haney et al., 1997; Vogt and Argos, 1997
; Vogt et al., 1997
). Here, we have divided protein surfaces into buried and exposed parts and evaluated the contribution of polar and non-polar atoms. These calculations have been performed for all thermophilic and mesophilic protein chains (one polypeptide chain per protein) and compared with those for 165 dissimilar monomers. The calculations have been done in two different ways. In the first set all atoms including the backbone were considered. In the second set, the backbone atoms were excluded. Table III
presents the results. The distributions of buried and exposed, polar and non-polar surface areas are quite uniform for the 165 dissimilar monomers as well as for the thermophilic and mesophilic protein chains.
|
Salt bridges and hydrogen bonds
Along with oligomerization, chain length, hydrophobicity and compactness, hydrogen bonds and salt bridges have also been compared between the thermophilic and the mesophilic proteins. The hydrogen bonds were divided into three classes: main chainmain chain (MM H-bonds), main chainside chain (MS H-bonds) and side chainside chain hydrogen bonds (SS H-bonds). Figure 3 shows plots of SS H-bonds and salt bridge content changes in the families of thermophilic and mesophilic proteins in their biochemically relevant oligomeric states, and at their interfaces. As the figure shows, side chainside chain H-bonds and salt bridge content increase in the monomers of most thermophilic proteins and at their interfaces.
|
Insertions, deletions and oligomerization
It has been suggested that deletion or shortening of loops may increase protein thermal stability (Russell et al., 1997, 1998
). Oligomerization can be another contributing factor. These factors reflect a change in protein size, and its effect on thermal stability. Figure 4
shows changes in hydrogen bonds, salt bridges, compactness and hydrophobicity plotted against the change in the number of residues between thermophilic and mesophilic proteins in each family. Mostly there is no correlation with a change in protein size, either due to insertions/deletions or due to oligomerization. This is further corroborated by the observation that in 14 out of 18 families in our database, thermophilic and mesophilic proteins have the same oligomeric states. In two families the oligomeric states of thermophilic proteins are found to be higher than those of their mesophilic homologs. However, the oligomeric states of mesophilic proteins are higher than their thermophilic homologs in the other two families.
|
In the literature, the stability of thermophilic proteins has been described in a number of ways, such as in terms of the temperature at which a protein is active (activity temperature), stable (stability temperature) or by half life for a certain duration of time. Much less frequently a protein is described in terms of melting, or mid-point transition temperature (Tm). Perhaps due to this heterogeneity in the available data, a recent database analysis study (Vogt and Argos, 1997; Vogt et al., 1997
) used the living temperatures of the organisms from which the proteins were isolated as a parameter for studying thermostability. Figure 5
plots changes in the oligomeric state, chain length, hydrophobicity, compactness, main chainmain chain, main chainside chain and side chainside chain hydrogen bonds and salt bridges as a function of living and of melting temperatures. Figure 5a
shows that structural factors involved in protein thermostability do not correlate with living temperatures of the thermophilic organisms. The trends observed in Figure 5b
are clearer. However, there are only five data points, two out of these (first and last) are unreliable due to reasons summarized in the Materials and methods section. If we ignore these points, we observe that among the various factors, only the salt bridges tend to correlate with the melting temperature. Unfortunately, this observation is unreliable, as it is based only on three proteins. However, it is consistent with studies by Yip et al. (1998), who have observed a correlation between ion pairs and thermostability for glutamate dehydrogenases from different organisms. Clearly, this phenomenon needs to be investigated further before any conclusions are drawn.
|
The overall distributions of amino acids in the 18 non-redundant families of thermophilic and mesophilic protein chains are presented in Table IV. Figure 6
presents a comparison between the residue composition of the thermophilic and mesophilic proteins. Despite the high sequence homology, a
2 test (Kumar and Bansal, 1998a
) indicates that the differences between the two distributions are highly significant (
2 = 86.2). For a 19 parameter system such as amino acid distribution, a
2 value at 95% level of confidence (probability of accepting the null hypothesis that two distributions are similar, P
0.05) should be greater than 30.14 to reject the null hypothesis. This evidence is further corroborated by the observation that the value of Hamming distance in 20 dimensional amino acid composition (%) space (Kumar and Bansal, 1998a
) between thermophilic and mesophilic chains is large (8.1 distance units).
|
|
It has been suggested that Pro has an increased occurrence in thermophilic proteins, especially in loops (Haney et al., 1997; Watanabe et al., 1997
; Bogin et al., 1998
). A total of 75 Pro substitutions are observed in loop regions of thermophilic and mesophilic chains. In 39 cases, the thermophilic chains contain a Pro residue instead of other residues found in their mesophilic homologs at equivalent loop positions. However, in 36 cases, another residue is present in the thermophilic chains instead of Pro in the mesophilic homologs. Thus, there is no consistent pattern for Pro substitutions in loops. In our database, the frequency of occurrence of Pro is unchanged (4.2%) (Figure 6
) in thermophilic and mesophilic proteins.
Preferred and avoided residues in thermophilic proteins
A change in proportion test (Kumar and Bansal, 1998a) is used to identify amino acids whose proportions change significantly, that is, by >2 standard deviations, between thermophilic and mesophilic chains. Changes in the proportions of Cys (0.6% in thermophilic and 1.0% in mesophilic chains), Arg (4.6% in thermophilic and 3.6% in mesophilic chains), Ser (4.0% in thermophilic and 5.5% in mesophilic chains) and Tyr (4.5% in thermophilic and 3.7% in mesophilic chains) are found to be significant (Figure 6
).
Of the 20 amino acids, Asn, Gln, Met and Cys can be classified as thermolabile due to their tendency to undergo deamidation or oxidation at high temperatures (Russell et al., 1997). Table IV
and Figure 6
indicate that the frequencies of occurrence for Gln (2.8% in thermophiles and 2.9% in mesophiles) and Met (2.3% in thermophiles and 2.4% in mesophiles) are similar. Cys (0.6% in thermophilic chains and 1.0% in mesophilic) and Asn (4.4% in thermophilic and 5.1% in mesophilic) change by appreciable amounts. However, only the change in the frequency of Cys is significant.
The above observations raise questions about the possible roles of Arg, Tyr and Ser whose proportions change significantly. It has been suggested that thermophilic proteins have increased hydrogen bonding and salt bridge formation (Yip et al., 1995; Querol et al., 1996
; Vogt and Argos, 1997
; Vogt et al., 1997
; Russell et al., 1997
, 1998
). Due to their large side chains, Arg and Tyr may be useful both in short range local interactions and in long range interactions. The guanidium group in Arg can form salt bridges. On the other hand, due to its short side chain Ser forms mostly local interactions (Jeffrey and Saenger, 1991
). Interestingly, it has recently been observed that hot spots for binding in protein interfaces are also rich in Arg, Tyr and Trp (Clackson and Wells, 1995
; Bogan and Thorn, 1998
). Hence, it appears that in both binding and folding at high temperatures, Arg and Tyr play a similar role, contributing toward protein stability. On the other hand, Trp occurs with a similar proportion in both thermophilic and mesophilic chains (Table IV
and Figure 6
). In contrast to Arg and Tyr, Trp is a hydrophobic residue with a bulky double ring side chain, usually occurring with low frequencies in proteins. Alternatively, it is possible that the absence of a noticeable trend for Trp, a rare residue, is due to its low counts in our sample.
Thermophilic and mesophilic -helices
It has been suggested that thermophilic proteins have a higher helical content (Querol et al., 1996). In our database, we find that in nine out of the 18 families, thermophilic and mesophilic chains have similar values for the fraction of residues in helical conformation (fH), as identified using DSSP (Kabsch and Sander, 1983
). However, on the whole, thermophilic proteins have a higher occurrence of residues in helical conformation. fH for thermophilic chains is 32.0% as compared with 25.4% in the mesophilic chains.
-Helices in the thermophilic and mesophilic proteins adopt similar overall geometries as characterized using HELANAL (Kumar and Bansal, 1996
; Kumar and Bansal, 1998b
).
Tables V presents the amino acid distributions in
-helices of thermophilic and mesophilic chains.
2-test shows that amino acid distribution in
-helices of thermophilic proteins is significantly different from that of
-helices in mesophilic proteins. Hamming distance (Kumar and Bansal, 1998a
) between the two distributions is 15.1 distance units in the 20 dimensional amino acid composition space. The proportions of Cys (0.1% in thermophilic and 0.8% in mesophilic helices), His (2.0% in thermophilic and 3.3% in mesophilic helices) and Arg (5.5% in thermophilic and 3.9% in mesophilic helices) change significantly. Thermophilic helices favor Arg and avoid His and Cys as compared with mesophilic helices. A recent database analysis study on
-helices shows Arg to be a helix-favoring residue with its propensity to occur in the middle region of
-helices being 1.33, while Cys (propensity = 0.87 in the middle of
-helices) and His (propensity = 0.76 in the middle of
-helices) are helix disfavoring residues (Kumar and Bansal, 1998a
). Thermostability has also been attributed to enhanced secondary structure propensity (Querol et al., 1996
). This might rationalize the increase in the proportion of Arg, a helix favoring residue in thermophilic protein helices, while helix disfavoring residues Cys and His decrease. A previous analysis of the composition of
-helices in the thermophilic proteins (Warren and Petsko, 1995
) has also noted a significant decrease in Cys and His. The proportion of Arg increases and that of Cys decreases significantly in the entire thermophilic proteins as well. Furthermore, Proline occurs with a frequency of 0.7% in
-helices of thermophilic as compared to 1.3% in
-helices of mesophilic proteins. Proline is the most avoided residue in the middle of
-helices (Kumar and Bansal, 1998a
), since it may cause kinks (Woolfson and Williams, 1990
; Kumar and Bansal, 1996
, 1998a
, Kumar and Bansal, b
).
|
![]() |
Discussion and conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The most consistent trend is shown by salt bridges and side chainside chain hydrogen bonds. These increase in the majority of the thermophilic proteins. In recent years, the role of salt bridges toward protein stability has been controversial (Hendsch and Tidor, 1994; Kumar and Nussinov, 1999
). However, in the case of the thermophilic proteins, salt bridges have been shown to be stabilizing (Elcock, 1998
; Xiao and Honig, 1999
; Kumar et al., 2000
). Recently, we have calculated the electrostatic strengths of salt bridges in the glutamate dehydrogenase family (Kumar et al., 2000
). Network formation stabilizes individual salt bridges in Pyrococcus furiosus glutamate dehydrogenase (Kumar et al., 2000
). Salt bridges are major contributors toward thermostability of Pyrococcus furiosus glutamate dehydrogenase as compared with the mesophilic Clostridium symbiosum glutamate dehydrogenase (Yip et al., 1995
). In a large database analysis study, we have observed that salt bridges with `good geometries', such as those in the present study, have mostly, but not always, contributed stabilizing electrostatic contributions toward protein stability (Kumar and Nussinov, 1999
). Thermophilic proteins are not only stable, but are also optimally active at high temperatures. An increase in the number of salt bridges and hydrogen bonds may rigidify a thermophilic protein and expose it to the danger of becoming inactive. Still, while a thermophilic protein may be rigid at room temperature, it is likely to be flexible at high temperatures (Jaenicke and Bohm, 1998
). Recently, we have also observed that Pyrococcus furiosus glutamate dehydrogenase contains a greater number of salt bridges and their networks around the active site as compared with the mesophilic Clostridium symbiosum glutamate dehydrogenase. The salt bridges around the active site may help to keep the active site region together by opposing disorder due to greater atomic mobility at high temperatures (Kumar et al., 2000
).
Examination of the sequences shows that despite high sequence homology, the differences in amino acid distributions in the thermophilic and mesophilic proteins are highly significant. While some of the differences in the amino acid distributions are likely to be the outcome of phylogenetic differences between thermophiles and mesophiles, others correlate with protein thermostability. For example, the proportions of the thermolabile amino acid Cys, and of Ser which usually forms local interactions, decrease significantly, while those of Arg and Tyr which are capable of both short range and long range interactions increase significantly in the thermophilic proteins. The stability of the constituent -helices also appears to contribute to protein thermal stability. Thermophilic proteins have a higher proportion of residues in helical conformation. Helix-favoring residue Arg occurs more frequently in
-helices of thermophilic proteins, whereas helix-disfavoring residues Cys, His and Pro have lower frequencies of occurrence in thermophilic helices. Refraining from using some residues, and opting for others in sequences of thermophilic proteins suggests a dual strategy employed by these proteins to enhance their stability. On the one hand, thermophilic proteins prefer residues with larger side chains that can form salt bridges, long range or local electrostatic and hydrophobic interactions, and which stabilize secondary structure elements. However, concomitantly, thermophilic proteins avoid thermolabile residues and residues that can destabilize secondary structure elements.
Our analysis shows that the organisms' living temperatures are not good descriptors of protein thermostability. Melting temperatures may be more appropriate to measure protein thermostability. When explored with respect to the melting temperatures, salt bridges appear to show a correlation with the Tm's. We note, however, that while high quality crystal structures are available, unfortunately, the Tm's have been determined only for a few of these proteins. Hence, currently we are unable to examine a correlation of salt bridges and the respective melting temperatures of the thermophiles in a statistically meaningful way. However, we observe that structural factors involved in the stability of the thermophilic proteins do not correlate with the living temperatures of their source organisms.
From the point of view of designing a thermophilic protein, this study suggests inclusion of a larger proportion of salt bridges. Additionally, it indicates including residues in -helical conformation, and a higher frequency of Arg both to form salt bridges and additionally to stabilize
-helices. It would be preferable to avoid Pro, Cys and His in
-helices, and avoid thermolabile residues, particularly Cys.
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Auerbach,G., Jacob,U., Grottinger,M., Schurig,M. and Jaenicke,R. (1997) Biol. Chem., 378, 327329.
Bernstein,F., Koetzle,T., Williams,G., Meyer,E.J., Brice,M., Rodgers,J., Kennard,O. Shimanuchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[ISI][Medline]
Bogan,A.A. and Thorn,K.S. (1998) J. Mol. Biol., 280, 19.[ISI][Medline]
Bogin,O., Peretz,M., Hacham,Y., Korkhin,Y., Frolow,F., Kalb(Gilboa),A.J. and Burstein,Y. (1998) Protein Sci., 7, 11561163.
Clackson,T. and Wells,J.A. (1995) Science, 267, 383386.[ISI][Medline]
Daniel,R.M., Cowan,D.A., Morgan,H.W. and Curran,M.P. (1982) Biochem. J., 207, 641644.[ISI][Medline]
Davies,G.J., Gamblin,S.J., Littlechild,J.A. and Watson,H.C. (1993) Proteins, 15, 283289.[ISI][Medline]
Day,M.W., Hsu,B.T., Joshua-Tor,L., Park,J.B., Zhou,Z.H., Adams,M.W.W. and Rees,D.C. (1992) Protein Sci., 1, 14941507.
Dill,K.A. (1990) Biochemistry, 31, 71347155.
Elcock,A.H. (1998) J. Mol. Biol., 284, 489502.[ISI][Medline]
Fukuyama,K., Nagahara,Y., Tsukihara,T., Katsube,Y., Hase,T. and Matsubara,H. (1988) J. Mol. Biol., 199, 183193.[ISI][Medline]
Glaser,P., Presecan,E., Delepierre,M., Surewicz,W.K., Mantsch,H.H., Barzu,O. and Giles,A.M. (1992) Biochemistry, 31, 30383043.[ISI][Medline]
Gomes,J., Gomes,I., Kreiner,W., Esterbauer,H., Sinner,M. and Steiner,W. (1993) J. Biotech., 30, 283297.[ISI]
Haney,P., Konisky,J., Koretke,K.K., Luthey-Schulten,Z. and Wolynes,P.G. (1997) Proteins, 28, 117130.[ISI][Medline]
Hendsch,Z.S. and Tidor,B. (1994) Protein Sci., 3, 211226.
Hiller,R., Zhou,Z.H., Adams,M.W.W. and Englander,S.W. (1997) Proc. Natl Acad. Sci. USA, 94, 1132911332.
Holland,D.R., Hausrath,A.C., Juers,D. and Matthews,B.W. (1995) Protein Sci., 4, 19551965.
Jaenicke,R. and Bohm,G. (1998) Curr. Opin. Struct. Biol., 8, 738748.[ISI][Medline]
Jeffrey,G.A. and Saenger,W. (1991) Hydrogen Bonding in Biological Structures. Springer-Verlag, Berlin
Jiang,Y., Nock,S., Nesper,M., Sprinzl,M. and Sigler,P.B. (1996) Biochemistry, 35, 1026910278.[ISI][Medline]
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[ISI][Medline]
Karshikoff,A. and Ladenstein,R. (1998) Protein Engng, 1, 867872.
Kelly,C.A., Nishiyama,M., Ohnishi,Y., Beppu,T. and Birktoft,J.J. (1993) Biochemistry, 32, 39133922.[ISI][Medline]
Kjeldgaard,M., Nissen,P., Thirup,S. and Nyborg,J. (1993) Structure, 1, 3550.[ISI][Medline]
Klump,H.H., Dikuggiero,J., Kessel,M., Park,J.B., Adams,M.W.W. and Robb,F.T. (1992) J. Biol. Chem., 267, 2268122685.
Knegtel,R.M.A., Wind,R.D., Rozeboom,H.J., Kalk,K.H., Buitelaar,R.M., Dijkhuizen,L. and Dijkstra,B.W. (1996) J. Mol. Biol., 256, 611622.[ISI][Medline]
Kumar,S. and Bansal,M. (1996) Biophys. J., 71, 15741586.[Abstract]
Kumar,S. and Bansal,M. (1998a) Proteins, 31, 460476.[ISI][Medline]
Kumar,S. and Bansal,M. (1998b) Biophys. J., 75, 19351944.
Kumar,S. and Nussinov,R. (1999) J. Mol. Biol., 293, 12411255.[ISI][Medline]
Kumar,S., Ma,B., Tsai,C.J. and Nussinov,R. (2000) Proteins, 38, 368383.[ISI][Medline]
Ladenstein,R. and Antranikian,G. (1998) Adv. Biochem. Engng Biotechnol., 61, 3785.
Lee,B.K. and Richards,F.M. (1971) J. Mol. Biol., 55, 379400.[ISI][Medline]
Matthews,B.W., Weaver,L.H. and Kester,W.H. (1974) J. Biol. Chem., 249, 80308044.
Obmolova,G., Kuranova,I. and Teplyakov,A. (1993) J. Mol. Biol., 232, 312313.[ISI][Medline]
Perutz,M. and Raidt,H. (1975) Nature, 255, 256259.[ISI][Medline]
Querol,E., Perez-Pons,J.A. and Mozo-Villarias,A. (1996) Protein Engng, 9, 256271.
Russell,R.J.M., Ferguson,J.M.C., Haugh,D.W., Danson,M.J. and Taylor,G.L. (1997) Biochemistry, 36, 99839994.[ISI][Medline]
Russell,R.J.M., Gerike,U., Danson,M.J., Hough,D.W. and Taylor,G.L. (1998) Structure, 6, 351361.[ISI][Medline]
Rypniewski,W.R. and Evans,P.R. (1989) J. Mol. Biol., 207, 805821.[ISI][Medline]
Salminen,T., Teplyakov,A., Kankare,J., Cooperman,B.S., Lahti,R. and Goldman,A. (1996) Protein Sci., 5, 10141025.
Singleton,P. and Sainsbury,D. (1978) Dictionary of Microbiology and Molecular Biology, 2nd Edn. John Wiley, New York.
Tsai,C.J., Lin,S.L., Wolfson,H. and Nussinov,R. (1996) J. Mol. Biol., 260, 604620.[ISI][Medline]
Tsai,C.J. and Nussinov,R. (1997a) Protein Sci., 6, 2442.
Tsai,C.J. and Nussinov,R. (1997b) Protein Sci., 6, 14261437.
Tsai,C.J., Xu,D. and Nussinov,R. (1997) Protein Sci., 6, 113.
Tsunasawa,S., Izu,Y., Miyagi,M. and Kato,I. (1997) J. Biochem., 122, 843850.[Abstract]
Usher,K.C., De la Cruz,A.F.A., Dahlquist,F.A., Swanson,R.V., Simon,M.I. and Remington,S.J. (1998) Protein Sci., 7, 403412.
Vogt,G. and Argos,P. (1997) Fold. Des., 2, S40S46.[ISI][Medline]
Vogt,G., Woell,S. and Argos,P. (1997) J. Mol. Biol., 269, 631643.[ISI][Medline]
Warren,G.L. and Petsko,G.A. (1995) Protein Engng, 8, 905913.[Abstract]
Watanabe,K., Hata,Y., Kizaki,H., Katsube,Y. and Suzuki,Y. (1997) J. Mol. Biol., 269, 142153.[ISI][Medline]
Wigley,D.B., Gamblin,S.J., Turkenburg,J.P., Dodson,E.J., Piontek,K., Muirhead,H. and Holbrook,J.J. (1992) J. Mol. Biol., 223, 317335.[ISI][Medline]
Woolfson,D.N. and Williams,D.H. (1990) FEBS Lett., 277, 185188.[ISI][Medline]
Xiao,L. and HonigB. (1999) J. Mol. Biol., 289, 14351444.[ISI][Medline]
Yip,K.S.P. et al. (1995) Structure, 3, 11471158.[ISI][Medline]
Yip,K.S.P., Britton,K.L., Stillman,T.J., Lebbink,J., De Vos,W.M., Robb,F.T., Vetriani,C., Maeder,D. and Rice,D.W. (1998) Eur. J. Biochem., 255, 336346.[Abstract]
Zehfus,M.H. and Rose,G.D. (1986) Biochemistry, 25, 57595765.[ISI][Medline]
Zuber,H. (1988) Biophys. Chem., 29, 171179.[ISI][Medline]
Received June 30, 1999; revised October 26, 1999; accepted November 29, 1999.