Aromatic clusters: a determinant of thermal stability of thermophilic proteins

N. Kannan and S. Vishveshwara,1

Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
A number of factors have been elucidated as responsible for the thermal stability of thermophilic proteins. However, the contribution of aromatic interactions to thermal stability has not been systematically studied. In the present investigation we used a graph spectral method to identify aromatic clusters in a dataset of 24 protein families for which the crystal structures of both the thermophilic and their mesophilic homologues are known. Our analysis shows a presence of additional aromatic clusters or enlarged aromatic networks in 17 different thermophilic protein families, which are absent in the corresponding mesophilic homologue. The additional aromatic clusters identified in the thermophiles are smaller in size and are largely found on the protein surface. The aromatic clusters are found to be relatively rigid regions of the surface and often the additional aromatic cluster is located close to the active site of the thermophilic enzyme. The residues in the additional aromatic clusters are preferably mutated to Leu, Ser or Ile in the mesophilic homologue. An analysis of the packing geometry of the pairwise aromatic interaction in the additional aromatic clusters shows a preference for a T-shaped orthogonal packing geometry. The present study also provides new insights for protein engineers to design thermostable and thermophilic proteins.

Keywords: aromatic clusters/aromatic packing/protein engineering/protein folding/thermal stability


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Understanding the determinants of protein thermal stability is still an unsolved problem in protein biochemistry. A large number of investigations have been carried out in the past two decades in order to understand the factors that contribute to the thermal stability of thermophilic proteins and still we do not have a complete understanding of the stabilization strategies adopted by thermostable proteins.

So far two major approaches have been made to investigate this aspect. The first approach was to compare the structures and sequences of homologous proteins from thermophiles and mesophiles, which has resulted in the understanding of various aspects related to the thermal stability, such as increased number of salt bridges, better hydrogen bonding, high internal packing, strengthening inter-subunit association, all of which has been compiled in two recent reviews (Jaenicke and Bohm, 1998Go; Ladenstein and Antranikian, 1998Go). More recently, Szilagyi and Zavodszky (2000) analysed 13 structural parameters in order to understand the structural features underlying thermal stability.

The second major approach involved protein engineering methods (Fersht and Serrano, 1993Go), which provided new insights into the factors that contribute to the thermal stability of proteins, such as (1) stabilization of the dipoles of the {alpha}-helices (Nicholson et al., 1988Go), (2) reducing the difference in entropy between the folded and unfolded states (Matthews et al., 1987Go) and (3) increasing the number of hydrophobic interactions in the hydrophobic core (Yutani et al., 1987Go) and reducing the area of water accessible hydrophobic surface (Wigley et al., 1987Go). There have also been attempts to combine both of these approaches to study thermal stability (Serrano et al., 1993Go). The final conclusion from all these studies is that there is no unique factor which determines protein thermal stability but instead it is a result of a number of subtle interactions characteristic for each protein species (Ladenstein and Antranikian, 1998Go; Szilagyi and Zavodszky, 2000Go).

We recently elucidated a graph spectral method to identify side-chain clusters in protein structures (Kannan and Vishveshwara, 1999Go). In the present study, we applied this method to identify aromatic clusters in a set of homologous thermophiles and mesophiles from different protein families. Our analysis shows the presence of additional aromatic clusters and aromatic networks in the thermophilic protein as compared with their mesophilic homologue. Although many other structural features have been reported to contribute to thermal stability (Querol et al., 1996Go; Vogt and Argos, 1997Go; Szilagyi and Zavodszky, 2000Go), to our knowledge no study has highlighted the presence of additional aromatic clusters in thermophilic proteins.

Aromatic interactions are known to be important in the structural stabilization of proteins (Burley and Petsko, 1985Go; Anderson et al., 1993Go). A pair of aromatic interaction contributes between –0.6 and –1.3 kcal/mol to the protein stability (Serrano et al., 1991Go). Protein engineering methods have shown that introducing aromatic pairs and aromatic clusters in a protein increases the thermal stability (Burley and Petsko, 1985Go; Serrano et al., 1991Go) and more recently Georis et al. (2000) demonstrated that introducing an additional aromatic interaction improves the thermophilicity and thermostability of family 11 xylanase. Recently, a structure of thermophilic protein, Bacillus Ak.1, from the subtilisin family has been reported which shows the presence of additional aromatic clusters on the surface of the protein as compared to its mesophilic counterpart (Smith et al., 1999Go).

The present study on a dataset of 24 thermophilic and corresponding mesophilic homologue from different protein families shows the presence of additional aromatic clusters in 17 out of 24 thermophilic families compared with the mesophilic families. The additional aromatic clusters found in the thermophiles are mostly located on the surface of the protein and are usually small in size. The topologically equivalent residues in the mesophiles are generally leucine, isoleucine or serine residues. The aromatic residues of the additional clusters in the thermophilic proteins generally emanate from different secondary structural regions of the protein and have low B values. In most of the thermophilic proteins, at least one additional aromatic cluster was found to be located close to the binding/active site of the protein. An analysis of the packing geometry of pairwise aromatic interaction in the additional aromatic clusters showed that a T-shaped perpendicular packing geometry is preferred. The present study highlights the presence of additional aromatic clusters as one of the major contributing factors to the thermal stability of thermophilic proteins and provides new insights for protein engineers to design thermostable proteins.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Dataset

A dataset of 48 proteins belonging to 24 protein famlies was considered for cluster analysis (Table IGo). The dataset was constructed based on the earlier report of Facchiano et al. (1998) and the recent exhaustive search of Szilagyi and Zavodszky (2000). The pairs of thermophilic and mesophilic proteins which had very high sequence identity and structural similarity were selected, in order to ensure that the observed amino acid substitution and structural difference between the thermophiles and their corresponding mesophilic homologue is due to the difference in stability and not an artifact of evolutive effects (Fachiano et al., 1998). For protein families in which more than one homologous mesophilic structure was known, the one which had high structural similarity and sequence identity to the mesophilic protein was considered.


View this table:
[in this window]
[in a new window]
 
Table I. List of protein families taken for study and the number of aromatic clusters identified
 
Clustering procedure

Aromatic clusters for all the proteins in the dataset were obtained by a Graph Spectral method as described earlier (Kannan and Vishveshwara, 1999Go). The method detects aromatic clusters by considering all the Cß atoms of the aromatic residues in the protein as nodes of a graph and two interacting aromatic side chains were connected in the graph assigning an edge in weight corresponding to the distance between the respective Cß atoms. This connectivity information is represented in the form of a Laplacian matrix. The Laplacian matrix is diagonalized and clustering information is obtained from the eigenvectors corresponding to the second lowest eigenvalue. The details of the algorithm were given in our earlier paper (Kannan and Vishveshwara, 1999Go).

In our earlier report of identifying side chain clusters we constructed a graph for the protein structure with the constraint that the degree of at least one of the two interacting side chain should be >1. This constraint was imposed in order to avoid two-residue clusters. In the present analysis we removed this constraint so as to detect all the two-residue aromatic clusters in addition to clusters of larger size.

Geometry of aromatic packing

The inter-planar orientation between two interacting aromatic side chains was determined by evaluating the angle between the normals of the two aromatic planes. The inter-planar angle was evaluated for two pairs of aromatic rings if the distance between the center of the two rings was <6.5 Å.

Structural alignment

The thermophile and the corresponding mesophilic homologue were superimposed using the STAMP program (Russell and Barton, 1992Go). A structure-based sequence alignment was obtained using this program in order to compare the topologically equivalent residues.

Identification of cluster location

The location of the aromatic residues in the protein structure was identified by three procedures: (a) graphically using the VMD package (Humphrey et al., 1996Go), (b) calculation of the solvent accessibility using Connolly's program (Connolly, 1993Go) and (c) the graph spectral method of assigning residues to the hydrophobic core based on the contact criteria of the hydrophobic residues (Kannan and Vishveshwara, 1999Go). Using this method we were able to identify residue clusters on the surface and in the buried core of the protein using a `hydrophobic contact criterion'. Hydrophobic residues in a cluster having high internal contact with themselves were identified to occur in the core of the protein. The accessible surface area calculation was performed only on the monomeric subunits if the thermophile or mesophile existed as an oligomer.

Identification of aromatic clusters close to the active site

The aromatic cluster in which at least one of the residue atoms was at a distance of <10 Å from any of the ligand atoms was considered to be close to the active/binding site. In those cases where the ligand coordinates were not specified, the distance was evaluated from the active site residues of the protein.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Aromatic clusters

The aromatic clusters in the 24 pairs of proteins were determined as mentioned in the Materials and methods section. The numbers of aromatic clusters found in the thermophiles and in the corresponding mesophilic homologue are listed in Table IGo. Based on the number of aromatic clusters identified, the protein pairs can be classified into three different groups: (a) in 10 out of 24 families (1, 2, 3, 6, 7, 8, 9, 11, 18 and 24) the number of aromatic clusters is clearly greater in the thermophiles than the mesophiles (Table IGo); (b) in 10 protein families (4, 5, 10, 12, 13, 14, 16, 17, 22 and 23) the number of aromatic clusters is the same in both the thermophiles and mesophiles; and (c) in protein families 15, 19, 20 and 21 mesophilic proteins have more aromatic clusters (Table IGo).

A graphical examination of the superimposed structures and the cluster residues of the thermophile and its mesophilic homologue showed that in 10 protein families (group a) one or more aromatic clusters in the thermophilic protein was absent in the topologically equivalent position in the mesophilic homologue. A cluster in the thermophile for which a topologically equivalent cluster is not found in the mesophile is referred to as an additional aromatic cluster. Structure-based sequence alignment showed that the residues forming the additional aromatic cluster were mutated to non-aromatic residues in the mesophile. The superimposed structures and the additional clusters in the thermophile and the equivalent substitution in the mesophile are shown for the neutral protease (family 1) (Figure 1Go). A three-residue aromatic cluster of residues Tyr93, Tyr151 and Trp115 (dark) is found in the thermophile and the topologically equivalent residues in the mesophile are Ile94, Asn152 and Trp116. Owing to a non-aromatic mutation of Tyr93 and Tyr151 to Ile94 and Asn152, the aromatic cluster is absent in the mesophile. Also, a two-residue cluster comprising residues Tyr28 and Tyr24 is found in the thermophile but is absent in the mesophile owing to a non-aromatic substitution of Tyr24 to Leu24. This feature is also observed in nine other protein families of group a. In 10 protein families (group b) the number of aromatic clusters in both the thermophiles and mesophiles is the same. However, in seven of them (4, 5, 10, 12, 13, 16 and 22) the number of aromatic residues constituting the clusters was larger in the thermophiles than their mesophilic counterpart. For example, in the case of ribonuclease H two clusters were detected in the thermophile (1RIL) (Figure 2aGo) and in the mesophile (2RN2) (Figure 2bGo). Cluster 1 in both the proteins occurs in a topologically similar location but the cluster in the thermophile (Tyr73, Trp104, Phe78, Phe118, Phe120, Trp81, Trp85 and Trp90) is larger than that in the mesophile (Tyr73, Trp104, Trp81, Trp85, Trp90). The three additional aromatic residues Phe78, Phe118 and Phe120 in the first cluster of the thermophile are mutated to Ile78, Trp118 and Trp120 in the mesophile. Although Trp118 and Trp120 are equivalent aromatic substitutions in the mesophile, they do not form a part of the cluster as Phe78, which interacts with Phe118 and Phe120 in the thermophile is mutated to Ile78 in the mesophile. Since this interaction is lost, the cluster size reduces to five residues in the mesophile. The second cluster in the thermophile and in the mesophile do not occur in topologically similar positions. The second cluster constituting residues Phe8 and Tyr68 in the thermophile (Figure 2aGo) (cluster 2) is smaller than that in the mesophile where a three-residue cluster of residues Tyr22, Phe35 and Tyr39 is observed (Figure 2bGo) (cluster 2'). Therefore, even though the number of clusters in the two proteins is the same, the cluster sizes are different. There are additional aromatic interactions in the thermophile as the net number of aromatic residues in the clusters of thermophile is greater than that in the mesophile. This trend is also observed in protein families 5, 10, 12, 13, 16 and 22. However in families 14, 17 and 23 the number of clusters and also the number of residues constituting the cluster are the same in the thermophile and in the mesophile. Further, a reverse trend of more aromatic clusters in the mesophile than in the thermophile is observed in the protein families 15, 19, 20 and 21. This could possibly be because of the presence of other factors such as increased number of hydrogen bonds, increased number of salt bridge interactions across the inter-subunit interface and also a decrease in the enzyme to surface volume ratio which may contribute to thermal stability of this protein (Tanner et al., 1996Go). Since the trend of additional aromatic interactions in the thermophiles is observed in 17 out of 24 cases, these additional aromatic clusters of the 17 protein families were further analysed in detail.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 1. The superimposed structures of the thermophilic neutral protease (1THL) and the mesophilic protease (1NPC) of family 1. The cluster residues in the thermophile (dark) and the topologically equivalent residues in the mesophile are shown in ball-and-stick representation (Kraulis, 1991Go).

 


View larger version (49K):
[in this window]
[in a new window]
 
Fig. 2. The aromatic cluster identified in the ribonuclease H family (family 4). (a) Aromatic clusters in the thermophilic protein (1RIL); (b) the aromatic cluster in the mesophile (2RN2). The protein molecule is represented in a TUBE representation and the cluster residues in a BONDS representation (Humphrey et al., 1996Go).

 
The additional clusters in the thermophiles and the corresponding topologically equivalent residues in the mesophiles as obtained from a structure-based sequence alignment are listed in Table IIGo, which provides the information on the equivalent residues identified on the mesophile corresponding to each of the aromatic residues in the additional aromatic cluster in the thermophile. In further analysis pertaining to the accessibility of aromatic clusters, the residues preferred for mutation and secondary structural location (dealt with in the later sections) are based only on the additional aromatic clusters shown in Table IIGo. Some interesting observations can be made from a careful analysis of the mutation table (Table IIGo). Most of the additional clusters are two-residue aromatic clusters. An analysis of the size distribution of the 34 additional clusters in all the thermophiles in the dataset shows that clusters of two residues occur 21 times and clusters of three residues occur six times. Large clusters of size 11–13 residues were found to occur in protein families 12, 16 and 22. Recently, it was also shown experimentally that introducing an aromatic pair by point mutation increases the thermophilicity and also its thermal stability in the mesophilic xylanese (Georis et al., 2000Go).


View this table:
[in this window]
[in a new window]
 
Table II. Mutation table for the protein families in which additional aromatic clusters were found in the thermophile
 
Among the aromatic residues, Tyr and Phe are more prone to mutation than Trp (Figure 3Go). Phenylalanine is most often mutated to Leu, Ile and Val and Tyr is preferably mutated to non-aromatic residues Leu, Ser or Ile in the mesophiles as seen from the mutation histogram (Figure 3Go). In this context, it is interesting that Leu and Tyr show a slightly higher amino acid composition in the mesophilic and thermophilic genome sequences, respectively (Deckert et al., 1998Go). Tyrosine is also mutated to Ser four times in four different protein families (Figure 3Go). The mutation of Phe to Leu retains the hydrophobic nature of the residue which is important for the structure of the mesophiles, but the extra stability provided by aromatic interaction is lost due to this mutation in the mesophile. Similarly in the case of Tyr, the chemical nature of Tyr residue is retained by a selective mutation in some cases to Ser by which the hydrogen bonding capacity is retained but the aromatic stabilization is lost in the mesophiles. In the other cases where retaining the hydroxy group is not important, the Tyr residue is preferably mutated to Leu or Ile (Figure 3Go).



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 3. Mutation histogram: the amino acid distribution of the topologically equivalent residues mutated in the mesophile corresponding to each of the aromatic residues in the additional aromatic clusters (Table IIGo).

 
Location of the aromatic clusters

The location of the aromatic clusters was identified by the three methods mentioned in the Materials and methods section. Their location in the thermophilic proteins was quantitatively analysed by calculating the accessible surface area of the cluster residues in addition to graphical visualization. In most of the protein families the additional cluster was found to be located on the protein surface with a partially accessible surface area (Figure 4Go). In three cases the clusters show an accessibility <5% and in most of the cases the accessible area is <15% and >40% (Figure 4Go), indicating that the additional aromatic clusters are partially exposed. Our analysis by the graph spectral method showed that most of the additional clusters in the thermophiles are not part of the protein core and they occur as separate entities on the protein surface. Previously Heringa et al. (1995) had identified strong side chain clusters on the protein surface in the subtilisin family and predicted a few point mutations which increased the thermal stability of the protein. Generally, mutations on the surface of the protein do not have a large impact on the native structure and the folding intermediate. Hence, it appears that nature has chosen to engineer thermal stability by mutations on the surface of the protein.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 4. Average accessible surface area of the additional aromatic clusters in the thermophilic proteins.

 
Active site: additional aromatic clusters

Thermophilic enzymes are stable and fully active at elevated temperatures but are not functional at room temperature (Hecht et al., 1989Go). This ability of an enzyme to exhibit activity at high temperature is defined as thermophilicity (Georis et al., 2000Go). The inactivation of an enzyme at room temperature has been attributed to the restriction of conformational fluctuations necessary for the catalytic function (Zavodszky et al., 1998Go). We investigated this aspect by identifying the presence of additional aromatic clusters close to the active site of the thermophilic proteins. We found that in most of the thermophilic proteins at least one additional aromatic cluster was found close to the active site of the enzyme (Table IIGo). The presence of additional aromatic clusters near the active site should help in retaining the conformational features of the active sites residues required to bind the substrate at high temperatures and thus contributing to the high thermophilicity of the thermostable proteins.

In Figure 5aGo is shown a two-residue aromatic cluster of Phe8 and Tyr68 close to the active site (formed by residues Asp10, Asp70 and Glu248). This two-residue aromatic cluster is located in the two parallel ß-sheets on which the active site residues Asp10 and Asp70 are located (Figure 5aGo) of the thermophilic protein (1RIL) belonging to the ribonuclease H family. This cluster, which interacts in a parallel offset geometry, is probably important for the conformational rigidity of the thermophile at room temperature. In the mesophile, Tyr68 is mutated to a Ser (Table IIGo), allowing for the conformational flexibility of the active site residues. Similarly, in the case of lactate dehydrogenase (Figure 5bGo), two additional aromatic clusters are found close to the active site. The cluster residues emanate from three different helices marked 1, 2 and 3 (Figure 5bGo). It can be seen that the N-terminal residues of helix 1 interact with the ligand and probably the flexibility of helix 1 is important for ligand binding. The two additional aromatic clusters involving the N-terminal (Phe30) and the C-terminal (Phe37) residues of helix 1 impart rigidity to this helix. However, in the case of the mesophile the cluster residues Phe30 and Tyr248 are mutated to Ala30 and Ser246, respectively (Table IIGo), and the other cluster residues Phe65 and Phe37 are mutated to Leu65 and Ile37, thus allowing for the conformational flexibility of helix 1 in the mesophilic protein.




View larger version (141K):
[in this window]
[in a new window]
 
Fig. 5. Aromatic clusters close to the active site. (a) The additional aromatic cluster (Phe8 and Tyr68) in the thermophilic protein of the ribonuclease H family (1RIL). The active site residues are shown in ball-and-stick representation (grey) (Kraulis, 1991Go). (b) Additional aromatic clusters (Phe30 and Tyr248; Phe65 and Phe37) in the lactate dehydrogenase family (1LDN). The cluster residues are shown in ball-and-stick representation and the ligand in CPK representation (Kraulis, 1991Go).

 
Thermal factors of the additional aromatic clusters

Vihinen (1987) showed an inverse correlation between thermal stability and protein flexibility by calculating flexibility indices from the thermal factors of the side chains in known 3D structures, and more recently Parthasarathy and Murthy (2000) showed that serine and threonine residues which show a high composition in the thermophiles have a low B factor compared with the mesophiles. In the present study, we analysed the thermal factors of the residues in the additional aromatic clusters. The flexibility of the aromatic clusters was studied by evaluating the temperature factor (B factor) of the aromatic residues forming clusters. Since most of the additional aromatic clusters are located on the protein surface, the average B factor of the aromatic residues forming clusters was compared with the average B factor of the partially buried residues in each of the thermophilic proteins. Interestingly, in most cases the average thermal factor of the cluster residues is less than or equal to the average B factor of the partially buried residues. Menendez and Argos (1989) showed that the regions which show reduced flexibility corrspond to the helical regions in the protein. The secondary structural location of the additional aromatic residues was analysed in our present dataset. We found that the aromatic residues forming the additional cluster emanate from different secondary structural regions of the protein, stabilizing the tertiary fold. Almost 38% of the aromatic residues in the additional aromatic clusters emanate from helices, 32% from strands, 21% from coil and 9% from the loop regions of the thermophilic proteins. However, the secondary structural composition of the thermophilic proteins in the dataset show that nearly 54% of the secondary structures are coils or loops and nearly 26% are strands and only 20% are helices. It is clear from the statistics of the general secondary structural composition and the secondary structural location of the aromatic residues that the additional aromatic clusters occur in regular secondary structures, implying their location to be in more rigid regions of the protein.

Geometry of aromatic–aromatic interaction

The packing geometries of the aromatic residues in the additional aromatic clusters was investigated by evaluating the inter-planar angles (see Materials and methods) for all pairwise aromatic interactions of the residues which were mutated to non-aromatic ones in the mesophile. The distributions of inter-planar angles were categorized in three regions, namely (0–30) denoting near-parallel face to face interaction, (30–60) denoting tilted geometry and (60–90) denoting orthogonal or T-shaped packing geometry. In 51 out of a total of 100 pairs, the aromatic residues interact pairwise in T-shaped orthogonal geometry and in 29 cases in a tilted geometry. In only 18 cases were the aromatic residues in the additional aromatic clusters found to interact in near-parallel geometry. A simple pairwise energy calculation of aromatic residues also had shown a preference for T-shaped packing (Burley and Petsko, 1985Go). Ab initio calculations of pairwise aromatic aromatic packing have shown that aromatic pairs can pack together in any of the two energetically favorable geometries, namely off-centred parallel displaced geometry or a T-shaped perpendicular geometry (Chipot et al., 1996Go; Hobza et al., 1996Go; Jaffe and Smith, 1996Go).

The investigations of Singh and Thornton (1991) using more detailed energy calculations had shown that a T-shaped packing geometry of aromatic residues is preferred. However, more recently a study by McGanghey et al. (1998) showed that aromatic pairs favor an off-centered parallel orientation. The authors also pointed out that this deviation in the results is because the earlier workers had not separated out pairwise aromatic interactions and that looking at pairwise interactions in aromatic clusters (greater than two residues) dilutes the effect of TT stacking. We performed an analysis on all two-residue aromatic clusterd for the present dataset of 24 thermophiles and found that the pairwise aromatic interaction geometries show a preference for T-shaped or tilted geometry as opposed to a purely parallel geometry.


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The free energy of stabilization of thermophilic proteins is ~5–7 kcal/mol (Nojima et al., 1977Go). This marginal difference in the energy is clearly a natural selection. As pointed out by Querol et al. (1996), there appears to be a selective pressure of marginal stability, probably to facilitate processes such as polypeptide folding and the flexibility required by the native conformation in the protein. This marginal stability can be accomplished only by a few changes in the protein sequence and structure. Protein engineering methods have shown that introducing an aromatic pair on the protein surface increases the stability of the protein (Serrano et al., 1991Go). Since the analysed thermophilic and mesophilic sequences have very high sequence and structural similarity, we believe that the additional clusters in the thermophiles would be present just to stabilize the thermophilic proteins at higher temperatures.

Moreover, the additional clusters in the thermophiles mostly involve pairwise interaction and occur on the protein surface. Additional stabilization on the surface of the protein by aromatic interactions could be essential to prevent the native structure from thermal denaturation. Since protein thermal denaturation is known to start with unfolding of the outer surface which leads to the exposure of the hydrophobic core (Calflish and Karplus, 1994Go), this denaturation can possibly be prevented by stabilizing the protein surface with aromatic interactions.

Further, in a few protein families (4, 5, 10, 12 and 15) there were additional aromatic clusters found on the mesophile but a topologically equivalent cluster was not found in the thermophile. The aromatic clusters found in the mesophile but not in the thermophile are listed in Table IIIGo. Interestingly, all the equivalent residues found in the thermophile are also in solvent-accessible positions. These residues in the thermophiles could again be possible targets of mutation for protein engineers in order to increase further the stability of the thermophilic proteins, i.e. to convert a thermophile to a hyperthermophile. For example, Leu35 and Glu39 in ribonuclease H (family 4) can be possibly mutated to Phe and Tyr, respectively, to enhance further the stability of the protein.


View this table:
[in this window]
[in a new window]
 
Table III. Additional aromatic clusters in mesophiles
 
Conclusions

This study on 24 pairs of structurally similar thermophilic and mesophilic proteins has shown that the thermophilic proteins have a large number of pairwise aromatic interactions compared with the mesophilic homologue. Certain families also show a presence of additional aromatic clusters which are larger in size. The additional clusters are located on the protein surface and are more rigid regions of the surface. The topologically equivalent mutations in the mesophiles are usually to non-aromatic Leu or Ile residues if the replacement is for Phe and either to Ser or Leu if a Tyr residue is mutated. The presence of at least one additional aromatic cluster close to the active site of the thermophile provides a plausible explanation for the high thermophilicity exhibited by most thermostable enzymes. During the course of evolution, the organisms had probably achieved viability by carefully mutating the surface residues and hence introducing an aromatic pair or an additional aromatic cluster in the protein.

Although the dataset considered in this study is limited, the consistent observation of additional aromatic clusters in nearly 70% of the protein families is significant. Also, the fact that this could be an elegant way of increasing thermal stability suggests that nature could probably have taken this simpler and elegant approach to adapt itself to higher temperatures. This study also provides new insights for protein engineers to design thermophilic and thermostable proteins.


    Notes
 
1 To whom correspondence should be addressed. E-mail: sv{at}mbu.iisc.ernet.in Back


    Acknowledgments
 
The computational facilities of the Supercomputer Education Research Centre, the Bioinformatics Centre and the interactive graphics facility of the Indian Institute of Science, Bangalore, are acknowledged.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Anderson,D.E., Hurley,J.H., Nicholson,H., Haase,W.A. and Matthews,B.W. (1993) Protein Sci., 2, 1285–1290.[Abstract/Free Full Text]

Burley,S.K. and Petsko,G.A. (1985) Science, 229, 23–28.[ISI][Medline]

Calflish,A. and Karplus,M. (1994) Proc. Natl Acad. Sci. USA, 91, 1746–1750.[Abstract]

Chipot,C., Jaffe,R., Maigret,B., Pearlman,D.A. and Kollman,P.A. (1996) J. Am. Chem. Soc., 118, 11217–11224.[ISI]

Connolly,M. (1993) J Mol. Graph., 11, 139–141.[ISI][Medline]

Deckert,G., Warren,P.V., Gaasterland,T., Young,W.G., Lenox,A.L., Graham, D.E., Overbeek,R., Snead,M.A., Keller,M. and Aujay,M. (1998) Nature, 392, 353–358.[ISI][Medline]

Facchiano,A.M., Colonna,G. and Ragone,R. (1998) Protein Eng., 11, 753–760.[Abstract]

Fersht,A. and Serrano,L. (1993) Curr. Opin. Struct. Biol., 3, 75–83.[ISI]

Georis,J., de Lemos Esteves,F., Brasseur,J., Bougnet,V., Devreese,B., Giannotta,F., Garnier,B. and Frre,J. (2000) Protein Sci., 9, 466–475.[Abstract]

Hecht,K., Wrba,A. and Jaenicke,R. (1989) Eur. J. Biochem., 183, 69–74.[Abstract]

Heringa,J., Argos,P., Egmond,M.R. and de Vlieg,J. (1995) Protein Eng., 8, 21–30.[Abstract]

Hobza,P., Selzle,H.L. and Schlag,E.W. (1996) J. Phys. Chem., 100, 18790–18794.[ISI]

Humphrey,W., Dalke,A. and Schulten,K. (1996) J. Mol. Graph., 14, 33–38.[ISI][Medline]

Jaenicke,R. and Bohm,G. (1998) Curr. Opin. Struct. Biol., 8, 738–748.[ISI][Medline]

Jaffe,R.L. and Smith,G.D. (1996) J. Chem. Phys., 105, 2780–2788.[ISI]

Kannan,N. and Vishveshwara,S. (1999) J. Mol. Biol., 292, 441–464.[ISI][Medline]

Kraulis,P.J. (1991) J. Appl. Crystallogr., D24, 946–950.[ISI]

Ladenstein,R. and Antranikian, G. (1998) Adv. Biochem. Eng. Biotechnol., 61, 37–82.[Medline]

Matthews,B.W., Nicholson,H. and Becktel,W.J. (1987) Proc. Natl Acad. Sci. USA, 84, 6663–6667.[Abstract]

McGaughey,G.B., Gagne,M. and Rappe,A.K. (1998) J. Biol. Chem., 273, 15458–15463.[Abstract/Free Full Text]

Menendez-Arias,L. and Argos,P. (1989) J. Mol. Biol., 206, 397–406.[ISI][Medline]

Nicholson,H., Becktel,W.J. and Matthews,B.W. (1988) Nature, 336, 651–656.[ISI][Medline]

Nojima,H., Ikai,A., Oshima,T. and Noda,H. (1977) J. Mol. Biol., 116. 429–442.[ISI][Medline]

Parthasarathy,S. and Murthy,M.R. (2000) Protein Eng., 13, 9–13.[Abstract/Free Full Text]

Querol,E., Perez-Pons,J.A. and Mozo-Villarias,A. (1996) Protein Eng. 9, 265–271.[Abstract]

Russell,R.B. and Barton,G.J. (1992) Proteins, 14, 309–323.[ISI][Medline]

Serrano,L., Bycroft,M. and Fersht,A.R. (1991) J. Mol. Biol., 218, 465–475.[ISI][Medline]

Serrano,L., Day,A.G. and Fersht,A.R. (1993) J. Mol. Biol., 233, 305–312.[ISI][Medline]

Singh,J. and Thornton,J.M. (1991) J. Mol. Biol., 218, 837–846.[ISI][Medline]

Smith,C.A., Toogood,H.S., Baker,H.M., Daniel,R.M. and Baker,E.N. (1999) J. Mol. Biol., 294, 1027–1040.[ISI][Medline]

Szilagyi,A. and Zavodszky,P. (2000) Structure, 8, 493–504.[ISI][Medline]

Tanner,J.J., Hecht,R.M. and Krause,K.L. (1996) Biochemistry, 35, 2597–609.[ISI][Medline]

Vihinen,M. (1987) Protein Eng., 1, 477–80.[Abstract]

Vogt,G. and Argos,P. (1997) Fold. Des., 2, 40–46.

Wigley,D.B., Clarke,A.R., Dunn,C.R., Barstow,D., Atkinson,T., Chia,W., Muirhead,H. and Holbrook,J. (1987) Biochim. Biophys. Acta, 916, 145–148.[ISI][Medline]

Yutani,K., Ogasahara,K., Tsujita,T. and Sugino,Y. (1987) Proc. Natl Acad. Sci. USA, 84, 4441–4444.[Abstract]

Zavodszky,P., Kardos,J., Svingor,A. and Petsko,G.A. (1998) Proc. Natl Acad. Sci. USA, 95, 7406–7411.[Abstract/Free Full Text]

Received May 16, 2000; revised September 11, 2000; accepted September 28, 2000.