Analysis of homodimeric protein interfaces by graph-spectral methods

K.V. Brinda1, N. Kannan2 and S. Vishveshwara1,3

1 Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India and 2 Cold Spring Harbor Laboratory, 1 Bungtown Road, PO Box 100, NY 11724, USA


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
The quaternary structures impart structural and functional credibility to proteins. In a multi-subunit protein, it is important to understand the factors that drive the association or dissociation of the subunits. It is a well known fact that both hydrophobic and charged interactions contribute to the stability of the protein interface. The interface residues are also known to be highly conserved. Though they are buried in the oligomer, these residues are either exposed or partially exposed in the monomer. It is felt that a systematic and objective method of identifying interface clusters and their analysis can significantly contribute to the identification of a residue or a collection of residues important for oligomerization. Recently, we have applied the techniques of graph-spectral methods to a variety of problems related to protein structure and folding. A major advantage of this methodology is that the problem is viewed from a global protein topology point of view rather than localized regions of the protein structure. In the present investigation, we have applied the methods of graph-spectral analysis to identify side chain clusters at the interface and the centers of these clusters in a set of homodimeric proteins. These clusters are analyzed in terms of properties such as amino acid composition, accessibility to solvent and conservation of residues. Interesting results such as participation of charged and aromatic residues like arginine, glutamic acid, histidine, phenylalanine and tyrosine, consistent with earlier investigations, have emerged from these analyses. Important additional information is that the residues involved are a part of a cluster(s) and that they are sequentially distant residues which have come closer to each other in the three-dimensional structure of the protein. These residues can easily be detected using our graph-spectral algorithm. This method has also been used to identify important residues (`hot spots') in dimerization and also to detect dimerization sites on the monomer. The residues predicted using the present algorithm have correlated well with the experiments indicating the efficacy of this method in predicting residues involved in dimer stability.

Keywords: dimerization sites/eigen vectors/expanded clusters/interface clusters/interface hot spots


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
Protein–protein interactions are extremely common in nature. Most proteins are functional only as dimers. Many others interact with other proteins to carry out their cellular functions. Classic examples include protein–receptor complexes, antigen–antibody complexes and innumerable other proteins involved in signal transduction. Hence, it is of utmost importance to understand the factors that affect the dimer interface stability. Different methods have been used to study protein interfaces. These include simple methods like detecting the change in the accessible surface area (ASA) when a monomer dimerizes (Chothia and Janin, 1975Go; Janin et al., 1988Go) and conservation of amino acid residues at protein interfaces (Hu et al., 2000Go; Valdar and Thornton, 2001Go). There are other methods which use geometric properties, surface complementarities between interacting monomers, change in conformational energies and other energy considerations, to predict interacting surfaces and to dock one monomer onto the other. The reviews by Sternberg et al. (Sternberg et al., 1998Go) and Lengauer and Rarey (Lengauer and Rarey, 1996Go) discuss these methods in detail. Other aspects like amino acid and charge complementarities between interacting surfaces, electrostatic and hydrogen bonding abilities at interfaces and hydrophobic patches occurring at interfaces have also been looked at (Jones and Thornton, 1996Go, 1997Go; Xu et al., 1997Go; Palma et al., 2000Go). It is believed that correlated mutations contain information regarding interacting residues in proteins (Pazos et al., 1997Go). The preferences of amino acid residues at the interface have also been analyzed (Jones and Thornton, 1996Go; Bogan and Thorn, 1998Go; Larsen et al., 1998Go; Glaser et al., 2001Go). Hydrophobic and charged interactions are known to play a major role in stabilizing the dimer (Larsen et al., 1998Go). It has been proposed that tryptophan, arginine and tyrosine are the preferred amino acid residues at the interface (Bogan and Thorn, 1998Go). We also know that hydrophobic residues dominate large interfaces whereas charged residues dominate small interfaces (Glaser et al., 2001Go). All these methods either look at one-to-one interactions between residues or surface geometries or change in conformational energy. However, our present analysis takes into account the overall topology of the protein and uses this input to detect side chain clusters in protein structures. Consequently, our method allows us to detect possible dimerization sites on the monomer as well as to recognize important residues involved in dimerization.

The present study has been directed towards analyzing protein interfaces in a set of 20 homodimers using a graph-spectral method (Kannan and Vishveshwara, 1999Go; Patra and Vishveshwara, 2000Go). Graph theory has been frequently used in the analysis of protein structures. For instance, graph-theoretic techniques have been used for the comparison of secondary structural motifs (Mitchell et al., 1990Go), analysis of sheet topologies (Koch et al., 1992Go) and identification of specific side chain patterns in three-dimensional structures of proteins (Artymiuk et al., 1994Go). Thermal fluctuations in proteins have also been evaluated using Kirchoff's adjacency matrix based on proximity of residues in three-dimensional space (Bahar et al., 1997Go). The present algorithm uses a graph-theoretic method to determine side chain clusters in proteins. It is a well known fact that side chain clusters in proteins aid in protein folding and in stabilizing the three-dimensional structure of proteins (Heringa and Argos, 1991Go). Previous investigations related to protein structure and stability, that were carried out in our laboratory, showed that aromatic side chain clusters in thermophilic proteins were involved in imparting thermal stability to proteins (Kannan and Vishveshwara, 2000Go). Residue clusters identified in {alpha}–ß barrel proteins were found to be topologically conserved and were essentially involved in imparting structural stability (Kannan et al., 2001bGo). These studies provided us insights into the possible role of side chain clusters in maintaining protein structure and stability and hence motivated us to take a closer look at dimer interfaces in terms of side chain clusters.

Graph-spectral parameters like the eigen values and the eigen vector components provide us significant information about the side chain clusters. These eigen values and their vector components along with other properties such as the difference in the accessible surface area ({delta}ASA) upon dimerization (Chothia and Janin, 1975Go) and the conservation of residues in homologous proteins, have been used to predict a few residues at the interface of these proteins which may play a significant role in dimer interface stabilization. We have also analyzed the clusters in the 20 monomers to identify exposed and conserved clusters that could possibly be dimerization sites on the monomer.

Our studies confirm that the interface cluster residues comprise of both charged and hydrophobic residues which implies that both charged and hydrophobic interactions are required for stabilizing the dimer interface. Most charged residues in the interface clusters are neutralized by oppositely charged residues, which can either belong to the same chain or to the other chain. This leads us to believe that dimer interfaces are essentially neutral with charges nullified by complementary residues. We find that there is high correlation between the residues with a high eigen vector component, large {delta}ASA and high conservation in homologs. Considering these factors, we propose that the residues which satisfy all the three above-mentioned criteria, will probably play a very significant role in the stability of the dimer interface. We would like to emphasize that detection of interface residue clusters and determination of their eigen values and eigen vector components, gives insight to the structural characteristics of protein interfaces. Based on these observations we have attempted to predict mutations at the interface of these proteins which may possibly disrupt the dimer interface. Identification of dimerization sites on the monomer, based on the detection of exposed clusters that are conserved, has also yielded good results.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
Data set and detection of clusters

Construction of protein graphs. The crystallographic coordinates of the 20 homodimers (Table IGo), whose resolution is better than 2.5 Å, have been obtained from the RCSB protein data bank (Berman et al., 2000Go). Side chain clusters were determined for all the 20 dimers and their corresponding monomers using a graph-theoretic algorithm. A brief description of the methodology is given here (Kannan and Vishveshwara, 1999Go). Each residue of the protein is represented in the form of a node in the graph. A protein with n residues is represented by a graph of n nodes. Non-glycine amino acids are represented by their Cß atoms whereas glycines are represented by C{alpha} atoms. The graph is constructed by connecting two non-adjacent nodes (residues i – 2 to i + 2 are excluded) by edges if the side chain interaction criterion (contact criterion) between them is satisfied. The contact criterion is a user-defined parameter, which essentially defines the condition that has to be satisfied by the residues so as to be connected to other residues. This criterion specifies the number of side chain atoms from a pair of sequentially non-adjacent residues that have to come within a distance of 4.5 Å so that they can be considered as interacting residues. If a contact criterion of 6% is specified by the user, then the output consists of clusters in which all the residues in a cluster will have at least 6% of overlap with one or more residues in the cluster.


View this table:
[in this window]
[in a new window]
 
Table I. Mutations predicted to influence dimerization
 
Matrix construction. The connected protein graph can be represented in terms of an adjacency matrix, A, where:

aij = 1/dij, if i and j are connected and

aij = 1/100, otherwise.

dij = distance between i and j.

The degree matrix, D, is a diagonal matrix obtained by summing up the elements of each column. The Laplacian matrix, L is defined as DA. This Laplacian matrix is of dimension n x n, where n is the number of residues in the protein.

Graph spectra. This Laplacian matrix, L, is diagonalized to yield the eigen values and the eigen vector components. The vector components corresponding to the second lowest eigen value gives the clustering information (Hall, 1970Go). The centers of these clusters can be identified from the eigen vector components of the top eigen values (Kannan and Vishveshwara, 1999Go; Patra and Vishveshwara, 2000Go). The cluster centers identified correspond to the nodes with the highest connectivity (degree) in the cluster, which, in most protein clusters that we have dealt with, also correspond to the geometric center of the cluster (unpublished results). Only clusters with three or more residues are considered in this analysis. Table IIGo shows the complete set of clusters obtained in the monomers and dimer of yeast triose phosphate isomerase when a contact criterion of 12% is used. The residues with the same vector component in the second lowest eigen value form a cluster. The residue with the highest magnitude of a vector component in the corresponding top eigen value is the center of the cluster. Thus, we can see that monomer A has two clusters, monomer B has three clusters and the dimer has eight clusters for the chosen contact criterion of 12%. The residues forming the center of the clusters are marked in bold.


View this table:
[in this window]
[in a new window]
 
Table II. Cluster residues and their vector components of yeast triose phosphate isomerase (1ypi) at 12% contact criteria
 
Identification of interface clusters

After determining side chain clusters in the dimer, those at the interface are identified and differentiated from the others based on the fact that the interface clusters would have contributions from both chains of the dimer. The contact criterion to select the interface clusters in a protein has been optimized as follows. Initially, we begin with a high contact criterion, for example 14%, to obtain clusters. We then gradually reduce the criterion by 1% until at least one or two clusters comprising of residues from both monomers are obtained. We have used contact criteria varying from 6 to 14% and have detected clusters varying from 5 to 14 in number. The number of residues per cluster varies from 3 to 15 according to the contact criterion used and the size of the protein. The same contact criterion has been used for a chosen monomer–dimer pair so that the clusters obtained in the two can be compared. These side chain clusters were then visualized using the package VMD (Humphrey et al., 1996Go). Having obtained a set of side chain clusters for the monomer as well as the dimer of the same protein, these clusters and their eigen values and eigen vector components were then critically analyzed to identify the cluster centers and also the changes in the side chain clusters that are expected to occur on dimerization.

Analysis of other properties

The ASAs of all the monomers as well as the dimers were determined using Connolly's ASA program with a probe of radius 1.4 Å (Connolly, 1993Go). The percentage difference in the ASA of residue i when the monomer dimerizes has been calculated as:

The total ASA for each residue type has been obtained from the literature (Miller et al., 1987Go).

The conservation of the interface cluster residues in various species has been looked at using the ClustalW program (Thompson et al., 1994Go). The sequences of these proteins from various species were obtained from the Swiss-Prot data bank (Bairoch and Apweiler, 2000Go).

Analysis of monomer clusters and identification of possible dimerization sites

It is believed that the three-dimensional structure of a monomer encodes the information required for dimerization. The features which have earlier been considered important for analyzing protein interfaces are (i) the nature and composition of amino acids (Jones and Thornton, 1996Go; Bogan and Thorn, 1998Go; Larsen et al., 1998Go; Glaser et al., 2001Go), (ii) solvent accessibility (exposed or buried) (Chothia and Janin, 1975Go) and (iii) conservation of interface residues (Hu et al., 2000Go; Valdar and Thornton, 2001Go). In the present study, we have incorporated these features in the amino acid clusters detected by our method and have attempted to identify the clusters involved in dimerization. Our analysis has shown that a strong interface is formed only when there is a seeding cluster (of three or more residues) in at least one of the monomers, which gets strengthened on dimerization. These are considered as `expanded' clusters. We also find that some `new' clusters are formed during dimerization. Invariably, the size of such clusters is small and results in fewer interactions between the two monomers, when compared with expanded clusters. These are explained in a later section. Our procedure is able to identify the seeding clusters in the monomer which get expanded on dimerization. The details of the identification of such seeding clusters in the monomers are given below.

Our analysis has shown that the seeding clusters are formed even when a high contact criterion (10–12%) is used indicating that these seeding (expanded) clusters consist of very strongly interacting residues. Thus, the first step in the process is to identify side chain clusters in a given monomer using a high contact criterion. The contact criterion is gradually reduced starting from 12 down to 8% until the first surface clusters emerge in the monomer. This leads to the use of different contact criteria for different monomers. The clusters thus obtained are then characterized based on their location, the extent of conservation of the component residues and number of preferred residues present in them. The `surface clusters' are identified by carrying out the ASA calculations using Connolly's algorithm (Connolly, 1993Go). Residues with >20% ASA are considered as exposed (E), those between 5 and 20% are considered partially exposed (P) and the rest are considered as buried (B). A cluster is considered as an exposed cluster if at least two of the residues are exposed or partially exposed. In the next step, we look for the presence of at least one of the preferred amino acids (arginine, histidine, phenylalanine, tyrosine and glutamic acid, which are the preferred amino acids in the seeding clusters based on our analysis) in these clusters. The clusters are then investigated for conserved residues. Residues could be classified based on their conservation not only as totally (T) and partially (P) conserved but also they could have undergone conserved mutations (M). A cluster with at least two of the residues conserved (T, P or M) is considered as a `conserved cluster'. After considerable reduction of contact criterion, if no clusters satisfying the above criteria are obtained, one can look for such clusters in the other monomer, which could have clusters satisfying these criteria. (Although, in principle, the two monomers of a homodimer are identical, there could be differences in the coordinates of the two monomers if the crystallographic asymmetric unit is a dimer.) Thus, `exposed, conserved clusters with the preferred residues' are identified as possible sites of dimerization. If there is more than one cluster identified as a dimerization site, then they can be ranked based on the extent of conservation and the number of preferred residues.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
Side chain cluster analysis

Identification of side chain clusters. Side chain clusters have been determined for all the 20 monomers and dimers using a graph-theoretic algorithm. The number and size of clusters obtained depend on the contact criterion used. If a low contact criterion is specified, then we could end up with the whole protein as a single cluster. Similarly, if a high contact criterion is used, one might lose essential information regarding side chain interactions because high cut-off will yield very few or no clusters at all. Hence, the contact criterion has been optimized for each protein so that we get clusters that are discriminated from the bulk of the protein and, therefore, different contact criteria have been used for different proteins. However, the same value is used for a chosen monomer–dimer pair. In this analysis, we have used contact criteria varying from 6 to 14% and we obtain side chain clusters varying from 5 to 15 in number per protein and the size of clusters varying from 3 to 15 residues per cluster.

Differences between clusters in monomers and dimers. After determining side chain clusters in the dimer, those at the interface are identified and differentiated from the others based on the fact that interface clusters would have component residues that are contributed from both chains of the dimer. The differences between the side chain clusters in the monomer and the dimer were then determined. The interface clusters in the dimer can be categorized (with reference to the clusters in the monomer) as: (i) a new cluster formed on dimerization; and (ii) an existing cluster in the monomer which expands or gets strengthened on dimerization, in which case the monomer consists of a set of seeding clusters to which more residues from the other monomer are added on dimerization.

These two cases can be understood from Tables II and IIIGoGo. Table IIGo shows the clusters obtained in the monomer and dimer of triose phosphate isomerase using a contact criterion of 12%. Clusters 2, 6 and 7 are non-interface clusters whereas the others are interface clusters. Clusters 3, 4 and 5 are new clusters that emerged in the dimer whereas clusters 1 and 8 are clusters which already had a seeding in the monomer that got strengthened after dimerization. Table IIIGo elucidates the clusters in the monomer and dimer of malate dehydrogenase at 11% cut-off. Malate dehydrogenase has three interface clusters. Clusters 2 and 15 are new interface clusters whereas cluster 1 is an expanded interface cluster, which had a seeding in the monomer. All others are non-interface clusters, which are present in the monomers as well. Figures 1 and 2GoGo show the clusters in the monomers and dimers of triose phosphate isomerase and malate dehydrogenase, respectively. The rectangular regions enclose the seeding clusters in the monomers and the interface clusters in the dimers.


View this table:
[in this window]
[in a new window]
 
Table III. Clusters in malate dehydrogenase (4mdh) at 11% contact criterion
 


View larger version (52K):
[in this window]
[in a new window]
 
Fig. 1. Side chain clusters in triose phosphate isomerase (a) dimer and (b) monomer. The cluster residues are shown in bold. The rectangular box in (a) corresponds to the interface clusters and the one in (b) shows the seeding clusters that get expanded on dimerization.

 


View larger version (50K):
[in this window]
[in a new window]
 
Fig. 2. Side chain clusters in malate dehydrogenase (a) dimer and (b) monomer. The cluster residues are shown in bold. The rectangular box corresponds to the interface clusters in the dimer (a) and the seeding cluster in the monomer (b).

 
Most of the proteins that have been analyzed have both these types of clusters at the interface. All 20 of them have some new clusters that are formed on dimerization. Greater than 70% of them have at least one additional cluster at the interface for which there was already a seeding cluster in the monomer, which got strengthened after dimerization. The expanded clusters impart more stability to the dimer interface than new clusters as they are involved in creating a bigger network of interactions between the two monomers involved in dimerization. These expanded (seeding) clusters have been further analyzed to predict possible dimerization sites on the monomer. The details of this are discussed in a different section.

Preference of amino acid residues in interface clusters. In both the above-mentioned cases, namely formation of new clusters and expansion of existing clusters, new charged and hydrophobic interactions, essential for dimer formation, are introduced. It has been observed that, if there is a charged (positive or negative) residue in the interface cluster, more often than not, there is an oppositely charged residue too in the same cluster, which neutralizes this charge and thus stabilizes the cluster. This oppositely charged residue can be from the same chain as the first charged residue or it can be from the other chain. Though dimer interfaces do have charged residues, they are essentially neutral because of the nullification of charges. Apart from these charged residues, most interface clusters also have hydrophobic residues. For example, in triose phosphate isomerase, when cluster 1 comprising of residues F102, E104, N65, R98, F108, gets strengthened to a cluster comprising of residues F108, E104, N65, R98, F102, E325, Y315 (Table IIGo), new charged (E325) and hydrophobic (Y315) residues are added on to the cluster which induce new interactions in the dimer which were initially absent in the monomer. Similarly, in the case of malate dehydrogenase, clusters 2 and 15 with residues M388, R229, D392, R161, L157 and M54, R563, D58, respectively, are new clusters that are formed on dimerization (Table IIIGo). These clusters also induce some new charged and hydrophobic interactions in the dimer that were absent in the monomer. Hence, both charged and hydrophobic residues are equally involved in the formation of new interface clusters as well as strengthening of existing clusters, both of which occur when a protein dimerizes.

The composition of the amino acids at the interface of proteins considered in this data set is given in Table IVGo. The values that are obtained by considering all residues, which have lost even a small amount of ASA upon dimerization, are reported. Also reported are values obtained by considering the composition of interface clusters. The normalized values (percentage compositions) for both these cases are shown as histograms in Figures 3aGo and b, respectively. It is evident that the composition patterns are different in both. Figure 3bGo is more discriminatory than Figure 3aGo. This shows that by using a high contact criterion, we are able to identify residues that contribute significantly to the stability of the dimer, which has been the aim of the present analysis. In principle, we could pick up all those residues that have lost ASA on dimerization by our method using a low cut-off value, in which case Figure 3bGo would be similar to Figure 3aGo.


View this table:
[in this window]
[in a new window]
 
Table IV. Amino acid preferences at the interface
 


View larger version (26K):
[in this window]
[in a new window]
 
Fig. 3. Histogram representing the occurrence of the 20 amino acids at protein interfaces. Amino acids are represented using their single letter codes. (a) The percentage of amino acids occurring at the interface based on loss of accessible surface area. (b) The percentage of amino acids occurring in the interface clusters.

 
Arginine, histidine, phenylalanine, tyrosine and glutamic acid are found to be the most preferred residues in the interface clusters as shown in Figure 3bGo. There is a significant contribution from tryptophan as well as methionine in the interface clusters when compared with the other amino acids. This preference of amino acids in interface clusters is consistent with the preferences obtained by earlier studies which were carried out on the basis of residue-wise interactions (Jones and Thornton, 1996Go; Bogan and Thorn, 1998Go; Larsen et al., 1998Go; Glaser et al., 2001Go). Clearly, there seems to be a preference for charged and aromatic side chains in the interface clusters. Glycine, alanine, valine and cysteine are among those residues that are rarely found in the interface side chain clusters. A comparison of Figures 3a and bGo shows that although these small side chain residues occur at the interface (Figure 3aGo), they do not contribute to the interface stability by participating in significant interactions across the interface (Figure 3bGo). Thus, the present analysis focuses on detecting strongly interacting residues at the interface.

Predicting residues involved in dimer stability

Previous investigators have used the loss of accessible area upon dimerization (Chothia and Janin, 1975Go) and other features such as conserved amino acid residues at the interface (Hu et al., 2000Go; Valdar and Thornton, 2001Go) and correlated mutations that occur in protein sequences (Pazos et al., 1997Go) for predicting the residues involved in stabilizing the dimers. These methods, no doubt, have aided in identifying important residues for oligomerization. Further, a rigorous method for analyzing protein interfaces using surface patch analysis has also been developed by Thornton's group (Jones and Thornton, 1997Go). Several other methods which use surface properties and geometric complementarity of monomers have also been used to identify protein interfaces. These methods have been discussed in detail in the reviews by Sternberg et al. (Sternberg et al., 1998Go) and Lenguaer and Rarey (Lenguaer and Rarey, 1996). However, most of these techniques examine the interface interactions at a one-to-one residue level or at a surface geometry level. Our present method of identification of side chain clusters using graph theory considers the connectivity input in a global way and the identified clusters as a network of connections between the monomers. Thus, the interface clusters detected by the graph-spectral method give extended side-chain network information. Further, the technique also identifies the center of such a networked cluster, identifying probably the most important residue(s) for dimerization. In the present study, we have combined a graph-spectral algorithm with traditional methods of investigating the features such as loss in accessible area upon dimerization and conservation of interface residues in order to predict residues important in dimer stability. The details of the investigations as applied to the present data set are given below.

Eigen vector component. The eigen vector component of an interface cluster residue can be used as an important criterion to determine residues, which may be involved in stabilizing the dimer interface. As mentioned earlier, the graph-theoretic algorithm used in this analysis gives the eigen values and the corresponding eigen vector components for each residue involved in cluster formation. Interface clusters and their centers have been identified as mentioned earlier. It has been observed earlier (Kannan and Vishveshwara, 1999Go; Patra and Vishveshwara, 2000Go) that the residues at the center of the cluster have a high vector component corresponding to the highest eigen value of that cluster. Similarly, the residues with a low vector component in this eigen value are away from the core of the cluster. The higher the magnitude of the vector component of a residue within a cluster, the more important is its role in the formation and stabilization of the cluster because it represents the core of the cluster. We would like to emphasize the fact that these high vector component residues make stronger and more number of contacts with their spatial neighbors than other residues in the cluster. The mutation of such residues could result in the loss of spatial contacts leading to loss of interactions across the dimer interface. Thus, interface clusters and their centers give us an insight to residues involved in dimer stability. Hence, this is one of the important factors that has been used to predict residues, which might stabilize the dimer interface.

Accessible surface area. The change in the accessible area of a residue when a monomer dimerizes can give us some information as to which are the interface residues. This property has been used as one of the criteria to predict dimer destabilizing mutants.

A measure of the loss in the ASA when two monomers associate to form a dimer gives an estimate of the hydrophobic free energy involved in dimer formation (Chothia and Janin, 1975Go). A good estimate of this can be made if the structures of both the monomer and the dimer are known. However, in the absence of the isolated monomer structure, it is reasonable to approximate the structure of the monomer to be very similar to that in the dimer. An empirical correlation between the extent of the ASA lost upon dimerization and the hydrophobic free energy contribution towards dimer formation has been given by Chothia and Janin (Chothia and Janin, 1975Go), according to which a loss of 1 Å2 corresponds to a contribution of 0.025 kcal/mol of hydrophobic free energy. Hence, the more the loss in the ASA of a residue on dimerization, the more the residue contributes to the hydrophobic free energy of dimerization. This property has been used as a contributing factor to determine the residues, which might play a major role in stabilizing the dimer. In the present analysis, any residue, which is a part of the identified interface cluster and has lost even a small amount of the ASA upon dimerization, has been taken into consideration for identification of `hot spots' at the dimer interface.

Conservation of interface cluster residues. It has been pointed out (Hu et al., 2000Go; Valdar and Thornton, 2001Go) that the extent of conservation is high in the interface residues and so conserved residues are more likely to be important in protein dimerization. Hence, it is important to know whether the interface cluster residues are conserved or not. Therefore, this has been used as one of the factors to identify residues that might be important for dimer stability. The homologous set of sequences for each of the 20 homodimers were obtained from Swiss-Prot data bank and aligned using the ClustalW algorithm (Thompson et al., 1994Go). The conserved residues were then identified from these aligned sequences. The sequence alignment of cardiotoxin (1cdt) is shown in Figure 4Go as an example.



View larger version (71K):
[in this window]
[in a new window]
 
Fig. 4. Multiple sequence alignment of cardiotoxin using ClustalW. The first sequence shown in this alignment is the sequence of 1cdt, the protein in our data set. *, completely conserved residues; :, conserved mutations; ., partially conserved residues. The residues K50 and N45 (shown in bold), which are part of a new interface cluster, are completely conserved and partially conserved, respectively.

 
We find that most of the interface cluster residues are conserved indicating that they play an important role in stabilizing the dimer. It is important to note that a cluster, with most of its residues conserved in homologs, implies enormous significance because the cluster residues are sequentially distant residues and they make spatial contact in the three-dimensional structure of the protein. Such conserved clusters could be structurally or functionally important for the protein. The fact that most of the interface clusters are highly conserved indicates that these cluster residues, which are sequentially distant but close in the three-dimensional structure, are important from the structural perspective. Though most of the interface cluster residues are conserved, the ones that are cluster centers are essentially highly conserved. This factor precisely strengthens our argument that cluster centers of the interface clusters are strongly involved in dimer stabilization.

Those residues that satisfy all the three above-mentioned conditions (conserved, high vector component and high {delta}ASA) have been predicted as `hot spots' on the dimer interface. The protein dimer will possibly lose its structural credibility upon mutating such residues. The hot spots identified in all the 20 proteins of the data set are listed in Table IGo. The position of the vector components, the percentage of {delta}ASA and the extent of conservation of these residues are also given in Table IGo. In most of the cases, there is high correlation between residues that have high vector components, high {delta}ASA and high conservation. There are a few cases where the {delta}ASA is less but the vector component and the extent of conservation are very high and are also experimentally found to have an effect on dimerization (discussed in the next section). Hence, in the present study, residues which are conserved and have high vector components have been given higher weightage.

Experimental evidence for the predicted mutations. Single and multiple mutations carried out on these proteins were analyzed to correlate our results with the experimentally verified mutations. Although some of these mutations were designed for other purposes, we have reported only those which have relevant information regarding dimer interface stability. An extensive literature survey shows that only in five of the 20 proteins considered in our data set, experimental data is available on mutations that affect the dimer formation and stabilization (Table IGo). In these cases, the predicted mutations have correlated well with the experiments. In the case of phospholipase A2, we have predicted two mutations (Phe5 and Ile9) which have been carried out experimentally and were found to disrupt the dimer (Liu et al., 1995Go). In triose phosphate isomerase, Arg98, which is one of the predicted residues, has been mutated and found to destabilize the interface (Mainfroid et al., 1996Go). Similarly, in the case of interleukin8, tyrosyl tRNA synthetase and isocitrate dehydrogenase, some predicted mutations have already been experimentally verified and found to disrupt dimer formation. In all the other proteins present in the data set, no relevant experimental information is available because most of the other predicted mutations in all these proteins are yet to be carried out. Our prediction of the Glu104 mutation in triose phosphate isomerase, as having no effect on dimerization, was a false negative as this mutation was known to effect dimerization (Daar et al., 1986Go). We had failed to predict this residue, as it did not form the cluster center even though it was part of the interface cluster with a reasonably high vector component value (Table IIGo). This residue could have been a part of the set of predicted residues if we had used a less stringent criterion regarding the magnitude of the vector component. This could be done when the cluster size is big. If the cluster size was small, considering the first few vector components would be sufficient for predicting the crucial residues important for dimer stability. All our predicted mutations are listed in Table IGo. In the terminology of the Laplacian matrix, these predicted `hot spots' have very high magnitude in the corresponding diagonal element of the Laplacian matrix. Apart from these, other experimentally tested mutations, which have not been predicted by us, are listed in Table VGo. Interestingly, most of these mutations do not have any effect on dimerization. These negative results can be rationalized from our present investigations. A careful analysis shows that these residues have not been predicted for the following reasons. They are either not a part of any interface cluster, in which case they may or may not have a high magnitude in the Laplacian matrix, or do not have a high vector component, due to very low magnitude in the corresponding diagonal element of the Laplacian matrix. Hence, looking at interface clusters in terms of their vector components can give us valuable information about residues involved in dimerization. Out of 55 predicted mutations, eight have been experimentally carried out and all of them have given positive results (Table IGo). Out of 16 non-predicted mutations that have been carried out only one has yielded a positive result (Table VGo). Thus, the algorithm has yielded correct results in >90% of the cases. These experiments do validate our argument that identifying interface clusters could essentially give us information on residues involved in dimer formation and stabilization, and that we have a rational method to identify residues that stabilize the dimer interface. Therefore, we can infer that these interface cluster residues and especially the cluster centers, are structurally and functionally important for the protein. The monomers fail to associate upon mutation of the residues, thereby hampering the formation of the functionally significant dimer.


View this table:
[in this window]
[in a new window]
 
Table V. Experimentally tested mutations which have not been predicted by the present method
 
Another convincing support for our prediction method is its experimental verification on the dimerization of the {alpha}-subunit of RNA polymerase (Kannan et al., 2001aGo). A few residues were predicted that could destabilize the dimer, based on a similar analysis. The experiments were then carried out and it was found that mutation of the predicted residues indeed disrupts the dimer interface. All previous mutation experiments on the proteins in the data set have been carried out either randomly or by just considering the change in the accessible area of the residue when the monomer dimerizes and other residue-wise interactions. We now have provided a rational method, which actually takes into consideration the spatial interactions amongst residues and the clustering of such interactions at interfaces. This has proved to be more effective than predictions based on other traditional methods. Also, the residues, which satisfy the criteria of conservation and high {delta}ASA, can be scored on the basis of vector components to get a rank-ordered list of `hot spots'.

Analysis of monomer clusters that get expanded on dimerization

Apart from predicting the hot spots from the interface clusters, we have also analyzed the clusters in the monomers that get expanded on dimerization. Most often, we have a crystal structure of the monomer and from the biochemistry of the monomer we know that it is a functional dimer or it interacts with other monomers for its activity. In such cases, it is relevant to predict the possible dimerization sites on the monomer. We have analyzed the side chain clusters in the monomers to get some insight to the possible dimerization sites.

Identification of exposed and conserved clusters in the monomer. As mentioned earlier, we do see that some clusters present in the monomer get expanded or strengthened after dimerization. This suggests that there is some kind of seeding that is present in the monomer, which gets strengthened upon dimerization. We have analyzed these strengthened clusters in all the 20 homodimers to look for a specific clustering pattern in these expanded clusters in terms of size and nature of amino acids. We find that most of the strengthened clusters are those which are formed when the contact criterion used is as high as 10–12%. The size of these clusters varies from three to six residues per cluster. Also, the preferred amino acid residues in such seeding clusters are arginine, phenylalanine, histidine, tyrosine and glutamic acid.

Once the clusters in the monomer are determined using high contact criteria, these clusters are characterized further based on the ASA and extent of conservation of component residues as well as the number of preferred residues present in the cluster. The `surface clusters' are identified by looking at the ASA of the cluster residues. After identifying the surface clusters, we examine these surface clusters thoroughly for the presence of frequently occurring residues at the interface, which include arginine, histidine, phenylalanine, tyrosine and glutamic acid. In most cases, the number of exposed clusters when a high contact criterion is used is as small as two to five clusters. The number of exposed clusters with the preferred amino acids would be still smaller. The `conserved clusters' amongst these clusters are the likely sites of dimerization on the monomer. Even if there is more than one cluster which is exposed, conserved and also has the preferred amino acid residues, we could still rank these clusters according to their tendency to dimerize, based on the extent of conservation of the residues forming these clusters and the number of preferred residues present. Hence, we can localize the dimerization sites on the monomer to these clusters. If we do not find any cluster satisfying these criteria, then we could reduce the contact criterion down to 8% to get a new set of clusters which can again be subjected to the same set of rules. Table VIGo shows the monomer clusters in 14 proteins where at least one of the clusters has got expanded on dimerization. We have identified the possible clusters of dimerization (underlined) based on the above mentioned criteria. The selection criteria have done extremely well in identifying all the interface clusters (bold, underlined). A few clusters which have been identified as possible dimerization sites but are actually not involved in dimerization (underlined, not bold) indeed do not rank top if the conservation criterion is made more stringent.


View this table:
[in this window]
[in a new window]
 
Table VI. Identification of dimerization sites on the monomers

 
Out of the 20 proteins used in the data set, 14 of them show such seeding clusters which are identified using the above mentioned criteria (Table VIGo). Four other proteins (1cdt, 1il8, 1utg and 2gn5) are very small proteins with approximately 60–80 residues per monomer. These proteins do not show seeding clusters, until the criterion is reduced to 4%. This could be because these monomers do not have a well defined buried core as in the case of bigger proteins. So the ratio of buried to exposed residues in such proteins is much smaller leading to difficulty in selectively identifying exposed residues and clusters, which are of importance for dimerization. Two other proteins, 1msb and 3gap, of sizes 115 and 209 residues, respectively, do show exposed, conserved clusters at a high contact criterion, but these clusters do not participate in the homodimer interface even when the contact criterion is reduced down to 8%. Possibly, these clusters are involved in interaction with other proteins. There is experimental evidence to show that one of these proteins (1msb) interacts with various other proteins like CD14, serine proteases, etc. (Wallis and Dodd, 2000Go; Chiba et al., 2001Go).

Thus, the present analysis of looking for exposed and conserved clusters in the monomer with the preferred amino acids, using high contact criteria, could help us to identify dimerization sites on the monomer. This method actually narrows down the search space for the identification of dimerization sites on the monomer. Instead of analyzing all exposed surfaces or residues in the monomer, we can restrict our analysis to such exposed and conserved clusters. This method has the potential to evolve as an effective one for predicting possible dimerization sites on the monomer. The limitation of this method is that it does not perform well on smaller proteins with less than approximately 100 amino acid residues.


    Conclusions
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
The graph-theoretic algorithm, which considers the global topology of protein structures, has been successfully implemented to obtain side chain clusters at dimer interfaces of proteins. Analyses of these side chain clusters indicate that both charged and hydrophobic residues are involved in stabilizing the dimer interface. However, the interface is neutral because of the presence of oppositely charged residues. Arginine, histidine, phenylalanine, tyrosine and glutamic acid seem to be the most preferred residues at the dimer interface. Residues important for dimer formation and stabilization have been predicted using the present graph-theoretic algorithm and the predicted mutations have correlated well with experimental results. Hence, we have a robust method for predicting residues that play a significant role in stabilizing dimers. We have also ventured into predicting dimerization sites on the monomer, which has been extremely successful in this limited data set and hence, the algorithm could very well evolve as a good method for prediction of dimerization sites on the monomer. We would like to emphasize the fact that the major advantage of this method is that we are analyzing interfaces on the basis of the side chain clusters detected using a graph-theoretic algorithm, where the clustering residues are sequentially non-adjacent but spatially close to each other. Analysis of the clusters of such spatially connected residues yields better results than just analyzing pair-wise residue interactions.


    Notes
 
3 To whom correspondence should be addressed. E-mail: sv{at}mbu.iisc.ernet.in Back


    Acknowledgments
 
The authors would like to thank the Super Computer Education and Research Centre and the Distributed Informatics Centre of the Indian Institute of Science, Bangalore, India, for the computational facilities provided by them. One of the authors (K.V.B.) would like to thank the Centre for Scientific and Industrial Research, India, for the fellowship offered.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusions
 References
 
Artymiuk,P.J., Poirrette,A.R., Grindley,H.M., Rice,D.W. and Willet,P., (1994) J. Mol. Biol., 243, 327–344.[CrossRef][ISI][Medline]

Bahar,I., Atilgan,A.R. and Erman,B. (1997) Fold Des., 2, 173–181.[ISI][Medline]

Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45–48.[Abstract/Free Full Text]

Barnes,H.J., Nordlund-Moller,L., Nord,M., Gustafsson,J., Lund,J. and Gillner M. (1996) J. Mol. Biol., 256, 392–404.[CrossRef][ISI][Medline]

Bedoulle,H. and Winter,G. (1986) Nature, 320, 371–373.[ISI][Medline]

Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) Nucleic Acids Res., 28, 235–242.[Abstract/Free Full Text]

Bogan,A.A. and Thorn,K. (1998) J. Mol. Biol., 280, 1–9.[CrossRef][ISI][Medline]

Borchert,T.V., Kishan,K.V., Zeelen,J.P., Schliebs,W., Thanki,N., Abagyan,R., Jaenicke,R. and Wierenga,R.K. (1995) Structure, 3, 669–679.[ISI][Medline]

Chiba,H., Sano,H., Iwaki,D., Murakami,S., Mitsuzawa,H., Takahashi,T., Konishi,M., Takahashi,H. and Kuroki,Y. (2001) Infect. Immun., 69, 1587–1592.[Abstract/Free Full Text]

Chothia,C. and Janin,J. (1975) Nature, 256, 705–708.[ISI][Medline]

Connolly,M.L. (1993) J. Mol. Graph., 11, 139–141.[CrossRef][ISI][Medline]

Daar,I.O., Artymiuk,P.J., Phillips,D.C. and Maquat,L.E. (1986) Proc. Natl Acad. Sci. USA, 83, 7903–7907.[Abstract]

Dunkel,R., Vriend,G., Beato,M. and Suske,G. (1995) Protein Eng., 8, 71–79.[Abstract]

Dupureur,C.M., Yu,B.Z., Mamone,J.A., Jain,M.K. and Tsai,M.D. (1992a) Biochemistry, 31, 10576–10583.[ISI][Medline]

Dupureur,C.M., Yu,B.Z., Jain,M.K., Noel,J.P., Deng,T., Li,Y., Byeon,I.J. and Tsai,M.D. (1992b) Biochemistry, 31, 6402–6413.[ISI][Medline]

Glaser,F., Steinberg,D.M., Vakser,I.A. and Ben-Tal,N. (2001) Proteins Struct. Funct. Genet., 43, 89–102.[CrossRef][ISI][Medline]

Hall,K.M. (1970) Manage. Sci., 17, 219–229.[ISI]

Hammond,M.E., Shyamala,V., Siani,M.A., Gallegos,C.A., Feucht,P.H., Abbott,J., Lapointe,G.R., Moghadam,M., Khoja,H., Zakel,J. and Tekamp-Olson,P. (1996) J. Biol. Chem., 271, 8228–8235.[Abstract/Free Full Text]

Heringa,J. and Argos,P. (1991) J. Mol. Biol., 220, 151–171.[ISI][Medline]

Horcher,M., Rot,A., Aschauer,H. and Besemer,J. (1998) Cytokine, 10, 1–12.[CrossRef][ISI][Medline]

Hu,Z., Ma,B., Wolfson,H. and Nussinov,R. (2000) Proteins Struct. Funct. Genet., 39, 331–342.[CrossRef][ISI][Medline]

Humphrey,W., Dalke,A. and Schulten,K. (1996) J. Mol. Graph., 141, 33–38.

Hurley,J.H., Chen,R. and Dean,A.M. (1996) Biochemistry, 35, 5670–5678.[CrossRef][ISI][Medline]

Iobst,S.T., Wormald,M.R., Weis,W.I., Dwek,R.A., Drickamer,K. (1994) J. Biol. Chem., 269, 15505–15511.[Abstract/Free Full Text]

Janin,J., Miller,S. and Chothia,C., (1988) J. Mol. Biol., 204, 155–164.[ISI][Medline]

Jones,S. and Thornton,J.M. (1996) Proc. Natl Acad. Sci. USA, 93, 13–20.[Abstract/Free Full Text]

Jones,S. and Thornton,J.M. (1997) J. Mol. Biol., 272, 121–132.[CrossRef][ISI][Medline]

Kannan,N. and Vishveshwara,S. (1999) J. Mol. Biol., 292, 441–464.[CrossRef][ISI][Medline]

Kannan,N. and Vishveshwara,S. (2000) Protein Eng., 13, 753–761.[Abstract/Free Full Text]

Kannan,N., Preethi,C., Pallavi,G., Vishveshwara,S. and Dipankar,C. (2001a) Protein Sci., 10, 46–54.[Abstract/Free Full Text]

Kannan,N., Selvaraj,S., Gromiha,M.M. and Vishveshwara,S. (2001b) Proteins Struct. Funct. Genet., 43, 103–112.[CrossRef][ISI][Medline]

Koch,I., Kaden,F. and Selbig J. (1992) Proteins, 12, 314–323.[ISI][Medline]

Larsen,T.A., Olson,A.J. and Goodsell,D.S. (1998) Structure, 6, 421–427.[ISI][Medline]

Lee,B.I., Yoon,E.T. and Cho,W. (1996) Biochemistry, 35, 4231–4240.[CrossRef][ISI][Medline]

Lengauer,T. and Rarey,M. (1996) Curr. Opin. Struct. Biol., 5, 402–406.[CrossRef]

Liu,X., Zhu,H., Huang,B., Rogers,J., Yu,B.Z., Kumar,A., Jain,M.K., Sundaralingam,M. and Tsai,M.D. (1995) Biochemistry, 34, 7322–7334.[ISI][Medline]

Lo,C.C., Hsu,J.H., Sheu,Y.C., Chiang,C.M., Wu,W., Fann,W. and Tsao,P.H. (1998) Biophys. J., 75, 2382–2388.[Abstract/Free Full Text]

Mainfroid,V., Mande,S.C., Hol,W.G., Martial,J.A. and Goraj,K. (1996) Biochemistry, 35, 4110–4117.[CrossRef][ISI][Medline]

Maliwal,B.P., Yu,B.Z., Szmacinski,H., Squier,T., Binsbergen,J., Slotboom,A.J. and Jain,M.K. (1994) Biochemistry, 33, 4509–4516.[ISI][Medline]

Miller,S., Janin,J., Lesk,A.M. and Chothia,C. (1987) Nature, 328, 834–836.[CrossRef][ISI][Medline]

Mitchell,E.M., Artymiuk,P.J., Rice,D.W. and Willet,P. (1990) J. Mol. Biol., 212, 151–166.[ISI][Medline]

Palma,P.N., Krippahl,L., Wampler,J.E. and Moura J.J. (2000) Proteins Struct. Funct. Genet., 39, 372–384.[CrossRef][ISI][Medline]

Patra,S.M. and Vishveshwara,S. (2000) Biophys. Chem., 84, 13–25.[CrossRef][ISI][Medline]

Pazos,F., Helmer-Citterich,M., Ausiello,G. and Valencia,A. (1997) J. Mol. Biol., 271, 511–523.[CrossRef][ISI][Medline]

Sternberg,M.J., Gabb,H.A. and Jackson,R.M. (1998) Curr. Opin. Struct. Biol., 8, 250–256.[CrossRef][ISI][Medline]

Thompson,J.D., Higgins,D.J. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 4673–4680.[Abstract]

Tzeng,Y.L., Zhou,X.Z. and Hoch,J.A. (1998) J. Biol. Chem., 273, 23849–23855.[Abstract/Free Full Text]

Valdar,W.S. and Thornton,J.M. (2001) Proteins Struct. Funct. Genet., 42, 108–124.[CrossRef][ISI][Medline]

Wallis,R. and Dodd,R.B. (2000) J. Biol. Chem., 275, 30962–30969.[Abstract/Free Full Text]

Williams,J.C., Zeelen,J.P., Neubauer,G., Vriend,G., Backmann,J., Michels,P.A., Lambeir,A.M. and Wierenga,R.K. (1999) Protein Eng., 12, 243–250.[Abstract/Free Full Text]

Xu,D., Tsai,C.J. and Nussinov,R. (1997) Protein Eng., 10, 999–1012.[Abstract]

Received July 27, 2001; revised December 21, 2001; accepted January 28, 2002.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (7)
Request Permissions
Google Scholar
Articles by Brinda, K.V.
Articles by Vishveshwara, S.
PubMed
PubMed Citation
Articles by Brinda, K.V.
Articles by Vishveshwara, S.