Favorable scaffolds: proteins with different sequence, structure and function may associate in similar ways

Ozlem Keskin1 and Ruth Nussinov2,3

1Koc University, Center of Computational Biology and Bioinformatics, and College of Engineering, Rumelifeneri Yolu, 34450 Sariyer Istanbul, Turkey, 2Basic Research Program, SAIC-Frederick, Inc., Laboratory of Experimental and Computational Biology, NCI-Frederick, Frederick, MD 21702, USA and 3Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel

Correspondence should be addressed to R.Nussinov at SAIC-Frederick, Inc. or to O.Keskin. E-mail: ruthn{at}ncifcrf.gov or okeskin{at}ku.edu.tr


    Abstract
 Top
 Abstract
 Introduction
 Results and discussion
 References
 
Proteins with similar structures may have different functions. Here, using a non-redundant two-chain protein–protein interface dataset containing 103 clusters, we show that this paradigm extends to interfaces. Whereas usually similar interfaces are obtained from globally similar chains, this is not always the case. Remarkably, in some interface clusters, although the interfaces are similar, the overall structures and functions of the chains are different. Hence, our work suggests that different folds may combinatorially assemble to yield similar local interface motifs. The preference of different folds to associate in similar ways illustrates that the paradigm is universal, whether for single chains in folding or for protein–protein association in binding. We analyze and compare the two types of clusters. Type I, with similar interfaces, similar global structures and similar functions, is better packed, less planar, has larger total and non-polar buried surface areas, better complementarity and more backbone–backbone hydrogen bonds than Type II (similar interfaces, different global structures and different functions). The dataset clusters may provide rich data for protein–protein recognition, cellular networks and drug design. In particular, they should be useful in addressing the difficult question of what the favorable ways for proteins to interact are.

Keywords: interface motifs/protein architecture/protein–protein binding/protein–protein interaction/protein–protein interfaces


    Introduction
 Top
 Abstract
 Introduction
 Results and discussion
 References
 
Nature frequently uses convergent solutions to address biological problems. Globular proteins almost always fold into a unique three-dimensional structure described by a set of secondary structures (Finkelstein and Ptitsyn, 1987Go; Finkelstein et al., 1991Go). Although particular folds are used for particular functions, one fold may have more than one function. There have been speculations as to whether proteins with similar folds seen today result from convergent evolution to a stable fold or from divergent evolution from a common ancestor. Since biological functions are derived from the formation of protein complexes, protein–protein interactions are crucial for the structural and functional organization of the cell. Understanding these interactions is expected to shed light on molecular mechanisms of biological processes in vivo, such as cellular regulation, biosynthethic and degradation pathways, signal transduction, initiation of DNA replication, transcription and translation, multimolecular associations, packaging, the immune response and oligomer formation (Kleanthous, 2000Go). They relate to allosteric mechanisms, to turning genes on and off and to drug design. Yet, despite the broad recognition of the importance of deciphering the complex nature of protein interactions (Jones and Thornton, 1996Go; Bogan and Thorn, 1998Go; Keskin et al., 1998Go; Tsai et al., 1998Go; LoConte et al., 1999Go; Ma et al., 2001Go; Valdar and Thornton, 2001Go; Chakrabarti and Janin, 2002Go; Boniecki et al., 2003Go; Zhang et al., 2003Go), they are still not entirely understood. Studies of protein–protein interactions aim to provide deeper insights into the nature and mechanism of protein recognition. The identification of protein-binding sites is a challenging aspect of functional analysis.

Protein–protein interfaces have been characterized in terms of their structural and physical properties (size, shape, complementarity and packing) and their chemical nature (amino acid composition, chemical group distributions, hydrophobicity/hydrophilicity, electrostatic interactions, hydrogen bonding and interactions with water) (Katchalski-Katzir et al., 1992Go; Jones and Thornton, 1996Go; Wallis et al., 1998Go; Todd et al., 2002Go; Arkin et al., 2003Go; Nooren and Thornton, 2003Go). Most of the physical and biochemical data are derived from either enzymes (mostly proteases, interacting with protein inhibitors) and antibodies interacting with their cognate antigens. Evolutionary perspectives of interfaces have also been studied. Subunit interfaces in proteins are generally hydrophobic and aromatic residues are frequently conserved. Conserved functional residues across interfaces between the two chains and on one side of the interface and experimental hot spot residues, were observed to be organized in cooperative ‘hot regions’ (Keskin et al., 2005Go).

Recently, we have extracted all interfaces between two protein chains obtained from higher complexes of proteins (Keskin et al., 2004Go). Interfaces sharing similar architectures were clustered. At the end of the iterative procedure, we obtained 3799 interface clusters. These structurally similar interface clusters were further filtered to eliminate redundancy. A remaining cluster should contain at least five members to be a ‘valid’ cluster. These criteria have decreased the number of clusters from 3799 to 103. The final set of clusters contains member proteins as diverse as enzymes, antibodies, viral capsids, etc. We divide the 103 clusters into three categories as summarized in Table I. Type I: two chain interface clusters with unique functions. Members of these clusters have similar chains and similar interfaces. Thus, the entire complexes are well aligned. Within a cluster, the single chains from which the interfaces were derived have similar functions. Type II: two chain interfaces with multi-functions. The interfaces of members of these clusters have both of their chains aligned. However, the functions of the proteins whose interfaces belong to the cluster differ. Type III: interfaces with multi-functions. However, unlike Type II, members of these clusters have only one side of their interface aligned. Within a cluster, all proteins whose interfaces belong to the cluster have dissimilar functions. Schematic representations of Type I, II and III interfaces are given in Figure 1. Table II provides the listing of the Type II interfaces. Clusters containing similar interfaces with dissimilar global protein folds are good candidates for detailed structural/functional studies. Since the overall structures of the proteins are different, these proteins mostly have different functions.


View this table:
[in this window]
[in a new window]
 
Table I. Definition of interface types

 


View larger version (29K):
[in this window]
[in a new window]
 
Fig. 1. Schematic representation of three interface types: Type I, II and III. Type I includes clusters with similar interfaces and similar functions. Type II consists of clusters with similar interfaces but dissimilar functions. Type III clusters include interfaces which share only one similar side in their aligned interfaces.

 

View this table:
[in this window]
[in a new window]
 
Table II. Clusters: similar interfaces with dissimilar folds

 
The fact that different proteins bind in similar ways to yield similar interface architectures suggests that these interface motifs are favorable structural scaffolds. They lend stability to the protein–protein interactions (Cunningham and Wells, 1991Go; Well and deVos, 1996Go; DeLano et al., 2000Go) and allow functional flexibility. This situation is reminiscent of protein structures (Finkelstein and Ptitsyn, 1987Go; Finkelstein et al., 1991Go). The recurrence of folds in single chains has led to the proposition of the paradigm of the limited number of folding motifs, regardless of the diversity of protein functions (Chothia, 1992Go; Ma et al., 2002Go). It has further been shown that similar motifs have similar patterns of flexibility (Keskin et al., 2000Go). Various estimates of the number of folds in nature have appeared in the literature. Clearly, evolution repeatedly utilizes favorable, stable folds adapting them to a broad range of regulatory, enzymatic and packaging/structural roles. Here, we focus on these Type II interfaces. We show that different protein folds combinatorially assemble to yield similar local interface motifs. The preference of different folds to associate in similar ways illustrates that this paradigm (Chothia, 1992Go) is universal, whether for single chains in folding or for protein–protein association in binding. Below, we enumerate examples and analyze Type II interfaces found in the same structural cluster, yet have different global protein structures and different functions.


    Results and discussion
 Top
 Abstract
 Introduction
 Results and discussion
 References
 
The interface

A protein–protein interface consists of two polypeptide chains forming the two sides of the interface. Residues on both sides interact with each other. Several criteria may define an interface. Here, two residues are defined to be interacting if the distance between any two atoms of the two residues from the different chains is less than the sum of their corresponding van der Waals radii plus 0.5 Å (Tsai et al., 1996Go). A residue is defined to be a ‘nearby’ residue if the distance between its C{alpha} atom and a C{alpha} atom of any interacting residue is <6 Å. Nearby residues are important in clustering, since they provide the interface scaffold.

Dataset

We applied these definitions to all multi-chain PDB entries (Berman et al., 2000Go) in the database. On 18 July 2002, there were 18 687 entries in the PDB, which included 35 112 single chains. PDB entries that contain more than two chains were used to obtain all two-chain combinations. Thus, interfaces between any two chains were extracted (Keskin et al., 2004Go). These included all two-chain interfaces from dimers, trimers and higher complexes. As a result, 21 686 two-chain interfaces were obtained. Dimers are included in the dataset. The interfaces were renamed as follows: if the PDB code of a protein is 1fq3 [PDB] and it has two chains A and B, the interface is named 1fq3AB, indicating that there is an interface between chains A and B of protein 1fq3 [PDB] . All the interfaces were structurally compared by the Geometric Hashing sequence order-independent structural algorithm (Nussinov and Wolfson, 1991Go; Tsai et al., 1996Go). We used a heuristic iterative clustering procedure that assigned an interface into a cluster if its similarity to the cluster representative was below predefined thresholds, otherwise it was assigned as a new cluster representative. Six clustering cycles were performed, gradually relaxing the thresholds. Overall, 3799 clusters were obtained. Sequences within each cluster were compared using CLUSTALW (Higgins et al., 1994Go) and the BLOSSUM90 substitution matrix (Henikoff and Henikoff, 1992Go). Any one of two entries in the same cluster sharing more than 50% similarity was eliminated. Clusters with less than five members were removed, leading to 103 clusters. Crystal interfaces are included in the dataset. Since crystal interfaces may be structurally similar to biological interfaces, it is often the case that a given cluster contains both biological and crystal interfaces. Three of the case studies described below are examples of such clusters. Although we have not segregated the dataset into transient and permanent complexes, Type II (and Type III) interfaces may be expected to fall largely into the transient category.

Structural alignment algorithm: MultiProt

MultiProt is fully automated software to align multiple protein structures simultaneously. It finds the common geometric cores among the input structures. It does not require that all members participate in the alignments and detects high scoring partial multiple alignments of the input structures. It is a residue-sequence order and directionality independent algorithm, which makes it applicable to protein–protein interfaces (Shatsky et al., 2004Go). Using MultiProt, we structurally aligned cluster members. The parameters used in the alignments are as follows: maximal r.m.s.d. for matching = 3.5 Å; minimal size of rigidly matched fragments = 3; maximal shift in indices of two matched fragments = 20; overlap ratio = 0.8; OnlyRefMol = 0; and FullSet = 1. MultiProt chooses one of the structures as the representative of the multiple alignment. This representative is the one most similar to all members, hence not necessarily the same one as in the previous, hierarchical clustering. We analyzed the residue identities at specific spatial positions. If a residue type is found at a specific position in more than half of the interfaces, it is labeled as a ‘computational hot spot’ or a structurally conserved residue.

Interface family types: similar interfaces, similar global folds (Type I); similar interfaces, dissimilar folds (Type II)

In most cases, if the interfaces are similar, the overall protein folds are also similar. Such similar interface, similar fold clusters (Type I) contain a single family. However, some clusters (listed in Table II), belong to a different, particularly interesting category. In these cases the interfaces are structurally similar; however the global protein folds are different (Type II). These similar interfaces, dissimilar protein folds fall into different families [see the SCOP classification (Murzin et al., 1995Go), also provided in Table II, first column]. However, since they have similar interfaces they are nevertheless members of the same interface clusters. The parent proteins of these interfaces belong to families that have different functions. Hence interface similarity does not ensure global structural similarity. Furthermore, it has been shown previously that globally similar structures may have different functions in proteins, although there is usually a correspondence between fold and function (Orengo et al., 1999Go; Moult and Melamud, 2000Go; Thornton et al., 2000Go; Nagano et al., 2002Go). Cases such as those listed here illustrate that this paradigm can be taken further: similar interfaces do not imply similar functions of the parent proteins from which the interfaces were derived.

These similar interfaces, different functions clusters may aid in illuminating aspects of protein binding and function. Below, we discuss some cases in detail. Note that there are three clusters for Type II in Table II where the representative of the cluster does not appear in the list of family members. These cases are cellulose-binding domain family III, MHC antigen-recognition domain and nucleotide and nucleoside kinases. In these cases, although the representative aligned well with each cluster member, it did not align well with all members simultaneously, suggesting some slight deviations.

Figure 2 illustrates some examples from Type II interfaces in Table II. Each left panel presents the ribbon diagrams of two proteins which belong to two different SCOP families (Murzin et al., 1995Go) in the same interface cluster, clearly showing that the global structures are different. The interfaces are enlarged in the right panel. The ribbon diagrams display the functionally or structurally important residues of the individual proteins in blue. These residues were extracted either from a literature search or from sequence alignments of the protein within its functional family. Regardless of the functional families, if we carry out an alignment within each cluster, we observe that some residues are conserved (with a ratio of at least 50% with Blossum90 substitution matrix). The residue types of the conserved residues (hot spots) (Hu et al., 2000Go; Ma et al., 2001Go) are in red and the residue numbers are given in the left panel. These red residues might be important for the stability of the interfaces but not necessarily for the specific function as the individual members may have different functions.




View larger version (122K):
[in this window]
[in a new window]
 
Fig. 2. Some examples of similar interfaces, dissimilar monomer structures and dissimilar functions (called Type II in this work). These six examples belong to multi-functional clusters. The left panel displays the ribbon diagrams of two proteins with different overall folds and different functions. The common interfaces among the proteins in the same cluster are colored yellow. The remainders of the chains of the two proteins are in pink and green. The blue residues are the structurally or functionally important residues–either conserved residues from sequence alignments or residues cited in the literature. The red residues are the conserved residues among the member interfaces in the respective cluster. The residue names are for the conserved residues. The right panel gives the detailed architectures of the backbones of the interfaces (yellow regions in the left panel). The gray segments represent the hydrophobic residues, cyan and blue are the positively and negatively charged residues. The red segments are the polar ones. Possible ionic interactions are shown with yellow dashed lines.

 
The right part of the panel displays the aligned consensus motif for the interfaces. In these figures, polar residues (Y, T, S, N and Q) are shown in red, positively charged residues (K, H and R) in cyan and negatively charged residues (D and E) in blue. The distances between oppositely charged groups are indicated with dashed yellow lines (if the distance between C{alpha} atoms of the two residues is <8 Å). There are a large number of hydrophobic residues (in gray) between charged pairs as expected in interfaces. There are a number of conserved usually charged/polar residues in each cluster. Protein members with different functions have active sites at different locations. Below, we discuss several examples in some detail.

The 1afrBD cluster

This cluster includes members of chromo domain-like (chromatin) proteins, aldolases and tryptophan synthase ß-subunit-like PLP-dependent enzymes. The overall structures of the members are displayed in Figure 3A. Here, we provide a comparison of the interfaces of 1dz1AB and 1f05AB.



View larger version (65K):
[in this window]
[in a new window]
 
Fig. 3. (A) The overall ribbon diagrams of the four members in the cluster represented by the 1afrBD interface. These are the structures of heterochromatin proteins, chromo shadow domain (1dz1 [PDB] , 1e0b), transaldolase (1f05) and 1-aminocyclopropane-1-carboxylate deaminase (1f2d). (B) The interfaces belonging to the two functionally different proteins. The common interface is in yellow.

 
The first member of this cluster belongs to the heterochromatin protein 1 (HP1) family (1dz1 [PDB] ) which is involved in gene silencing via the formation of heterochromatic structures (Brasher et al., 2000Go). They are composed of two related domains: an N-terminal chromo domain and a C-terminal shadow chromo domain. Although the HP1 proteins are located in the chromatin, they do not appear to bind to DNA directly (Singh et al., 1991Go). Rather, they have been found to interact with a number of different proteins, suggesting that they may function as protein interaction motifs, bringing together different proteins. The crystal structure has a dimeric state. Most residues involved in the dimer interface are conserved between the shadow domains of HP1 proteins: A125, L132, N153, P157, I161, Y164, L168 and W170 (Brasher et al., 2000Go). Another member of this cluster is human transaldolase (1f05). It is one of the enzymes in the non-oxidative branch of the pentose phosphate pathway that catalyzes the reversible transfer of a dihydroxyacetone moiety from fructose 6-phosphate to erythrose 4-phosphate (Thorell et al., 2000Go). Sequence alignments show that D27, T43, N45, P46, G101, S104, E106, A122, K142, N165, T167, L168, F170, S187, F189, R192, T231, G311, F315 and L322 are among the invariant residues in the transaldolase family (Wu et al., 2002Go). These residues are either in the active site or they are structurally important. The invariant hydrophobic cluster involves residues L168, F170, F189, G311 and F315 (Thorell et al., 2000Go) and contribute to the stability of the protein. This region is at the dimer interface, formed mostly by two helices and a loop which connects them. The active site includes residues D27, N45, E106, K142, T167, S187 and R192 (Thorell et al., 2000Go).

There are 28 residues in these aligned interfaces (six on one side, 22 on the other). Two helices are on one side, a part of a long helix on the other. The consensus interface is displayed in yellow (Figure 3B) and the conserved residues in blue (Figure 2B). In 1dz1 all conserved residues coincide with ‘nearby’ residues whereas in the 1f05 interface, the conserved residues are far from the interface. This may simply reflect the sizes of the proteins, since 1dz1 [PDB] is a much smaller protein than 1f05 [PDB] . The red residues in Figure 2B are the conserved residues (hot spots) in this cluster. There are a large number of hot spots in this cluster, mostly charged and polar residues. The inset in Figure 3B displays the aligned interface. These two proteins are believed to interact with a number of different proteins, so their common interface may be used as a target that binds nonspecifically to many proteins. These two proteins represent a similar interface between an enzyme and a non-enzymatic protein. Still, 1f05AB interface is a crystal interface, not a biological interface.

The 1aohAB cluster

Figure 4A displays four members of the cluster with functionally and structurally different protein families. These are either structural proteins (1aohAB and 1g1kAB) or fluorescent proteins (1b9cAB and 1g7kAB). 1aoh and 1g1k are in dimeric form and 1b9c and 1g7k are tetramers in the PDB entries.



View larger version (60K):
[in this window]
[in a new window]
 
Fig. 4. (A) Ribbon diagrams of four members in the cluster represented by the 1aohAB interface. These are the structures of single cohesin domain from the scaffolding protein cipa of the Clostridium thermocellum cellulosome (1aoh [PDB] ), green fluorescent protein mutant F99S, M153T and V163A (1b9c [PDB] ), cohesin module from the cellulosome of Clostridium cellulolyticum (1g1k) and the crystal structure of Dsred, a red fluorescent protein from Discosoma sp. red (1g7k). The letters indicate the monomers in these complexes. (B) Ribbon diagrams of two interfaces derived from two functionally different proteins. The yellow region points to the common interface.

 
Cohesin (1aoh) domain acts as a scaffolding protein in the cellulosome. The quaternary structure of the cellulosome is an aggregate containing a multi-enzymatic extracellular complex with a catalytic domain and a conserved dockerin domain which serves to anchor the enzyme to the non-catalytic scaffolding protein (Tavares et al., 1997Go). This protein contains a cellulose binding domain and nine consecutive cohesin domains which are separated from each other by Pro-Thr rich linker segments. Each cohesin domain specifically binds to the dockerin domains. It promotes binding of cellulose to the catalytic domains of the cellulolytic enzymes (Tavares et al., 1997Go). There are a number of conserved residues: G18, P24, D45, G57, P65, F69, I80, L83, G91, I95, D98, G99, F101, I104, A128, E134, G142 and V144 (Wu et al., 2002Go). These should be conserved for either structural or functional reasons. M79, V81 and L83 in 1aoh form a hydrophobic patch on the interface, which promotes extended intermolecular contacts in cellulosome integrating proteins (Tavares et al., 1997Go). Energy-transfer acceptor (1b9c) plays a role to transduce the blue chemiluminescence of the protein aequorin into green fluorescent light by energy transfer (Battistutta et al., 2000Go). It fluoresces in vivo upon receiving energy from the Ca2+-activated photoprotein aequorin. It contains a covalently attached chromophore, which is composed of modified amino acids. The chromophore is formed upon cyclization of the residues Ser-dehydroTyr-Gly. In 1b9c, S205 interacts with the carboxylate of E222; the chromophore is connected to T203 and H148 (Battistutta et al., 2000Go). These residues are conserved in green fluorescent proteins. The proteins in this family are highly conserved. Mutations of F99S, M153T and V163A apparently increase the fluorescence of the protein (Battistutta et al., 2000Go). The dimer interfaces in the tetramer show two distinct characteristics. One interface contains a hydrophobic patch of well-packed residues surrounded by polar residues; the other interface is mostly polar. The second type of interface is believed to be crucial for the fluorescent proteins to oligomerize and produce a diversity of colors and fluorescence emission. In our alignment, the first interface (mostly hydrophobic) is aligned with the hydrophobic interface of the scaffold proteins. This might suggest that in order to perform specific functions, evolution has designed particular interfaces, although for the packing and dimerization hydrophobic interfaces are used as a common scaffold.

The aligned interfaces are colored yellow (Figure 4B) and the conserved or functionally important residues are blue (Figure 2C). There are 48 residues in the two aligned sides of the interface (25 residues on one side and 23 on the other). The interface is made up of ß-sheets on both sides (each made up of three ß-strands). Some of the critical residues are in the interface. The inset in Figure 4B displays the details of the aligned interfaces. In these examples, both interfaces are crystal interfaces as opposed to the other members in the same cluster.

The 1e7kAB cluster

This interface cluster includes four members of protein complexes with varied functions, snake venom toxins, cysteine proteases and P-loop containing nucleotide triphosphate hydrolases. The ribbon diagrams are shown in Figure 5A. Below we detail the functions of 1kba and 1ef7.



View larger version (49K):
[in this window]
[in a new window]
 
Fig. 5. (A) Four members in the cluster represented by 1e7kAB interface. These are the structures of cardiotoxin VII4 from Naja mossambica mossambica (1cdt), crystal structure of human cathepsin X (1ef7), yeast initiation factor 4A complexed with mrna helicase (1fuu) and crystal structure of {kappa}-bungarotoxin (1kba). (B) Ribbon diagrams of the two interfaces of the functionally different proteins. Yellow highlights the common interface.

 
Cathepsin X (1ef7) belongs to the group of lysosomal cysteine proteases. It is involved in intracellular protein catabolism and also in a number of important cellular processes such as antigen presentation, bone remodeling and proteolytic processing of various proteins (Barrett et al., 1998Go). They are expressed in a variety of cells. The crystal structure is a homodimer. The active site cleft contains C31 and H180 (Guncar et al., 2000Go). Other conserved residues include W32, C65, 71C, 112C, 187W, 195Y, 196W, 200N and 202W (Brasher et al., 2000Go). The alignment has been performed over the most diverse members of the Papain family cysteine proteases, in the Conserved Domain Database, NCBI (http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi) (Marchler-Bauer). There are 11 cysteine residues which are involved in disulfide bonding: C28–C71, C65–C103, C93–C109, C112–C118 and C153–C235 (Guncar et al., 2000Go). Neurotoxin {kappa}-bungarotoxin (1kba) is a member of the venom of hydrophid and elapid families of snakes. It selectively blocks nicotinic transmission by interacting with neuronal nicotinic acetylcholine receptors in vertebrate skeletal muscle and the electric organ of electric fish (Dewan et al., 1994Go). The toxin is active in a dimeric state. The two chains are identical. There are five highly conserved disulfide bridges in each chain: C3–C21, C14–C42, C27–C31, C46–C58 and C59–C64. All of these residues are observed to be conserved in addition to P7, T13, K24, P47 and N65. The dimer interface is stabilized by direct hydrophobic interactions between aromatic interactions of L57 from both chains. There are nine hydrogen bonds in the interface stabilizing the complex (Dewan et al., 1994Go).

Figure 5B displays the two structures with the common interface motif colored yellow. Forty-seven residues are structurally aligned in these two interfaces. These form two ß-sheets (with two ß-strands) on each side of the interface. There are 27 residues on one sheet and 20 on the other. The conserved residues are displayed in blue (Figure 2D, left panel). 1ef7AB is a crystal interface whereas 1kbaAB is a biological interface.

The 1qbzBC cluster

This cluster includes six members from the virus ectodomain and tropomyosin families. The overall structures are displayed in Figure 6A.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 6. (A) Four members in the cluster represented by 1qbzBC interface. These are the structures of contractile protein, tropomyosin molecule (1ic2 [PDB] , 1if3), Siv Gp41 ectodomain: Siv Gp41 ectodomain (1qbz) and the Ebola virus membrane-fusion subunit, Gp2, from the envelope glycoprotein ectodomain (1ebo [PDB] ). (B) Ribbons depict the two interfaces of the functionally different proteins. Yellow highlights the common interface.

 
Ebola virus membrane fusion glycoprotein (1ebo [PDB] ) is active as trimers inside the cell. The trimer is next cleaved into two chains, GP1 and GP2. 1ebo is the structure of the GP2 which has a trimeric, highly {alpha}-helical, rod-shaped conformation. There is a disulfide-bonded loop (C100–C107) which is conserved as well as a conserved motif that includes residues 82–109 (Weissenhorn et al., 1998Go). A chloride ion is trapped between residues N85 and S82. T64 has an interesting conformation pointing toward the 3-fold symmetry axis rather than the classical knobs-into-holes packing. A sequence alignment of this family shows that N44, 59E, L50, 67N, 77R and 86G are conserved (Weissenhorn et al., 1998Go). The structure of a contractile protein (1ic2 [PDB] ) has a parallel two-stranded {alpha}-helical coiled-coil structure. 1ic2 is the N-terminal fragment of an 81-residue muscle {alpha}-tropomyosin. Tropomyosin molecules polymerize in a head-to-tail fashion to form a long cable which winds around the actin helix. Its contractile activity is controlled by the Ca2+-sensitive complex of troponin and tropomyosin structure (Brown et al., 2001Go). In this protein there are alanine clusters contributing to the local symmetry. The members in this cluster have very distinct structures of double stranded helices. Studies have shown that certain apolar residues such as leucines and isoleucines can determine the stability and multimeric state of the structure (Harbury et al., 1993Go; Brown et al., 2001Go). Sequence alignment illustrates that M1, D2, I4, K6, M8, K12, N17, A18, A22, E26, E39, D58 and L64 are conserved (Wu et al., 2002Go).

In this interface cluster, there are 98 residues aligned as a result of multiple structural alignment. The interface consists of two long helices, one with 57 residues and the other 41. The common interface is colored yellow (Figure 6B). Interestingly, unlike all interfaces in the examples above, the charged/polar residues do not dominate. Possibly, the helical motif's stability is maintained by the hydrophobic forces. Additionally, oppositely charged residues are 10 Å in proximity. In these examples, all interfaces are biological interfaces.

Propensities of residues in Type I and in Type II interface clusters

The relative frequencies of different types of amino acids in the interfaces of protein–protein complexes can be used to derive the residue propensities. The overall propensities of the 20 amino acids are calculated in the interfaces from the dataset containing all interface clusters. We compare the frequency patterns at the binding sites versus those in the overall structures. The propensity (Pi) of a residue (i = Ala, Val, Gly, ...) to occur at the interface is calculated as the fraction of the count of residue i in the interface as compared with its fraction in the whole chain:

(1)
where ni is the number of residues of type i at the interface, Ni is the number of residues of type i in the chains, n is the total number of residues in the interface and N is the total number of residues in the whole chains. Figure 7 displays the logarithmic propensities of the twenty residues for Type I and II interfaces as bars. It is clearly observed that interfaces are dominated by hydrophobic residues in both cases. The next are the aromatic residues.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 7. Logarithmic propensities of the contacting residues in the two interface types. A positive value indicates a favorable propensity in the interfaces as compared with the rest of the protein, whereas a negative propensity indicates that it is less likely to find the particular residue in the interfaces compared to the rest of the protein.

 
Accessible surface areas in Type I and in Type II interface clusters

The change in accessible surface areas (ASAs) upon complex formation is used as a measure of the interface contact area. Figure 8 shows the accessible surface area distribution of the interfaces. Figure 8A displays the ASA for Type I interfaces and Figure 8B that for Type II. The ASAs range between 300 and 6000 Å2. Type I has a mean of 1967 ± 1079 Å2 and Type II a mean of 1450 ± 1211 Å2 (Table III). Type I interfaces have larger surface areas (358 interfaces were used in the calculations). Type II clusters have 94 interface members. Type I interfaces peak around 1500 Å2 whereas Type II interfaces peak around 1000 Å2. The numbers of residues in the parental chains to which the interfaces belong are also compared. The Type I and II interfaces have a mean of 370 ± 235 and 356 ± 226 residues, respectively (Table III). Type II interfaces have a similar number of residues, but they have smaller accessible surface areas, suggesting that these surfaces are not optimized through evolution as in the case of Type I interfaces, probably due to their different functions.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 8. Distribution of the accessible surface areas of interfaces for (A) Type I and (B) Type II. The ASA values are in Å2.

 

View this table:
[in this window]
[in a new window]
 
Table III. Characteristics of interfaces

 
Hence we do not observe a linear correlation between the number of interface residues and ASA of the interfaces (data not shown). The complementarities of the interfaces are important for the binding strengths. Imperfect complementarity suggests a higher probablility for the presence of gaps. We calculated the gap-to-volume indices for the interfaces. The gap volume gives a measure of the complementarity of the interacting surfaces. The volume of the gaps between the two interacting subunits was calculated using the web site ‘Protein–Protein Interaction Server’ (PPI Server) (http://www.biochem.ucl.ac.uk/bsm/PP/server/aprogram) (Jones and Thornton, 1995Go, 1996Go) which uses the program SURFNET (Laskowski, 1995Go). The gap-to-volume indices are defined as the ratio of the gap volume to the interface ASA. The average gap-to-volume indices are found to be 1.98 and 2.73 for Type I and II interfaces, respectively (Table III). This result suggests that Type I interfaces are much more tightly bound whereas the Type II interfaces are abundant in gaps and are not optimized in packing.

The planarities of the interfaces are also analyzed. These were obtained from the PPI Server. In the planarity calculations, the best fit plane through the three-dimensional coordinates of the atoms in the interface was obtained by using principal component analysis. The root mean square deviation (r.m.s.d.) of the atoms from the plane was next calculated and used as the measure of planarity. The larger the r.s.m.d. value, the less planar is the interface and, conversely, the smaller the r.s.m.d. value, the more planar is the interface. If we look at the two types separately (Table III), we observe that Type I interfaces have a planarity of 3.16 and Type II interfaces have a much smaller value of 1.86. Hence Type II interfaces are more planar than Type I interfaces, again suggesting a less favorable packing.

Hydrogen bonding across the interfaces

We analyzed the potential involvement of residues in hydrogen bonding across the interfaces. Two atoms from each side across the interface are said to form a hydrogen bond if the distance between their donors and receptors (McDonald and Thornton, 1994Go) is <3.5 Å. Table IV summarizes the results of the hydrogen bonds that originate from the interface residues as well as the type of H-bonding in the two types of interfaces. The first column gives the type of interface, i.e. I_SS indicates the Type I interface and H-bonds between side chain–side chain (SS) groups, I_BB is for hydrogen bonds between backbone–backbone (BB) groups and I_SB is for hydrogen bonds formed between a side chain and a backbone (SB) group. The second column is the number of total hydrogen bonds formed across the interface. The third column is the distribution of H-bonds among different types of H-bond interaction, namely SS, BB and SB. The hydrogen bonds formed in Type I interfaces are equally distributed between the side chains and the backbones. On the other hand, Type II interfaces mainly utilize the side chains for H-bond formation. Hence Type II interfaces are not optimized by their structural complementarities.


View this table:
[in this window]
[in a new window]
 
Table IV. Hot spot and hydrogen bond distributions in Type I and II interfacesa

 
Similarities between protein interfaces and protein cores

The similarity between protein binding and protein folding has been discussed (Janin et al., 1988Go; Janin and Chothia, 1990Go; Jones and Thornton, 1996Go; Tsai and Nussinov, 1997Go; Tsai et al., 1998Go). Since the attractive and repulsive forces in folding the polypeptide chain are also responsible for protein–protein associations, it is not surprising that protein cores and protein interfaces share similar secondary structure motifs (Tsai et al., 1997Go).

Numerous studies of single-chain proteins have shown that evolution has selected a limited repertoire of favorable folds. Here we extend these observations to protein–protein interfaces. We illustrate that there are recurring architectural motifs also at the interfaces, similarly fulfilling a range of functions. These recurring motifs can be used as structural templates in protein–protein recognition (A.S.Aytuna, A.Gursoy and O.Keskin, unpublished results). In monomers, recurring structural motifs have frequently been referred to as either building blocks or biological functional units (Tsai and Nussinov, 1997Go). For example, the repeated strand–helix–strand motif forms a TIM barrel fold and the helix–turn–helix motif has been recognized as a calcium binding site. Similarly, the Rossman fold has been shown to be associated with nucleotide binding. Whether specific interface motifs also relate to specific functions is still unclear. Hopefully, the interface clusters in Table II will prove useful in further addressing this question. We are currently annotating all domains involved in the protein–protein interface dataset in an effort to address this question.

Conclusions

Recently, we have assembled a unique dataset of two-chain interfaces derived from the PDB. The interfaces are clustered based on their spatial structural similarities, regardless of the connectivity of their residues on the protein chains. The dataset includes 3799 clusters. From this dataset we have obtained 103 clusters which have at least five non-homologous members. These serve as a rich source of data for analysis of protein interfaces. Here, we have carried out a detailed examination of all members of the 103 clusters. We find that the interface clusters can be divided into three types: whereas most clusters consist of similar interfaces whose parent chains are also similar (Type I), this is not always the case. In some of our clusters the interfaces are similar; however the overall structures of the parent proteins from which the interfaces derive are different. These are labeled Type II interfaces. Type III consists of interface clusters where only one side of the interface is similar. Proteins belonging to this type have different functions. In all of the Type II and Type III cases that we have, the proteins belong to different (SCOP) families (Murzin et al., 1995Go), with different functions. One of the paradigms in protein science states that similarity between protein structures does not imply similarity in function. Our observations suggest a striking extension of this paradigm: similarity in interface architectures does not imply similarity in function. As in protein monomers, ‘good’ favorable interface structural scaffolds have been re-used and adapted by evolution for diverse functions. The functions extend from enzymes/inhibitors to toxins and immunoglobulins. We did not observe homodimers among these proteins of similar interfaces, different global structures and functions. Homodimers are always classified as Type I. This is probably due to the smaller sizes of the monomers and the extensive interfaces in the two-state homodimers which cover large portions of the chains. As expected, we find that multi-functional interface clusters consisting of helices largely derive from proteins whose functions relate to muscle and to membranes.

We analyzed the Type I versus Type II clusters. As expected, we find that Type I is better packed, buries larger total and non-polar ASAs, is less planar and has better interface complementarity and more backbone–backbone hydrogen bonds.

The observation that globally different protein structures may associate in similar ways to yield similar motifs is extremely interesting. Clearly, there is a very large number of ways in which monomers can combinatorially assemble. Remarkably, among these there are preferred interface architectures and these are similar to those observed in monomers. This observation both underscores the view that the number of favorable motifs is limited in nature and highlights the analogy between binding and folding. It is further reminiscent of the combinatorial assembly of protein building blocks in folding.

Here, we observe that that there are many cases where evolutionarily related proteins have diverged from each other in function, yet maintained the interfaces they use to interact with other proteins. The question arises as to whether it is possible to infer from cases such as those in our dataset the time scales of evolutionary divergence. One possible way toward such a goal is through sequence analysis of classified structural alignment of the interfaces. Further, here we have looked at the functions of Type II proteins using qualitative descriptions. Current work assesses the functions of these proteins by using the Gene Ontology database (http://www.geneontology.org). Such an analysis should provide more quantitative means of comparing the functions of proteins in different clusters.


    Acknowledgments
 
We thank S.Mintz for her help in crystal interfaces. We thank Drs B.Ma, C.-J.Tsai, Y.Pan, K.Gunasekaran, D.Zanuy, H.-H(G).Tsai and members of the Nussinov–Wolfson group, in particular Maxim Shatsky for help with MultiProt. We thank Dr Jacob V.Maizel for encouragement. We thank Dr A.Gursoy and S.Aytuna for helpful discussions. The research of R.Nussinov in Israel has been supported in part by the Center of Excellence in Geometric Computing and its Applications funded by the Israel Science Foundation (administered by the Israel Academy of Sciences). This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract number NO1-CO-12400.


    References
 Top
 Abstract
 Introduction
 Results and discussion
 References
 
Arkin,M.R. et al. (2003) Proc. Natl Acad. Sci. USA, 100, 1603–1608.[Abstract/Free Full Text]

Barrett,A.J., Rawlings,N.D. and Woessner,J.F.,Jr (1998) In Barrett,A.J., Rawlings,N.D. and Woessner,J.F.,Jr (eds) Handbook of Proteolytic Enzymes. Academic Press, London.

Battistutta,R., Negro,A. and Zanotti,G. (2000) Proteins, 41, 429–437.[CrossRef][ISI][Medline]

Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) Nucleic Acids Res., 28, 235–242.[Abstract/Free Full Text]

Bogan,A.A. and Thorn K.S. (1998) J. Mol. Biol., 280, 1–9.[CrossRef][ISI][Medline]

Boniecki,M., Rotkiewwicz,P., Skolnick,J. and Kolinski,A. (2003) J. Comput. Aided Mol. Des., 17, 725–738.[CrossRef][ISI][Medline]

Brasher,S.V., Smith,B.O., Fogh,R.H., Nietlispach,D., Thiru,A., Nielsen,P.R., Broadhurst,R.W., Ball,L.J., Murzina,N.V. and Laue,E.D. (2000) EMBO J., 19, 1597–2000.

Brown,J.H., Kim,K.-H., Jun,G., Greenfield,N.J., Dominguez,R., Volkmann,N., Hitchcock-DeGregori,S.E. and Cohen,C. (2001) Proc. Natl Acad. Sci. USA, 98, 8496–8501.[Abstract/Free Full Text]

Chakrabarti,P. and Janin,J. (2002) Proteins, 47, 334–343.[CrossRef][ISI][Medline]

Chothia,C. (1992) Nature, 357, 543–544.[CrossRef][ISI][Medline]

Cunningham,B.C. and Wells,J.A. (1991) Proc. Natl Acad. Sci. USA, 88, 3407–3411.[Abstract/Free Full Text]

DeLano,W.L., Ultsch,M.H., de Vos,A.M. and Wells,J.A. (2000) Science, 287, 1279–1283.[Abstract/Free Full Text]

Dewan,J.C., Grant,G.A. and Sacchettini,J.C. (1994) Biochemistry, 33, 13147–13154.[CrossRef][ISI][Medline]

Finkelstein,A. and Ptitsyn,O.B. (1987) Prog. Biophys. Mol. Biol., 50, 171–190.[CrossRef][ISI][Medline]

Finkelstein,A.V., Badretdinov,A.Y. and Ptitsyn,O.B. (1991) Proteins, 10, 287–299.[ISI][Medline]

Guncar,G., Klemencic,I., Turk,B., Turk,V., Karaoglanovic-Carmona,A., Juliano,L. and Turk,D. (2000) Structure, 8, 305–313.[ISI][Medline]

Harbury,P.B., Zhang,T., Kim,P.S. and Alber,T. (1993) Science, 262, 1401–1407.[ISI][Medline]

Henikoff,S. and Henikoff,J. (1992) Proc. Natl Acad. Sci. USA, 89, 10915–10919.[Abstract/Free Full Text]

Higgins,D., Thompson,J., Gibson,T, Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 4673–4680.[Abstract]

Hu,Z., Ma,B., Wolfson,H. and Nussinov,R. (2000) Proteins, 39, 331–342.[CrossRef][ISI][Medline]

Janin,J. and Chothia,C.J. (1990) J. Biol. Chem., 265, 16027–16030.[Free Full Text]

Janin,J., Miller,S. and Chothia,C. (1988) J. Mol. Biol., 204, 155–164.[ISI][Medline]

Jones,S. and Thornton,J.M. (1995) Prog. Biophys. Mol. Biol., 63, 31–65.[CrossRef][ISI][Medline]

Jones,S. and Thornton,J.M. (1996) Proc. Natl Acad. Sci. USA, 93, 13–20.[Abstract/Free Full Text]

Katchalski-Katzir,E., Shariv,I., Eisenstein,M., Friesem,A.A., Aflalo,C. and Vakser,I.A. (1992) Proc. Natl Acad. Sci. USA, 89, 2195–2199.[Abstract/Free Full Text]

Keskin,O., Bahar,I., Badretdinov,A.Y., Ptitsyn,O.B. and Jernigan,R.L. (1998) Protein Sci., 7, 2578–2586.[Abstract/Free Full Text]

Keskin,O., Jernigan,R.L. and Bahar,I. (2000) Biophys J., 78, 2093–2106.[Abstract/Free Full Text]

Keskin,O., Tsai,C.J., Wolfson,H. and Nussinov,R. (2004) Protein Sci., 13, 1043–1055.[Abstract/Free Full Text]

Keskin,O., Ma,B. and Nussinov,R. (2005) J. Mol. Biol., 345, 1281–1294.[CrossRef][ISI][Medline]

Kleanthous,C. (ed.) (2000) Protein–Protein Recognition, Frontiers in Molecular Biology. Oxford University Press, Oxford.

Laskowski,R.A. (1995) J. Mol. Graph., 13, 323–330.[CrossRef][ISI][Medline]

LoConte,L., Chothia,C. and Janin,J. (1999) J. Mol. Biol., 285, 2177–2198.[CrossRef][ISI][Medline]

Ma,B., Wolfson,H.J. and Nussinov,R. (2001) Curr. Opin. Struct. Biol., 11, 364–369.[CrossRef][ISI][Medline]

Ma,B., Shatsky,M., Wolfson,H.J. and Nussinov,R. (2002) Protein Sci., 11, 184–197.[Abstract/Free Full Text]

McDonald,I.K. and Thornton,J.M. (1994) J. Mol Biol., 238, 777–793.[CrossRef][ISI][Medline]

Moult,J. and Melamud,E. (2000) Curr. Opin. Struct. Biol., 10, 384–389.[CrossRef][ISI][Medline]

Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536–540.[CrossRef][ISI][Medline]

Nagano,N., Orengo,C.A. and Thornton,J.M. (2002) J. Mol. Biol., 321, 741–765.[CrossRef][ISI][Medline]

Nooren,I.M.A. and Thornton,J.M. (2003) J. Mol. Biol., 325, 991–1018.[CrossRef][ISI][Medline]

Nussinov,R. and Wolfson,H.J. (1991) Proc. Natl Acad. Sci. USA, 88, 10495–10499.[Abstract/Free Full Text]

Orengo,C.A., Todd,A.E. and Thornton,J.M. (1999) Curr. Opin. Struct. Biol., 9, 374–382.[CrossRef][ISI][Medline]

Shatsky,M., Nussinov,R. and Wolfson,H.J. (2004) Proteins, 56, 143–156.[CrossRef][Medline]

Singh,P.B., Miller,J.R., Pearce,J., Kothary,R., Burton,R.D., Paro,R., James,T.C. and Gaunt,S.J. (1991) Nucleic Acids Res., 19, 789–794.[Abstract]

Tavares,G.A., Béguin,P. and Alzari,P.M. (1997) J. Mol. Biol., 273, 701–713.[CrossRef][ISI][Medline]

Thorell,S., Gergely,P.,Jr, Banki,K., Perl,A. and Schneider,G. (2000) FEBS Lett., 475, 205–208.[CrossRef][ISI][Medline]

Thornton,J.M., Todd,A.E., Milburn,D., Borkakoti,N. and Orengo,C.A. (2000) Nat. Struct. Biol., 7, 991–994.[CrossRef][Medline]

Todd,A.E., Orengo,C.A. and Thornton,J.M. (2002) Structure, 10, 1435–1451.[CrossRef][ISI][Medline]

Tsai,C.J. and Nussinov,R. (1997) Protein Sci., 6, 1426–1437.[Abstract/Free Full Text]

Tsai,C.J., Lin,S.L., Wolfson,H.J. and Nussinov,R. (1996) J. Mol. Biol., 260, 604–620.[CrossRef][ISI][Medline]

Tsai,C.J., Xu,D. and Nussinov,R. (1998) Fold. Des., 3, R71–R80.[ISI][Medline]

Valdar,W.S.J. and Thornton,J.M. (2001) Proteins, 42, 108–124.[CrossRef][ISI][Medline]

Wallis,R., Leung,K.Y., Osborne,M.J., James,R., Moore,G.R. and Kleanthous,C. (1998) Biochemistry, 37, 476–485.[CrossRef][ISI][Medline]

Weissenhorn,W., Carfi,A., Lee,K.-H., Skehel,J.J. and Wiley,D.C. (1998) Mol. Cell, 2, 605–616.[ISI][Medline]

Wells,J.A. and de Vos,A.M. (1996) Annu. Rev. Biochem., 65, 609–634.[CrossRef][ISI][Medline]

Wu,C.H. et al. (2002) Nucleic Acids Res., 30, 35–37.[Abstract/Free Full Text]

Zhang,Y., Kolinski,A. and Skolnick,J. (2003) Biophys. J., 85, 1145–1164.[Abstract/Free Full Text]

Received August 10, 2004; revised November 12, 2004; accepted December 3, 2004.

Edited by Dek Woolfson