Protein domain interfaces: characterization and comparison with oligomeric protein interfaces

Susan Jones1,2, Antoine Marin1 and Janet M.Thornton1,3

1 Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College, Gower Street, London WC1E 6BT and 3 Department of Crystallography, Birkbeck College, Malet Street, London WC1E 7HX, UK


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The physical and chemical properties of domain–domain interactions have been analysed in two-domain proteins selected from the protein classification, CATH. The two-domain structures were divided into those derived from (i) monomeric proteins, or (ii) oligomeric or complexed proteins. The size, polarity, hydrogen bonding and packing of the intra-chain domain interface were calculated for both sets of two-domain structures. The results were compared with inter-chain interface parameters from permanent and non-obligate protein–protein complexes. In general, the intra-chain domain and inter-chain interfaces were remarkably similar. Many of the intra-chain interface properties are intermediate between those calculated for permanent and non-obligate inter-chain complexes. Residue interface propensities were also found to be very similar, with hydrophobic residues playing a major role, together with positively charged arginine residues. In addition, the residue composition of the domain interfaces were found to be more comparable with domain surfaces than domain cores. The implications of these results for domain swapping and protein folding are discussed.

Keywords: interface/oligomeric protein/protein–protein interaction/structural domain


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Protein structural domains have been described as compact, local semi-independent units (Richardson, 1981Go), and to date 14 518 structural domains have been defined in the CATH database (version 1.5; Orengo et al., 1997). The domains in this hierarchical classification of proteins in the Brookhaven Protein Databank (Bernstein et al., 1977Go) have been assigned from the coordinates using a consensus approach (Jones et al., 1998Go). This method uses a number of previously published algorithms (DOMAK, Siddiqui and Barton, 1995; PUU, Holm and Sander, 1994; DETECTIVE, Swindells, 1995), and takes advantage of the elevated accuracy obtained when assignments from individual algorithms are in agreement. Previous work, conducted on the characterization of inter-chain protein–protein interactions in a number of different categories of complex (Jones and Thornton, 1995; Jones and Thornton, in press), prompted the use of the same software tools to analyse the physical and chemical properties of domain–domain (intra-chain) interactions. The current work expands the analysis and conclusions drawn from a dataset of multi-domain proteins by Argos (1988).

The analysis of domain interfaces in two-domain protein chains from the CATH database is presented. The two domain structures were divided into two groups depending on whether they were derived from (i) monomeric proteins, or (ii) oligomeric or complexed proteins. Physical and chemical properties including size, hydrogen bonding, packing and residue propensities have been calculated for the intra-chain domain interfaces (interactions within monomers) for both datasets, and compared with those observed in inter-chain interfaces of two categories of protein–protein complexes (permanent and non-obligate). Permanent complexes include those proteins that only function in the complexed state, and are thus obligatory, e.g. oligomeric proteins. Non-obligate complexes are built from units that exist both as part of the complex and separately in the cell, e.g. enzymes and their inhibitors (Jones and Thornton, 1996Go).

Analysing intra-chain interfaces within monomers, and comparing them with the protein surface and interior may also provide an insight into the role played by domains in protein folding. In the process of folding of multi-domain proteins there are two possible pathways: (i) domains fold independently prior to forming the inter-domain interactions present in the complete protein, or (ii) domain folding and the formation of inter-domain interactions occur simultaneously. An analysis of the amino acid composition of domain interfaces compared with domain cores is conducted in the current work, to give some indication as to which pathway is the most likely.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Dataset

This analysis is restricted to two-domain protein chains of which there are 2382 classified in version 1.5 of the CATH database (Orengo et al., 1997Go). Using the CATH numbers assigned to different homologous families within the database, the two-domain proteins were divided into 151 nonhomologous families. Two domains from the same homologous family could be present more than once in the dataset if their domain partners were different in each case. When there was more than one member of a family, the protein with the best resolution was selected as the representative.

From preliminary observations of the proteins in this dataset it became apparent that many of the domains were also involved in other interactions, including contacts with other subunits of the same protein or nucleic acids (Figure 1Go). Hence, we categorized the proteins in the initial dataset into (i) monomers or (ii) oligomers or complexes, using information from the Macromolecular Structure Database (EBI-MSD)(PQS Server) at the EBI (http://msd.ebi.ac.uk). The two domain structures in our initial dataset were only classed as complexed if they were bound to another protein chain or nucleic acid. Those bound to small ligands such as ATP were not classed as complexed. On this basis, the initial dataset of two-domain proteins was divided into (i) 46 two-domain monomers, and (ii) 105 two-domain chains derived from oligomers or protein complexes. The PDB codes for these two datasets, and the datasets of protein–protein complexes [with which they are compared (Table IGo)] are listed at http://www.biochem.ucl.ac.uk/bsm/domains.



View larger version (61K):
[in this window]
[in a new window]
 
Fig. 1. MOLSCRIPT diagrams of examples of two-domain proteins that are oligomeric. (a) Histidyl-rRNA synthetase homo-dimer (PDB code 1htt; Arnez et al., 1995). (b) T Cell antigen receptor (PDB code 1bec; Bentley et al., 1995). (c) C-MYB DNA binding domain (PDB code 1mse; Ogata et al., 1994). (d) Abrin-A hetero-dimer (PDB code 1abr; Tahirov et al., 1995). In each diagram one monomer has one domain coloured red and one green. The remaining monomers or bound ligands are shown in grey. The DNA in (c) is shown in ball-and stick representation.

 

View this table:
[in this window]
[in a new window]
 
Table I. Interface parameters for (a) intra-chain and (b) inter-chain interfaces
 
Interface parameters were calculated for two domain chains from both monomeric and oligomeric/complexed proteins. The residue propensities and residue frequencies were calculated using only the dataset of monomeric proteins.

Interface definitions

Domain definitions were taken from CATH 1.5 (Orengo et al., 1997Go) (http://www.biochem.ucl.ac.uk/bsm/cath). Residues in one domain are defined as interface residues if they lost >1 Å2 accessible surface area (ASA) on complexation with the second domain in the complete structure. In a small number of proteins the only contact between domains was via a domain linker (most commonly a loop structure). In these cases the domain interface residues were still defined as described above, even if they represented only the contacts made in the domain linker.

Surface residues are defined as those residues that had a relative ASA of >5% (Miller et al., 1987Go) and that were not also defined as interface residues. Interior residues are defined as those residues that have a relative ASA of <5% (Miller et al., 1987Go).

Interface parameters

The process of protein–protein recognition involves many physical and chemical factors including hydrophobic and electrostatic interactions, and shape complementarity. A series of interface parameters (ASA, planarity, segmentation, polarity, hydrogen bonds and gap volume index) were calculated in an attempt to quantity some of these factors. All parameters were calculated using a software tool previously used to analyse protein–protein interactions in homodimers (Jones and Thornton, 1995Go). The six interface parameters were calculated for each domain interface from the monomeric and the oligomeric/complexed datasets. The parameter definitions are as follows:

The mean and standard deviation for each parameter are shown in Table IaGo, and the distributions of each are shown in Figure 2Go. Means and standard deviations are also shown for a dataset of 36 permanent and 23 non-obligate protein–protein complexes (Jones and Thornton, in press)(Table 1bGo).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2. Frequency histograms for the six domain interface parameters listed in Table IGo. (a) Interface ASA2); (b) planarity, measured as the r.m.s.d. from a best fit plane through all interface atoms (Å); (c) number of sequence segments; (d) percentage of polar atoms in the interface; (e) number of inter-domain hydrogen bonds per 100 Å2 of interface ASA; (f) gap volume index. For a full explanation of these parameters refer to the Materials and methods section. In each graph the frequencies for two-domain proteins from monomeric proteins are shown in black and those from oligomeric or complexed proteins are shown in white.

 
Interface amino acid composition

Interface propensities were calculated to find the relative importance of different amino acid residues in the domain interface, compared with the domain surface as a whole. The propensities were calculated as in eqn 1Go.


where the terms are defined as follows: {Delta}nAAj(i), the sum of the number of amino acid residues of type j in the interface; {Delta}nAA(i), the sum of the number of amino acid residues of all types in the interface; {Delta}nAAj(s), the sum of the number of amino acid residues of type j on the domain surface; {Delta}nAA(s), the sum of the number of amino acid residues of all types on the domain surface; Ni, number of residues in the domain interface; Ns, number of residues in the domain surface.

An interface residue propensity >1.0 indicates that a residue type is more prevalent in the domain interface than on the rest of the domain surface.

Amino acid frequencies were calculated for those residues in each of three locations within protein domains: (i) interface, (ii) interior and (iii) surface (Figure 3Go). The number of each amino acid type was calculated and divided by the number of residues in each location over the dataset.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 3. Mean percentage frequencies of amino acid residues in domain interiors, interfaces and surfaces. In the main graph the values are shown as bars and the amino acid residues have been arranged from charged on the left through to hydrophobic on the right. In the inset, the values are shown as line graphs and the residues have been arranged such that the values for the domain interiors are in ascending order. This emphasises the similarity between the domain interfaces and the domain surfaces.

 

    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Domain interface parameters

Six interface parameters [interface size (in terms of ASA)], planarity, sequence segmentation, polarity, hydrogen bonding and packing) were calculated for two-domain proteins from 46 monomers and 105 oligomers or complexes (Table 1aGo). The two distributions are different at statistical significance (P < 0.005) for two parameters; ASA and gap volume index.

The intra-chain interfaces in the monomeric two-domain proteins ranged in size from 260 Å2 in factor H to 3580 Å2 in heamocyanin. The intra-chain interfaces derived from oligomeric or complexed proteins ranged from 95 Å2 in histidyl-tRNA synthetase (Figure 1aGo) to 2813 Å2 in peroxisomal thiolase. The domain pairs from the oligomeric and complexed two-domain proteins had a greater number of small interfaces compared with the monomeric dataset (Figure 2aGo), with approximately 10% having interfaces of less than 250 Å2. Not one monomeric protein fell into this interface size category. This probably reflects the bias in the data for proteins that will crystallize. It is clear that there are many proteins in which domains are joined by flexible linkers and have little contact. Such proteins will be hard to crystallize unless complexed in some way to give a more stable structure.

The domain interfaces comprised, on average, between four and five segments, and had 0.9 inter-domain hydrogen bonds per 100 Å2 of the interface. The percentage composition of polar atoms in the interfaces varied widely from 14 to 65%. The intra-chain interfaces derived from oligomeric or complexed structures are less well packed than those in the monomeric structures (with mean gap volume of 3.1 compared with 1.8). However, the calculation of the gap volume involves the generation of spheres to fill the gaps between the domains (see Materials and methods) and the volume of the sphere is influenced by edge effects at the periphery of the contact area. In structures with a very small interface (as observed in some of the intra-chain interfaces derived from oligomeric or complexed two-domain proteins) the edge effect is large and can lead to disproportionately large gap volume indices.

Interface parameters have previously been calculated for a dataset of 36 permanent and 23 non-obligate protein–protein complexes (Jones and Thornton, in press) (Table 1bGo). In many respects the intra-chain interfaces are very similar to the inter-chain interfaces in these complexes, as all the parameter distributions overlap for all types of interaction. In terms of size and planarity the domain interfaces are more similar to the non-obligate complexes than the permanent complexes. In terms of the polarity and inter-molecular hydrogen bonds, the domain interfaces are intermediate between the permanent and non-obligate subunit interfaces. The monomeric intra-chain interfaces are more closely packed than the inter-chain interfaces (both permanent and non-obligate), whilst those intra-chain interfaces from oligomeric or complexed proteins are less well packed than both types of protein–protein interface.

Interface amino acid composition

The amino acid frequencies for the domain interfaces were compared with those on the protein surface and in the protein interior. Figure 3Go reveals that the domain interfaces closely resemble the domain surface. The percentage amino acid distributions for the domain interfaces are significantly different from the protein interior (P = 4.7x10–9), but not significantly different from the protein surface (P = 0.43) using a {chi}2 test. Specifically, domain interfaces contain charged and polar molecules at frequencies more commonly associated with domain surfaces.

The residue propensities for the domain interfaces are remarkably similar to those derived for permanent dimer interfaces (Jones and Thornton, 1996Go) [the correlation coefficient (r) is 0.73]. The majority of hydrophobic residues have interface propensities of greater than one, indicating their prevalence in both intra- and inter-chain interfaces. Arginine also plays an important role in both types of interface, being involved in the many inter-molecular hydrogen bonds that are present.


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The current work has characterized domain interfaces in terms of a number of chemical and physical properties. In general, the domain interfaces are very similar to those observed in inter-chain interfaces from protein–protein complexes.

The size of domain interfaces is an important factor in the definition of structural domains. In the evaluation of one automatic domain assignment algorithm (Islam et al., 1995Go), the larger the domain interface the harder it was for the algorithm to make a correct domain assignment. Many domain assignment algorithms (including those by Holm and Sander, 1994; Islam et al., 1995; Siddiqui and Barton, 1995; Swindells, 1995) use different methods to find the assignment that has more intra-domain contacts than inter-domain contacts. Hence multidomain proteins with domains that have large interfaces make automatic assignments very difficult. The accuracy of domain assignment algorithms could possibly be increased by incorporating the interface properties analysed in the current work. The analysis of inter-domain hydrogen bonds, packing and residue content for domain assignments could be completed as part of a post-processing of domain definitions to give an indication of the reliability of the assignment (e.g. Siddiqui and Barton, 1995).

The physical and chemical characteristics of domain–domain interfaces are intermediate between those calculated for permanent and non-obligate interfaces between chains (Jones and Thornton, 1996, in press). Thus they are not as hydrophobic or as large as the permanent interfaces, but they are less polar with fewer hydrogen bonds than the non-obligate interfaces. It is possible that the covalent linkage and proximity of these same-chain domains means that weaker, less specific non-covalent interactions are still sufficient to form a stable unit.

With this general similarity of interactions within monomers (intra-chain) and between monomers (inter-chain) it is not surprising that three-dimensional domain swapping (Bennett et al., 1995Go; Schlunegger et al., 1997Go) has been observed in the formation of some proteins. Three-dimensional domain swapping is a mechanism for forming oligomeric proteins from monomers, in which one domain of a monomer is replaced by the same domain from an identical protein chain. The end result is a dimer or higher oligomer with one domain of each subunit replaced by an identical domain from another subunit (Bennett et al., 1995Go). There are many examples of this phenomenon, including diptheria toxin (Bennett et al., 1994Go) and the crystallins (Slingsby et al., 1991Go, 1997Go). In such a mechanism, inter-domain interaction sites in the monomer are replaced by inter-subunit sites in the higher oligomer. For such a mechanism to work, these sites must have similar characteristics. Our present analysis has shown that this is true.

A knowledge of the characteristics of domain interfaces is also important for the understanding of protein folding and the design of novel proteins. The results presented here show that domain interfaces within proteins have amino acid compositions more comparable with domain surfaces than domain cores (Figure 3Go). The presence of amino acid residues in surface-like proportions seems to support a folding pathway in which individual domains fold first, prior to collapse into a stable multi-domain structure. However, the complexities of protein folding for multi-domain structures have still to be revealed experimentally. What is clear is that, which ever pathway folding takes, domains represent discrete folding units, and the interactions between them make an important contribution to the overall stability of the protein structure.


    Acknowledgments
 
This work was carried out with funding from the MRC (ROPA grant), and the Department of Energy, USA (grant number DEFG02096ER62166.A000). We also acknowledge the support of the BBSRC Structural Biology Centre (University College and Birkbeck College, London).


    Notes
 
2 To whom correspondence should be addressed Email: sue{at}biochem.ucl.ac.uk Back


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Argos,P. (1988) Protein Engng, 2, 101–113.[Abstract]

Arnez,J.G., HarrisD.C., Mitschler,A., Rees,B., Francklyn,C.S. and Moras,D. (1995) EMBO J., 14, 4143–4155.[Abstract]

Bennett,M.J., Choe,S. and Eisenberg,D. (1994) Protein Sci., 3, 1444–1463.[Abstract/Free Full Text]

Bennett,M.J., Schlunegger,M.P. and Eisenberg,D. (1995) Protein Sci., 4, 2455–2468.[Abstract/Free Full Text]

Bentley,G.A., Boulot,G., Karjalainen,K. and Mariuzza,R.A. (1995) Science, 267, 1984–1987.[ISI][Medline]

Bernstein,F., Koetzle,T., Williams,G., Meyer,G., Meyer,E., Brice,M., Rodgers,K., Kennard,O., Shimanouchi,T. and Tasmui,M. (1977) J. Mol. Biol., 112, 535–542.[ISI][Medline]

Holm,L. and Sander,C. (1993) Proteins Struct. Funct. Genet., 19, 256–268.[ISI][Medline]

Holm,L. and Sander,C. (1998) Proteins, 33, 88–96.[ISI][Medline]

Hubbard,S.J. (1990) ACCESS: A Computer Algorithm to Calculate Surface Accessibility. University College, London.

Islam,S.A, Luo,J. and Sternberg,M.J.E. (1995) Protein Engng, 8, 513–525.[Abstract]

Jones,S., Stewart,M., Michie,A., Swindells,M.B., Orengo,C. and Thornton,J.M. (1998) Protein Sci., 7, 233–242.[Abstract/Free Full Text]

Jones,S. and Thornton,J.M. (1995) Prog. Biophys. Biol., 63, 31–65.

Jones,S. and Thornton,J.M. (1996) Proc. Natl Acad. Sci. USA., 93, 13–20.[Abstract/Free Full Text]

Jones,S. and Thornton,J.M. In Kleanthous,C. (ed), Protein–Protein Recognition. Oxford University Press, Oxford (in press).

Laskowski,R.A. (1995) J. Mol. Graphics, 13, 323–330.[ISI][Medline]

Lee,B. and Richards,F.M. (1971) J. Mol. Biol., 55, 379–400.[ISI][Medline]

McDonald,I.K. and Thornton,J.M. (1994) J. Mol. Biol., 238, 777–793.[ISI][Medline]

Miller,S., Janin,J., Lesk,A.M. and Chothia,C. (1987) J. Mol. Biol., 196, 641–656.[ISI][Medline]

Ogata,K., Morikawa,S., Nakamura,H., Seikawa,A., Inoue,T., Kanai,H., Sarai,A., Ishii,S. and NishimuraY. (1994) Cell, 79, 639–648.[ISI][Medline]

Orengo,C.A., Michie,A., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 1093–1108.[ISI][Medline]

Richardson,J.S. (1981) Adv. Protein Chem., 34, 167–339.[Medline]

Sayle,R.A. and Milner-White,E.J. (1995) Trends Biochem Sci., 20, 374–376.[ISI][Medline]

Schlunegger,M.P., Bennett,M.J. and Eisenberg,D. (1997) Adv. Protein Chem., 50, 61–122.[ISI][Medline]

Siddiqui,A.S. and Barton,G.J. (1995) Protein Sci., 4, 872–884.[Abstract/Free Full Text]

Slingsby,C., Simpson,A., Ferszt,A., Bateman,O.A. and Nalini,V. (1991) Biochem. Soc. Trans., 19, 853–858.[ISI][Medline]

Slingsby,C., Norledge,B., Simpson,A., Bateman,O.A. and Wright,G. (1997) Prog. Retinal. Eye Res., 16, 3–29.[ISI]

Swindells,M.B. (1995) Protein Sci., 4, 103–112.[Abstract/Free Full Text]

Tahirov,T.H., Lu,T.H., Liaw,Y.C., Chen,Y.L. and Lin,J.Y. (1995) J. Mol. Biol., 250, 354–367.[ISI][Medline]

Received August 23, 1999; revised November 9, 1999; accepted November 16, 1999.