1 Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College, Gower Street, London WC1E 6BT and 3 Department of Crystallography, Birkbeck College, Malet Street, London WC1E 7HX, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: interface/oligomeric protein/proteinprotein interaction/structural domain
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The analysis of domain interfaces in two-domain protein chains from the CATH database is presented. The two domain structures were divided into two groups depending on whether they were derived from (i) monomeric proteins, or (ii) oligomeric or complexed proteins. Physical and chemical properties including size, hydrogen bonding, packing and residue propensities have been calculated for the intra-chain domain interfaces (interactions within monomers) for both datasets, and compared with those observed in inter-chain interfaces of two categories of proteinprotein complexes (permanent and non-obligate). Permanent complexes include those proteins that only function in the complexed state, and are thus obligatory, e.g. oligomeric proteins. Non-obligate complexes are built from units that exist both as part of the complex and separately in the cell, e.g. enzymes and their inhibitors (Jones and Thornton, 1996).
Analysing intra-chain interfaces within monomers, and comparing them with the protein surface and interior may also provide an insight into the role played by domains in protein folding. In the process of folding of multi-domain proteins there are two possible pathways: (i) domains fold independently prior to forming the inter-domain interactions present in the complete protein, or (ii) domain folding and the formation of inter-domain interactions occur simultaneously. An analysis of the amino acid composition of domain interfaces compared with domain cores is conducted in the current work, to give some indication as to which pathway is the most likely.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
This analysis is restricted to two-domain protein chains of which there are 2382 classified in version 1.5 of the CATH database (Orengo et al., 1997). Using the CATH numbers assigned to different homologous families within the database, the two-domain proteins were divided into 151 nonhomologous families. Two domains from the same homologous family could be present more than once in the dataset if their domain partners were different in each case. When there was more than one member of a family, the protein with the best resolution was selected as the representative.
From preliminary observations of the proteins in this dataset it became apparent that many of the domains were also involved in other interactions, including contacts with other subunits of the same protein or nucleic acids (Figure 1). Hence, we categorized the proteins in the initial dataset into (i) monomers or (ii) oligomers or complexes, using information from the Macromolecular Structure Database (EBI-MSD)(PQS Server) at the EBI (http://msd.ebi.ac.uk). The two domain structures in our initial dataset were only classed as complexed if they were bound to another protein chain or nucleic acid. Those bound to small ligands such as ATP were not classed as complexed. On this basis, the initial dataset of two-domain proteins was divided into (i) 46 two-domain monomers, and (ii) 105 two-domain chains derived from oligomers or protein complexes. The PDB codes for these two datasets, and the datasets of proteinprotein complexes [with which they are compared (Table I
)] are listed at http://www.biochem.ucl.ac.uk/bsm/domains.
|
|
Interface definitions
Domain definitions were taken from CATH 1.5 (Orengo et al., 1997) (http://www.biochem.ucl.ac.uk/bsm/cath). Residues in one domain are defined as interface residues if they lost >1 Å2 accessible surface area (ASA) on complexation with the second domain in the complete structure. In a small number of proteins the only contact between domains was via a domain linker (most commonly a loop structure). In these cases the domain interface residues were still defined as described above, even if they represented only the contacts made in the domain linker.
Surface residues are defined as those residues that had a relative ASA of >5% (Miller et al., 1987) and that were not also defined as interface residues. Interior residues are defined as those residues that have a relative ASA of <5% (Miller et al., 1987
).
Interface parameters
The process of proteinprotein recognition involves many physical and chemical factors including hydrophobic and electrostatic interactions, and shape complementarity. A series of interface parameters (ASA, planarity, segmentation, polarity, hydrogen bonds and gap volume index) were calculated in an attempt to quantity some of these factors. All parameters were calculated using a software tool previously used to analyse proteinprotein interactions in homodimers (Jones and Thornton, 1995). The six interface parameters were calculated for each domain interface from the monomeric and the oligomeric/complexed datasets. The parameter definitions are as follows:
![]() |
![]() |
The mean and standard deviation for each parameter are shown in Table Ia, and the distributions of each are shown in Figure 2
. Means and standard deviations are also shown for a dataset of 36 permanent and 23 non-obligate proteinprotein complexes (Jones and Thornton, in press)(Table 1b
).
|
Interface propensities were calculated to find the relative importance of different amino acid residues in the domain interface, compared with the domain surface as a whole. The propensities were calculated as in eqn 1.
![]() |
An interface residue propensity >1.0 indicates that a residue type is more prevalent in the domain interface than on the rest of the domain surface.
Amino acid frequencies were calculated for those residues in each of three locations within protein domains: (i) interface, (ii) interior and (iii) surface (Figure 3). The number of each amino acid type was calculated and divided by the number of residues in each location over the dataset.
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Six interface parameters [interface size (in terms of ASA)], planarity, sequence segmentation, polarity, hydrogen bonding and packing) were calculated for two-domain proteins from 46 monomers and 105 oligomers or complexes (Table 1a). The two distributions are different at statistical significance (P < 0.005) for two parameters; ASA and gap volume index.
The intra-chain interfaces in the monomeric two-domain proteins ranged in size from 260 Å2 in factor H to 3580 Å2 in heamocyanin. The intra-chain interfaces derived from oligomeric or complexed proteins ranged from 95 Å2 in histidyl-tRNA synthetase (Figure 1a) to 2813 Å2 in peroxisomal thiolase. The domain pairs from the oligomeric and complexed two-domain proteins had a greater number of small interfaces compared with the monomeric dataset (Figure 2a
), with approximately 10% having interfaces of less than 250 Å2. Not one monomeric protein fell into this interface size category. This probably reflects the bias in the data for proteins that will crystallize. It is clear that there are many proteins in which domains are joined by flexible linkers and have little contact. Such proteins will be hard to crystallize unless complexed in some way to give a more stable structure.
The domain interfaces comprised, on average, between four and five segments, and had 0.9 inter-domain hydrogen bonds per 100 Å2 of the interface. The percentage composition of polar atoms in the interfaces varied widely from 14 to 65%. The intra-chain interfaces derived from oligomeric or complexed structures are less well packed than those in the monomeric structures (with mean gap volume of 3.1 compared with 1.8). However, the calculation of the gap volume involves the generation of spheres to fill the gaps between the domains (see Materials and methods) and the volume of the sphere is influenced by edge effects at the periphery of the contact area. In structures with a very small interface (as observed in some of the intra-chain interfaces derived from oligomeric or complexed two-domain proteins) the edge effect is large and can lead to disproportionately large gap volume indices.
Interface parameters have previously been calculated for a dataset of 36 permanent and 23 non-obligate proteinprotein complexes (Jones and Thornton, in press) (Table 1b). In many respects the intra-chain interfaces are very similar to the inter-chain interfaces in these complexes, as all the parameter distributions overlap for all types of interaction. In terms of size and planarity the domain interfaces are more similar to the non-obligate complexes than the permanent complexes. In terms of the polarity and inter-molecular hydrogen bonds, the domain interfaces are intermediate between the permanent and non-obligate subunit interfaces. The monomeric intra-chain interfaces are more closely packed than the inter-chain interfaces (both permanent and non-obligate), whilst those intra-chain interfaces from oligomeric or complexed proteins are less well packed than both types of proteinprotein interface.
Interface amino acid composition
The amino acid frequencies for the domain interfaces were compared with those on the protein surface and in the protein interior. Figure 3 reveals that the domain interfaces closely resemble the domain surface. The percentage amino acid distributions for the domain interfaces are significantly different from the protein interior (P = 4.7x109), but not significantly different from the protein surface (P = 0.43) using a
2 test. Specifically, domain interfaces contain charged and polar molecules at frequencies more commonly associated with domain surfaces.
The residue propensities for the domain interfaces are remarkably similar to those derived for permanent dimer interfaces (Jones and Thornton, 1996) [the correlation coefficient (r) is 0.73]. The majority of hydrophobic residues have interface propensities of greater than one, indicating their prevalence in both intra- and inter-chain interfaces. Arginine also plays an important role in both types of interface, being involved in the many inter-molecular hydrogen bonds that are present.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The size of domain interfaces is an important factor in the definition of structural domains. In the evaluation of one automatic domain assignment algorithm (Islam et al., 1995), the larger the domain interface the harder it was for the algorithm to make a correct domain assignment. Many domain assignment algorithms (including those by Holm and Sander, 1994; Islam et al., 1995; Siddiqui and Barton, 1995; Swindells, 1995) use different methods to find the assignment that has more intra-domain contacts than inter-domain contacts. Hence multidomain proteins with domains that have large interfaces make automatic assignments very difficult. The accuracy of domain assignment algorithms could possibly be increased by incorporating the interface properties analysed in the current work. The analysis of inter-domain hydrogen bonds, packing and residue content for domain assignments could be completed as part of a post-processing of domain definitions to give an indication of the reliability of the assignment (e.g. Siddiqui and Barton, 1995).
The physical and chemical characteristics of domaindomain interfaces are intermediate between those calculated for permanent and non-obligate interfaces between chains (Jones and Thornton, 1996, in press). Thus they are not as hydrophobic or as large as the permanent interfaces, but they are less polar with fewer hydrogen bonds than the non-obligate interfaces. It is possible that the covalent linkage and proximity of these same-chain domains means that weaker, less specific non-covalent interactions are still sufficient to form a stable unit.
With this general similarity of interactions within monomers (intra-chain) and between monomers (inter-chain) it is not surprising that three-dimensional domain swapping (Bennett et al., 1995; Schlunegger et al., 1997
) has been observed in the formation of some proteins. Three-dimensional domain swapping is a mechanism for forming oligomeric proteins from monomers, in which one domain of a monomer is replaced by the same domain from an identical protein chain. The end result is a dimer or higher oligomer with one domain of each subunit replaced by an identical domain from another subunit (Bennett et al., 1995
). There are many examples of this phenomenon, including diptheria toxin (Bennett et al., 1994
) and the crystallins (Slingsby et al., 1991
, 1997
). In such a mechanism, inter-domain interaction sites in the monomer are replaced by inter-subunit sites in the higher oligomer. For such a mechanism to work, these sites must have similar characteristics. Our present analysis has shown that this is true.
A knowledge of the characteristics of domain interfaces is also important for the understanding of protein folding and the design of novel proteins. The results presented here show that domain interfaces within proteins have amino acid compositions more comparable with domain surfaces than domain cores (Figure 3). The presence of amino acid residues in surface-like proportions seems to support a folding pathway in which individual domains fold first, prior to collapse into a stable multi-domain structure. However, the complexities of protein folding for multi-domain structures have still to be revealed experimentally. What is clear is that, which ever pathway folding takes, domains represent discrete folding units, and the interactions between them make an important contribution to the overall stability of the protein structure.
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Arnez,J.G., HarrisD.C., Mitschler,A., Rees,B., Francklyn,C.S. and Moras,D. (1995) EMBO J., 14, 41434155.[Abstract]
Bennett,M.J., Choe,S. and Eisenberg,D. (1994) Protein Sci., 3, 14441463.
Bennett,M.J., Schlunegger,M.P. and Eisenberg,D. (1995) Protein Sci., 4, 24552468.
Bentley,G.A., Boulot,G., Karjalainen,K. and Mariuzza,R.A. (1995) Science, 267, 19841987.[ISI][Medline]
Bernstein,F., Koetzle,T., Williams,G., Meyer,G., Meyer,E., Brice,M., Rodgers,K., Kennard,O., Shimanouchi,T. and Tasmui,M. (1977) J. Mol. Biol., 112, 535542.[ISI][Medline]
Holm,L. and Sander,C. (1993) Proteins Struct. Funct. Genet., 19, 256268.[ISI][Medline]
Holm,L. and Sander,C. (1998) Proteins, 33, 8896.[ISI][Medline]
Hubbard,S.J. (1990) ACCESS: A Computer Algorithm to Calculate Surface Accessibility. University College, London.
Islam,S.A, Luo,J. and Sternberg,M.J.E. (1995) Protein Engng, 8, 513525.[Abstract]
Jones,S., Stewart,M., Michie,A., Swindells,M.B., Orengo,C. and Thornton,J.M. (1998) Protein Sci., 7, 233242.
Jones,S. and Thornton,J.M. (1995) Prog. Biophys. Biol., 63, 3165.
Jones,S. and Thornton,J.M. (1996) Proc. Natl Acad. Sci. USA., 93, 1320.
Jones,S. and Thornton,J.M. In Kleanthous,C. (ed), ProteinProtein Recognition. Oxford University Press, Oxford (in press).
Laskowski,R.A. (1995) J. Mol. Graphics, 13, 323330.[ISI][Medline]
Lee,B. and Richards,F.M. (1971) J. Mol. Biol., 55, 379400.[ISI][Medline]
McDonald,I.K. and Thornton,J.M. (1994) J. Mol. Biol., 238, 777793.[ISI][Medline]
Miller,S., Janin,J., Lesk,A.M. and Chothia,C. (1987) J. Mol. Biol., 196, 641656.[ISI][Medline]
Ogata,K., Morikawa,S., Nakamura,H., Seikawa,A., Inoue,T., Kanai,H., Sarai,A., Ishii,S. and NishimuraY. (1994) Cell, 79, 639648.[ISI][Medline]
Orengo,C.A., Michie,A., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 10931108.[ISI][Medline]
Richardson,J.S. (1981) Adv. Protein Chem., 34, 167339.[Medline]
Sayle,R.A. and Milner-White,E.J. (1995) Trends Biochem Sci., 20, 374376.[ISI][Medline]
Schlunegger,M.P., Bennett,M.J. and Eisenberg,D. (1997) Adv. Protein Chem., 50, 61122.[ISI][Medline]
Siddiqui,A.S. and Barton,G.J. (1995) Protein Sci., 4, 872884.
Slingsby,C., Simpson,A., Ferszt,A., Bateman,O.A. and Nalini,V. (1991) Biochem. Soc. Trans., 19, 853858.[ISI][Medline]
Slingsby,C., Norledge,B., Simpson,A., Bateman,O.A. and Wright,G. (1997) Prog. Retinal. Eye Res., 16, 329.[ISI]
Swindells,M.B. (1995) Protein Sci., 4, 103112.
Tahirov,T.H., Lu,T.H., Liaw,Y.C., Chen,Y.L. and Lin,J.Y. (1995) J. Mol. Biol., 250, 354367.[ISI][Medline]
Received August 23, 1999; revised November 9, 1999; accepted November 16, 1999.