1 Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT and 2 Department of Biochemistry and Molecular Biology, Birkbeck College, Malet Street, London WC1E 7HX, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: protein domains/proteinligand contacts/protein structure/schematic diagrams/sequence motifs
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Domains can be identified by visual inspection of the protein structure. However, with the rapid growth of the protein structure database, the need for automatic assignments of domains has become increasingly important to allow the efficient maintenance of structural classifications such as CATH (Orengo et al., 1997). Several algorithms for the automatic assignment of domains from the co-ordinates have been devised (Holm and Sander, 1994
; Islam et al., 1995
; Siddiqui and Barton, 1995
; Swindells, 1995
). They focus on varying criteria in the definition of a structural domain.
Tools to visualize the modular arrangement of protein sequences, in which domains are identified by sequence comparison, are now available (Gouzy et al. 1997; Schultz et al., 1998
). It is important to note that their definition of a `domain', as a sequence motif, is different from that used here, although a sequence motif may encompass an entire structural domain.
Here we describe a program, DOMPLOT, which can produce schematic descriptions of protein chains in terms of their constituent structural domains. These diagrams are based on three-dimensional structural information, unlike the above, and incorporate annotations derived from the co-ordinates. The diagrams are in the form of a series of linked boxes, each box corresponding to all or part of a domain. The program is completely general in that it can work for any protein chain provided that the co-ordinates are in Protein Data Bank (PDB) format (Bernstein et al., 1977), and that the domains have previously been assigned. The output is in Postscript format (Adobe Systems Inc., 1985
).
The diagrams illustrate the pattern of interactions between the domain and any metal ions, ligands and/or nucleic acids with which it binds, allowing a fast analysis of the location of specific intermolecular interactions with respect to the domain sequence. PROSITE (Bairoch et al., 1997) information is also included.
The program has found a number of applications. The output gives a simple representation of the results of more complex atomic comparisons at the domain level, facilitating the comparison of many structures. It has been adapted to read multiple structural alignment files generated by CORA (Orengo, 1998). The output provides a concise summary of regions of structural equivalence between two or more protein chains. Conservation of residue interactions is immediately apparent, making it a useful tool to aid detection of protein homology. It can also be used for the comparison and verification of domain assignments.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The domain assignments used here are those derived with a consensus approach (Jones et al., 1998), which was developed for the structural database CATH, although in practise any domain definitions can be used. The consensus approach applies three independent algorithms, PUU (Holm and Sander, 1994
), DETECTIVE (Swindells, 1995
) and DOMAK (Siddiqui and Barton, 1995
). When the three algorithms are in agreement, the domain boundaries are assigned automatically, else the protein structure is inspected by eye.
Domain sizes
Almost one third of structural domains are discontinuous in that they are constructed from two or more non-sequential segments of the polypeptide chain (Jones et al., 1998). DOMPLOT must be able to deal with such domains, and it is necessary to determine the number of residues within each sequence segment. The size of each segment within each domain is defined as the total number of residues within the sub-sequence, rather than the number of residues of known structure within the domain segment.
Interaction information and PROSITE motifs
Annotation by ligand contacts requires a list of intermolecular interactions, generated by the program GROW (Milburn et al., 1998). GROW processes the output of HBPLUS (McDonald and Thornton, 1994
), an algorithm which identifies hydrogen bonds and other non-bonding interactions for a given PDB file. All possible positions for hydrogen atoms (H) are computed for donor atoms (D) which satisfy specified geometrical criteria with acceptor atoms (A) in the vicinity. The criteria used here are that the HA distance is <2.7 Å, the DA distance is <3.3 Å, the DHA angle is >90° and the HAAA angle is >90°, where the AA atom is the one attached to the acceptor. Non-bonded contacts are defined as those between atoms less than 3.9 Å apart.
GROW classifies all interactions identified by HBPLUS according to the pairs PP, PL, PN, PM, NL NM and LM (where P represents protein; L, ligand; N, nucleic acid and M, metal). Only the first four types of intermolecular interactions are relevant in DOMPLOT. If active-site information is available in the PDB file, it is stored also.
PROSITE information is obtained from a file listing all PROSITE motifs for each protein structure in the PDB (Kasuya and Thornton, 1999). Common motifs such as those corresponding to glycosylation sites are usually not included.
Drawing the plot
Figure 1A shows a MOLSCRIPT (Kraulis, 1991
) diagram of the ATPase fragment of a heat-shock cognate 70 kDa protein (PDB code 1atr) (O'Brien and McKay, 1993
). A simplified DOMPLOT picture, highlighting the domain organization using the same colour-coding, is shown in Figure 1B
, and the standard DOMPLOT diagram of the protein chain is shown in Figure 1C
.
|
Gaps of residues for which there is no structural information are represented by dotted lines, the lengths of which correspond to the size of the gap. (If the gap is at the N- or C-terminus, which can be very common, only a short dotted line is drawn regardless of the gap's length.) Gaps situated between domain segments are represented by dotted fragment links, whereas those lying within a domain segment are represented by two dotted lines so as to maintain the continuity of the domain segment box (not shown in this example).
Residues involved in intermolecular interactions as given in the GROW output file are represented by a vertical line at the appropriate position along the segment box. The line is coloured to indicate the identity of the ligand with which it interacts. Black vertical lines denote active-site residues. Lines with two or more colours indicate residues bound to two or more ligands. Labels which give the residue number (and insertion code) of the interacting residues are placed as close as possible to the corresponding interaction lines so as to avoid overlap. A horizontal line placed immediately below all or part of a segment box and/or fragment line denotes a PROSITE motif. These lines are coloured to distinguish between two or more motifs with different PROSITE accession codes.
The boundaries of the segments within each domain are given in plain text below the graphics, as well as those ions, ligands and/or nucleic acid chains with which the domain interacts. The CATH number of the homologous superfamily to which each domain is assigned may also be given, as well as the name, code and sequence of each PROSITE motif. The bound molecules and PROSITE names are coloured according to the colouring scheme used in the box plot.
![]() |
Applications |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The TIM barrel owes its name to chicken triosephosphate isomerase in which the /ß barrel was first observed. The TIM barrel structural domain comprises eight parallel ß-strands surrounded by seven or eight
-helices, and is found in a wide variety of enzymes. Whilst the evolutionary path of many fold families is well defined, that of the
/ß barrel is less clear. The low sequence similarity and diverse functional activity amongst the barrel enzymes support a mechanism by which they have converged to a stable protein fold. However, the active site is invariably found at the C-terminus of the barrel domain, suggesting also that
/ß barrel proteins may have diverged from a common ancestor.
Figure 2A illustrates the structural domain organization of four protein chains which contain a flavin-binding TIM barrel domain. The PDB codes 1oya (Fox and Karplus, 1994
), 2tmd (Lim et al., 1986
; Mathews et al., 1993, unpublished data), 1gox (Lindqvist, 1989
) and 1fcb (Xia and Mathews, 1990
) correspond to old yellow enzyme (OYE), trimethylamine dehydrogenase (TMDH), glycolate oxidase (GO) and flavocytochrome b2 (FCB2) respectively. Figure 2B
gives a DOMPLOT picture for the output of a CORA multiple structural alignment (Orengo, 1998
) of the TIM barrel domains in each protein. This program compares the structural environments of the residues and employs a double dynamic programming algorithm. From this concise DOMPLOT summary of the detailed structural comparison (Figure 2B
) it is apparent that there are two structurally similar pairs of domains: 1oya and the first domain of chain A of 2tmd, and 1gox and the second domain of chain B of 1fcb. The high conformational similarities of the TIM barrel domains of GO and FCB2 (Lindqvist et al., 1991
), and of OYE and TMDH (Fox and Karplus, 1994
) have previously been established. Pairwise sequence identities, SSAP scores (Taylor and Orengo, 1989
) and r.m.s. deviations of these four domains are given in Table I
.
|
|
The question of whether conservation of the position of the active site in the TIM barrels is evidence for their divergence from a common ancestor has arisen time and again (Branden, 1991; Farber and Petsko, 1990
). Farber and Petsko clustered GO, FCB2 and TMDH into the same homologous sub-family on the basis of structural and chemical properties, and given the high structural similarity of OYE and TMDH, old yellow enzyme would also be classified in this group. In contrast, Scrutton (1994) grouped the pairs OYE and TMDH, and GO and FCB2, into two separate families, given the sequence identities and modular assembly of these and other flavin oxidase/dehydrogenase enzymes. He reached no conclusion on the evolutionary origin of these two sub-classes. Wilmanns et al. (1991) detected sequence similarity in the phosphate binding site regions of nine TIM barrels, including GO, FCB2 and TMDH, and proposes that this observation is indicative of divergent evolution.
The example illustrates how DOMPLOT can be used to provide a clear and concise summary of detailed structural comparisons, and with the annotation of ligand contacts, structural conservation of binding sites is immediately evident. The high conformational similarity of the FMN-binding site in all four barrels, yet the low sequence identity between the two homologous pairs suggests that the pairs may have diverged from an early ancestor. The program is a valuable tool, when used in conjunction with others to aid detection of remote protein homology.
![]() |
Availability |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bairoch,A., Bucher,P. and Hofmann,K. (1997) Nucleic Acid. Res., 25, 217221.
Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F.,Jr., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[ISI][Medline]
Branden,C.-I. (1991) Curr. Opin. Struct. Biol., 1, 978983.
Farber,G.K. and Petsko,G.A. (1990) Trends Biochem. Sci., 15, 228234.[ISI][Medline]
Fox,K. and Karplus,P.A. (1994) Structure, 2, 10891105.[ISI][Medline]
Gouzy,J., Eugene,P., Greene,E.A., Kahn,D. and Corpet,F. (1997) Comput. Appl. Biosci., 13, 601608.[Abstract]
Holm,L. and Sander,C. (1994) Proteins, 19, 256268.[ISI][Medline]
Islam,S.A., Luo,J. and Sternberg,M.J.E. (1995) Protein Engng, 8, 513525.[Abstract]
Jones,S., Stewart,M., Michie,A., Swindells,M.B., Orengo,C. and Thornton,J.M. (1998) Protein Sci., 7, 233242.
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[ISI][Medline]
Karplus,P.A., Fox,K.M. and Massey,V. (1995) FASEB J., 9, 15181526.
Kasuya,A. and Thornton,J.M. (1999) J. Mol. Biol., 286, 16731691.[ISI][Medline]
Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946950.[ISI]
Lim,L.W., Shamala,N., Mathews,F.S., Steenkamp,D.J., Hamlin,R. and Xuong,N. (1986) J. Biol. Chem., 261, 1514015146.
Lindqvist,Y. (1989) J. Mol. Biol., 209, 151166.[ISI][Medline]
Lindqvist,Y., Branden,C.-I., Mathews,F.S. and Lederer,F. (1991) J. Biol. Chem., 266, 31983207.
McDonald,I.K. and Thornton,J.M. (1994) J. Mol. Biol., 238, 777793.[ISI][Medline]
Milburn,D., Laskowski,R. and Thornton,J.M. (1998) Protein Engng, 11, 855859.[Abstract]
O'Brien,M.C. and McKay,D.B. (1993) J. Biol. Chem., 268, 1965619658.
Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 10931108.[ISI][Medline]
Orengo,C.A. (1998) Protein Sci., in press.
Richardson,J.S. (1981) Adv. Protein Chem., 34, 246253.
Schultz,J., Milpetz,F., Bork,P. and Ponting,C.P. (1998) Proc. Natl Acad. Sci. USA, 95, 58575864.
Scrutton,N.S. (1994) BioEssays, 16, 115122.[ISI][Medline]
Siddiqui,A.S. and Barton,G.J. (1995) Protein Sci., 4, 872884.
Swindells,M.B. (1995) Protein Sci., 4, 103112.
Taylor,W.R. and Orengo,C.A. (1989) J. Mol. Biol., 208, 122.[ISI][Medline]
Wilmanns,M., Hyde,C.C., Davies,D.R., Kirschner,K. and Jansonius,J.N. (1991) Biochemistry, 30, 91619169.[ISI][Medline]
Xia,Z.-X. and Mathews,F.S. (1990) J. Mol. Biol., 212, 837863.[ISI][Medline]
Received September 22, 1998; revised January 1, 1999; accepted January 22, 1999.