1 Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College London, Gower Street,London WC1E 6BT, UK, 3 Department of Crystallography, Birkbeck College, Malet Street, London WC1E 7HX, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: CATH database/functional annotation/homologous superfamily/protein domains/structural alignments
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Protein structure classification databases, CATH (Orengo et al., 1997), SCOP (Murzin et al., 1995
), Dali Domain Dictionary (Holm and Sander, 1999
), 3Dee (Siddiqui and Barton, 1997
), DDBASE (Sowdhamini et al., 1996
) and ENTREZ/MMDB (Hogue et al., 1996
; Marchler-Bauer et al., 1999
) are now well established and provide frameworks for ordering the known protein universe. These databases cluster proteins into fold groups or evolutionary families using manual methods (Murzin et al., 1995
) or automatic structure comparison methods, i.e. SSAP (Taylor and Orengo 1989
), DALI (Holm and Sander, 1993
), STAMP (Russell and Barton, 1992
), DIAL (Sowdhamini and Blundell, 1995
) and VAST (Gibrat et al., 1997
). There are no consensus definitions for fold similarity and groups apply different criteria for assigning proteins to fold groups (reviewed in Orengo, 1994, and Brown et al., 1996). More recently, databases have emerged that present structural alignments for selected protein families. HOMSTRAD (Mizuguchi et al., 1998a
) contains 372 alignments for homologous families originally developed by Overington et al. (1990). CAMPASS (Sowdhamini et al., 1998
) is a database of 52 structurally aligned protein superfamilies derived from DDBASE. In both cases the alignments are annotated with structural features using JOY (Mizuguchi et al., 1998b
).
CATH and SCOP are currently the largest manually validated hierarchical classifications of protein domain structures (Hubbard et al., 1999; Orengo et al., 1999
). Both of these databases group proteins into homologous families and superfamilies. These levels are interesting to many biologists as they cluster proteins that have descended from a common ancestral gene and whose core structural and functional features have often been conserved by evolution (Chothia, 1984
; Overington et al., 1990
). Identifying the homologous superfamily of a protein of interest can be an important step in determining the biological role of the protein. Proteins are called analogues when they share a common fold but there is no definitive evidence that they are related by evolution. This has been observed to occur in some highly populated fold groups described as superfolds (Orengo et al., 1994
) or frequently occurring domains (FODs, Brenner et al., 1997). They are particularly common in nature, possibly owing to the inherent thermodynamic stability of the fold and prevalence of common recurring structural motifs (Salem et al., 1999
).
Homologous relationships can be identified relatively easily if the divergent proteins retain high sequence identities of 30% or more (Chothia and Lesk, 1986; Sander and Schneider, 1991
; Rost, 1999
). The presence of a functional sequence motif (PROSITE, Hofmann et al., 1999) or set of motifs (PRINTS, Attwood et al., 1999) can be used to detect more distant sequence relationships. More recently, sequence searching methods that use profile-based approaches or intermediate sequences, e.g. PSI-BLAST, (Altschul et al., 1997
), hidden Markov models (SAM-T98, Hughey and Krogh, 1996), ISS (Park et al., 1997
) and MISS (Salamov et al., 1999
), have also been shown to detect more distant homologues than pairwise sequence techniques (Park et al., 1998
; Salamov et al., 1999
). Sequence databases of homologous protein families and their associated multiple alignments have been established using these methods, e.g. PROSITE, PRINTS, PFAM (Bateman et al., 1999
), SMART (Ponting et al., 1999
) and PRODOM (Corpet et al., 1999
).
More distant evolutionary relationships (<20% sequence identity) are difficult to elucidate without a combination of structural and functional evidence to prove homology (see Murzin, 1996, 1998 for reviews). SCOP uses the literature to manually identify unusual structural features (e.g. beta-bulges) and key conserved residues involved in structure stabilization, substrate or co-factor binding or catalysis (Murzin et al., 1995; Brenner et al., 1996
). In the CATH database, high structural similarity indicated by the SSAP structure comparison algorithm is used to infer homologous proteins that are then validated by checking the literature and available functional information. While CATH and SCOP contain similar classifications for close homologues, the difficulty in identifying distant relationships means that they differ for the more distant homologues (Hadley and Jones, 1999
). Other approaches for identifying homologous proteins are based on deriving core structures for protein families (Schmidt et al., 1997
; Matsuo and Bryant, 1999
; Orengo, 1999
) or comparing functional descriptions, e.g. SWISS-PROT keywords (Holm and Sander, 1997
).
Manual validation of very distant evolutionary homologues using functional data can be a very time-consuming step. However, because there is currently no consistent format for most functional data, a purely automated approach can sometimes fail to detect significant relationships and can lead to incorrect homologue assignment. The possibility of functional inheritance errors in sequence databases (Bork and Koonin, 1998) highlights the need for a similarly cautious approach to homologue classification for structural databases.
For this reason we have developed the CATH Dictionary of Homologous Superfamilies (DHS), a Web-based resource that provides multiple structural alignments for each superfamily containing more than one non-identical representative (362 families) and facilitates the recognition of distant homologues. Alignments and structural profiles for each superfamily have been generated using the CORA suite of programs (Orengo, 1999). The alignments permit the identification of consensus superfamily-specific properties (e.g. conserved proteinligand interactions). These consensus features can be used to verify the results of sequence (e.g. PSI-BLAST) or structure (e.g. SSAP, CORA structural profiles) comparison methods that have inferred a distant homologous relationship to a protein in a CATH superfamily.
We illustrate the value of the DHS as a diagnostic tool using the pyridoxal-5'-phosphate (PLP) binding aspartate aminotransferase superfamily which contains diverse sequences and a broad range of functions. The CORA multiple alignments can easily be downloaded from the DHS Web pages and will be updated with each CATH database release.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The CATH database (release 1.5, October 1998) provides a hierarchical classification of 14 382 protein domain structures in the PDB (Orengo et al., 1997). The major classification levels are protein class (C), architecture (A), topology (T), homologous superfamily (H) and sequence family (S) (see Table I
). To reduce the level of redundancy in the PDB, the proteins in CATH with >95% sequence identity are clustered into 2807 near-identical protein families (N-level). The CATH-95 dataset includes one representative structure from each N-level family and is used for all structural comparisons in the construction of the DHS. Sequence families contain close evolutionary relatives that are identified using sequence comparison methods (Needleman and Wunsch, 1970
) and conservative sequence identity cut-offs (
35%).
|
|
Generation of structure comparison data using SSAP.
In order to provide data on sequence and structure relationships within each fold group in CATH, SSAP structural comparisons were performed for each pair of CATH-95 domains within each fold (117 239 pairwise comparisons). Protein pairs within the same fold and same superfamily are defined as homologues whereas those within the same fold but in different superfamilies are analogues. Fold level comparisons provide a complete dataset for analysing analogues and homologues and checking for any incorrect classifications. SSAP returns a normalized score between 0 and 100 for each pairwise comparison. Scores above 70 accompanied by significant residue overlap (60%) indicate proteins with the same fold or topology (T). Higher scores (>80) suggest a homologous relationship between two proteins. Sequence identity in this study is defined as the number of identical residues (after structural alignment by SSAP) divided by the number of residues in the smallest protein domain. SSAP score and sequence identity matrices are available for each superfamily in the DHS Web pages.
Automatic validation of structural relatives (DHS-VALID). The DHS-VALID program is used to check automatically all the pairwise sequence and structure comparison data generated for each fold group and homologous superfamily in CATH. Outlying proteins that have low structural similarity scores and low structural overlap percentages against the majority of the relatives are identified. These can then be checked against the known functional information and ligand interaction data and if necessary placed into a newly created homologous family. Similarly, high SSAP scores can identify potentially homologous proteins that are currently classified in different superfamilies.
Generation of multiple structural alignments using CORA.
Multiple structural alignments were generated using the CORA program (Orengo, 1999) for each of the 362 homologous superfamilies with more than one CATH-95 structure. CORA (Conserved Residue Attributes) is a suite of programs for automatically multiply aligning and analysing protein structural families. CORA uses the pairwise structural comparison data from SSAP to determine the initial set of proteins to be aligned and then identifies conserved characteristics and expresses them as a 3D structural profile for each family. As the profiles encapsulate the critical `core' of the fold and functional sites, which in the case of homologous proteins have been conserved throughout evolution, they are more sensitive at identifying distant structural homologues than comparing against a single structure.
Annotation of structural alignments.
The residues in the multiple structural alignments are annotated by colour in several different ways using the program DHS-PLOT. At the simplest level, plots are colour coded according to secondary structure regions (as defined by DSSP; Kabsch and Sander, 1983), sequence identity and amino acid type (Taylor, 1997). A shaded score bar beneath each alignment indicates the CORA structural conservation score which measures the conservation of the structural environment (dark grey/black are highly conserved regions).
PROSITE sequence patterns (Hofmann et al., 1999; URL: http://www.expasy.ch/prosite) are also included in the multiple alignment data for each homologous superfamily. Only structurally significant PROSITE patterns (from release 1.4) are used, as identified for all known PDB structures (Kasuya and Thornton, 1999). Ligand interaction data is derived from the GROW algorithm (Milburn et al., 1998
). Importantly, the structural alignment plots help to identify the consensus PROSITE patterns and ligand interaction positions that exist in the structurally conserved regions for a given protein superfamily.
PROSITE patterns, ligand interactions and domain boundary representations are also brought together in DOMPLOT diagrams (Todd et al., 1999a) which are available for each homologous superfamily. The 3D structural superpositions can be viewed interactively in a RASMOL viewer (Sayle and Milner-White, 1995
) using a customized RASMOL script (ROMLAS; R.A.Laskowski, unpublished computer program) that colours the 3D structures to complement the shading of the 2D sequence plots.
Methods for the automatic extraction of functional information.
Protein information as given in the PDB header including the protein name, species, crystallographic or NMR information was extracted directly from PDBsum web pages (Laskowski et al., 1997; URL: http://www.biochem.ucl.ac.uk/bsm/pdbsum). The files from the ENZYME database (Bairoch, 1999, release 23.0, July 1998) and the SWISS-PROT database (Bairoch and Apweiler, 1999, release 35.0) were downloaded from the ExPasy website (URL: http://www.expasy.ch). SWISS-PROT entries give links to structures in the PDB but do not specify individual PDB chains. A cross-reference table of PDB chain to SWISS-PROT entry was derived from searching PDB chain sequences against SWISS-PROT and using the highest identity matches from SWISS-PROT (A.C.R.Martin, personal communication). The EC numbers for PDB chains were automatically extracted from the SWISS-PROT files using this cross-reference table (Martin et al., 1998). The summary information from the ENZYME file (description and reaction entries) and the SWISS-PROT file (comments and keyword entries) was extracted using Perl scripts.
Compiling and updating the DHS. DHS-WEB is a suite of programs and scripts that are used to combine the multiple alignments with the functional data, consensus sequence patterns and information about sequence/structure relationships. The output is a DHS Web page of functional summary tables in HTML for each homologous superfamily. The programs can be run automatically to regenerate the DHS web pages and links to other web sites for each new release of the CATH database as further relatives are added to the homologous superfamilies. The CATH classification data is stored in a Postgres relational database and the DHS data is in flat file format. CATH and the DHS are currently being transferred into an Oracle relational database that will be updated initially on a 6 monthly cycle and subsequently on a quarterly cycle. Once the Oracle database is set up, CATH and DHS mirror sites will be established in three locations (USA, Canada and Cambridge, UK). DHS alignment files will be made available from the CATH ftp site (URL: ftp://ftp.biochem.ucl.ac.uk/pub/cathdata).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The CATH DHS is a resource providing multiple structural alignments in the 362 well-populated CATH superfamilies and facilities for viewing them (URL http://www.biochem.ucl.ac.uk/bsm/dhs). These alignments establish which residues are structurally equivalent across a whole superfamily and, since they are annotated with consensus sequence patterns and functional information, the DHS provides a valuable diagnostic tool for detecting remote homologous relationships based on the identification of consensus superfamily specific features.
Conserved active site residues, ligand interactions or sequence motifs often provide the functional signature necessary to prove an evolutionary link. The DHS resource summarizes these properties for each superfamily on a single Web page. It is linked to the CATH structure classification server (URL: http://www.biochem.ucl.ac.uk/bsm/cath/server) allowing the biologist to scan the CATH database with a newly determined structure and then access the DHS pages for each match to assess the potential evolutionary relationship.
Using the DHS as a diagnostic tool for identifying remote homologues in the PLP-dependent aspartate aminotransferase superfamily (large domain)
Pyridoxal-5'-phosphate (PLP, a vitamin B6 derivative) is a versatile cofactor, able to catalyse many different reactions involved in nitrogen metabolism in all organisms (Jansonius, 1998). The PLP-dependent aspartate aminotransferase superfamily is presented here to demonstrate the use of the DHS as a resource for homologue classification within the CATH database. Recent studies of PLP-dependent enzymes have shown that there are five distinct PLP-binding domain folds (Denessiouk et al., 1999
). All the enzymes have PLP bound to an active site lysine, forming an internal aldimine. Once the amino acid substrate reacts with the cofactor, any one of three remaining bonds around the C
may be cleaved, enabling a broad range of reactions including transamination, racemization and decarboxylation. Reaction specificity is due to interactions with the groups surrounding the C
of the substrate that favour a particular bond cleavage (Martell, 1982
).
In the PLP-dependent aspartate aminotransferase superfamily, all the enzymes have two distinct domains that have different topologies (Figure 2A) and so are classified in different superfamilies in the CATH domain database. The PLP binds covalently to a conserved lysine residue (in the large domain) at the bottom of the interdomain active site cleft (Figure 2B
). The large domain has three-layer sandwich architecture with a seven-stranded mixed ß-sheet forming the domain core that is surrounded by helices on both sides (CATH code 3.40.640). The central ß-sheet is mostly parallel with one anti-parallel edge strand. This is reminiscent of the Rossmann fold which has the same three-layer sandwich architecture but has a six-stranded parallel ß-sheet at the domain core. Figure 2C
shows the TOPS diagram (Westhead et al., 1998
) to represent schematically the topology of the large domain. The small domain has an
ß plait topology and can be found in DHS Web pages for CATH code 3.30.70.160.
|
|
|
|
|
Using the DHS data to analyse sequence and structure relationships in CATH
DHS data on pairwise sequence and structural similarity data within each fold can be used to analyse structural relationships within different CATH fold groups and superfamilies. For example, Figure 6 shows a sequencestructure plot for the PLP aspartate aminotransferase family, illustrating that the current CATH family contains diverse relatives (<15% sequence identity) as well as close homologues containing high structural similarity (SSAP score
80) and significant sequence identity (
25%). The DHS contains a sequencestructure plot for each homologous superfamily.
|
|
|
By contrast, in the homologue distributions for both TIM barrel folds (Figure 7B) and Rossmann folds (Figure 7C
) there are large peaks at 6% sequence identity, much lower than for the globin fold. These homologous distributions have considerable overlap with their respective analogous distributions. The shift to lower sequence identities relative to the globin distributions may be due to the considerable functional diversity of TIM barrel and Rossmann fold proteins. If just the single domain proteins are considered, there are 46 enzyme functions in the TIM barrel fold and 37 different functions in the Rossmann fold (A.E.Todd, unpublished data). This functional diversity may be the consequence of a stable structural framework that is tolerant to extensive changes in the sequence so that there remains very little similarity in the sequences and no common sequence motifs between proteins having different functions. In these folds, analogous pairs may be very distantly related homologues. However, this is very difficult to prove without the presence of intermediate sequences or further evolutionary evidence such as proteins being part of a common metabolic pathway.
The homologous distribution of the immunoglobulin superfold unusually exhibits two large peaks below 30% sequence identity (Figure 7D). There are 30 homologous superfamilies in the immunoglobulin fold with a total of 320 CATH-95 representative domains. The fold population is strongly biased towards the largest superfamily (CATH code 2.60.40.10) that contains 220 representatives and includes the antibody protein domains. One possible explanation for the peaks is that the homologous domains of the antibody proteins have arisen through gene duplication but have diverged to perform different functions within the antibody molecule. The variable domains from the light and heavy chains (VL and VH) are required for binding to the antigen while the constant domains (CL and CH) have a structural role. The peak around 23% arises from identities between two functionally similar domains in the same antibody molecule (i.e. VLVH or CLCH). The lower peak around 10% sequence identity is caused by the identities from dissimilar domains (i.e. VLCL, VHCH, VLCH and VHCL). The peaks above 30% are due to comparing the same domains (e.g. VLVL or VHVH) from different antibody molecules in the PDB.
These observations suggest that once functional constraints have been removed, sequences can diverge much further while retaining the same fold. As the function diverges within a homologous family (e.g. in paralogous proteins) the degree of sequence and structural similarity resembles that of analogous proteins and it becomes difficult to distinguish between analogous proteins and very distant homologues.
Considering also structural similarity, we observe that in general analogous protein pairs often have lower SSAP scores than homologous pairs and <25% sequence identity (see, for example, Figure 7D for the immunoglobulin fold). A full analysis of all the 117 239 pairwise relationships in the CATH-95 dataset showed that no analogues exhibit the combined criteria of SSAP
80, sequence identity
25% and residue overlap
70%. As manual validation of homologues can be slow, even with the benefit of the DHS, these empirical cut-offs can be used to assign homologous proteins automatically to CATH superfamilies. In a recent dataset of 2646 new domain structures (1879 chains), 64% could be classified as homologues using cautious sequence identity cut-offs (
35%), a further 9.8% could be assigned using stringent E-value cut-offs in PSI-BLAST (see Materials and methods). The combined sequencestructure criteria identified a further 2.9% of homologous pairs. Of the remaining domains, 4.6% were found to be distant homologues using the DHS Web pages and assigned to existing CATH superfamilies, 6.9% were assigned to new superfamilies within an existing fold group as they lacked the sufficient evidence to prove homology and 11.8% were identified as novel folds (see Figure 8
). These automatic homologue assignment criteria in combination with the DHS should enable the CATH classification to keep pace with the rapid increase in structure determination expected from structural genomic initiatives.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
One of the main purposes of studying the 3D structure of a protein is to gain clearer insights into its specific function and biological role. However, for many protein families there are not yet any structural data, e.g. only ~1000 structural families are known compared with ~20 000 sequence families. For these cases information on biological role can sometimes be gleaned by considering the functions of related protein sequences. This technique of functional inheritance is now routinely applied when analysing and annotating novel genome sequences. Structural genomic initiatives have now been established (Pennisi, 1998) to determine structural representatives for each of the 20 000 sequence families, making the prospect of functional inheritance through structural similarity increasingly feasible. This highlights the growing importance of incorporating functional annotations for structures and related genomic sequences into the DHS.
The CATH Protein Family Database (CATH-PFDB) of genomic sequences has recently been established to complement the CATH structural database (Pearl et al., 2000). The CATH-PFDB contains over 100 000 clear homologues from the sequence databases (e.g. translated GenBank) that have been identified using PSI-BLAST to be closely related to structural domains in CATH. These sequences and their associated functional annotations (e.g. from SWISS-PROT) will be incorporated into the DHS to increase significantly the functional information available for each structural superfamily. A PSI-BLAST server (URL: http://www.biochem.ucl.ac.uk/bsm/PSI-CATH) has been developed to allow the user to scan the CATH database with a new protein sequence. Hits to CATH structures will be linked to the DHS to allow consideration of functional variation within the potential superfamily. This should help to improve the accuracy of functional inheritance for sequences assigned to CATH superfamilies, particularly for cases where there is greater functional diversity so that any properties should be inherited more cautiously.
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 33893402.
Attwood,T.K., Flower,D.R., Lewis,A.P., Mabey,J.E., Morgan,S.R., Scordis,P., Selley,J.N. and Wright,W. (1999) Nucleic Acids Res., 27, 220225.
Bairoch,A. (1999) Nucleic Acids Res., 27, 310311.
Bairoch,A. and Apweiler,R. (1999) Nucleic Acids Res., 27, 4954.
Bateman,A., Birney,E., Durbin,R., Eddy,S.R., Finn,R.D. and Sonnhammer, E.L.L (1999) Nucleic Acids Res., 27, 260262.
Benson,D.A., Boguski,M.S., Lipman,D.J., Ostell,J., Ouellette,B.F.F., Rapp,B.A. and Wheeler,D.L. (1999) Nucleic Acids Res., 27, 1217.
Bork,P. and Koonin,E.V. (1998) Nature Genet., 18, 313318.[ISI][Medline]
Brenner,S.E., Chothia,C., Hubbard,T.J.P. and Murzin,A.G. (1996) Methods Enzymol., 266, 635643.[ISI][Medline]
Brenner,S.E., Chothia,C. and Hubbard,T.J.P. (1997) Curr. Opin. Struct. Biol., 7, 369376.[ISI][Medline]
Brenner,S.E., Chothia,C. and Hubbard,T.J.P. (1998) Proc. Natl Acad. Sci. USA, 95, 60736078.
Brown,N.P., Orengo,C.A. and Taylor,W.R. (1996) Comput. Chem., 20, 359380.[ISI]
Chothia,C. (1984) Annu. Rev. Biochem., 53, 537572.[ISI][Medline]
Chothia,C. and Lesk,A.M. (1986) EMBO J., 5, 823826.[Abstract]
Corpet,F., Gouzy,J. and Kahn,D. (1999) Nucleic Acids Res., 27, 263267.
Denessiouk,K.A., Denesyuk,A.I. Lehtonen,J.V., Korpela,T. and Johnson,M.S. (1999) Proteins: Struct. Funct. Genet., 35, 250261.[ISI][Medline]
Gibrat,J.F., Madej,T., Spouge,J.L. and Bryant,S.H. (1997) Biophys. J., 72, 298.
Hadley,C. and Jones,D.T. (1999) Structure, 7, 10991112.[ISI][Medline]
Hofmann,K., Bucher,P., Falquet,L. and Bairoch,A. (1999) Nucleic Acids Res., 27, 215219.
Holm,L. and Sander,C. (1993) J. Mol. Biol., 233, 123138.[ISI][Medline]
Holm,L. and Sander,C. (1997) In Gaasterland,T., Karp,P., Karplus,K., Ouzounis,C., Sander,C. and Valencia,A. (eds), Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA., pp. 140146.
Holm,L. and Sander,C. (1999) Nucleic Acids Res., 27, 244247.
Hogue,C.W.V., Ohkawa,H. and Bryant,S.H. (1996). Trends. Biochem. Sci., 21, 226229.[ISI][Medline]
Hubbard,T.J.P. and Blundell,T.L. (1987). Protein Engng, 1, 159171.[Abstract]
Hubbard,T.J.P., Ailey,B., Brenner,S.E., Murzin,A.G. and Chothia,C. (1999) Nucleic Acids Res., 27, 254256.
Hughey,R. and Krogh,A. (1996) Comput. Appl. Biosci., 12, 95107.[Abstract]
Jansonius,J.N. (1998) Curr. Opin. Struct. Biol., 8, 759769.[ISI][Medline]
Jones,D.T. (1999) J. Mol. Biol., 287, 797815.[ISI][Medline]
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[ISI][Medline]
Kasuya,A. and Thornton,J.M. (1999) J. Mol. Biol., 286, 16731691.[ISI][Medline]
Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946950.[ISI]
Laskowski,R.A., Hutchinson,E.G., Michie,A.D., Wallace,A.C., Jones,M.L. and Thornton,J.M. (1997) Trends Biochem. Sci., 22, 488490.[ISI][Medline]
Marchler-Bauer,A., Addess,K.J., Chappey,C., Geer,L., Madej,T., Matsuo,Y., Wang,Y. and Bryant, S. (1999) Nucleic Acids Res., 27, 240243.
Martell,A.E. (1982) Adv. Enzymol. Relat. Areas Mol. Biol., 53, 163199.[ISI][Medline]
Martin,A.C., Orengo,C.A., Hutchinson,E.G., Jones,S., Karmirantzou,M., Laskowski,R.A., Mitchell,J.B., Taroni,C. and Thornton,J.M. (1998) Structure, 6, 875884.[ISI][Medline]
Matsuo,Y. and Bryant,S.H. (1999) Proteins: Struct. Funct. Genet., 35, 7079.[ISI][Medline]
Merritt,E.A. and Bacon,D.J. (1997) Methods Enzymol., 277, 505524.[ISI]
Milburn,D., Laskowski,R.A. and Thornton,J.M. (1998) Protein Engng, 11, 855859.[Abstract]
Mizuguchi,K., Deane,C.M., Blundell,T.L. and Overington,J.P. (1998a) Protein Sci., 7, 24692471.
Mizuguchi,K., Deane,C.M., Blundell,T.L. Johnson,M.S. and Overington,J.P. (1998b) Bioinformatics, 14, 617623.[Abstract]
Murzin,A.G. (1996) Curr. Opin. Struct. Biol., 6, 386394.[ISI][Medline]
Murzin,A.G. (1998) Curr. Opin. Struct. Biol., 8, 380387.[ISI][Medline]
Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536540.[ISI][Medline]
Needleman,S.B. and Wunsch,C.D. (1970). J. Mol. Biol., 48, 443453.[ISI][Medline]
Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) (1992) Enzyme Nomenclature. Academic Press, New York.
Orengo,C.A. (1994) Curr. Opin. Struct. Biol., 4, 429440.[ISI]
Orengo,C.A. (1999) Protein Sci., 8, 699715.[Abstract]
Orengo,C.A., Brown,N.P. and Taylor,W.R. (1992) Proteins, 14, 139167.[ISI][Medline]
Orengo,C.A., Jones,D.T. and Thornton,J.M. (1994) Nature, 372, 631634.[ISI][Medline]
Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 10931108.[ISI][Medline]
Orengo,C.A., Pearl,F.M.G., Bray,J.E., Todd,A.E., Martin,A.C., Lo Conte,L. and Thornton,J.M. (1999) Nucleic Acids Res., 27, 275279.
Overington,J.P., Johnson,M.S., Sali,A. and Blundell,T.L. (1990). Proc. R. Soc. London, Ser B, 241, 132145.[ISI][Medline]
Park,J., Teichmann,S.A., Hubbard,T. and Chothia,C. (1997) J. Mol. Biol., 273, 349354.[ISI][Medline]
Park,J., Karplus,K., Barrett,C., Hughey,R., Haussler,D., Hubbard,T. and Chothia,C. (1998) J. Mol. Biol., 284, 12011210.[ISI][Medline]
Pearl,F.M.G., Lee,D., Bray,J.E., Sillitoe,I., Todd, A.E., Harrison,A.P., Thornton,J.M. and Orengo,C.A. (2000) Nucleic Acids Res., 28, 277282.
Pennisi,E. (1998) Science, 279, 978979.
Ponting,C.P., Schultz,J., Milpetz,F. and Bork,P. (1999) Nucleic Acids Res., 27, 229232.
Rost,B. (1999) Protein Engng, 12, 8594.
Russell,R.B. and Barton,G.J. (1992) Proteins: Struct. Funct. Genet., 20, 309323.
Russell,R.B., Saqi,M.A.S., Sayle,R.A., Bates,P.A. and Sternberg,M.J.E. (1997) J. Mol. Biol., 269, 423439.[ISI][Medline]
Salamov,A.A., Suwa,M., Orengo,C.A. and Swindells,M.B. (1998) Protein Sci., 8, 771777.[Abstract]
Salamov,A.A., Suwa,M., Orengo,C.A. and Swindells,M.B. (1999) Protein Engng, 12, 95100.
Salem,G.M., Hutchinson,E.G., Orengo,C.A. and Thornton,J.M. (1999) J. Mol. Biol., 287, 969981.[ISI][Medline]
Sander,C. and Schneider,R. (1991) Proteins: Struct. Funct. Genet., 9, 5668[ISI][Medline]
Sayle,R.A. and Milner-White,E.J. (1995) Trends Biochem. Sci., 20, 374376.[ISI][Medline]
Schmidt,R., Gerstein,M. and Altman,R.B. (1997) Protein Sci., 6, 246248.
Siddiqui,A.S. and Barton,G.J. (1997) http://circinus.ebi.ac.uk:8080/3Dee/help/help_intro.html.
Sowdhamini,R. and Blundell,T.L. (1995) Protein Sci., 4, 506520.
Sowdhamini,R., Rufino,S.D. and Blundell,T.L. (1996) Folding Des., 1, 209220.[ISI][Medline]
Sowdhamini,R., Burke,D.F., Huang,F., Mizuguchi,K., Nagarajaram,H.A., Srinivasan,N., Steward,R.E. and Blundell,T.L. (1998) Structure, 6, 10871094.[ISI][Medline]
Taylor,W.T. (1997) Protein Engng, 10, 743746.
Taylor,W.T. and Orengo,C.A. (1989) J. Mol. Biol., 208, 122.[ISI][Medline]
Todd,A.E., Orengo,C.A. and Thornton,J.M. (1999a) Protein Engng, 12, 375379.
Todd,A.E., Orengo,C.A. and Thornton,J.M. (1999b) Curr. Opin. Chem. Biol., 3, 548556.[ISI][Medline]
Wallace,A.C., Laskowski,R.A. and Thornton,J.M. (1995) Protein Engng, 8, 127134.[Abstract]
Westhead,D.R., Hatton,D.C. and Thornton,J.M. (1998) Trends Biochem. Sci., 23, 3536.[ISI][Medline]
Received August 8, 1999; revised December 14, 1999; accepted January 6, 2000.