1 Computational Protein Structure Group, Computational Biosciences Section, Life Sciences Division and 3 Center for Engineering Science Advanced Research, Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830-6480, USA
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: combinatorial optimization/fold recognition/protein threading/protein structure prediction
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
It is generally believed that both singleton fitness and pairwise contact preference between spatially nearby residues, along with alignment gap penalties, need to be considered in scoring the quality of a sequencestructure alignment for the purpose of protein fold recognition. But one problem caused by such a scoring scheme is the lack of effective computational methods which can both guarantee to find an optimal sequencestructure alignment, and do so in an efficient manner (within a practically acceptable amount of time). (Throughout this paper, an optimal alignment means a globally-optimal alignment.) To avoid the computational difficulty, most existing threading programs employ heuristic or, in some cases, stochastic methods for finding an optimal sequencestructure alignment. With this type of threading method, it is difficult to determine if the scoring function is inadequate or the optimizer used did not find the function's global optimum, when the prediction performance is not good. To address this problem, we have recently developed an algorithm which guarantees to find an optimal sequencestructure alignment and does so efficiently when pairwise contacts are considered only between nearby residues, say up to 15 Å between their ß-carbon atoms (Xu et al., 1998b; Xu,Y. and Xu,D., manuscript submitted). We have implemented this algorithm as a computer program, called PROSPECT (PROtein Structure Prediction and Evaluation Computer Toolkit) (Xu,Y. and Xu,D., manuscript submitted). CASP3 provided an excellent opportunity for testing and evaluating the PROSPECT program.
During the CASP3 experiment (CASP, 1998), we noticed that a number of targets had a limited amount of structural information, e.g. disulfide bonds or active sites, available in the literature before their structures were fully solved. For example, target t0072 was known to have three disulfide bonds between certain pairs of cysteines. (Unfortunately, the experimentally-determined structure of t0072 was not made available for the assessment of predictions in the CASP3 experiment. Hence no analysis will be done on t0072 in this paper.) This type of information should help improve the fold recognition and the sequencestructure alignment results when applied to the protein threading process. Because of observations of this kind, we have generalized the PROSPECT threading algorithm to deal with a more general class of threading problems, which we term constrained optimal threading problems, where the goal is to find an optimal sequencestructure alignment under various (mainly geometrical) constraints on certain residues of the unknown structure. Our experience has shown that even a small number of constraints could improve both the prediction accuracy and the computational efficiency significantly.
Materials and methods
The prediction procedure in our CASP3 exercise consists of three main steps for each target: (i) information collection; (ii) information-based (or information-constrained) protein threading using PROSPECT and (iii) post-processing of fold recognition results. Using this procedure, we made structure predictions on all the targets in CASP3. The following section summarizes the prediction targets in CASP3.
Prediction targets in CASP3
There were 43 targets (t0043t0085) for prediction in CASP3. Among them, 35 have their experimental structures available so far. (Target t0054 was used in the CASP3 assessment, but it is not available to the predictors. Hence we do not include it in our discussion.) A detailed description of the targets can be found at the CASP3 official web page (http://PredictionCenter.llnl.gov/casp3/ ). The 35 targets can be roughly grouped into six categories.
The analysis of our performance in CASP3 prediction experiment described in this paper focuses on the 23 targets belonging to the family, superfamily and fold categories. These are the categories of targets where threading is most useful for fold recognition. We include the family category in our analysis mainly to show how PROSPECT improves the accuracy of a sequencestructure alignment over a typical sequence comparison program. Further, we give a brief summary of how PROSPECT performed in the ab initio category.
Pre-processing: information collection
The goal of the information collection step was to extract as much information about each target from the literature or from computational results using publicly available tools as we could within the time limit (roughly 1 day per target). This step included (a) a literature search for structural or functional information about each target; (b) a database search against PDB (Bernstein et al., 1977) using BLAST (Altschul et al., 1997
) for possible homologs; (c) secondary structure prediction using PHD (Rost and Sander, 1993
) and PSA (Stultz et al., 1993
, White et al., 1994
); (d) prediction of motifs or potential active sites by MOTIF (at http://www.motif.genome.ad.jp); (e) multiple sequence alignment against the SWISS-PROT database (Bairoch and Apweiler, 1999
) to determine the potentially conserved positions of each target. The information collected was used either as constraints during the threading process or to narrow down the candidate list for fold recognition.
Protein threading by PROSPECT
Each target was threaded against a subset of FSSP (Holm and Sander, 1996) (release of April 1998), unless a smaller set of remote homologs was identified. (A template fold in PROSPECT's template library needs to be pre-processed, which could take a significant amount of time. The development of the PROSPECT system was not completed until a few weeks into the CASP3 prediction season; hence some of the template folds are not pre-processed and put into our template library in time.) PROSPECT found an optimal alignment between the target and each template in the template library. An `optimal' alignment is determined in two steps: (i) finding an optimal alignment between the target and core elements of the template, while penalizing length differences between the corresponding loop regions, using a divide-and-conquer algorithm (Xu et al., 1998a
; Xu,Y. and Xu,D., manuscript submitted) [cores consist of
-helices and ß-strands, as determined by DSSP (Kabsch and Sander, 1983
)]; (ii) alignments between loop regions are done separately after core elements are aligned, using a sequencesequence alignment algorithm like SmithWaterman (Smith and Waterman, 1981
).
The quality of an alignment between a target and a template's core elements is measured by a linear combination of (a) Es, a singleton fitness term which measures how well a residue-type fits a particular environment, defined by the original residue type in this template position, its secondary structure and solvent accessibility; (b) Ep, a pairwise contact term which measures how preferable to have a pair of particular residue types in close contact (up to 15 Å between ß-carbons) and (c) El, a term which penalizes the length differences of corresponding loops. More specifically, the following scoring function is used to measure an alignment ({S2, T1}, {S4, T2}, . . ., {S2M, TM}; odd-numbered Sis are aligned with loops) between a sequence s and a series of cores T1, . . ., TM:
|
where {S1, S2, . . ., S2M+1} is a partition of s (each Si is a substring of s) with ||S2i|| = ||Ti||, for all i 2 [1, M]; X[k] represents the kth element of X; loop(Ti, Ti+1) denotes the loop length between core Ti and Ti+1 and || · || means the size of a set.
The divide-and-conquer algorithm (Xu et al., 1998a; Xu,Y. and Xu,D., manuscript submitted) of PROSPECT finds an alignment which achieves the global optimum of the scoring function (1). A significant and unique property of this algorithm is that it finds the globally-optimal alignment efficiently. The global optimality is achieved by implicitly searching through the whole alignment space, and its computational efficiency is obtained by avoiding explicit examination of a significant portion of the search space, mathematically proven not to contain the global optimum. Table I
summarizes the amount of time used to find an optimal alignment between each target and its best scoring template structure for all the targets except t0075, which PROSPECT incorrectly identified as a new fold. (We did not put the total threading time against the whole template set because the size of this template library varied from target to target.)
|
Information-constrained fold recognition
PROSPECT allows a user to incorporate structure-related information about a specific target during the threading process. This is done by modifying the objective function (1) based on the user's input, and by generalizing the divide-and-conquer algorithm. Four types of constraints are currently allowed: (a) distance constraints between certain pairs of residues in the target sequence; (b) consistency with known or predicted secondary structure information; (c) consistency with given partial alignments; (d) the maximum number of core elements allowed to be removed from the template structure in a threading alignment. Constraints have been used in CASP3 to improve both the fold recognition accuracy and the computational efficiency of threading by reducing the alignment space to be searched. A detailed description of how the generalized algorithm deals with these constraints will be given elsewhere.
Pairwise distance constraints
Information about distances between certain pairs of residues (e.g. between their ß-carbons) in the target may be known before its full structure is determined, e.g. disulfide bonds or the residues in an active site. A distance constraint specified as `distance(si, si') [ai, bi]' is enforced in PROSPECT by adding to the objective function (1) a term which penalizes deviation of distance (si, si') between residues si and si' from the specified range [ai, bi]. When the penalty is set so high that no distance (si, si')
[ai, bi] is violated, the distance constraint is a hard constraint; otherwise it becomes a soft constraint. During the search for an optimal alignment, partial alignments with significant violations of the distance constraints are discarded for further consideration. This reduces the size of the alignment search space and hence improves the computational efficiency.
Secondary structure information
It is estimated that the best secondary structure prediction programs identify the secondary structure (helix, strand or loop) for residues of the target sequence with approximately 70% accuracy. Multiple sequence alignments with related proteins could also reveal secondary structure information about particular residues. Secondary structures of small proteins could also be determined by NMR before the full structures are solved (e.g. target t0056 was provided with the NMR-determined secondary structures, six -helices). Effectively using this type of information should help improve both the fold recognition accuracy and computational efficiency. For each residue with specified secondary structure information, PROSPECT adds a term to the objective function (1), which penalizes the difference between the predicted or known secondary structure type and the type of its aligned core secondary structure. A variable weight is used for such a penalty term, based on the confidence level of the secondary structure information. During the CASP3 experiment, one heuristic we used was that a residue in the center of a run of residues, predicted to be a loop with high confidence values (score = 9) by PHD (Rost and Sander, 1993
), can be aligned only with a loop residue in the template. This heuristic did not cause any alignment accuracy problem (on the CASP3 targets), but significantly reduced the computational time. We have applied similar ideas to threading-alignment predictions by other threading programs (Jones et al., 1992
; Crawford, 1999
).
Partial alignments
Certain residues or runs of residues (short fragments) in the target sequence may be conserved among the family members. This information could be obtained through multiple sequence alignments or by literature search (for active sites, for example). A correct alignment between the target sequence and its native-like template structure should be consistent with those (preferred) alignment fragments. A term is added to the objective function (1), which rewards the realization of each such partial alignment during the threading process. A variable weight is assigned to each such rewarding term to reflect the confidence level or the significance of a predicted conserved element. Known partial alignments are also used to reduce the computational time as do other constraints mentioned above.
Unaligned cores
It is well known that similar structural folds may not be fully aligned with the target due to the difference in the structures. Certain cores of the template or segments of the target sequence may have to be removed from the alignment. This is particularly true in cases where the most similar fold in the template library is not from the same protein family as the target sequence. While PROSPECT allows segments of the target, at any position of the sequence, to be unaligned and treats them as loop regions, it allows deletions of cores only at the ends of the template but not in the middle. A user-specified parameter, D, is used to control the maximum number of cores that can be deleted at each end of a template. (D can be automatically set based on the length difference between a query sequence and the template.) Deletion of cores at the ends of a template is achieved by generalizing the threading problem defined in Eqn 1 to the following problem. The goal is to find k1 and k2, with k1, k2 [1, D], such that alignment ({S2k1, Tk1}, . . . {S2(Mk2), TMk2}) optimizes the following function:
|
where Tk1, Mk2 is the partial template from core k1 to core M k2.
This generalized threading problem can be solved using a slight generalization of the divide-and-conquer algorithm (Xu et al., 1998a; Xu,Y. and Xu,D., manuscript submitted). The algorithm compares all the threading alignments with no cores up to D cores removed from each end of the template, and selects the best scoring one. A careful implementation of the algorithm increases the computational time insignificantly, not by D2 times as a naive implementation would do. This generalized approach has provided the flexibility of allowing alignment between a target sequence and a partial template. Deletions of cores in the middle of a template can theoretically be done similarly but this would greatly increase the computational time, due to the overall structure of our divide-and-conquer algorithm. Research is currently under way to develop more efficient ways to handle deletions of cores in the middle of a template.
PROSPECT provides a standard format, which allows a user easily to specify the constraints for a target before the threading process starts; then it automatically modifies the objective function according to the constraints and solves the constrained threading problem.
Post-processing and submission
For each target of CASP3, the fifty best-scoring sequencestructure alignments were selected for post-processing. The post-processing step consisted mainly of (a) checking for compactness (b) generating atomic models for the top 10 alignments which passed the compactness test and (c) ranking the atomic models and their corresponding alignments. Then the top five sequencestructure alignments (called model-1, . . ., model-5) and the top one or two atomic models were selected and submitted as the final predictions, depending on whether their scores were good enough.
(a) Since PROSPECT allows deletions of cores at the ends of a template, the aligned substructure may not always be compact, as measured by its radius of gyration. (Research is currently ongoing to use substructure compactness as a constraint when deleting cores.) Uncompact aligned substructures were removed from further consideration.
(b) All the atomic structures were generated by MODELLER (Sali and Blundell, 1993). Multiple templates were fed to MODELLER as the starting structures if several high-quality template structures (identified as folds similar to the target) were available. Ten structures were generated for each alignment.
(c) We then used WHATIF (Vriend, 1990) and PROCHECK (Laskowski et al., 1993
) to assess (i) the packing and backbone conformations of each structure, (ii) the consistency between the secondary structures predicted by PHD (Rost and Sander, 1993
) and those of the predicted model assigned by DSSP, (iii) the inside/outside occupancies of hydrophobic and hydrophilic residues and (iv) the existence of D-amino residues or cis peptide bonds in the predicted model; and chose the best one to be the atomic model. In case none of the 10 structures had acceptable packing or backbone conformations, we adjusted the alignment and rebuilt the model. We also visually checked the structures and tuned the alignment if necessary.
Post-prediction assessment
To evaluate the prediction accuracy, we have conducted both sequence-independent superposition and sequence-dependent superposition between the actual structure and the predicted structure. Structurestructure alignments in a sequence independent superposition were performed by SARF (Alexandrov, 1996; Alexandrov and Fischer, 1996
). Though the number of structurally alignable residues calculated by SARF is often different from the one calculated by ProSup (Feng and Sippl, 1996
), the difference between the two is generally insignificant. The sequence-dependent superpositions, examples of which are shown in Figures 1 and 6
, were aligned by the protein least squares fitting program ProFit (A.C.R.Martin, University College London). The schematic diagrams of the proteins were made by the Insight II package (Molecular Simulations Inc., 1995), and the C
traces were made by VMD (Humphrey et al., 1996
).
|
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
All the structures of predicted model-1 in this category have good alignment to the correct fold. Figure 1 shows two examples of the quality of the models in the sequence-dependent superposition. For target t0068, the core structure of the fold is highly conserved between the target and the template (the C
-R.m.s.d. between the model and the experimental structure is only 1.76 Å for 77% of the structure in the sequence-independent superposition), while there are some conformational differences in the peripheral elements. For target t0074, the secondary structures and the alignments are predicted correctly (no shift error of alignment in the core structure of the fold), while there are some shifts in the relative positions of secondary structures between the template and the experimental structure. Our threading alignment of each model is much better (see Table III
) than the alignment produced by the sequencesequence pairwise alignment program ALIGN (http://www2.igh.cnrs.fr/bin/align-guess.cgi). However, the quality of our alignments is similar to PSI-BLAST (Altschul et al., 1997
) results (the sequence alignments using multiple sequence profile). Unfortunately, we failed to use PSI-BLAST for our CASP3 predictions.
|
In the superfamily category (eight targets), PROSPECT recognized three correct folds, found two closely-similar folds and missed three targets. Prediction for t0053 constitutes a successful example of applying constraints during the threading process. After the first-pass threading of t0053 against the whole template library, we found that the top scoring template (1ak1) has an active site at His183 (Al-Karadaghi et al., 1997). Using the BLOCK search (Henikoff and Henikoff, 1994
), we identified His145 of t0053 as the most likely active site. We then ran PROSPECT to thread t0053 against 1ak1 under the constraint that t0053's His145 has a high probability to be aligned with His183 of 1ak1, and made the final sequencestructure alignment prediction.
It turned out that 1ak1 is actually the fold of the experimental structure. t0053 and 1ak1 have only 11.2% identity in the sequence-independent structurestructure alignment. In a more detailed analysis of the unconstrained alignment, only one of the 13 alignable cores of t0053 was aligned correctly with 1ak1, and one more was aligned with a four-residue shift. Our final prediction with the constraint correctly aligned five cores, and aligned each of the other eight cores with at most a five-residue shift. The C-r.m.s.d. between the predicted three-dimensional structure and the experimental one is 7.7 Å for all the experimentally-solved residues (257 in total); and 174 residues (66% of the target) have a C
-r.m.s.d. of 4.0 Å (Koehl and Levitt, 1999
) (see Figure 2
).
|
|
Models for t0063 and t0080 are closely-similar folds to the experimental structures. Both model -1 of t0063 and its experimental structure have ß-barrel folds, but the former has one domain while the latter has two domains. The overall packing between different segments in t0080 is similar, although some core secondary structures vary. We missed the correct fold for t0044 due to a program bug in PROSPECT. After the bug was fixed, we re-ran PROSPECT on t0044, which placed the correct fold as number 1 scoring template with a good alignment. Unfortunately, the bug was noticed and fixed days after the expiration date for t0044.
Fold
Among the eight targets in this category, PROSPECT recognized the correct fold with a good alignment for one, found closely-similar folds for four and missed three.
PROSPECT identified the correct fold (1tup-A) for target t0067 but as its second choice (model-2). (1tup-A has 101 structurally equivalent residues with t0067 with C-r.m.s.d. of 3.3 Å, one of the best templates in PDB according to the DALI server (Holm and Sander, 1993
).) Figure 4
shows a comparison of the two structures. The core elements of t0067 consist of four antiparallel ß-strands (two ß-hairpin motifs). Its topology is one of the infrequent ones among the eight observed topologies of two sequentially adjacent hairpin motifs (Branden and Tooze, 1999
). Our prediction had the correct topology between the four ß-strands, with a small alignment error. Among the 75 structurally alignable residues (calculated by ProSup) between model-2 and the experimental structure of t0067, 40 residues had alignment shifts of four positions or less. In addition, the two short ß-strands at each side of the four long ß-strands and a small ß-sheet (two ß-strands) at the lower-left of the structures are identified. This example demonstrates that PROSPECT is capable of identifying a fold whose peripheral elements differ significantly from the target. Unfortunately, we did not place the template 1tup-A as the top choice (model-1). The core elements of our model-1 (template 1pbo-A) are also antiparallel ß-strands but they form a ß-barrel.
|
|
With its capability of aligning a target with a partial structure, PROSPECT aligned the small targets t0065, t0073 and t0084 with substructures of the templates 1abv, 1cos-A and 1uby, respectively, though these targets are in the category of ab initio predictions. These alignments with the partial structures directly gave accurate structure predictions, with C-r.m.s.d.s of 4.2, 2.1 and 0.8 Å, respectively, as shown in Figure 6
. Our model-1 for target t0073 is among one of the better models submitted to CASP3 (CASP, 1998
). However, we failed to recognize targets t0052 and t0056 as new folds.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our first-time performance in CASP (CASP, 1998; Koehl and Levitt, 1999
) testifies to the advantage of having a threading algorithm, like PROSPECT, which guarantees to find the globally optimal sequencestructure alignment. PROSPECT recognized four correct folds in the superfamily/fold categories, in which none of the targets had identifiable sequence homology to any protein in PDB. Among the four targets, two of them had good alignments with the correct folds. Even when PROSPECT failed to identify the best available fold, the selected template often shared motifs and/or overall architecture of the target structure. PROSPECT is good at identifying the best templates for short peptides, as shown by the success in all three peptide targets (two of which are synthetic). This demonstrates that PROSPECT can be a powerful tool for peptide design. In the family category, though the correct fold can be identified without using threading, the alignment generated by the local sequencesequence alignment program BLAST often covers only a small portion of the target sequence. PROSPECT generally produced much better alignments than the global sequencesequence alignment program ALIGN. Although PSI-BLAST can also give a good alignment for a large portion of the target sequence when multiple homologous sequences to the target are available, many protein sequences have no unambiguous homology (defined by BLAST) to other known protein sequences (so-called `orphan genes'), or have too few homologs to produce a reliable multiple sequence profile. In these cases, PROSPECT can be an ideal tool, particularly when threading sequences at the genome scale.
While some success was achieved by our threading method in CASP3, a number of lessons have been learned from where we failed. These lessons should provide guidance for development of more accurate and efficient computational methods for protein structure predictions.
The similarity between some targets and their templates is not very strong, and it is unclear whether the current energy function (using knowledge based singleton and contact potential) is capable of identifying the fold. For example, for target t0075, for which we predicted a new fold, the best match (1rlr) found by DALI can align only 59 residues with z-score 3.5 and r.m.s.d. 4.3 Å. The closest structure neighbor (1jvr) found by VAST (Gibrat et al., 1996) can align only 25 residues out of 110 to the experimental structure. (Depending on the definition of new fold, this target could be accepted as a new fold.) The template 1rlr was ranked by PROSPECT the 59th out of more than 1300 templates, indicating some preference for the fold, but insufficient for accepting it. The big variation in the core elements made the current energy function lose significant recognition power (Russell and Barton, 1994
). A new type of energy function which is more sensitive to the fold than to the atomic coordinates may be needed.
Several new features have been added to the PROSPECT system since CASP3 (see http://compbio.ornl.gov/structure/prospect/ for details). These include (i) a better definition of template cores, which exclude core secondary structures or residues that do not significantly interact with other parts of the protein; (ii) an optional energy term using a position-dependent profile of the target sequence, derived from SAM (Karplus et al., 1998; Park et al., 1998
); (iii) the DALI domain library in addition to the FSSP chain library as the threading templates. In addition, the system runs a PSI-BLAST search before the threading process, and adds any significant hits to the template library. These new additions have significantly improved the performance of PROSPECT in both the category of fold recognition and sequencestructure alignments on 700 pairs of aligned structures of FSSP (Holm and Sander, 1996
). On the CASP3 targets of the fold recognition category, the new version of PROSPECT recognizes four more targets, t0044, t0079, t0081 and t0083, with good alignments while maintaining a similar performance level on other targets. In this exercise, a fold is considered to be correctly recognized if it ranks among the top five (just as in CASP3). Table 4
summarizes the prediction improvement of PROSPECT on these four targets.
|
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Alexandrov,N.N. (1996) Protein Engng, 9, 727732.[Abstract]
Alexandrov,N.N. and Fischer,D. (1996) Proteins: Struct. Funct. Genet., 25, 354365.[ISI][Medline]
Altschul,S.F., Madden,T.L., Schäffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 33893402.
Bairoch,A. and Apweiler,R. (1999) Nucleic Acids Res., 27, 4954.
Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[ISI][Medline]
Branden,C. and Tooze,J. (1999) Introduction to Protein Structure, 2nd edn. Garland Publishing, Inc., New York.
CASP (1995) Proteins: Struct. Funct. Genet., 23, 295462.[ISI][Medline]
CASP (1998) Third Meeting on the Critical Assessment of Techniques for Protein Structure Prediction, December 1317. Asilomar Conference Center, California.
Crawford,O.H. (1999) Bioinformatics, 15, 6671.
Feng,Z.K. and Sippl,M.J. (1996) Fold. Des., 1, 123132.[ISI][Medline]
Gibrat,J.F., Madej,T. and Bryant,S.H. (1996) Curr. Opin. Struct. Biol., 6, 377385.[ISI][Medline]
Henikoff,S. and Henikoff,J.G. (1994) Genomics, 19, 97107.[ISI][Medline]
Holm,L. and Sander,C. (1993) J. Mol. Biol., 233, 123138.[ISI][Medline]
Holm,L. and Sander,C. (1996) Science, 273, 595602.
Holm,L. and Sander,C. (1998) Proteins: Struct. Funct. Genet., 33, 8896.[ISI][Medline]
Humphrey,W.F., Dalke,A. and Schulten,K. (1996) J. Mol. Graphics, 14, 3338.[ISI][Medline]
Jones,D.T., Taylor,W.R. and Thornton,J.M. (1992) Nature, 358, 8689.[ISI][Medline]
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[ISI][Medline]
Karplus,K., Barrett,C. and Hughey,R. (1998) Bioinformatics, 14, 846856.[Abstract]
Koehl,P. and Levitt,M. (1999) Nature Struct. Biol., 6, 108111.[Medline]
Laskowski,R.A., MacArthur,M.W., Moss,D.S. and Thornton,J.M. (1993) J. Appl. Crystallogr., 26, 283291.[ISI]
Molecular Simulations, Inc. (1995) Insight II (Release 95.0) San Diego, Molecular Simulations, Inc., CA.
Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536540.[ISI][Medline]
Orengo,C.A., Jones,D.T. and Thornton,J.M. (1994) Nature, 372, 631634.[ISI][Medline]
Park,J., Karplus,K., Barrett,C., Hughey,R., Haussler,D., Hubbard,T. and Chothia,C. (1998) J. Mol. Biol., 284, 12011210.[ISI][Medline]
Rost,B. and Sander,C. (1993) J. Mol. Biol., 232, 584599.[ISI][Medline]
Russell,R.B. and Barton,G.J. (1994) J. Mol. Biol., 244, 332350.[ISI][Medline]
Sali,A. and Blundell,T.L. (1993) J. Mol. Biol., 234, 779815.[ISI][Medline]
Schmidt,R.B., Gerstein,M. and Altman,R.B. (1997) Protein Sci., 6, 246248.
Smith,T.F. and Waterman,M.S. (1981) Adv. Appl. Math., 2, 482489.
Stultz,C.M., White,J.V. and Smith,T. (1993) Protein Sci., 2, 305314.
Vriend,G. (1990) J. Mol. Graphics, 8, 5256.[ISI][Medline]
White,J.V., Stultz,C.M. and Smith,T. (1994) Math. Biosci., 119, 3575.[ISI][Medline]
Xu,Y., Xu,D. and Uberbacher,E.C. (1998a) In Istrail,S., Pevzner,P. and Waterman,M. (eds), The Second Annual International Conference on Computational Molecular Biology. ACM, New York, pp. 285292.
Xu,Y., Xu,D. and Uberbacher,E.C. (1998b) J. Comp. Biol. 5, 597614.
Received February 27, 1999; revised June 16, 1999; accepted August 5, 1999.