Laboratory of Molecular Biology, Division of Basic Sciences, National Cancer Institute, National Institutes of Health, 37 Convent Drive, MSC 4255, Bethesda, MD 20892-4255, USA
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: comparison/clustering/protein structure alignment/SHEBA
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Fast methods that employ many different strategies have now been reported. In the fragment assembly methods (Fischer et al., 1992; Holm and Sander, 1993
; Boutonnet et al., 1995
; ZuKang and Sippl, 1996
; Shindyalov and Bourne, 1998
), many or all small fragments from two structures are compared and those that are sufficiently similar are selected and assembled into a larger consistently similar set. Many other methods use a similar strategy but use only the secondary structural elements (Artymiuk et al., 1990
; Rufino and Blundell, 1994
; Madej et al., 1995
; Mizuguchi and Go, 1995
; Alesker et al., 1996
; Alexandrov and Fischer, 1996
; Koch et al., 1996
). In another set of methods, an alignment for the whole structure is obtained and refined, typically using a dynamic programming algorithm (Satow et al., 1986
; Taylor and Orengo, 1989
; Zuker and Somorjai, 1989
; Orengo et al., 1993
; Holm and Sander, 1995
; Suyama et al., 1997
; Gerstein and Levitt, 1998
). A problem in comparing structures that are only remotely similar is that there is no natural, objective way of deciding what is to be considered similar (Mizuguchi and Go, 1995
). Different methods can compare the same two protein structures and conclude differently on their similarity. Even when two structures are judged to be similar, different structural alignments can be produced that appear equally valid (ZuKang and Sippl, 1996
). Therefore, it is desirable to have many different methods available that focus on different features of the structure and use different sets of parameters for similarity comparisons (Madej et al., 1995
).
All of the techniques cited above, except that by Suyama et al. (1997), use geometry alone for the comparison, ignoring sequence homology and similarity of the structural environment of the residues. On the other hand, Suyama et al. ignored the three-dimensional (3D) geometry altogether and compared structures on the basis of (position-sensitive) 3D profiles alone (Bowie et al., 1991), which included residue type-dependent solvent accessibility, hydrogen bonds, local secondary structural states and side-chain packing. The 3D profile matching is a one-dimensional (1D) process, much like the sequencesequence alignment and they use the dynamic programming method (Needleman and Wunsch, 1970
) for a fast alignment. The method produced results that are comparable to those from other methods, but with differences characteristic of the method. For example, two-domain structures, with rather different relative orientation of the two domains, could be aligned by the procedure. On the other hand, alignments are often inaccurate; they can be shifted by a few residues or sometimes by an entire secondary structural element (Suyama et al. 1997
).
In the new method reported here, an initial alignment is generated using a profile method like that of Suyama et al. (1997), although we use generic rather than position-sensitive profiles. But this initial 1D alignment is then refined in a refinement cycle that obtains new alignments based on the 3D superposition of currently aligned residues. The profile used in the initial alignment includes sequence homology, similarity of the secondary structural states and the tertiary environment of each residue and is normalized differently from that used by Suyama et al. (1997). The 3D refinement is made using the dynamic programming method first used by Satow et al. (1986) and by many others since (Zuker and Somorjai, 1989; Rufino and Blundell, 1994
; Holm and Sander, 1995
; Alexandrov and Fischer, 1996
; ZuKang and Sippl, 1996
).
Because the method produces the final alignment on the basis of 3D comparisons, it should normally generate alignments that are similar to those produced by other purely geometry-based methods. On the other hand, if an alternative alignment exists that is chemically more sensible than a purely geometric alignment, this method may find the alternative alignment since it starts from an alignment based on the profiles. As will be described in this paper, we have found an instance of such a case in aligning a family of ferredoxins.
The method (SHEBA, for Structural Homology by Environment-Based Alignment) is fast. Its speed allows comparison of each of the 3539 domains, which represent all sequentially homologous groups in the March 1998 release of PDB, against all others. This is a sufficiently large number of structures that z-scores can be calculated for the alignment of a pair of proteins on the basis of aligning each structure of the pair against all others. We report the results of clustering all known protein structures on the basis of this measure of similarity.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A set of 68 benchmark protein pairs (Fischer and Eisenberg, 1996) was used to obtain parameters for the alignment score function. The two proteins of each pair have <25% sequence homology between them, but are clearly homologous in terms of the three-dimensional structure. The reference structural alignment for each pair of this dataset was obtained in a series of preliminary studies (see below). This dataset, along with the reference alignments, is referred to as the aligned protein pair (APP) database.
Scoring function
The alignment score (a-score) for a pair of proteins was calculated as the sum of scores for each aligned pair of residues. The a-score for a pair of aligned residues, i and i', was computed by
![]() | (1) |
The first three terms in this expression represent scores for the sequence homology, for the similarity of the secondary structures to which the residues belong and for the similarity of the solvent accessibility and polarity of the environment of the residues (Bowie et al., 1991). These are explained below. gp is the opening gap penalty and
ii' = 1 or 0 depending on whether there is or is not, respectively, a gap between residues i and i + 1 or residues i' and i' + 1. A gap was allowed between two residues only when the distance between the C
atoms of the corresponding residues in the other protein of the aligned pair (gap distance) was between 3.8 and 8 Å. This range of values was chosen from the frequency of gap distances observed in the APP database (data not shown). A gap extension penalty was not used. The procedure used to determine the value of gp will be described later. The constant term c was set to unity; when summed over all aligned residue pairs, this term gives the number of aligned residue pairs in the alignment. Inclusion of this term tends to maximize the number of aligned residues.
The values for the Hii', Sii' and Eii' terms were obtained by looking up the appropriate entries in tables of values that were pre-calculated using the APP database. In order to set up these tables, two residue pair lists were made from the APP database. One consisted of all the aligned residue pairs in all the protein pairs in the database. The second list was much larger and consisted of a concatenation of all possible npxnp' residue pairs of each protein pair pp', where np and np' are the number of residues in the two proteins p and p'. These are referred to as the aligned and `random' residue pair lists, respectively. For the dataset used, the total number of aligned and random residue pairs was 9202 and 3 867 377, respectively. All the table values were calculated in a similar manner as the logarithms of the ratio of a normalized frequency (PA) observed in the aligned residue pair list versus the corresponding normalized frequency (PR) observed in the random residue pair list. When PA was <0.001 for any frequency, the ratio was set to 1.
Hii' was set equal to Hii' = H(Ri, Ri'), where Ri and Ri' are the amino acid types of residues i and i', respectively and H(Ri, Ri') is a sequence homology matrix. We used the matrix of Gonnet et al. (1992) for H in some preliminary studies, but the final results reported here were obtained using a new matrix computed from the APP database by
![]() | (2) |
where PA(R, R') and PR(R, R') were the normalized frequencies with which the amino acid types R and R' were found paired in the aligned and random residue pair lists, respectively.
Similarly, Sii' was set to Sii' = S(si, si), where si and si' are the secondary structural states of residues i and i', respectively, and S is the secondary structural homology matrix. The S matrix was computed as
![]() | (3) |
where PA(s, t) and PR(s, t) were the normalized frequencies with which a residue in the secondary structural state s was found paired with a residue in the secondary structural state t in the aligned and random residue pair lists, respectively. The secondary structural states used were the eight states defined by Kabsch and Sander (1983) and were assigned to each residue of the known structures using their DSSP program.
The Eii' term was calculated by
![]() | (4) |
where fai and fai' are the accessible surface area fractions (see below) of residues i and i', respectively, and fpi and fpi' are the polar fractions of the environment (see below) of the residues i and i', respectively (Bowie et al., 1991). The accessible surface area fraction, fa, of a residue is the accessible surface area (Lee and Richards, 1971
) of the residue divided by the maximum accessible surface area for the amino acid type of the residue. This latter area was the maximum observed for the same type of residue in the middle of some 110 000 tripeptides. This tripeptide database was generated from the protein structures selected from the November 1996 release of the EMBL list of representative proteins available through web site http://ftp.embl-heidelberg.de/pub/databases/pdb_select (Hobohm and Sander, 1994
). The algorithm used for the accessible surface area calculation was that by Shrake and Rupley (1973). The area of a residue was calculated as the sum of the areas for the C
and the side-chain atoms.
The polar fraction of the environment, fp, is the fraction of the accessible surface area that is either buried by the polar atoms (N, O) or exposed to the solvent. When a surface patch was buried by both polar and non-polar atoms, it was counted as polar.
The function E was given by
![]() | (5) |
where PA(fa) and PR(
fa) were the normalized frequencies of residue pairs with an fa difference
fa in the aligned and random residue pair lists, respectively, and PA(
fp) and PR(
fp) were defined similarly using the fp difference
fp.
Wh, ws and we in Equation 1 are the relative weights for the corresponding terms. The optimum values for these and for gp were determined by minimizing the alignment shifts. The alignment shift of an alignment (trial alignment) with respect to another (reference alignment) is defined by
![]() | (6) |
where i and i' are a pair of residues which are aligned in the reference alignment, but which are separated by rii' number of residues and gaps in the trial alignment, N is the total number of aligned residue pairs in the reference alignment and the summation is over the N residue pairs. Alignment shifts were calculated after the initial alignment (see the alignment procedure in the next section) for the 68 protein pairs of the APP database. By systematically searching the entire range of possible values in discrete steps, it was found that the overall alignment shift was the smallest when wh, ws, we and gp were set to 0.5, 1.0, 1.5 and 5.0, respectively. Thus, the optimum weight is the largest for the accessibility and polar environment term (we) and smallest for the sequence homology term (wh). Small wh was expected since these pairs of proteins were selected to have low sequence homology between them (Fischer and Eisenberg, 1996
). On the other hand, the optimum wh value is definitely non-zero since setting it to zero gives clearly worse results (data not shown). This indicates that use of the sequence homology is helpful in aligning structures even when they appear to have little sequence homology.
Alignment procedure
The optimum alignment between two protein structures was obtained by an iterative procedure which involved the following three steps: in the first step, the initial alignment was obtained by maximizing the a-score using a fast dynamic programming procedure based on the NeedlemanWunsch algorithm (Needleman and Wunsch, 1970). In the second step, Kabsch's equations (Kabsch, 1976
, 1978
) were used to translate and rotate one structure against the other so as to minimize a weighted mean square distance between the C
atoms of the aligned residues. The weighting scheme used is described in the next paragraph. In the third step, a new alignment was obtained from this pair of superimposed structures by using the NeedlemanWunsch algorithm that maximized the number of residue pairs whose C
distances were <3.5 Å. The gap penalty was found to be unnecessary for this procedure. Steps two and three were repeated. The structure superposition/realignment cycle converged rapidly: for the 68 test proteins, the average number of aligned residues after just three cycles was within 0.2 residue of that after nine cycles (data not shown). The results reported in this paper are those obtained using three cycles of refinement.
It should be noted that the initial step involves only one-dimensional (1D) alignment and ignores three-dimensional (3D) relationships among different structural elements. For some protein pairs, the initial 1D alignment produces an alignment that is incompatible in 3D and which the subsequent 3D1D superposition cycles could not resolve. Therefore, we used five different weighting schemes in the Kabsch superposition step: (1) all residues weighted equally, (2) the N-terminal and C-terminal halves weighted in a 1.0:0.1 ratio, (3) the N- and C-terminal halves weighted 0.1:1.0, (4) the N-terminal quarter, middle half and C-terminal quarter weighted 0.1:1.0:0.1 and (5) the N-terminal quarter, middle half and C-terminal quarter weighted 1.0:0.1:1.0. The final alignment chosen was the best among all five alignments generated by these different Kabsch-step weighting schemes.
The degree of structural similarity between two optimally aligned proteins was measured by means of the number of aligned residue pairs (m-value). A pair of residues was considered aligned if their C atoms were <3.5 Å apart. The m-score was defined as the m-value divided by the size (number of amino acids) of the larger protein.
Reference alignments for the 68 protein pairs
The reference alignments were generated using the alignment procedure described above, but using a scoring function that did not require knowledge of the reference alignment. Specifically, the matrix by Gonnet et al. (1992) was used for H(a, b), S(s, t) was set to 1 if s and t were equal and 0 if not and E(fa,
fp) was set equal to the negative of the Euclidean distance in the two-dimensional space of
fa and
fp. For some protein pairs, this procedure failed and the initial alignment had to be found manually using the in-house graphics program GEMM (Syi and Lee, 1988
). During the refinement of the initial alignment, the three weights and the gap penalty were varied specifically for each protein pair. After alignments were found for all the proteins, the S, H and E scoring matrices were determined and new alignments were obtained using the new scoring function.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The new structure alignment procedure was tested using the 68 protein pairs (Fischer and Eisenberg, 1996) that were used to obtain the score function and the weight and gap penalty parameters. The computer time taken to align a protein pair was <1 s for most of the proteins (Figure 1
). In the reference alignment, the number of aligned residues (m-value) ranged from 26 to 91% of the number of residues of the larger of the two proteins. When the same pairs were aligned using SHEBA, 63 pairs were aligned with m-values within 5% of the reference alignment. The m-values for six other pairs deviated 511% from the reference alignment, but visual inspection clearly indicated that the two alignments were essentially the same for all six pairs. For the remaining two pairs (1aep256bA and 5fd12fxb), SHEBA gave significantly lower match scores than the reference alignment.
|
The other pair (5fdl2fxb) are ferredoxins, for which SHEBA produces another alternative alignment as shown in Figure 2. The 5fd1 protein has two reaction centers. One (Fe3S4) contains three irons and three cysteine residues, as well as four sulfur atoms and the other (Fe4S4) four irons, four cysteine residues and four sulfur atoms, shown respectively at the bottom and top of the 5fd1 structure in Figure 2
. The 2fxb protein has only one reaction center of the Fe4S4 type. In the reference alignment, all four ß-strands and two helices of 2fxb are aligned to the first four ß-strands and two helices of 5fd1 sequentially, resulting in 51 matched residues. However, this aligns the Fe4S4 center of 2fxb with the Fe3S4 center of 5fd1 and matches only three cysteine residues (residues 11, 14 and 61 of 2fxb aligned to the residues 8, 16 and 49 of 5fd1, respectively). In the SHEBA alignment, the Fe4S4 center of 2fxb is aligned to the Fe4S4 center of 5fd1. Again, only three cysteines are aligned (residues 11, 14 and 17 of 2fxb aligned to the residues 39, 42, 45 of 5fd1, respectively), but the spacings between the aligned residues are the same in the two structures and the geometry of the loop that spans these residues is more similar. This alignment matches the N-terminus of 2fxb to the middle of 5fd1 (Figure 2
) and aligns only the first two ß-strands and one helix, producing 34 matched residues. However, the C-terminal half of 2fxb becomes aligned to the N-terminal half of 5fd1 in the three-dimensional structure (Figure 2
). Indeed, if 2fxb molecule is circularly permutated by joining the C- and N-termini and by cutting the protein at a point between the second and third ß-strands (Figure 3
), the number of matched residues increases to 54. This alignment also aligns C61 of 2fxb to C20 of 5fd1, thus matching all four cysteines of the Fe4S4 reaction center of both proteins. We found a number of other ferredoxins that also align to 5fd1 after a similar circular permutation (Table I
). The protein 1blu is particularly notable. It is of similar length to 2fxb and matches 53 residues after the circular permutation, of which 23 (43%) are identical (Figure 4
).
|
|
|
|
The performance of the new procedure was further tested by comparing the alignments it produced with those published using four other structure alignment procedures (Table II). The alignments using SHEBA were essentially the same as those found by Dali (Holm and Sander, 1993
), the MCL method (Boutonnet et al., 1995
), VAST (Gibrat et al., 1996
) and one of the two alternative alignments by ProSup (ZuKang and Sippl, 1996
) in all but three or four cases compared. The exceptional cases are the 2tmvP256bA, 3chy1rcf, 1aba1pbf and 1acx1tnfA pairs, which are described below.
|
The alignment obtained for the 3chy1rcf pair is also an alternative alignment, but of different type. Both structures have an /ß sandwich fold, in which a five-stranded ß-sheet is surrounded by helices 1 and 5 on one side and by helices 2, 3 and 4 on the other side. The protein 1rcf has, in addition, an extended loop after helix 4. The ProSup alignment matches helices 11, 22, 33, 44 and 55 in one alignment and 11, 22, 33 and 55 in the other. In the SHEBA alignment, helices match as 11, 22, 34, 55 and the helix 4 in 3chy occupies the space occupied by the extended loop of 1rcf. Again, it is difficult to tell which of the three alignments is better by visual inspection. The numbers of matched residues are 75, 74 and 82 for the above three alignments, respectively.
For the 1aba1pbf pair, VAST and SHEBA give alignments that appear visually similar, but they were not compared quantitatively because the VAST alignment was not readily available for this pair. SHEBA failed to recognize the 1acx1tnfA pair as being structurally homologous. The failure is understandable because this pair is structurally homologous only when the topological difference is ignored (Holm and Sander, 1993) whereas SHEBA is inherently sensitive to chain connectivity since it uses the dynamic programming algorithm.
All-against-all comparison and clustering
All-against-all protein structure comparisons were made using domains produced by the domain parsing program PUU (Holm and Sander, 1994). The 18 595 protein domains, representing all protein chains of 40 residues or more in the March 1, 1998 release of the Protein Data Bank (PDB), were grouped into 3539 high sequence homology groups (those with at least 50% sequence identity with the representative member of the group). The sequence comparisons made in this process used a fast NeedlemanWunsch algorithm (Needleman and Wunsch, 1970
). The smallest protein was selected as the representative for each group and SHEBA was run for all pairs between these sequence representatives. The degree of structural homology between any two proteins was measured by the m-score, which is defined as the number of structurally aligned residues divided by the number of residues in the larger protein of the two. For clustering, z-scores were used as the distance measure between any two proteins. The z-score zab between protein a and protein b is the m-score between the two, measured in units of the root mean square deviation from the mean of the 3539 m-scores between protein a and all other proteins. We used z-scores rather than m-scores because the random noise background distribution was more uniform among different proteins when z-scores were used than when m-scores were used.
Clustering was done in two stages: main clusters were formed first and then subdivided into sub-clusters. A main cluster is defined as the complete collection of all sequence representatives, all of which are related to all others by a chain of neighbor relations; two proteins from two different main clusters are not related to each other by any chain of neighbor relations. The core of a sub-cluster is defined as the group of proteins, all of which are direct neighbors of each other. A sub-cluster was formed from a core by including the orphans (those which do not belong to any core) that shared more neighbor relations to the core than to any other core.
The number of main clusters and the size of the largest clusters are plotted as a function of the zcutoff value in Figure 5. As expected, all proteins are grouped into a few large clusters when a low zcutoff value is used, whereas almost all proteins form separate clusters when a high zcutoff value is used. In the medium range of zcutoff values, the number of clusters increases continuously with the zcutoff value. However, if single-member clusters are excluded, the number of clusters remains nearly constant in the range of zcutoff values of 4.56. We chose the zcutoff value of 5 mainly because the size of the largest cluster changes quickly in the zcutoff value range 35 and slowly beyond 5. Also, the number of multi-member clusters is nearly at the maximum and rather insensitive to the zcutoff value when the zcutoff value is 5. With this zcutoff value, the number of main clusters is 2043, of which 336 are multi-member clusters.
|
|
|
|
|
|
|
The SHEBA program package can be found at our web site (http://lmbbi.nci.nih.gov). The following additional information is also available at this web site: the scoring matrices H, S and E; the reference and SHEBA alignments for the 68 protein pairs of Fischer and Eisenberg (1996); and the sorted list of the 3539 protein domains with cluster boundaries.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
On the other hand, SHEBA is not overly dependent on sequence homology. An example is the case of the sugar-binding proteins. It is well known (Saper and Quiocho, 1983; Vyas et al., 1991
; Pearl et al., 1994
; Shilton et al., 1996
) that some of these proteins have very low sequence homology to others in the group, yet have closely similar structures. SHEBA correctly groups these proteins into one sub-cluster with rather high m-scores between them (Table V
).
The clustering was done at two levels. A main cluster consists of all proteins that are related to others in the cluster by a chain of neighbor relations. Sub-clusters are formed from core proteins, which are direct neighbors of each other. We did not attempt to build a more elaborate classification scheme and it is difficult to know how these main and sub-clusters relate to the hierarchical schemes such as SCOP (Murzin et al., 1995) and CATH (Orengo et al., 1993
). The number of main clusters varies continuously with the zcutoff value (Figure 5
), but the variation is mainly due to the change in the number of single-member clusters. The number of multi-member clusters is much less sensitive as long as the zcutoff value is >4. When the zcutoff value is set to 5, the total number of main clusters is 2043 and that of multi-member clusters is 336. These numbers are roughly comparable to the 1000 different folds estimated by Chothia (1992), 597 superfamilies and families in the October 1997 release of the SCOP database (Murzin et al., 1995
) and the 1577 sequence families and 949 homologous superfamilies in CATH (Orengo et al., 1993
). The data presented in Figure 5
suggest that one main cause of the variation among these different estimates is the ambiguity in the classification of the single-member clusters. (Single members do not necessarily represent a single experimental structure since each is a representative of a set of structures which have high sequence homology.)
There were 27 781 protein pairs with z-scores >4.0. Of these, 10 379 pairs share <10% sequence identity. These protein structure pairs provide a valuable database, which can be used to study the sequencestructure relations among remotely homologous proteins.
![]() |
Notes |
---|
2 To whom correspondence should be addressed. E-mail: bk{at}nih.gov
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Alexandrov,N.N. and Fischer D. (1996) Proteins, 25, 354365.[ISI][Medline]
Artymiuk,P.J., Rice D.W., Mitchell E.M. and Willett P. (1990) Protein Eng., 4, 3943.[Abstract]
Bork,P., Holm L. and Sander C. (1994) J. Mol. Biol., 242, 309320.[ISI][Medline]
Boutonnet,N.S., Rooman M.J., Ochagavia M.E., Richelle J. and Wodak S.J. (1995) Protein Eng., 8, 647662.[Abstract]
Bowie,J.U., Luthy R. and Eisenberg D. (1991) Science, 253, 164169.[ISI][Medline]
Branden,C. and Tooze J. (1999). Introduction to Protein Structure. Garland Publishing, New York, pp. 299323
Choe,S., Bennett M.J., Fujii G., Curmi P.M., Kantardjieff K.A., Collier R.J. and Eisenberg D. (1992) Nature, 357, 216222.[ISI][Medline]
Chothia,C. (1992) Nature, 357, 543544.[ISI][Medline]
Fischer,D. and Eisenberg D. (1996) Protein Sci., 5, 947955.
Fischer,D., Bachar O., Nussinov R. and Wolfson H. (1992) J. Biomol. Struct. Dyn., 9, 769789.[ISI][Medline]
Gerstein,M. and Levitt M. (1998) Protein Sci., 7, 445456.
Gibrat,J.F., Madej T. and Bryant S.H. (1996) Curr. Opin. Struct. Biol., 6, 377385.[ISI][Medline]
Gonnet,G.H., Cohen M.A. and Benner S.A. (1992) Science, 256, 14431445.[ISI][Medline]
Hobohm,U. and Sander C. (1994) Protein Sci., 3, 522524.
Holm,L. and Sander C. (1993) J. Mol. Biol., 233, 123138.[ISI][Medline]
Holm,L. and Sander C. (1994) Proteins, 19, 256268.[ISI][Medline]
Holm,L. and Sander C. (1995) ISMB, 3, 179187.[Medline]
Holm,L. and Sander C. (1998) Proteins, 33, 8896.[ISI][Medline]
Kabsch,W. (1976) Acta Crystallogr., A32, 922923.[ISI]
Kabsch,W. (1978) Acta Crystallogr., A34, 827828.[ISI]
Kabsch,W. and Sander C. (1983) Biopolymers, 22, 25772637.[ISI][Medline]
Koch,I., Lengauer T. and Wanke E. (1996) J. Comput. Biol., 3, 289306.[ISI][Medline]
Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946950.[ISI]
Lee,B. and Richards F.M. (1971) J. Mol. Biol., 55, 379400.[ISI][Medline]
Madej,T., Gibrat J.F. and Bryant S.H. (1995) Proteins, 23, 356369.[ISI][Medline]
Matthews,B. and Rossmann M. (1985) Methods Enzymol., 115, 397420.[ISI][Medline]
Mizuguchi,K. and Go N. (1995) Protein Eng., 8, 353362.[Abstract]
Murzin,A.G., Brenner S.E., Hubbard T. and Chothia C. (1995) J. Mol. Biol., 247, 546540.
Needleman,S.B. and Wunsch C.D. (1970) J. Mol. Biol., 48, 443453.[ISI][Medline]
Orengo,C.A., Flores T.P., Taylor W.R. and Thornton J.M. (1993) Protein Eng., 6, 485500.[Abstract]
Pearl,L., O'Hara B., Drew R. and Wilson S. (1994) EMBO J., 13, 58105817.[Abstract]
Richardson,J.S. (1981) Adv. Protein Chem., 34, 167339.[Medline]
Rufino,S.D. and Blundell T.L. (1994) J. Comput. Aided Mol. Des., 8, 527.[ISI][Medline]
Saper,M.A. and Quiocho F.A. (1983) J. Biol. Chem., 258, 1105711062.
Satow,Y., Cohen G.H., Padlan E.A. and Davies D.R. (1986) J. Mol. Biol., 190, 593604.[ISI][Medline]
Shilton,B.H., Flocco M.M., Nilsson M. and Mowbray S.L. (1996) J. Mol. Biol., 264, 350363.[ISI][Medline]
Shindyalov,I.N. and Bourne P.E. (1998) Protein Eng., 9, 739747.
Shrake,A. and Rupley J.A. (1973) J. Mol. Biol., 79, 351372.[ISI][Medline]
Suyama,M., Matsuo Y. and Nishikawa K. (1997) J. Mol. Evolut., 44, 163173.
Syi,J.L. and Lee,B. (1988) J. Mol. Graph., 6, 226226.
Taylor,W.R. and Orengo C.A. (1989) J. Mol. Biol., 208, 122.[ISI][Medline]
Vyas,N.K., Vyas M.N. and Quiocho F.A. (1991) J. Biol. Chem., 266, 52265237.
ZuKang,F. and Sippl M.J. (1996) Fold. Des., 26, 451461.
Zuker,M. and Somorjai R.L. (1989) Bull. Math. Biol., 51, 5578.[ISI][Medline]
Received February 1, 2000; revised June 6, 2000; accepted June 10, 2000.