Exploring the conformational diversity of loops on conserved frameworks

Weizhong Li, Shide Liang, Renxiao Wang, Luhua Lai1 and Yuzhen Han

Institute of Physical Chemistry, Peking University, Beijing 100871, China


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Appendix A. Total motif...
 Appendix B. Database of...
 Appendix C. Database of...
 References
 
Loops are structurally variable regions, but the secondary structural elements bracing loops are often conserved. Motifs with similar secondary structures exist in the same and different protein families. In this study, we made an all-PDB-based analysis and produced 495 motif families accessible from the Internet. Every motif family contains some variable loops spanning a common framework (a pair of secondary structures). The diversity of loops and the convergence of frameworks were examined. In addition, we also identified 119 loops with conformational changes in different PDB files. These materials can give some directions for functional loop design and flexible docking.

Keywords: database/loop/loop conformation/loop modeling/loop structure/motif


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Appendix A. Total motif...
 Appendix B. Database of...
 Appendix C. Database of...
 References
 
Proteins are composed of structurally conserved regions and variable regions. Loops, which bridge the regular secondary structure segments, usually show distinct variability in both sequence and structure with regard to {alpha}-helices and ß-sheets as a result of the long-term molecular evolution. The diversity of loops makes a large contribution to the multiformity of protein functions, but also causes difficulties in protein structure studies. Therefore, loop modeling has been one of the `hot' topics in research on protein structures and functions.

Loop modeling methods can be divided into knowledge-based analysis (Jones and Thirup, 1986Go; Blundell et al., 1988Go; Topham et al., 1993Go; Rufino et al., 1997Go) and ab initio computation or conformation searching (Fine et al., 1986Go; Bruccoleri and Karplus, 1987Go; Higo et al., 1992Go; Mas et al., 1992Go; Collura et al., 1993Go; Abagyan et al., 1994Go; Fidelis et al., 1994Go; Zheng and Kyle, 1996Go; Zhang et al., 1997Go). The early stage of statistical analysis found and classified the conserved ß-turns and ß-hairpins (Venkatachalam, 1968Go; Richardson, 1981Go; Sibanda and Thornton, 1985Go; Milner and Poet, 1986Go; Milner-White and Poet, 1987Go; Sibanda et al., 1989Go). Subsequently the analyses of {alpha}{alpha}-hairpins (Efimov, 1991Go; Wintjens et al., 1996Go), general loops (Leszczynski and Rose, 1986Go; Ring et al., 1992Go) and long loops of >10 residues (Martin et al., 1995Go) were published. Large-scale loop databases were built to meet the needs of homology modeling and protein design (Donate et al., 1996Go; Kwasigroch et al., 1996Go; Oliva et al., 1997Go; Rufino et al., 1997Go; van Vlijmen et al., 1997). In these studies, loops from the Protein Data Bank (PDB) were classified according to loop length, type of anchoring secondary structures, geometric parameters and sequence features. For example, Donate et al. (1996) composed a loop database containing 161 conformational classes from 223 proteins and domains. Their database contained not only the general conformational characteristics but also the sequence preference and geometric plasticity of the entire loop family. This database was then used by Rufino et al. (1997) to improve comparative modeling.

In homologous protein families, structural difference mainly occurs in loops rather than regular secondary structure frameworks. This is also common in some fragments or motifs between sequence-unrelated proteins. The motivation for the present study was to collect such structurally diverse information on loops connecting common secondary structures. This means that loops may have different conformations, but their anchoring secondary structures must be well superimposed. Here, three types of structural variability will be examined considering the source of the loop: loops from different protein families, loops from homologous families and loops from identical proteins but different PDB files. To meet these goals and provide more information, we adopted an all-PDB-based algorithm, a method searching all the PDB files even including mutant structures and proteins binding with different ligands.

Considering the functional protein design, grafting a functional loop on to another known framework is a widely used method. Compared with single or multiple mutation, it is more flexible. However, the success of grafting depends on the consistency between the framework of the template and that of the target. Therefore, a database containing superimposed frameworks can help in making a selection of template functional loops in grafting.

In a database from our previous work (Li et al., 1999Go), loops of variable conformations spanning structurally similar frameworks were inspected; 84 motif families were identified and the relationship between loop sequences and loop conformations was examined. We found 43 new loop conformation classes according to the classical loop classification of Donate et al. (1996). However, in that study, only loops with the same length and different sequence were taken into account. Here we removed this restriction so that loops with different length and the loops with same sequence were considered as long as they had the same anchoring secondary structures. We also used a larger protein structure database. Hence the present study provided more information.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Appendix A. Total motif...
 Appendix B. Database of...
 Appendix C. Database of...
 References
 
A loop and its bounding secondary structure elements together are considered as a structure motif, namely an `[H or E]-LOOP-[H or E]' segment. Here H and E, computed by DSSP (Kabsch and Sander, 1983Go), represent an {alpha}-helix and a ß-strand longer than or equal to four and two residues, respectively. H or E on both sides of the motif together is called the framework. Hence the motifs can be categorized into four types, namely EE, EH, HE and HH.

The comparison of two motifs can be divided into comparisons of loops and of frameworks. The aim of present study was to build a database of motif families containing similar frameworks but variable loops. If the fitted root mean square deviation (r.m.s.d.) of frameworks is less than the cut-off (1.0 Å), the two motifs are considered to be in one motif family. When comparing loop conformers in one motif family, two criteria, Cartesian and torsional differences, are applied. The former is the r.m.s.d. of loop backbone heavy atoms when the frameworks are fitted and superimposed. The latter is computed considering all the main-chain torsion angles, {phi}, {Psi} and even {omega} (to distinguish cis and trans {omega}):

where N is the length of the loop. All |{Delta}{phi}| are between 0 and 180° applying {Delta}{phi} = 360° – |{Delta}{phi}| when |{Delta}{phi}| is >180°. The rule is also valid for {Delta}{varphi} and {Delta}{omega}. An r.m.s.d. of 2.0 Å or r.m.s.d.torsion of 60° is selected as the cut-off value. Two loops, whose difference is greater than either cut-off, are treated as different conformers in a motif family. Several cut-offs are tried and the final values are decided by visual inspection of motifs.

The all-DB-based searching algorithm is time consuming owing to the huge size of PDB. One feasible approach is to group the proteins and domains into fold families first and then search each family. There have already been several well-known protein fold databases such as FSSP (Holm et al., 1992Go; Holm and Sander, 1993Go, 1994Go), CATH (Orengo et al., 1997Go) and SCOP (Murzin et al., 1995Go). Among these, FSSP is most suitable for our work, because this database is totally generated by a computer program, and other programs can easily read the format of FSSP.

The FSSP database used in this study contains 1172 fold files covering 9157 chains in the PDB. Every fold file provides the detailed parameters for the residue to residue superimposition of a representative PDB chain and all its structural neighbors. Therefore, the loops connecting similar secondary structures can be derived from these fold files. Our loop-seeking process comprises seven steps:

  1. Identifying superimposed motifs. The FSSP fold file is a multiple 3D alignment of PDB chains. Hence the first step is just to cut the aligned motifs from each fold file. Depending on the length of aligned sequence in each fold file, various numbers of aligned motif families are extracted. Each family must contain the same type of motif. Also, the length of the framework must be longer than or equal to four and two residues with respect to {alpha}-helix and ß-strand.
  2. Grouping similar motif families. Considering two dissimilar representative protein chains A and B, they can both be superimposed with protein chain C. Hence two motif families extracted from fold file A and B may comprise exactly the same motif from protein chain C. If this happens, the two motif families bridged by C are grouped into a larger family. This operation can bring motifs from unrelated proteins together.
  3. Since loops are the most error prone regions in X-ray crystallography, proteins with worse resolution will bring more noise rather than information. Hence the structures with resolution worse than 2.5 Å are excluded from this study. Loops longer than 12 residues or shorter than three residues are also omitted, because long loops are sparsely populated and we do not intend to include the shorter linkers such as conserved ß-turns, which had been clearly studied.
  4. Several optional operations can be performed at this step according to the nature of the final database. (a) Deleting redundant motif sequences, assuming that the same motif sequence will have the same structure (this is not correct and will be discussed in this paper). For duplicated motif sequences, only the one with the best resolution is kept. This manipulation generates the generic database in this paper. (b) Splitting the motif families so that each contains only motifs from homologous proteins, and then deleting the redundant motif sequence. (c) Splitting the motif family into some same-length loop families and deleting redundant motif sequences. (d) When focusing the conformational change of the same loop as opposite to (a), only the motifs of the same sequence are put into a family.
  5. Delete the motifs without significant structural variability. A large r.m.s.d.torsion does not result in a large r.m.s.d. in Cartesian space in many cases, but a large r.m.s.d. usually means a large r.m.s.d.torsion. We denote the diversity of loop structure as two types, `T' and `C', which stand for torsional and Cartesian space variability of peptide backbone conformation. Here, `C' covers `T' in most cases. The r.m.s.d.torsion values are computed pairwise if loop lengths are identical. If the r.m.s.d.torsion of two loops is <60°, the one with the worse resolution is eliminated until all the loops have different conformations in torsional space. Then the r.m.s.d.s are computed again and restored.
  6. To ensure all the frameworks are superimposed within a cut-off (1.0 Å used in this paper), the r.m.s.d.s of frameworks are computed pairwise. Then the motifs with neighboring frameworks are clustered into families using the average linkage clustering algorithm.
  7. The families with only one motif, which cannot demonstrate structural diversity, are ignored so that every family contains at least two conformationally different loops. Concerning the pairwise r.m.s.d.s, if a family contains significant Cartesian variablity, it is marked as a `C' family. In addition, a family containing different loop lengths is also considered as `C'. All other families are marked as `T'. Although type `T' families do not show important variability in backbone atoms, owing to the substantial torsional difference the side chains which are more important to protein functions will exhibit considerable variability, so they all remain `T' families.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Appendix A. Total motif...
 Appendix B. Database of...
 Appendix C. Database of...
 References
 
Content of the database

We use a simple, unique code to define every motif. This code is a string such as EE1osp-H_99_9 made up of four or five parts: types of the secondary structures (EE EH HE or HH), PDB code, chain identifier if applicable, beginning residue number of loop in the PDB file and loop length.

The motif families in Figure 1Go will be helpful in understanding the content of the database and the meaning of diversity of loops. This database is an assembly of such motif families having overlapped secondary structure elements and variable loops irrespective of whether the loops have an equal length or come from same protein family.



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 1. Some motif families. The C{alpha} Atoms in loops are drawn as small balls. (a) Family EE72 has six motifs with loop length 4–11. (b) Family EH10 contains six motifs, the loops have four or five residues. (c) Family HE61 has three motifs with 7–9 residues in the loop. (d) There are three 8–9-residue loops in family HH42.

 
Following our algorithm, 495 motif families are found from the FSSP and PDB. Some features are summarized in Table IGo. The populations of the motif families are from 2 to 12, with an average just above 2. In Table IIGo the number of motif families of different populations is shown. During the database generation, the single-member families are omitted because they cannot show significance of structural variability. However, they may turn to a variable family with the extension of the PDB size. As shown in Table IIGo, the number of single-member families is as high as 4704. Therefore, this database is readily extendable depending on the growth of the PDB.


View this table:
[in this window]
[in a new window]
 
Table I. Some features of the motif database
 

View this table:
[in this window]
[in a new window]
 
Table II. Distribution of motif families with different conformers
 
This database covers information not only about the variability of loops but also the stability of frameworks. It is clear that the total number of protein folds is limited and many proteins are composed of similar motif as structural elements. The motifs can appear unrelated in sequence and even topologically different proteins. This database provides many cases where different proteins share the same frameworks. For example, family EH41 contains five motifs, EH1ako_110_12, EH1pud_202_6, EH2tmd-A_672_9, EH3lad-A_316_10 and EH5rub-A_165_8, but these five motifs came from four completely different fold types according to the scop classification (Murzin et al., 1995Go):

1ako DNase I-like

1pud beta/alpha (TIM)-barrel

2tmd-A(646–729) A nucleotide-binding domain

3lad-A(278–348) FAD/NAD(P)-binding domain

5rub-A(138–157) beta/alpha (TIM)-barrel

Therefore, this motif database can also be considered as a structural module library providing basic templetes to build bulk protein. In addition, the modules have remarkable flexibility in loop regions.

Hypervariable loops in antibody

The perfect antigen-binding ability of immunoglobulins depends on the nature of their architecture. The variability of only six hypervariable loops enables them to adhere to countless antigens. This is a good example to illustrate the diversity of loops. Thus much effort has been devoted to studying the interesting structures of hypervariable loops such as conformation classification (Chothia and Lesk, 1987Go; Chothia et al., 1989Go; Martin and Thornton, 1996Go; Al-Lazikani et al., 1997Go) and modeling (Martin et al., 1989Go; Vasmatzis et al., 1994Go; Pellequer and Chen, 1997Go).

Here, we do not attempt to reproduce the canonical classification of hypervariable loops, and our strict algorithm is not suitable to find all of them. First, the loops with worse resolutions and those longer than 12 residues are excluded. Second, the motifs with unmatched DSSP definition in frameworks or where their frameworks cannot be overlapped in 1 Å are forbidden to be grouped into a family. Third, if a non-antibody loop and a hypervariable loop have same structure and the latter has worse resolution, the hypervariable loop will be eliminated.

Although we omit some hypervariable loops, our method brings other valuable results. There are many cases where antibody and other proteins share same motif framework. For example, motif family EE2 contains three H3 hypervariable loops and three loops from other proteins:


Loop sequences are boxed. The loop, which is defined by DSSP, may be different from the classical hypervariable loop definition.

The structures of the six loops are shown in Figure 2Go. This suggests that the non-antibody loops can be considered as a supplement of hypervariable loops and they may be grafted on to the frameworks of immunoglobulins in antibody engineering.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 2. Motif family EE2 contains three antibody hypervariable H3 loops and three loops from other proteins. The C{alpha} atoms are drawn as circles and the H3 loops are drawn as thick lines.

 
Homologous loops

If the algorithm is followed as option (b) in step 4 (see Materials and methods) so that only the motifs from homologous proteins (with sequence identity >30%) are grouped, another motif sub-database is derived. The structure and content of the new sub-database are similar to those of the original database. This sub-database comprises 180 motif families (109 EE families, 27 EH families, 25 HE families and 19 HH families), and covers 393 motifs (244 EE motifs, 60 EH motifs, 51 HE motifs and 38 HH motifs). It is useful if we only consider the variability of loops in some specific protein families.

Loops of equal length

As in the previous section, only the motifs with the same length are grouped [see step 4 (c) above]. We produced one more motif sub-database; it is similar to but much larger than our previous database (Li et al., 1999Go) owing to the improved method. In our early work, we found 84 motif families with length from 2 to 12. Every motif family in that database must contain more than five loops whether or not they have the same conformation. The current search produced a total of 177 motif families (112 EE families, 24 EH families, 27 HE motifs and 14 HH motifs). Every motif family must have at least two loops of different conformation.

If the loops are of the same length, then it is possible to determine the relationship between loop structures and loop sequences. Our previous study demonstrated that only in a few cases (24 out of 84 families) did similar loop sequences or motif sequences result in similar three-dimensional structures. We obtained a similar conclusion with this sub-database. In addition, an interesting result concerning the correlation between number of loop conformers and length is derived from this sub-database. One might expect that families with longer loops might have more conformations, but this was not so. The average number of conformers of motif families with loop length 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12 are 2.1, 2.2, 2.2, 2.2, 2.1, 2.3, 2.1, 2.2, 2.0 and 2.1, respectively. Hence the length has a limited impact on the variability of the conformation of loops when both ends are anchored on a common framework.

Same loop with different conformation

Loops are often involved and sometimes essential in molecular recognition processes. Many theoretical simulation methods, usually called docking, have been applied to predict the binding between receptor and ligand. However, in various docking programs, the conformational flexibility is still unsolved. In recent years, some docking algorithms have been able to simulate partly flexible docking by tolerating the van der Waals bump, or rotating rotatable bonds of the ligand and of protein side chains (Rosenfeld et al. 1995Go; Desmet et al. 1997Go; Jones et al. 1997Go). Nevertheless, the programs can hardly deal with the conformational change of the protein backbone in receptor–ligand association.

In this study, we attempted to make a survey of the structural changes of loops in different environments. This might give some suggestions regarding the flexible docking. In order to derive this information, step 4 of the algorithm described in Materials and methods is manipulated using option (d). In this step, the motifs having the same loop sequence are grouped into a smaller motif family. Hence the final result comprises only the conformational change of loops. This study found 119 motif families (72 EE, 25 EH, 13 HE and 9 HH) with remarkable conformational difference in the loops. All these loops and their sequences are listed in Table IIIGo. Most of these loops have only two different conformations. Six loops show three kinds of conformations and two loops exhibit four different conformations.


View this table:
[in this window]
[in a new window]
 
Table III. Loops with variable conformations
 
Here we used the torsional difference to compare two conformations of loops in the algorithm, so all these 119 loops have obvious variability regarding the backbone torsion angle. Large torsional differences do not correspond to the same difference in Cartesian coordinates in many cases. Among these 119 loops, 26 show significant changes in Cartesian space, and they are marked with a `C' in the column after the sequence in Table IIIGo. The other 93 loops, only exhibiting variability in torsional space, are marked with a `T'. Examples of these two types of variable loop are shown in Figure 3Go.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 3. Two kinds of variable loops. (a) Family HE11 has two loops, HE1phr_105_7 and HE1pnt_105_7; their loop sequence is QVKNCRA. The conformations of the two loops differ only in the backbone torsion angles. (b) The two loops in family EE11, EE1bfp_156_4 and EE1ema_156_4, extend out in completely different directions. The sequence is KQKN.

 
One important reason for the notable conformational change of loops is that they bind different ligands. A typical example is family EE60 composed of EE1rst-B_45_8, EE1sle-D_45_8, EE1slg-D_45_8 and EE1sri-A_45_8. This loop, which is from streptavidin, has four conformations (Figure 4Go). They bind with three different peptides (AWRHPQPYY, Ac-CHPQGPPC-NH2, and FSHPQNT) and a small molecule (3',5'-dimethyl-Haba), respectively. From Figure 4Go, we can see that the loop is isolated in space and has little interaction with other segments of the protein.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 4. Ligand-induced conformational change of loop in family EE60. (a) Superposition of four motifs in EE60, which is composed of EE1rst-B_45_8 EE1sle-D_45_8 EE1slg-D_45_8 and EE1sri-A_45_8. The C{alpha} atoms in loops are drawn as small balls. (b) The superimposition of the four protein chains demonstrates that the loops form one side and lids of binding pockets. (c)–(f) The loops bind with different ligands.

 
Although over 100 flexible loops were identified in this study, we also found over 6000 loops without any conformational changes in different PDB entries. The loops with conformational changes are very sparse. Hence we can estimate that ligand–acceptor binding involving backbone changes is also rare. This should be good news for the flexible docking programmer!

Hints in protein design

The motif family database derived in this study has illustrated the structural diversity of loops and the convergence of frameworks. In order to design or modify functional proteins, it is often necessary to shift the functional loops on to a target framework. This database may give some directions for this procedure. As we mentioned in the section on hypervariable loops, for the target framework, on which functional loops are to be mounted, if we can find similar frameworks in this motif family then all the loops in that family can be considered to be grafted on the target framework or be used as a template.

This database along with the database of homologous loops and of equal-length loops are freely available on the Web (http://www.ipc.pku.edu.cn/~liwz/motif.html).


    Appendix A. Total motif families in the database
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Appendix A. Total motif...
 Appendix B. Database of...
 Appendix C. Database of...
 References
 


View this table:
[in this window]
[in a new window]
 
EE motif
 

View this table:
[in this window]
[in a new window]
 
EH motif
 

View this table:
[in this window]
[in a new window]
 
HE motif
 

View this table:
[in this window]
[in a new window]
 
HH motif
 

    Appendix B. Database of homologous motif families
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Appendix A. Total motif...
 Appendix B. Database of...
 Appendix C. Database of...
 References
 


View this table:
[in this window]
[in a new window]
 
EE motifs
 

View this table:
[in this window]
[in a new window]
 
EH motifs
 

View this table:
[in this window]
[in a new window]
 
HE motifs
 

View this table:
[in this window]
[in a new window]
 
HH motifs
 

    Appendix C. Database of motif families with the same loop length
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Appendix A. Total motif...
 Appendix B. Database of...
 Appendix C. Database of...
 References
 


View this table:
[in this window]
[in a new window]
 
EE motifs
 

View this table:
[in this window]
[in a new window]
 
EH motifs
 

View this table:
[in this window]
[in a new window]
 
HE motifs
 

View this table:
[in this window]
[in a new window]
 
HH motifs
 


    Acknowledgments
 
Project 29703001 was supported by the NSFC. Project 863-103-13-03-05 was supported by the Chinese National High Technology Development Program.


    Notes
 
1 To whom correspondence should be addressed Back


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Appendix A. Total motif...
 Appendix B. Database of...
 Appendix C. Database of...
 References
 
Abagyan,R.A., Totrov M.M. and Kuznetsov,D.A., (1994) J. Comput. Chem., 15, 488–506.[ISI]

Al-Lazikani,B., Lesk,A.M. and Chothia,C. (1997) J. Mol. Biol., 273, 927–948.[ISI][Medline]

Blundell,T., Carney,D., Gardner,S., Hayes,F., Howlin,B., Hubbard,T., Overington,J., Sigh,O.A., Sibanda,B.L. and Sutcliffe,M. (1988). Eur. J. Biochem., 172, 513–520.[Abstract]

Bruccoleri,R.E. and Karplus,M. (1987) Biopolymers, 26, 137–168.[ISI][Medline]

Chothia,C. and Lesk,A.M. (1987) J. Mol. Biol., 196, 901–917.[ISI][Medline]

Chothia,C. et al. (1989) Nature, 342, 877–883.[ISI][Medline]

Collura,V., Higo,J. and Garnier,J. (1993). Protein Sci., 2, 1502–1510.[Abstract/Free Full Text]

Desmet,J., Wilson,I.A., Joniau,M., De Maeyer,M. and Lasters,I. (1997) FASEB J., 11, 164–172.[Abstract/Free Full Text]

Donate,L.E., Rufino,S.D., Canard,L.H.J. and Blundell,T.L. (1996) Protein Sci., 5, 2600–2616.[Abstract/Free Full Text]

Efimov,A.V. (1991) Protein Engng, 4, 245–250.[Abstract]

Fidelis,K., Stern,P.S., Bacon,D. and Moult,J. (1994) Protein Engng, 7, 953–960.[Abstract]

Fine,R.M., Wang,H., Shenkin,P.S., Yarmush,D.L. and Levinthal,C. (1986) Proteins, 1, 342–362.[Medline]

Higo,J., Collura,V. and Garnier,J. (1992). Biopolymers, 32, 33–43.[ISI][Medline]

Holm,L. and Sander,C. (1993). J. Mol. Biol., 233, 123–138.[ISI][Medline]

Holm,L. and Sander,C. (1994). Nucleic Acids Res., 22, 3600–3609.[Abstract]

Holm,L., Ouzounis,C., Sander,C., Tuparev,G. and Vriend,G. (1992). Protein Sci., 1, 1691–1698.[Abstract/Free Full Text]

Jones,T.A. and Thirup,T. (1986). EMBO J., 5, 819–822.[Abstract]

Jones,G., Willett,P., Glen,R.C., Leach,A.R., and Taylor,R. (1997) J. Mol. Biol., 267, 727–748.[ISI][Medline]

Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577–2637.[ISI][Medline]

Kwasigroch,J., Chomilier,J. and Mornon,J. (1996) J. Mol. Biol., 259, 855–872.[ISI][Medline]

Leszczynski,J.F. and Rose,G.D. (1986) Science, 234, 849–855.[ISI][Medline]

Li,W., Liu,Z. and Lai,L. (1999) Biopolymers, 49, 481–495.[ISI][Medline]

Martin,A.C.R. and Thornton,J.M. (1996) J. Mol. Biol., 263, 800–815.[ISI][Medline]

Martin,A.C.R., Cheetham,J.C. and Rees,A.R. (1989) Proc. Natl Acad. Sci. USA, 86, 9268–9272.[Abstract]

Martin,A.C.R., Toda,K., Stirk,H.J. and Thornton,J.M. (1995) Protein Engng, 8, 1093–1101.[Abstract]

Mas,M.T., Smith,K.C., Yarmush,D.L., Aisaka,K. and Fine,R.M. (1992) Proteins: Struct. Funct. Genet., 14, 483–498.[ISI][Medline]

Milner,W.E. and Poet,R. (1986) Biochem. J., 240, 289–292.[ISI][Medline]

Milner-White,E.J. and Poet,R. (1987) Trends Biochem. Sci., 12, 189–192.[ISI]

Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536–540.[ISI][Medline]

Oliva,B., Bates,P.A., Querol,E., Avilés,F.X. and Sternberg,M.J.E. (1997) J. Mol. Biol., 266, 814–830.[ISI][Medline]

Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 1093–1108.[ISI][Medline]

Pellequer,J.L. and Chen,S.W. (1997) Biophys J., 73, 2359–2375.[Abstract]

Richardson,J.S. (1981) Adv. Protein Chem., 34, 167–339.[Medline]

Ring,C.S., Kneller,D.G., Langridge,R. and Cohen,F.E. (1992) J. Mol. Biol., 224, 685–699.[ISI][Medline]

Rosenfeld, R,, Vajda,S. and DeLisi,C. (1995) Annu. Rev. Biophys. Biomol. Struct., 24, 677–700.[ISI][Medline]

Rufino,S.D., Donate,L.E., Canard L.H.J. and Blundell,T.L. (1997) J. Mol. Biol., 267, 352–367.[ISI][Medline]

Sibanda,B.L. and Thornton,J.M. (1985) Nature, 316, 170–174.[ISI][Medline]

Sibanda,B.L., Blundell,T.L. and Thornton,J.M. (1989) J. Mol. Biol., 206, 759–777.[ISI][Medline]

Topham,C., McLeod,A., Eisenmenger,F., Overington,J.P., Johnson,M.S. and Blundell,T.L. (1993) J. Mol. Biol., 229, 194–220.[ISI][Medline]

van Vlijmen,H.W.T. and Karplus,M. (1997) J. Mol. Biol., 267, 975–1001.[ISI][Medline]

Vasmatzis,G., Brower,R., and Delisi,C., (1994) Biopolymers, 34, 1669–1680.[ISI][Medline]

Venkatachalam,C.M. (1968) Biopolymers, 6, 1425–1436.[ISI][Medline]

Wintjens,R.T., Rooman,M.J. and Wodak,S.J. (1996) J. Mol. Biol., 255, 235–253.[ISI][Medline]

Zhang,H., Lai,L., Wang,L., Han,Y. and Tang,Y. (1997) Biopolymers, 41, 61–72.[ISI]

Zheng,Q. and Kyle,D.J. (1996). Proteins: Struct. Funct. Genet., 24, 209–221.[ISI][Medline]

Received January 14, 1999; revised August 19, 1999; accepted September 23, 1999.