Institute of Physical Chemistry, Peking University, Beijing 100871, China
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: database/loop/loop conformation/loop modeling/loop structure/motif
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Loop modeling methods can be divided into knowledge-based analysis (Jones and Thirup, 1986; Blundell et al., 1988
; Topham et al., 1993
; Rufino et al., 1997
) and ab initio computation or conformation searching (Fine et al., 1986
; Bruccoleri and Karplus, 1987
; Higo et al., 1992
; Mas et al., 1992
; Collura et al., 1993
; Abagyan et al., 1994
; Fidelis et al., 1994
; Zheng and Kyle, 1996
; Zhang et al., 1997
). The early stage of statistical analysis found and classified the conserved ß-turns and ß-hairpins (Venkatachalam, 1968
; Richardson, 1981
; Sibanda and Thornton, 1985
; Milner and Poet, 1986
; Milner-White and Poet, 1987
; Sibanda et al., 1989
). Subsequently the analyses of
-hairpins (Efimov, 1991
; Wintjens et al., 1996
), general loops (Leszczynski and Rose, 1986
; Ring et al., 1992
) and long loops of >10 residues (Martin et al., 1995
) were published. Large-scale loop databases were built to meet the needs of homology modeling and protein design (Donate et al., 1996
; Kwasigroch et al., 1996
; Oliva et al., 1997
; Rufino et al., 1997
; van Vlijmen et al., 1997). In these studies, loops from the Protein Data Bank (PDB) were classified according to loop length, type of anchoring secondary structures, geometric parameters and sequence features. For example, Donate et al. (1996) composed a loop database containing 161 conformational classes from 223 proteins and domains. Their database contained not only the general conformational characteristics but also the sequence preference and geometric plasticity of the entire loop family. This database was then used by Rufino et al. (1997) to improve comparative modeling.
In homologous protein families, structural difference mainly occurs in loops rather than regular secondary structure frameworks. This is also common in some fragments or motifs between sequence-unrelated proteins. The motivation for the present study was to collect such structurally diverse information on loops connecting common secondary structures. This means that loops may have different conformations, but their anchoring secondary structures must be well superimposed. Here, three types of structural variability will be examined considering the source of the loop: loops from different protein families, loops from homologous families and loops from identical proteins but different PDB files. To meet these goals and provide more information, we adopted an all-PDB-based algorithm, a method searching all the PDB files even including mutant structures and proteins binding with different ligands.
Considering the functional protein design, grafting a functional loop on to another known framework is a widely used method. Compared with single or multiple mutation, it is more flexible. However, the success of grafting depends on the consistency between the framework of the template and that of the target. Therefore, a database containing superimposed frameworks can help in making a selection of template functional loops in grafting.
In a database from our previous work (Li et al., 1999), loops of variable conformations spanning structurally similar frameworks were inspected; 84 motif families were identified and the relationship between loop sequences and loop conformations was examined. We found 43 new loop conformation classes according to the classical loop classification of Donate et al. (1996). However, in that study, only loops with the same length and different sequence were taken into account. Here we removed this restriction so that loops with different length and the loops with same sequence were considered as long as they had the same anchoring secondary structures. We also used a larger protein structure database. Hence the present study provided more information.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The comparison of two motifs can be divided into comparisons of loops and of frameworks. The aim of present study was to build a database of motif families containing similar frameworks but variable loops. If the fitted root mean square deviation (r.m.s.d.) of frameworks is less than the cut-off (1.0 Å), the two motifs are considered to be in one motif family. When comparing loop conformers in one motif family, two criteria, Cartesian and torsional differences, are applied. The former is the r.m.s.d. of loop backbone heavy atoms when the frameworks are fitted and superimposed. The latter is computed considering all the main-chain torsion angles, ,
and even
(to distinguish cis and trans
):
|
The all-DB-based searching algorithm is time consuming owing to the huge size of PDB. One feasible approach is to group the proteins and domains into fold families first and then search each family. There have already been several well-known protein fold databases such as FSSP (Holm et al., 1992; Holm and Sander, 1993
, 1994
), CATH (Orengo et al., 1997
) and SCOP (Murzin et al., 1995
). Among these, FSSP is most suitable for our work, because this database is totally generated by a computer program, and other programs can easily read the format of FSSP.
The FSSP database used in this study contains 1172 fold files covering 9157 chains in the PDB. Every fold file provides the detailed parameters for the residue to residue superimposition of a representative PDB chain and all its structural neighbors. Therefore, the loops connecting similar secondary structures can be derived from these fold files. Our loop-seeking process comprises seven steps:
![]() |
Results and discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We use a simple, unique code to define every motif. This code is a string such as EE1osp-H_99_9 made up of four or five parts: types of the secondary structures (EE EH HE or HH), PDB code, chain identifier if applicable, beginning residue number of loop in the PDB file and loop length.
The motif families in Figure 1 will be helpful in understanding the content of the database and the meaning of diversity of loops. This database is an assembly of such motif families having overlapped secondary structure elements and variable loops irrespective of whether the loops have an equal length or come from same protein family.
|
|
|
1ako DNase I-like
1pud beta/alpha (TIM)-barrel
2tmd-A(646729) A nucleotide-binding domain
3lad-A(278348) FAD/NAD(P)-binding domain
5rub-A(138157) beta/alpha (TIM)-barrel
Therefore, this motif database can also be considered as a structural module library providing basic templetes to build bulk protein. In addition, the modules have remarkable flexibility in loop regions.
Hypervariable loops in antibody
The perfect antigen-binding ability of immunoglobulins depends on the nature of their architecture. The variability of only six hypervariable loops enables them to adhere to countless antigens. This is a good example to illustrate the diversity of loops. Thus much effort has been devoted to studying the interesting structures of hypervariable loops such as conformation classification (Chothia and Lesk, 1987; Chothia et al., 1989
; Martin and Thornton, 1996
; Al-Lazikani et al., 1997
) and modeling (Martin et al., 1989
; Vasmatzis et al., 1994
; Pellequer and Chen, 1997
).
Here, we do not attempt to reproduce the canonical classification of hypervariable loops, and our strict algorithm is not suitable to find all of them. First, the loops with worse resolutions and those longer than 12 residues are excluded. Second, the motifs with unmatched DSSP definition in frameworks or where their frameworks cannot be overlapped in 1 Å are forbidden to be grouped into a family. Third, if a non-antibody loop and a hypervariable loop have same structure and the latter has worse resolution, the hypervariable loop will be eliminated.
Although we omit some hypervariable loops, our method brings other valuable results. There are many cases where antibody and other proteins share same motif framework. For example, motif family EE2 contains three H3 hypervariable loops and three loops from other proteins:
|
Loop sequences are boxed. The loop, which is defined by DSSP, may be different from the classical hypervariable loop definition.
The structures of the six loops are shown in Figure 2. This suggests that the non-antibody loops can be considered as a supplement of hypervariable loops and they may be grafted on to the frameworks of immunoglobulins in antibody engineering.
|
If the algorithm is followed as option (b) in step 4 (see Materials and methods) so that only the motifs from homologous proteins (with sequence identity >30%) are grouped, another motif sub-database is derived. The structure and content of the new sub-database are similar to those of the original database. This sub-database comprises 180 motif families (109 EE families, 27 EH families, 25 HE families and 19 HH families), and covers 393 motifs (244 EE motifs, 60 EH motifs, 51 HE motifs and 38 HH motifs). It is useful if we only consider the variability of loops in some specific protein families.
Loops of equal length
As in the previous section, only the motifs with the same length are grouped [see step 4 (c) above]. We produced one more motif sub-database; it is similar to but much larger than our previous database (Li et al., 1999) owing to the improved method. In our early work, we found 84 motif families with length from 2 to 12. Every motif family in that database must contain more than five loops whether or not they have the same conformation. The current search produced a total of 177 motif families (112 EE families, 24 EH families, 27 HE motifs and 14 HH motifs). Every motif family must have at least two loops of different conformation.
If the loops are of the same length, then it is possible to determine the relationship between loop structures and loop sequences. Our previous study demonstrated that only in a few cases (24 out of 84 families) did similar loop sequences or motif sequences result in similar three-dimensional structures. We obtained a similar conclusion with this sub-database. In addition, an interesting result concerning the correlation between number of loop conformers and length is derived from this sub-database. One might expect that families with longer loops might have more conformations, but this was not so. The average number of conformers of motif families with loop length 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12 are 2.1, 2.2, 2.2, 2.2, 2.1, 2.3, 2.1, 2.2, 2.0 and 2.1, respectively. Hence the length has a limited impact on the variability of the conformation of loops when both ends are anchored on a common framework.
Same loop with different conformation
Loops are often involved and sometimes essential in molecular recognition processes. Many theoretical simulation methods, usually called docking, have been applied to predict the binding between receptor and ligand. However, in various docking programs, the conformational flexibility is still unsolved. In recent years, some docking algorithms have been able to simulate partly flexible docking by tolerating the van der Waals bump, or rotating rotatable bonds of the ligand and of protein side chains (Rosenfeld et al. 1995; Desmet et al. 1997
; Jones et al. 1997
). Nevertheless, the programs can hardly deal with the conformational change of the protein backbone in receptorligand association.
In this study, we attempted to make a survey of the structural changes of loops in different environments. This might give some suggestions regarding the flexible docking. In order to derive this information, step 4 of the algorithm described in Materials and methods is manipulated using option (d). In this step, the motifs having the same loop sequence are grouped into a smaller motif family. Hence the final result comprises only the conformational change of loops. This study found 119 motif families (72 EE, 25 EH, 13 HE and 9 HH) with remarkable conformational difference in the loops. All these loops and their sequences are listed in Table III. Most of these loops have only two different conformations. Six loops show three kinds of conformations and two loops exhibit four different conformations.
|
|
|
Hints in protein design
The motif family database derived in this study has illustrated the structural diversity of loops and the convergence of frameworks. In order to design or modify functional proteins, it is often necessary to shift the functional loops on to a target framework. This database may give some directions for this procedure. As we mentioned in the section on hypervariable loops, for the target framework, on which functional loops are to be mounted, if we can find similar frameworks in this motif family then all the loops in that family can be considered to be grafted on the target framework or be used as a template.
This database along with the database of homologous loops and of equal-length loops are freely available on the Web (http://www.ipc.pku.edu.cn/~liwz/motif.html).
![]() |
Appendix A. Total motif families in the database |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
![]() |
Appendix B. Database of homologous motif families |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
![]() |
Appendix C. Database of motif families with the same loop length |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Al-Lazikani,B., Lesk,A.M. and Chothia,C. (1997) J. Mol. Biol., 273, 927948.[ISI][Medline]
Blundell,T., Carney,D., Gardner,S., Hayes,F., Howlin,B., Hubbard,T., Overington,J., Sigh,O.A., Sibanda,B.L. and Sutcliffe,M. (1988). Eur. J. Biochem., 172, 513520.[Abstract]
Bruccoleri,R.E. and Karplus,M. (1987) Biopolymers, 26, 137168.[ISI][Medline]
Chothia,C. and Lesk,A.M. (1987) J. Mol. Biol., 196, 901917.[ISI][Medline]
Chothia,C. et al. (1989) Nature, 342, 877883.[ISI][Medline]
Collura,V., Higo,J. and Garnier,J. (1993). Protein Sci., 2, 15021510.
Desmet,J., Wilson,I.A., Joniau,M., De Maeyer,M. and Lasters,I. (1997) FASEB J., 11, 164172.
Donate,L.E., Rufino,S.D., Canard,L.H.J. and Blundell,T.L. (1996) Protein Sci., 5, 26002616.
Efimov,A.V. (1991) Protein Engng, 4, 245250.[Abstract]
Fidelis,K., Stern,P.S., Bacon,D. and Moult,J. (1994) Protein Engng, 7, 953960.[Abstract]
Fine,R.M., Wang,H., Shenkin,P.S., Yarmush,D.L. and Levinthal,C. (1986) Proteins, 1, 342362.[Medline]
Higo,J., Collura,V. and Garnier,J. (1992). Biopolymers, 32, 3343.[ISI][Medline]
Holm,L. and Sander,C. (1993). J. Mol. Biol., 233, 123138.[ISI][Medline]
Holm,L. and Sander,C. (1994). Nucleic Acids Res., 22, 36003609.[Abstract]
Holm,L., Ouzounis,C., Sander,C., Tuparev,G. and Vriend,G. (1992). Protein Sci., 1, 16911698.
Jones,T.A. and Thirup,T. (1986). EMBO J., 5, 819822.[Abstract]
Jones,G., Willett,P., Glen,R.C., Leach,A.R., and Taylor,R. (1997) J. Mol. Biol., 267, 727748.[ISI][Medline]
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[ISI][Medline]
Kwasigroch,J., Chomilier,J. and Mornon,J. (1996) J. Mol. Biol., 259, 855872.[ISI][Medline]
Leszczynski,J.F. and Rose,G.D. (1986) Science, 234, 849855.[ISI][Medline]
Li,W., Liu,Z. and Lai,L. (1999) Biopolymers, 49, 481495.[ISI][Medline]
Martin,A.C.R. and Thornton,J.M. (1996) J. Mol. Biol., 263, 800815.[ISI][Medline]
Martin,A.C.R., Cheetham,J.C. and Rees,A.R. (1989) Proc. Natl Acad. Sci. USA, 86, 92689272.[Abstract]
Martin,A.C.R., Toda,K., Stirk,H.J. and Thornton,J.M. (1995) Protein Engng, 8, 10931101.[Abstract]
Mas,M.T., Smith,K.C., Yarmush,D.L., Aisaka,K. and Fine,R.M. (1992) Proteins: Struct. Funct. Genet., 14, 483498.[ISI][Medline]
Milner,W.E. and Poet,R. (1986) Biochem. J., 240, 289292.[ISI][Medline]
Milner-White,E.J. and Poet,R. (1987) Trends Biochem. Sci., 12, 189192.[ISI]
Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536540.[ISI][Medline]
Oliva,B., Bates,P.A., Querol,E., Avilés,F.X. and Sternberg,M.J.E. (1997) J. Mol. Biol., 266, 814830.[ISI][Medline]
Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 10931108.[ISI][Medline]
Pellequer,J.L. and Chen,S.W. (1997) Biophys J., 73, 23592375.[Abstract]
Richardson,J.S. (1981) Adv. Protein Chem., 34, 167339.[Medline]
Ring,C.S., Kneller,D.G., Langridge,R. and Cohen,F.E. (1992) J. Mol. Biol., 224, 685699.[ISI][Medline]
Rosenfeld, R,, Vajda,S. and DeLisi,C. (1995) Annu. Rev. Biophys. Biomol. Struct., 24, 677700.[ISI][Medline]
Rufino,S.D., Donate,L.E., Canard L.H.J. and Blundell,T.L. (1997) J. Mol. Biol., 267, 352367.[ISI][Medline]
Sibanda,B.L. and Thornton,J.M. (1985) Nature, 316, 170174.[ISI][Medline]
Sibanda,B.L., Blundell,T.L. and Thornton,J.M. (1989) J. Mol. Biol., 206, 759777.[ISI][Medline]
Topham,C., McLeod,A., Eisenmenger,F., Overington,J.P., Johnson,M.S. and Blundell,T.L. (1993) J. Mol. Biol., 229, 194220.[ISI][Medline]
van Vlijmen,H.W.T. and Karplus,M. (1997) J. Mol. Biol., 267, 9751001.[ISI][Medline]
Vasmatzis,G., Brower,R., and Delisi,C., (1994) Biopolymers, 34, 16691680.[ISI][Medline]
Venkatachalam,C.M. (1968) Biopolymers, 6, 14251436.[ISI][Medline]
Wintjens,R.T., Rooman,M.J. and Wodak,S.J. (1996) J. Mol. Biol., 255, 235253.[ISI][Medline]
Zhang,H., Lai,L., Wang,L., Han,Y. and Tang,Y. (1997) Biopolymers, 41, 6172.[ISI]
Zheng,Q. and Kyle,D.J. (1996). Proteins: Struct. Funct. Genet., 24, 209221.[ISI][Medline]
Received January 14, 1999; revised August 19, 1999; accepted September 23, 1999.