Centre for Protein Analysis and Design,University of Bath, Claverton Down, Bath BA2 7AY, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: antibody modelling algorithm/knowledge-based screens/WAM/WEB
Abbreviations: CDR, complementarity determining region Fv, antibody fragment containing the variable region L1-3 and H1-3, light- and heavy-chain CDRs, respectively VFF, valence force field r.m.s.d., root mean square deviation.
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Antibody modelling has an advantage over protein modelling in general in that only the Fv needs to be modelled, the constant region being conserved. Further, the majority of the Fv itself, the framework, is highly conserved in structure between different antibodies and can be modelled using the most sequence-homologous known framework (see Pedersen et al., 1994). For CDR modelling, five of the six CDRs (all except H3) frequently fall into one of between one and 10 canonical classes, a set for each CDR. Members of a canonical class all have approximately the same backbone conformation. This is determined by the loop length and the presence of a number of key residues, both in the CDR and in the framework (Foote and Winter, 1992), which hold the CDR in a given conformation by hydrogen bonding, electrostatic and/or hydrophobic interactions. So, to model an unknown CDR, the sequence is examined, the appropriate canonical class assigned and the most sequence-homologous known CDR used. For each loop except L2, a few examples fall outside existing canonical classes, and, along with the H3 loop, must be modelled in other ways (see below). It is likely that further canonical classes will be revealed as more crystal structures are solved, although to date no strictly `canonical' classification has been possible for H3.
The difficulty of modelling the H3 loop is due to the extensive variability in both sequence and structure between different antibodies. There are essentially three approaches: knowledge-based methods, such as database searching, where the closest matching database loop [either from antibodies or from the entire Protein Data Bank (PDB) (Berman et al., 2000)] in sequence and length is used as the model (e.g. as in ABGEN; Mandal et al., 1996) or ab initio methods, such as the CONGEN conformational search (Bruccoleri and Karplus, 1987
), or a combination of both (Martin et al., 1991
). More recent analysis suggests that eventually, knowledge-based methods will provide the degree of accuracy required for most applications of modelling, such as antibody humanization (Reichmann et al., 1988
; Pedersen et al., 1994
; Roguska et al., 1994
, 1996
). However, for the time being, a combination of empirical and ab initio methods is likely to be necessary where accuracy approaching that of a medium-resolution X-ray structure is required.
![]() |
Methods and results |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
In the following sections, we outline the current procedure used within WAM and indicate where the protocol differs from AbM.
Framework regions and canonical CDRs
The framework and canonical backbones are built as in AbM. Following the observation that chi-1 torsion angles of certain, but not all, residue side chains in canonical loops are conserved (see Web site), the CONGEN iterative side chain placement algorithm (Bruccoleri and Karplus, 1987), used in the original AbM, was replaced by a procedure in which the side chains are built using a chi-1 biased maximum-overlap procedure. This attempts to overlap as many as possible of the atoms of the side chain being modelled with the template side chain, while avoiding steric clashes. The non-conserved side chains are excluded at this stage and built later, at the same time as the non-conserved H3 side chains (see below).
Most of the algorithm developments relate to modelling of non-canonical CDRs, in particular H3. Analysis of antibody structures has revealed the existence of certain preferred interactions across the loop (Rees et al., 1996) and, further, that the majority of H3 loops have a kinked backbone conformation, involving the four C-terminal residues, that can be defined by a set of sequence-based rules (Shirai et al., 1996
; Morea et al., 1998
). For loops of up to 11 residues, we have observed that the majority show semi-conserved conformations (up to 2.5 Å r.m.s.d. between individual examples) and solvent accessibility profiles (Whitelegg, 1998
). This has allowed a modification of the published `kinked' rules for H3 loops leading to an improved prediction. The analysis on which this improvement is based is described in more detail below.
For other non-canonical examples of L1, L2, L3, H1 and H2, further small but significant changes to the original AbM algorithm have been made. First, construction of these CDRs uses only the C-alpha to C-alpha database search, replacing the more complicated combined database/conformational search algorithm (CAMAL; Martin et al., 1989, 1991). This has been made possible because the number of antibody structures is now much greater than when the CAMAL algorithm was introduced, where a conformational search procedure was found to be essential to saturate the available CDR conformational space. A series of benchmarks (see Website) using the database method has revealed that no deterioration in accuracy of these non-canonical CDRs is seen but considerable savings in computer time are achieved. Second, the stand-alone implementation of the VFF (Dauber-Osguthorpe et al., 1988) is used to screen the energies of the conformations generated, rather than the modified Eureka implementation used in AbM.
Modified algorithm for H3 loops
H3 loops predicted to belong to the `kinked' family on the basis of sequence rules are built using the modified algorithm presented below, using either CAMAL (eight residue loops or longer) or a CalphaCalpha database search (seven residue loops or shorter). Loops falling outside the `kinked' definition are built using a further modification in which the full range of H3 loops is used to generate the database search constraints. The final set of conformations is then screened by energy, rather than the newer knowledge-based methods (see below).
Database search.
The CalphaCalpha database search (Martin et al., 1989) is performed, with two modifications. First, only the set of `kinked' H3 crystal structures is used to derive the distance constraints, in order to bias the database hits in favour of `kinked' structures. Second, the database hits are subjected to 45 rounds of steepest-descent minimization after grafting on to the framework. Minimization, not included in AbM, has been introduced to avoid the situation where some low r.m.s.d. database loops are rejected because they sit above an energy threshold after construction by the CONGEN procedure.
Conformational search and energy minimization.
The CONGEN (Bruccoleri and Karplus, 1987) and chain closure (Go and Scheraga, 1970
) algorithms are used (as in AbM) to rebuild the central five-residue segment of each database loop after its excision. However, the definition of `central' is not fixed. For each candidate, the rebuild region is moved to contain those residues with the highest solvent exposure; for example, investigations of known `kinked' H3 structures has revealed that the most exposed regions lie towards the N-terminal regions of these loops.
A preliminary set of side chain conformations are built, using the CONGEN iterative algorithm, to enable an initial screen of loop candidates to be carried out; previous work (Whitelegg, 1998) has shown that building the final side chain conformations on loops before they have been fully minimized can lead to steric clashes.
Each loop conformation is then subjected to 15 rounds of steepest-descent minimization by VFF to relieve any residual high energy in the loops and the conformations are clustered to remove duplicates using a modification of the original AbM procedure.
Clustering. In the first step, all conformations within a starting value, typically within 0.5Å r.m.s.d. of the first conformation, are `clustered' with the first conformation; the first conformation to be excluded from the first conformational set forms the basis of the second cluster where all conformations within 0.5 r.m.s.d. of it are clustered with that conformation, and so on. Then the median conformation of each initial cluster is calculated and clusters are merged if their median conformations (as opposed to an arbitrary first conformation of the cluster, as in AbM) are within the clustering r.m.s.d./torsion resolution (a step value higher than the previous stage). This is repeated until a target number of clusters, dependent on the initial number of conformations, are produced. Where structures are modelled using the database search only, a clustering process is already included. However, an energy screen is performed so that only conformations within 50 kcal of the lowest energy of the set are retained.
Final screening. Three methods are used to screen the clustered sets of conformations:
Accessibility profiles.
It has been observed (Whitelegg, 1998) among `kinked' antibody H3 crystal structures ranging in length from 7 to 14 residues that certain residues are frequently exposed (>30% relative accessibility), whereas others are persistently buried (<30% relative accessibility). Using this accessibility `pattern', a screen for the conformations generated has been devised. First, side chains are built on each clustered conformation (see below). Each subsequent conformation is scored based on the amount that the accessibility of each key residue deviates from the mean observed accessibility in crystal structures, divided by the standard deviation as a measure of the certainty of the observed accessibilities:
![]() |
The accessibility screen is an effective test for a `kink' (Shirai et al., 1996), since the features of the `kink' include a buried H3:C 2 (C-terminus minus 2) residue and an exposed H3:C-terminal residue, but it also conveys the additional information of the solvent exposure of the central part of the loop and whether or not the N-terminal residue is buried.
r.m.s.d. screen.
This screen is based on similarity, as measured by r.m.s.d., to the known H3 structures having the same length as the modelling candidate. It has been observed that H3 loops with the same or similar lengths have similar conformations in many cases (described in detail on the Website; first seen by de la Paz et al., 1986). A similarity score for each of the clustered conformations is calculated:
![]() |
H3 and canonical non-conserved side chain modelling. This step is required since certain side chain conformations in canonical non-H3 CDRs and all of the H3 side chains are non-conserved. H3 and canonical non-conserved side chains (as well as all side chains on any non-canonical non-H3 loops) are built simultaneously on the final selected backbone (for the r.m.s.d. and VFF backbone only screens) or the entire clustered set (for the accessibility and VFF side chain screens), using either:
The preferred method depends on the screen used. For the r.m.s.d. and VFF backbone screens (where backbone energy only is being used as the screen and therefore only one conformation needs to have side chains built), the dead-end elimination algorithm is used. For VFF with side chains or the accessibility screen, side chains need to be built on the entire set of clustered conformations. Therefore, CONGEN is used in preference to the dead-end elimination algorithm since, although slightly less accurate and less theoretically sound, it is better able to deal with the processing of multiple conformations in a timely fashion.
Table I shows the results from modelling 19 test cases of known crystal structure using the three possible methods available in WAM to screen the final conformations:
|
It can be seen that the energy-based VFF method is not particularly discriminating, selecting a conformation of 2.0 Å or less in only nine out of 19 examples. This was the screening method used in AbM and although only four of the 19 antibodies modelled here were also modelled using the AbM package, the results show essentially the same trend. By contrast, while overall the accessibility screen does little better than VFF, with a conformation of 2.0 Å in 10 out of 19 examples, it is notably effective at short (nine or less) H3 lengths, selecting a conformation of
2.0 Å in 10 out of 11 examples. This is because for shorter loops, a greater percentage of the loop consists of residues with conserved accessibility values, whereas for longer loops, there are generally more apex residues with variable accessibility between individual examples.
The most successful screen is the r.m.s.d. screen. This method performs well, selecting a conformation of 2.0 Å r.m.s.d. (measured on N, C
, C, Cß atoms) in 16 out of 19 examples. On the whole, the r.m.s.d. screen performs better than merely selecting the closest crystal structure loop of that length by sequence similarity using the Dayhoff mutation matrix. Those structures which perform poorly using the r.m.s.d. screen (e.g. 1kem and 1kb5) are examples of structures which deviate considerably from the average conformation for that length, for sequence-dependent steric reasons (see Website).
Buried side chains are also modelled well using a combination of the r.m.s.d. screen and the dead-end elimination method for side chain construction, with a backbone plus buried side chain r.m.s.d. of 2.0 Å for non-H3 loops in almost all cases (one exception, CDR-H2 in 1kb5, is a non-canonical loop with backbone r.m.s.d. of 2.2 Å; in the other, CDR-L3 in 2fbj, the value is entirely due to a poorly placed tryptophan which is only semi-buried and is close to a number of exposed side chains with high r.m.s.d.). The r.m.s.d. values for CDR-H3 mirror those for backbone only, in that well-modelled backbones lead to well-modelled side chains whereas mild inaccuracies in the backbone are exacerbated in side chain construction (e.g. r.m.s.d. values of 1.7 and 2.7 Å, respectively, for the CDR-H3 of 2jel). In all these instances the poorly modelled side chains correspond to regions of the loop where the backbone deviation is the greatest.
Conclusion
WAM offers an improved algorithm for CDR-H3 modelling. The results presented here show that those knowledge-based approaches applied have been more effective than the original energy-based approaches. For shorter H3 loops of up to nine residues, accurate models (1.7 Å r.m.s.d. for nine out of 11 examples) can now be constructed, while for loops of 1012 residues a slight increase in the range of r.m.s.d. values is seen (1.32.7 Å r.m.s.d.). While the accessibility and the r.m.s.d. methods provide equivalent accuracies for kinked loops of up to nine residues, the accessibility score method is preferred since it is better able to screen those antibodies containing outlying H3 conformations. By contrast, the r.m.s.d. score is the method of choice for 1012 residue H3s where fewer residues have a well-defined accessibility profile.
There are some remaining problems to be solved: first, for loops longer than 12 residues there are too few database examples for either knowledge-based method to show significant improvement over an energetic screen such as VFF. Furthermore, the apices of the longer H3 loops vary between the known structures (perhaps required for conformational change during binding), reducing the inherent advantages of a consensus, knowledge-based approach. For these cases, methods that explicitly take account of flexibility (such as molecular dynamics simulation) will need to be developed. Second, those kinked H3 loops of atypical conformation (such as 1kem and 1kb5 above) cannot be constructed accurately by the r.m.s.d. screen. Until such time as examples of similar conformation appear in the PDB, allowing sequencestructure relationships to be devised (this also applies to non-kinked H3s for which there are only a small number known at present), it will only be possible to have moderate confidence in the models produced. Having said that, our own work (see the Website) and that of Shirai's group (Shirai et al., 1999) have allowed the definition of some loose rules for the kinked outliers.
The other remaining problem relates to the small number of non-canonical CDRs within the L1L3 and H1, H2 sets. Studies by Chothia and co-workers (AlLazikani et al., 1997) and our own examination of the newer structures deposited in the Protein Data Bank suggest that most of the exceptions will actually probably turn out to be members of new canonical classes.
![]() |
Appendix. Worked example |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Framework and canonicals
The first stage is to determine:
For this example, the most homologous light chain framework is 1dba, whereas for the heavy chain it is 1cgs.
The canonicals are as shown.
Thus, the light chain framework is built using that of 1dba, resequencing residues using maximum overlap if necessary; and the heavy chain framework is built using that of 1cgs. The two chains are combined into a single model by fitting on an average ß-barrel generated from the known Fv structures. The frameworks of loops L1 to H2 are deleted and rebuilt using the canonicals from 1mlb, 1fgn, 1vfa, 2fbj and 1igt, respectively. Finally, the conformationally conserved canonical side chains (N.R.J.Whitelegg and A.R.Rees, unpublished results) are built, using maximum-overlap resequencing.
The H3 loop: construction
The eight-residue H3 loop in this instance is preceded by Arg and features Asp at the residue preceding the C-terminal, specifying a kinked conformation (Shirai et al., 1996). Therefore, CAMAL (the combined database/conformational search) is used, in this instance with C-alpha to C-alpha distance constraints derived from kinked H3 structures only. For this example, the database search generated 151 hits, while after rebuilding the apex using CONGEN, a total of 14 108 conformations were produced. On clustering this was reduced to 69.
The H3 loop: final screen
This is an eight-residue kinked CDR and therefore the r.m.s.d. screen is the method of choice, as it is a relatively short loop and the known structures have a well-defined profile. Since side chains are required to be in place for this screen, side chains need to be built on all the clustered conformations. As CONGEN is a quicker method than dead-end elimination, this is the default algorithm here, although the user may alter this if it is decided that a slight time penalty can be taken in order to achieve a more theoretically sound model. (Note that if the r.m.s.d. screen was used, the screening requires only the backbone conformation and therefore only the final backbone, selected by the screen, would require side chain construction. Therefore dead-end elimination would be the default method.)
All non-conserved side chains (H3 and canonical CDRs) are built simultaneously using CONGEN, which iteratively places side chains at each residue until an acceptable global minimum is reached according to energy convergence. After side chain construction each conformation is energy-minimized (by steepest descent, using VFF) to relieve steric clashes and the accessibility screen performed on each model to select the final conformation, which is that most closely matching the typical accessibility profile of eight-residue kinked H3s.
The final model is then made available as a PDB file for visualization by any standard graphics software package.
|
|
![]() |
Notes |
---|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000). Nucleic Acids Res., 28, 235242.
Bruccoleri,R.E. and Karplus,M. (1987) Biopolymers, 26, 137168.[ISI][Medline]
Chothia,C. and Lesk,A.M. (1987) J. Mol. Biol., 196, 901917.[ISI][Medline]
Chothia,C. et al. (1989) Nature, 342, 877883.[ISI][Medline]
Chothia,C., Lesk,A.M., Gherardi,E., Tomlinson,I.M., Walter,G., Marks,J.D., Llewellyn,M.B. and Winter,G. (1992) J. Mol. Biol., 227, 799817.[ISI][Medline]
Darsley,M.J. and Rees,A.R. (1985) EMBO J, 4, 383392.[Abstract]
Dauber-Osguthorpe,P., Roberts,V.A., Osguthorpe,D.J., Wolff,J., Genest,M. and Hagler A.T. (1988) Proteins, 4, 3147.[ISI][Medline]
de la Paz,P., Sutton,B.J., Darsley,M.J. and Rees,A.R. (1986) EMBO J., 5, 415425.[Abstract]
De Maeyer,M., Desmet,J. and Lasters,I. (1997) Fold. Des., 2, 5366.[ISI][Medline]
Feldmann,R.J., Potter,M. and Glaudemans,C.P.J. (1981) Mol. Immunol., 18, 683698.[ISI][Medline]
Foote,J. and Winter,G. (1992) J. Mol. Biol., 224, 487499.[ISI][Medline]
Go,N. and Scheraga,H.A. (1970) Macromolecules, 3, 178187.
Kabat E.A and Wu,T.T. (1972) Proc. Natl Acad. Sci. USA 69, 960964.[Abstract]
Kabat,E.A., Wu,T.T. and Bilofsky,H. (1977) J. Biol. Chem., 252, 66096616.[ISI][Medline]
Lasters,I., De Maeyer,M. and Desmet,J. (1995) Protein Eng., 8, 815822.[Abstract]
Mandal. C., Kingery,B.D., Anchin,J.M., Subramaniam,S. and Linthicum,D.S. (1996) Nature Biotechnol., 14, 323.[ISI][Medline]
Martin,A.C.R., Cheetham,J.C. and Rees,A.R. (1989) Proc. Natl Acad. Sci. USA, 86, 92689272.[Abstract]
Martin,A.C.R., Cheetham,J.C. and Rees,A.R. (1991) Methods Enzymol., 203, 121153.[ISI][Medline]
Morea,V., Tramontano,A., Rustici,M., Chothia,C. and Lesk,A.M. (1998) J. Mol. Biol., 275, 269294.[ISI][Medline]
Padlan,E.A. and Davies,D.R. (1975) Proc. Natl Acad. Sci. USA, 72, 819823.[Abstract]
Padlan,E.A., Davies,D.R., Pecht,I., Givol,D. and Wright,C. (1976) Cold Spring Harbor Symp. Quant. Biol., 41, 627637.[ISI]
Pedersen,J.T., Searle,S.M.J., Henry,A.H. and Rees,A.R. (1992) Immunomethods, 1, 126.
Pedersen,J.T., Henry,A.H., Searle,S.M.J., Guild,B.C., Roguska,M. and Rees,A.R. (1994) J. Mol. Biol., 235, 939973.
Ponder,J.W. and Richards,F.M. (1987) J. Mol. Biol., 193, 775791.[ISI][Medline]
Rees,A.R., Martin,A.C.R., Webster,D., Cheetham,J.C. and Roberts,S. (1990) Biophys. J. 57, A384.
Rees,A.R., Searle,S.M.J., Henry,A.H.and Pedersen,J.T. (1996) In Sternberg M.J.E. (ed.), Protein Structure Prediction. Oxford University Press, Oxford, 141172.
Reichmann,L., Clark,M., Waldmann,H. and Winter,G. (1988) Nature, 332, 323327.[ISI][Medline]
Roguska,M.A. et al. (1994) Proc. Natl Acad. Sci. USA, 91, 969973.[Abstract]
Roguska,M.A. et al. (1996) Protein Eng., 9, 895904.[Abstract]
Shirai,H., Kidera,A. and Nakamura,H. (1996) FEBS Lett., 399, 18.[ISI][Medline]
Shirai,H., Kidera,A. and Nakamura,H. (1999) FEBS Lett., 455, 188197.[ISI][Medline]
Stanford,J.M. and Wu,T.T (1981) J. Theor. Biol., 88, 421439.[ISI][Medline]
Tomlinson,I.M., Cox,J.P.L., Gherardi,E., Lesk,A.M. and Chothia,C. (1995) EMBO J., 14, 46284638.[Abstract]
Whitelegg,N.R.J. (1998) PhD Thesis University of Bath.
Wu,T.T. and Kabat,E.A. (1970) J. Exp. Med. 132, 211250.[ISI][Medline]
Received July 17, 2000; revised October 19, 2000; accepted October 23, 2000.