Max-Planck-Institut für Biochemie, Abteilung Strukturforschung, Am Klopferspitz, 82152 Martinsried (bei München), Germany
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: protein design/protein folding/protein topology/Ramachandran map/ubiquitin
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Rather than simulating the detailed torsional dynamics at all times, our algorithm focuses on the evolution of torsional constraints of the chain, as determined by the basin (attractive paraboloid-like region) each residue is visiting within its Ramachandran (,
) map (Fernández and Berry, 2000
; Fernández, 2001
). Since, in order to reach a specific (
,
) value, the residue must first find the basin which contains it, our algorithm makes use of the fact that folding is subordinated by a coarser process determined by interbasin hopping. The transition rate for this discretized process is modulated according to the extent of structural involvement of the residues at each time. The whole set of basin occupancies is specified by the so-called local topology matrix (LTM) whose time-evolution provides a description of the evolving torsional constraints that guide the folding process. Thus, the algorithm operates iteratively, by random basin search with modulated rates, generation of the most probable conformation realizing the LTM and, based on this geometry, attribution of new basin-transition rates needed to generate the next LTM. The basic operating premises and their experimental validation are given in Methods.
The ab initio treatment sketched above appears to be a suitable tool to identify the residues which must be in a specific Ramachandran basin (local topology) for the chain to be able to control or quench structural fluctuations, a clear signature that a nucleus has been formed beyond which structural development becomes an energetically downhill process (Krantz et al., 2000). Thus, the main thrust of this paper is to engineer a shortened sequence able to retain the nucleus topology of the wild-type sequence when mutated so that this topology can be realized by a concrete conformation representing a free energy minimum. This automated engineering procedure might be applicable to most if not all two-state folders for which a single folding transition state is experimentally discernible (Sosnick et al., 1996
; Krantz et al., 2000
) and computationally detectable by a sudden quenching of structural fluctuations (Fernández, 2001
).
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The attributed basin-hopping rate for this discretized process is modulated according to the extent of structural involvement of each residue at each time. In turn, the extent of structural involvement is evaluated in thermodynamic termsand as such it requires an intramolecular semiempirical effective potentialas the gain in free energy which would occur if the (virtual) move consisting of changing the Ramachandran basin for the given residue would take place. This thermodynamic change adopts as reference state the lowest free energy structure associated with the previous set of Ramachandran basin assignments. This evaluation requires: (i) an effective intramolecular potential which introduces an implicit treatment of the solvent and therefore contains many body correlations to account for the environments that the chain itself is creating as it folds onto itself, and (ii) a computation of the microcanonical lake areas of Ramachandran basins in order to determine the destiny basin for each residue which changes basin (such a computation is given in Fernández, 2001).
Once the hopping rate or number of steps required to change basin has been computed, the most probable target basin becomes the one with the largest lake area or microcanonical entropy, as indicated in Fernández (Fernández, 2001). The whole set of basin occupancies is specified by the LTM whose time evolution provides a digital description of the evolving torsional constraints that guide the folding process. Thus, the algorithm operates iteratively, by random basin search with modulated rates, generation of the most probable conformation realizing the LTM and, based on this geometry, attribution of new basin-hopping rates needed to generate the next LTM. The basic operating premises and their experimental validation vis-á-vis existing mutational data on a specific protein are sketched below.
The algorithm has been designed with two purposes in mind:
(i) Make accessible considerably long timescales (milliseconds to seconds) at the expense of losing some structural and temporal resolution while retaining the inherent geometric constraints of the semiflexible peptide backbone.
(ii) Introduce a crude model for folding cooperativity by introducing three-body correlations which account for conformation-dependent environments. Such correlations determine a rescaling of the two-body interactions (pairwise energy contributions) depending on the proximity of a third body.
In view of this, the basic tenets of the model are:
(i) The backbone torsional motion is constrained by the basin of attraction in the Ramachandran map topography visited by each residue at a given time. Thus, local torsional constraints change only when residues perform interbasin hopping.
(ii) Interbasin hopping is slower than intrabasin exploration and the search for a particular target (,
) region is contingent on the residue first finding the basin that contains this region. Thus, an efficient exploration of conformation space is achieved by adopting a discretized `modulo basin' representation of torsional states.
(iii) A state of the chain within this description is specified as a `word' of basins, assigning one basin to each residue. Such words, known as LTMs need to be translated into geometries to be properly interpreted in terms of standard structural motifs. The conformational realization of each LTM is described in Fernández (Fernández, 2001) and requires: (1) choosing (
,
) points for each residue from within the basin assigned to it in the LTM; (2) optimizing the resulting conformations by minimizing a semiempirical intramolecular potential.
(iv) The LTM evolution is determined by interbasin transitions whose rates are modulated according to the extent of structural involvement of the residues at the time. Interbasin hopping rates decrease as a pattern compatible with a structural motif is suddenly recognized in the LTM and increase if patterns are dismantled due to the formation of out-of-consensus bubbles.
Thus, the algorithm consists of pattern-recognition-and-feedback iterations in which residues change their rates of interbasin hopping according to the information encoded in the structural pattern recognized in the latest LTM generated. To `read' any such pattern in the LTM, an optimized torsional coordinate representation is needed in order to identify new non-bonded interactions. However, the structural detail does not participate per se in the dynamics: it is only retrieved intermittently in order to assess the hopping rates of the less structurally constrained residues.
A quantity directly accessible from such simulations is L(t) = number of residues performing interbasin hopping at time t (Fernández, 2001). This quantity enables us to estimate the extent of structural fluctuations at each time, and thereby characterize the nucleus by finding the LTM at the time (t*) which marks a sudden decrease in L(t). To validate this methodology, we show how to compute the Fersht's
values (Fersht, 2000
) for the well studied chymotrypsin inhibitor (CI2; PDB accession number 1COA, N = 64; Jackson and Fersht, 1991
; Muñoz and Eaton, 1999
).
From the computationally accessible LTM(t) pattern for CI2 (compare with Figure 1) we get t* = 0.74 ms and determine the time
( = 1.16 ms) at which the coarsely-resolved structural fluctuations cease completely [L(t) = 0, t >
]. Then we define the dynamic parameter F(n) = t(n) /
. The F(n) plot for CI2 is displayed in Figure 6a
. The thick line absissas in regions 1417 and 4748 correspond to regions where F(n) is closest to the critical value F* = t* /
within the uncertainty in t* determination, represented by the band flanked by dashed lines.
|
|
![]() |
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The pattern [LTM = LTM(t), 0 < t < 7 ms] displayed in Figure 1 corresponds to the most reproducible successful run consisting of 7x107 iterations for mammalian ubiquitin (Ub, PDB accession number 1ubi) (Sosnick et al., 1996
; Krantz et al., 2000
). The simulation was perfomed at T = 316 K, a temperature demanding cooperativity and concertedness for structure survival. This feature is apparent in Figure 1
: periods of overall stasis are flanked by periods of extensive large-scale fluctuations engaging up to 79% of the chain. The target stationary LTM at 7 ms is virtually identical to that of the crystallographic structure displayed in the last detached row in Figure 1
. Only the local topology of a single internal residue is different, while the local topologies of the residues at both N- and C-terminii are crystal artifacts. On the other hand, the Hamming distance between the contact map associated with the LTM at 7 ms (Figure 2
; Fernández, 2001
) and that of the native fold is 1.02%, marking the success of the simulation. The contact maps are directly obtained from conformations realizing specific LTMs. The optimization of such conformations is governed by the semiempirical intramolecular potential given in Fernández (Fernández, 2001
).
|
The topology and kinetics of formation of the Ub nucleus, accessible from Figure 1, have eluded proton-exchange and other kinetic probes (Sosnick et al., 1996
; Krantz et al., 2000
), suggesting a backbone H-bond pattern vulnerable to solvation, typical of a ß-sheet system (Schonbrunner et al., 1997
). The contact map for the optimized conformation realizing the nucleus topology LTM(t*) is displayed in Figure 3
. This nucleus involves the antiparallel (115) ß-sheet, a displaced (15)(5863) parallel ß-sheet and part of the native context-dependent (2233) helix, a motif which cannot be inferred based solely on local propensities (Lacroix et al., 1998
). Comparison between Figures 2 and 3
reveals that the nucleus contains about 64% of the native structure.
|
|
|
(i) The topology of the chain remains invariant throughout the entire mutational pathway: the basin assigned to residue n is fixed according to the nth entry in the LTM(t*) given in Figure 1.
(ii) The nuclear residues most difficult to get organized will mutate at the slowest rate: the number of mutations, M(n), which nuclear residue n undergoes every 100 iterations of the topologically-restricted Metropolis-like simulation is M(n) = 100 x [(t* t(n)) / t*], where the brackets indicate closest integer.
(iii) For each new block of mutations, a single most probable chain conformation is generated by chosing (,
) values within the pre-assigned basins according to the PROCHECK distribution (Laskowski et al., 1993
). Once a particular residue has been chosen to occupy a specific position in the chain, the (
,
) coordinates at that location are chosen to be the most probable for that residue in the fixed basin.
(iv) A block of mutations is accepted if and only if its associated (,
) conformation is lower in energy than that resulting from the previous iteration. The semiempirical intramolecular potential adopted is the one successfully used to fold Ub (Fernández, 2001
).
Starting with Ub*, and after 8.2x107 topologically-restricted Metropolis-like block mutational iterations, we obtain the sequence Ub** (Figure 4). The associated optimized conformation of Ub** is displayed in Figure 5b
: it reproduces the native Ub fold motif with an r.m.s. displacement of 1.7 Å in the conserved nucleus region and lies 7.43 kcal/mol below that for Ub*. The invariant regions in the transition Ub*Ub** are the `belated' nuclear residues with t(n) closest to t* (indicated in dashed underlining in Figure 4
), while the most actively mutated regions are precisely those needed to recover the native parallel ß-sheet lost in Ub*. These residues, responsible for the turning, docking and locking of the terminal ß-sheet, are marked by solid underlining in Figure 4
and indicated with an asterisk in Figure 5
.
Essentially, the mutational process has produced a sequence able to sustain a stable fold retaining the nucleus topology along the entire mutational pathway and, in so doing, the mutational process lead to the reconstruction of the Ub native fold motif.
To reach a stable conformation, the mutational process produced a short highly flexible turn engaging residues G39, G42 and W44. The latter acts as a hydrophobic seal for the turn: Since water is organized around W44, the adjacent H-bonds which buttress the new parallel and antiparallel ß-sheets (Figure 5b) become effectively desolvated and thereby stabilized. This `locking of the ß-strand docking' induced by W44 is probably a kinetic inhibitory mechanism (Fernández, 2001
): water organized around W44 would have to be disrupted if it were to solvate the exposed backbone resulting from the dismantling of the buttressing H-bonds shown in Figure 5b
. Thus, facing diminished competition from surrounding water molecules, the harnessing H-bonds are effectively protected.
This primitive flexible turn motif is detectable in immunoglobulin-binding proteins such as PDB.1igd (Derrick and Wigley, 1994), in which the region G43-V44-D45-G46-V47-W48 performs the docking and locking of the terminal ß-strands.
The extensive mutation of Ub* leaves the first 33 units qualitatively unchanged and produces a new `turndocklock' motif by increasing flexibility in the (3843) region while protecting the recovered native motif. Significantly, the first 33 units in Ub* and Ub** have 38% homology with the hyperthermophile variant of streptococcus protein G B1 domain (pdb.1gb4; Malakauskas and Mayo, 1998; Cregut and Serrano, 1999
), including identity within the regions conserved between Ub* and Ub** (Figure 4
). The rest of the Ub* sequence shows virtually no homology with pdb.1gb4.
On the other hand, the Ub** recovery of the native fold motif (Spector et al., 1999), required forming a new `turndocklock motif' demanding extensive mutation of the non-conserved (3456) portion of the molecule. Significantly, Ub** is 92% homologous (modulo equivalent residues, such as L, I) to the hyperthermophile protein (Figure 4
), while the r.m.s. displacement between their respective folds is 2.02 Å. This suggests that the Ub** fold shown in Figure 5b
is a more `primitive' design of the Ub nuclear motif in which the finely tuned (3860) complex Ub loop has been replaced with the highly flexible and harnessing (3944) region.
Hence, a mutational pathway retaining the Ub basic topology and with implications for protein engineering (Cordes et al., 1999) has been elucidated using dynamical information on the wild-type folding history.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Notes |
---|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Cordes,M.H., Walsh,N.P., McKnight,C.J. and Sauer,R.T. (1999) Science, 284, 325328.
Cregut,D. and Serrano,L. (1999) Protein Sci., 8, 2712812[Abstract]
Dahiyat,B.I. and Mayo,S.L. (1997) Science, 278, 8287.
Derrick,J.P. and Wigley,D.B. (1994) J. Mol. Biol. , 243, 906918.[CrossRef][ISI][Medline]
Fernández,A. (2001) J. Chem. Phys., 114, 24892502.[CrossRef][ISI]
Fernández,A. and Berry,R.S. (2000) J. Chem. Phys. , 112, 52125222.[CrossRef][ISI]
Fernández,A., Colubri,A. and Berry,R.S. (2000) Proc. Natl Acad. Sci. USA, 97, 1406214066.
Fersht,A. (2000) Proc. Natl Acad. Sci. USA, 97, 15251529.
Jackson,S.E. and Fersht,A.R. (1991) Biochemistry, 30, 1042810435.[ISI][Medline]
Krantz,B., Moran,L.B., Kentsis,A. and Sosnick,T.R. (2000) Nat. Struct. Biol., 7, 6271.[CrossRef][ISI][Medline]
Lacroix,E., Viguera,A.R. and Serrano,L. (1998) J. Mol. Biol., 284, 173188.[CrossRef][ISI][Medline]
Laskowski,P.A., MacArthur,M.W., Moss,D.J. and Thornton,J.M. (1993) J. Appl. Crystallogr., 26, 283329.[CrossRef][ISI]
Malakauskas,S.M. and Mayo,S.L. (1998) Nat. Struct. Biol., 5, 470475.[ISI][Medline]
Muñoz,V. and Eaton,W.A. (1999) Proc. Natl Acad. Sci. USA, 96, 1131111316.
Schonbrunner,N., Pappenberger,G., Scharf,M., Engels,J. and Kiefhaber,T. (1997) Biochemistry, 36, 90579065.[CrossRef][ISI][Medline]
Sosnick,T.R., Mayne,L. and Englander,S.W. (1996) Proteins: Struct. Func. Genet., 24, 413426.[CrossRef][ISI]
Spector,S., Young,P. and Raleigh,D.P. (1999) Biochemistry, 38, 41284136.[CrossRef][ISI][Medline]
Voigt,C.A., Mayo,S.L., Arnold,F.H. and Wang,Z.G. (2001) Proc. Natl Acad. Sci. USA, 98, 37783783.
Received May 16, 2001; revised July 28, 2001; accepted September 25, 2001.
|