The building block folding model and the kinetics of protein folding

Chung-Jung Tsai1 and Ruth Nussinov1,2,3

1 Intramural Research Support Program—SAIC Laboratory of Experimental and Computational Biology, NCI-Frederick, Bldg 469, Rm 151, Frederick, MD 21702, USA and 2 Sackler Institute of Molecular Medicine, Department of Human Genetics, Medical School, Tel Aviv University,Tel Aviv 69978, Israel


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Here we show that qualitatively, the building blocks folding model accounts for three-state versus the two-state protein folding. Additionally, it is consistent with the faster versus slower folding rates of the two-state proteins. Specifically, we illustrate that the building blocks size, their mode of associations in the native structure, the number of ways they can combinatorially assemble, their population times and the way they are split in the iterative, step-by-step structural dissection which yields the anatomy trees, explain a broad range of folding rates. We further show that proteins with similar general topologies may have different folding pathways, and hence different folding rates. On the other hand, the effect of mutations resembles that of changes in conditions, shifting the population times and hence the energy landscapes. Hence, together with the secondary structure type and the extent of local versus non-local interactions, a coherent, consistent rationale for folding kinetics can be outlined, in agreement with experimental results. Given the native structure of a protein, these guidelines enable a qualitative prediction of the folding kinetics. We further describe these in the context of the protein folding energy landscape. Quantitatively, in principle, the diffusion–collision model for the building block association can be used. However, the folding rates of the building blocks and traps in their formation and association, need to be considered.

Keywords: anatomy/building block/folding kinetics/folding model/protein folding/two-state versus three-state


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
A number of experimental and theoretical results have recently illustrated that protein folding rates are largely determined by the native protein topology (e.g. Riddle et al., 1997Go; Kim et al., 1998Go; Martinez et al., 1998Go; Perl et al., 1998Go; Plaxco et al., 1998Go; Alm and Baker, 1999aGo,bGo; Galzitskaya and Finkelstein, 1999Go; Munioz and Eaton, 1999Go). Analysis of the general trends shown by proteins that fold with two-state kinetics (Jackson, 1998Go) has revealed that they are generally quite small (normally under 100 residues). Additionally, two other factors have been shown to play a role: the secondary structure type, with {alpha}-helices forming faster than ß-strands, and the sequence separation between residues that are in spatial contact (Plaxco et al., 1998Go; Tsai et al., 1999aGo,bGo,2000Go). Yet, not all {alpha}-helical proteins fold at a fast rate. Additionally, some questions have been raised with regard to the role played by topology and hence by sequential distance separation. It has been argued that proteins having similar topologies may fold with different rates. Further, external conditions that change the environment alter the folding rates (Chiti et al., 1999Go; Ionescu and Matthews, 1999Go). Similarly, rates may be changed by mutations without a change in the native topologies (Jackson, 1998Go; Ionescu and Matthews, 1999Go). Here we show that qualitatively, the building block folding model (Tsai et al., 1998Go,1999aGo,bGo) can explain remarkably well kinetic data for the folding rates of proteins that fold largely with simple, two-state kinetics. Furthermore, it explains the folding rates of proteins that fold with two-state as compared with data for proteins that fold via three-state kinetics, with observed stable intermediates.

A building block is a highly populated, transient structural entity composed of a contiguous sequence fragment. Its stability derives from local interactions within the sequence segment. A building block may twist or open up during the hydrophobic collapse, changing its conformation. The analogy between folding and binding is consistent with viewing protein folding as a hierarchical process, involving a combinatorial assembly of such a set of conformationally fluctuating building blocks. Such an assembly leads to the formation of compact, independently folding hydrophobic units, to domains, and subsequently to entire monomers. Protein folding can be viewed as a process of intra-molecular recognition, the outcome of mutual stabilization between the highly populated local structural elements. Folding and binding are similar processes, governed by similar principles. In both the driving force is the hydrophobic effect (Tsai et al., 1998Go,1999aGo,bGo).

Two-state systems have been considered to constitute the simplest models of protein folding. In two-state protein folding, only the unfolded and the native states are the populated forms of the protein. Numerous studies have been devoted to two-state systems (reviewed in Jackson, 1998Go; and in Alm and Baker 1999aGo, and references therein). In two-state folding proteins, a linear behavior is observed in plots of the natural logarithm of the rate constant for unfolding, and refolding, as a function of the concentration of the denaturant (Jackson, 1998Go). Furthermore, in two-state folding proteins, when calculated from kinetic data, the thermodynamic parameters for the change in the free energy of unfolding in the absence of the denaturant, and the constant related to the average fractional change in the extent to which the residues become exposed upon unfolding, correlate with those derived from the equilibrium measurements. In contrast, proteins considered to fold via a three-state folding pathway, illustrate a populated, stable intermediate state(s).

Here we show that qualitatively, the building blocks folding model provides a coherent, consistent rationale for the folding kinetics of two- and three-state proteins, in agreement with experiments. Previously, among the important determinants of the folding rates, two factors have been shown to stand out, the secondary structure type and the relative contact order (Plaxco et al., 1998Go). The latter essentially measures the weight of sequential, local, versus non-local interactions in any given protein molecule. Baker and colleagues have recently shown a particularly nice correlation between the experimentally derived rates of protein folding and their relative contact order, for small single domain proteins (Plaxco et al., 1998Go; Alm and Baker, 1999aGo). However, neither the relative contact order nor the building block folding model account perfectly for the kinetics of protein folding. Both approaches have advantages and disadvantages. The relative contact order suffers from four major limitations: first, it does not account for the effect of mutations on the rate of folding; second, similarly, it does not account for the different folding kinetics observed under different conditions; third, it applies only to small, two-state folding proteins; and fourth, a residue-based model cannot put the folding process in the context of the energy landscape. A residue-based approach cannot capture the directed down-hill folding pathway, as it basically relates to a random search process. As we illustrate below, no such limitations exist in the building block folding model. The concept of the combinatorial assembly of the conformationally fluctuating set of building blocks, with their most highly populated conformations being those observed at the native state, accounts for these remarkably well. Hence, for example, mutations and different solvent conditions will result in changing the energy landscape, the outcome of shifts in the populations of the building blocks (Tsai et al., 1999cGo; Kumar et al., 2000Go). Thus, for example, the experimental observation of an intermediate state in a previously cited example of a protein obeying two-state kinetics upon changing the solvent conditions (Park et al., 1997Go), is consistent with the building block folding model. On the other hand, unlike the relative contact order, the building block model is unable to provide a quantitative assessment of the folding kinetics, and hence does not yield a numerical correspondence with experimentally derived folding rates.

Table IGo provides a summary of the proteins used in this work. Fourteen proteins have been analyzed. Most of the proteins are two-state folders. They belong to ß-protein, ß-sandwich domains and {alpha}/ß folds. Table 1Go lists the observed folding rates for each of the cases discussed below. The cases and values are taken from the review by Jackson (Jackson, 1998Go).


View this table:
[in this window]
[in a new window]
 
Table I. Observed folding rates and the potential origin of the kinetic traps
 

    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
We have recently devised a building blocks cutting algorithm (Tsai et al., 2000Go). Given a 3-D protein structure, it is iteratively cut to uncover its anatomy tree. Through a progressive top-down dissecting procedure, we produce domains, then hydrophobic folding units (HFU), and finally, at the end of this multi-level procedure, a set of conformationally fluctuating building blocks. The obtained anatomy tree illustrates the most likely folding pathway(s). The tree itself already points to the kinetics of folding, to whether the protein is a fast or a slow folder, to potential intermediates and to the probability of misfolding. According to the building blocks folding model, if we were to cut out building blocks from the polypeptide chain, the most populated conformation of a building block-peptide is likely to be similar to that of the building block when it forms part of the protein in its native state.

The scoring function

The scoring function is based on a scoring function that has been successfully applied to locate HFU (Tsai and Nussinov, 1997Go). The HFU scoring function has four ingredients: compactness, hydrophobicity, degree of isolatedness and number of segments. However, since a building block has only one segment, only the first three ingredients are used in the current scoring function for locating building blocks. The scoring function we have designed is fragment-size-independent. The new function is a linear combination of the three elements. Each quantity is calculated as the deviation from the averaged value of known protein structures. The new scoring function, ScoreB.B., is:

Z, H, and I are respectively, the compactness, the hydrophobicity and the degree of isolatedness of a candidate fragment. For their calculation see Tsai et al. (Tsai et al., 2000Go). The corresponding arithmetic average, XAvg, and standard deviation, XDev, are determined from a non-redundant dataset of 930 representative single-chain proteins. Average and standard deviation with superscript 1 are calculated with respect to fragment size; average and standard deviation with superscript 2 are calculated as a function of the fraction of the fragment size to the whole protein.

The size-independent stability form of the function assumes that fragments of different sizes have equal averaged conformational stability. However, the linearity of the hydrophobicity (H) and of the isolatedness (I) in the region of fragment size >150 residue suggests that true fragmental stability should reflect this trend. Therefore, rather than use statistical values, the HAvg and IAvg in the scoring function are calculated from a fitted straight line, reflecting the relative size-dependent stability.

The cutting procedure

Step 1: Locating a basket of building blocks (relatively stable contiguous fragments). For each protein 3-D structure, all fragment candidates are assigned a stability score. We collect a basket of building blocks by locating all local minima on the fragment map. We define a local minimum as follows: A local minimum is the highest value in a specified local region. A local region can be specified in an absolute or relative way. To locate compact substructures in a fragment map, Zehfus (Zehfus, 1993Go) used an absolute local quasi-circle region, with a distance of four residues from a candidate fragment. We have found that a relative local region works better for locating building blocks. Hence, we have adopted the Zehfus' quasi-circle definition as the area of a local region in our algorithm. However, we took the radius of the circle as a variable which is 7.5% of the size of a candidate fragment. To locate all building blocks, every fragment candidate is checked to see if it is the highest scoring within its local region. If a candidate is the highest scoring, we register it in the basket of building blocks.

Step 2: A recursive top-down splitting process. It is practically universally agreed that the folding process does not follow a single pathway. In constructing an anatomy tree, our goals are twofold: first, an anatomy tree should straightforwardly yield the most likely folding pathway(s); and second, it should identify the set of the most likely building blocks which, via a process of combinatorial assembly, form the final, native protein conformation. Consequently, the anatomy of the structure is organized as a tree which grows upside down. The root node is the starting node, and it constitutes the entire native protein. Each node corresponds to a contiguous fragment. A node can sprout multiple branches to create child nodes. Child nodes are created through a multi-splicing process. If a new node does not produce a child, it is an end node. The level of a node is a function of the number of steps needed to trace back to the root node. The tree-growth process stops when no new child nodes can be created. At the end, the set of end nodes constitutes the most likely building blocks and the tree organization depicts the most likely folding pathway. Our ability to generate such anatomy trees further illustrates that the ‘building block’ folding model leads to a hierarchical folding.

The multi-splicing procedure

Unlike other top-down binary splicing algorithms, our recursive top-down multi-splicing procedure does not limit the number of branches at any level. Initiating from a node fragment and a basket of building blocks generated above, the search for multiple cutting is as follows: We search the basket for a set of fragments that covers the entire node fragment. The search algorithm follows several rules: First, a short overlap between building blocks is permitted, with the overlapping segment not more than seven residues. Second, if an unassigned segment is less than 15 residues, it is left unassigned. Otherwise, the segment is assigned to be a low-score building block, not listed in the original building block basket. A short unassigned segment may be a linker between two large building blocks. A long low-score fragment may also be a fragment linker or a building block that has opened-up. Third, except for the root node, a node can not have only one branch-child node. Fourth, a node becomes an end node if there are no two building blocks with scores above a threshold. Fifth, the sum of the first two fragment scores is used to rank all cut building block candidates.

Assembly of hydrophobic folding units

In addition, at each branching level, we obtain HFUs by combinatorially assembling the collection of building blocks. The validity of such a procedure is straightforwardly implied from the building block folding model. Preliminary results indicate a significant improvement compared to our previous HFU cutting algorithm (Tsai and Nussinov, 1997Go). The detailed procedure will be reported elsewhere (Tsai and Nussinov, 2001Go).

The adenylate kinase example

Figure 1Go illustrates an example of the anatomy tree of adenylate kinase (PDB code: 1aky; Bernstein et al., 1977Go), a complex, three-state folding protein. Figure 1AGo depicts the most likely folding pathway of the protein, marked on its fragment map. The blue horizontal lines are the local minima. The building blocks drawn in red are those which take part in the most likely folding pathway. Inspection of the figure illustrates that the building blocks assemble in multiple routes to finally give the native fold at the top. The complexity of the fold is immediately apparent from the non-binary nature of the (red) pathway. The detailed step-by-step micro-paths of the adenylate kinase are shown in Figure 1BGo. The branches which sprout from a node, form the respective node. Each branch is a building block, and its beginning and end positions and its score are labeled. Additionally, the hydrophobic folding unit to which the building block is assigned in the combinatorial assembly process is noted (in parentheses, and at the top right-hand side of the figure). The combinatorial assembly process forming the HFU is carried out for each level of cutting. The non-contiguity of the building blocks fragments constituting the HFU is apparent, consistent with the complexity of the fold. Figure 1CGo depicts the step-by-step dissection into building blocks (top row) and assembly into the HFU (bottom row), in a graphical representation. Going through Figure 1B and CGo we can construct the most likely folding pathways of the adenylate kinase, based on our algorithm and scoring function. On the other hand, by going through Figure 1AGo, and inspecting the fragment map along with the scores assigned to each of the building blocks minima (not shown), we can identify the alternate folding pathways.





View larger version (81K):
[in this window]
[in a new window]
 
Fig. 1. The anatomy tree of adenylate kinase. (A) The 2-D fragment map. The X and Y co-ordinates represent the fragment location and fragment size, respectively. Local minima in the fragment map are indicated by solid circles. The associated horizontal lines for these minima reflect the building blocks sizes. The collection of building blocks, at each level, which participate in the most likely pathway are drawn in red, to distinguish them from the entire local minima collection, drawn in blue. The red lines connecting the red building blocks indicate the parent–child relationship. The blue building block fragments may participate in alternate folding pathways. (B) The detailed anatomy tree of adenylate kinase. Starting with the entire protein as a parent node (solid circle), the branches (open circle) are linked to the their corresponding parent node. Next, each child becomes a new starting parent node. If a new parent node does not produce any children, it is an end node of the corresponding building block. The vertical bars are drawn to reflect the size of the end building block node. Each building block is labeled with its stability score, and a letter in parentheses. The letter indicates to which hydrophobic folding unit (A, B, C) the building block belongs. The HFU and their scores are noted at the top right-hand side of the figure. (C) The pictorial, step-wise cutting of the adenylate kinase (top row) and the combinatorial assembly into folding units (bottom row). Four levels of building blocks cutting are illustrated for this protein. In the building block assignments, the following colors are assigned from the N- to the C-terminus, in this order: red, green, yellow, blue, cyan. In the HFU, the color assignments are in alphabetical order: green, red and yellow.

 
Inspection of Figure 1CGo illustrates the critical importance of the N-terminus building block (in red, at the second level of cutting, in the top row of the figure). This building block mediates the interactions of other building blocks. Hence, this building block acts as an unsliced intra-molecular chaperone (Ma et al., 2000Go). The green building block at this, second level of cutting is also a hydrophobic folding unit (bottom row, last two drawings). This green building block might not be very stable; however, it is still likely to have a higher population time than all alternate conformations. Here we show that analysis of the building blocks and their mutually stabilizing associations into larger more stable units, rationalize kinetic data of the rates of protein folding.

Non-native interactions among building blocks

Fast and slow folding rates are determined by the number and depths of the on-track traps. Bumps can be the outcome of three types of cases: (i) non-native interactions between building blocks; (ii) non-native interactions within the building blocks, with more such non-native interactions expected for larger building blocks than for shorter ones; and (iii) barriers which need to be crossed to enable association between native building blocks. The higher the barriers, the slower the rates. Since the building blocks cutting is based entirely on native structures, the potential occurrence of non-native interactions can only be deduced. Hence, here the fast and slow folding rates are distinguished only qualitatively. If no intermediate state (thermodynamic or kinetic) has been observed, the folding process of the protein encounters no significant deep well traps on its mainstream folding path. In general, in two-state proteins the folding rate is inversely proportional to the likelihood of non-native interactions. Building blocks with larger sizes are expected to have more non-native interactions during their folding, and spend more time reaching their native state. Sequentially interacting building blocks are expected to be much more successful in avoiding non-native interactions than non-sequential ones. By utilizing a faster kinetic probing technique (Roder and Shastry, 1999Go), some previously thermodynamically defined two-state proteins have been reported as possessing kinetically detectable intermediate structures (Park et al., 1997Go). The ability to straightforwardly interpret this kinetic observation lends further support to the building block folding model.

On the other hand, an observable intermediate state is caused by an unavoidable on-track trap on the protein folding paths. An intermediate is observed if the trap is deep enough on a time scale which enables it to be distinguished as a three-state, as compared to a two-state folding. The trap can be a non-native interaction within the building block, or alternatively, it can be a barrier-crossing in the association of the native building block conformations. The effect of a mutation might be to create strong non-native interactions. When a mutation reduces non-native interactions, the folding rate is increased. Examples are discussed below.

In the next section, we go through well studied cases of two-state kinetics with faster and slower folding rates of proteins. We illustrate that proteins with similar general topologies may be cut differently to yield altered building blocks cuttings and different folding routes (Jackson, 1998Go; Ionescu and Matthews, 1999Go).


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Two-state folding kinetics: correlation with the building blocks folding model

Figures 2–5GoGoGoGo illustrate the results of the cutting into building blocks of a sample of proteins for which there are available kinetic data (Jackson, 1998Go). Below we describe these results, classifying them in terms of their folds. Cases which are not shown here can be viewed at our web site (http://protein3d.ncifcrf.gov/tsai/anatomy.html), where the cuttings into building blocks and the anatomy trees of the entire PDB are displayed.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 2. Three examples of two-state proteins, illustrating the qualitative correspondence between the building blocks cuttings and the folding kinetics. (A) The activation domain of procarboxypeptidase A2, ADAh2 (PDB code: 1PBA) a fast folding protein. The red and green building blocks in the third level of cutting (on the right-hand side of the figure), constituted one building block in the second level of cutting (all red, on the left-hand side of the figure). (B) The muscle acylphosphatase (PDB code: 1APS), a slow folding protein. The red and the green building blocks are generated as such directly at the second level of cutting. (C) The histidine-containing phosphocarrier protein (PDB code: 1HDN), a slow folding protein. Details are given in the text.

 


View larger version (22K):
[in this window]
[in a new window]
 
Fig. 3. The cuttings of two-state ß-proteins. (A) The cold shock protein (PDB code: 1CSP). This protein is a fast folder. SH3 domains: (B) the src domain (1SRL) is a relatively fast folding two-state protein. (C) P13 kinase (1PKS), a slow two-state folder.

 


View larger version (38K):
[in this window]
[in a new window]
 
Fig. 4. The third level of cutting into building blocks of two-state, {alpha}/ß-proteins. (A) The spliceosomal protein U1A (1URN) a fast folding protein. The third level of cutting is depicted here. (B) Ubiquitin (1UBQ), a fast folding protein. The figure depicts the results of the third level of the building blocks cutting. (C) FKBP12 (1FKB), a slow two-state folder, the third level of cutting. (D) CI2 (1COA), a relatively fast folding protein. (E) Villin (2VIK), a fast folding protein. The figure illustrates the third level of cutting.

 


View larger version (31K):
[in this window]
[in a new window]
 
Fig. 5. Cutting into building blocks of three-state folding proteins. (A) Che Y (5CHY) is a slow folding protein. The figure depicts its building blocks cutting at the third level. (B) Ribonuclease H (2RN2), a slow three-state folding protein.

 
ß-Proteins A cold shock protein (PDB code: 1CSP). Figure 3AGo illustrates the cuttings of this ß-barrel protein, here taken from Bacillus subtilisin. This is a two-state protein (Schindler et al., 1995Go), which folds with a very fast rate. There are two building blocks, with no possible mis-association between them. With no apparent potential non-native interactions that may take place as the protein folds, the building blocks folding model predicts fast folding kinetics.

SH3-domains Here we illustrate two cases. The first, an src domain (1SRL) is a relatively fast folding two-state protein (Grantcharova and Baker, 1997Go). The second, P13 kinase (1PKS) is a slow two-state folder (Guijarro et al., 1998Go). Figure 3B and CGo illustrate, respectively, the cuttings of these two proteins. 1SRL has two building blocks, which fold independently and associate. 1PKS is cut into three building blocks already at the second level of cutting. Further, the interaction between the building blocks is non-sequential, with the C-terminus (in yellow) mediating the interactions between the red (N-terminus) and its sequentially adjoining green building blocks. Mis-association of the building blocks, taking place during the combinatorial assembly process, would yield the sequential, non-native interactions between the red and green building blocks. Such intermediates may get trapped, slowing the folding. This protein constitutes an interesting case study for simulations, conceivably illustrating this non-native conformer on a folding trajectory. Hence, here we have two proteins, with rather similar ß-barrels, SH3 domain topologies. One is a slow folder, the other illustrates fast folding kinetics. The building blocks dissections straightforwardly rationalize qualitatively this difference in folding rates. A third, potential example, tendamistat (2AIT), has an S–S bond, and hence is not analyzed here.

ß-Sandwich domains Tanascin (1TEN) is a slow folding protein (Clarke et al., 1997Go). The building blocks cuttings illustrate three building blocks already in the second level of cuttings. One of these is long (46 residues). While the interactions between the building blocks are sequential, during the combinatorial assembly of this ß-sandwich protein, the building blocks can switch places, introducing non-native contacts between the native building blocks conformations. In this protein there are a number of potential ways that the building blocks can associate affecting folding rates.

{alpha}/ß-Proteins Spliceosomal protein U1A (1URN) is a two-state, fast folding protein (Silow and Oliveberg, 1997Go). Figure 4AGo illustrates the third level of cutting, into three building blocks. However, in the second level (not shown) this protein is cut into two building blocks, similar to the case of 1PBA, shown in Figure 2AGo, with the red and the green building blocks constituting one block. This suggests that in this sequentially folding protein, the red–green building block association is more important than other potential associations. Furthermore, a non-native green–yellow association would involve flipping out of the red building block, with a likely red–yellow collision. Hence, in the potential green–yellow versus green–red building block binding competition, the latter dominates, as observed both in the second to third level building blocks cutting anatomy and by inspection of the spatial arrangement of the building blocks in the third level. Figure 4BGo illustrates the third level of the building blocks cutting of wild-type ubiquitin (1UBQ), a fast folding protein (Khorasanizadeh et al., 1993Go). As in 1URN (and in 1PBA; Figure 2AGo), in the second level two building blocks are produced, with the red and green building blocks shown here constituting a single building block, illustrating the stronger interactions between these building block segments. On the other hand, in Hpr, the histidine-containing phosphocarrier protein (1HDN; Figure 2CGo) three building blocks are produced already at the second level of cutting. A closer inspection of Figure 2CGo reveals the reason. The spatial arrangement of the building blocks in 1HDN is such that the interactions between the red and the green building blocks are mediated by the yellow C-terminus building block fragment. Hence, in the absence of the yellow, during the combinatorial assembly non-native interactions between the red and green building block fragments can take place, slowing the rate of folding of this two-state protein. FKBP12 (1FKB) is also a slow two-state folder (Jackson, 1998Go). Figure 4CGo gives the third level of cutting. The red building block at the N-terminus leads to the green, to the yellow, to the blue and finally to a short unassigned region. Inspection of the figure illustrates that the interactions between the red and its sequentially-connected green building block are mediated by the yellow building block, producing a non-sequentially folding protein. As in 1HDN, in the absence of the yellow and its attached unassigned region, non-native interactions between the red and the green would take place, slowing the folding rate. Consistently, in the second level of cutting, the red and the green building blocks are already separated (not shown).

Activation domain procarboxypeptidase A2 (ADAh2, 1PBA; Figure 2AGo) is a fast folder (Villegas et al., 1995Go), whereas the muscle acylphosphatase (1APS; Figure 2BGo) illustrates slow folding kinetics (van Nuland et al., 1998bGo). As noted above, the building block cuttings of these proteins follow the same principles. In 1PBA a single building block at the N-terminus of the protein at the second level of cutting, is split into two (red and green) building blocks in the third level. On the other hand, in 1APS three building blocks are produced already at the second level. Consistently, inspection of Figure 2BGo illustrates the reason: the red N-terminus building block fragment mediates the interactions between the green and the yellow C-terminus block, resulting in non-sequential interactions. Further, since the red is a long fragment it may take a longer time to reach its native state. Consequently, if the red does not attain its native conformation, the relatively unstable and non-compact yellow might collapse onto the green, adopting an alternate conformation. Thus, this protein does not illustrate independent folding, and association, of its building block fragments. These results explain the observations made by Dobson and colleagues (Chiti et al., 1999Go), namely, up to 11% trifluoroethanol (TFE) the {alpha}-helix formation in the red building block is enhanced. Since the native conformation of this building block is critical, the increase in the percent alcohol increases the folding rate. However, beyond 11% the rate slows down, since the ß-strands might also adopt in this case a non-native helical conformation. Further, in the absence of the alcohol, the yellow building block might interfere with the slower formation of the native conformation of the red building block fragment. Such a situation does not arise in 1PBA (Figure 2AGo).

Hence, as in the case of the two SH3 domain proteins above, although the topology of the two proteins is similar in overall shape, that is, {alpha}-helices stacked against a ß-sheet, the details of the way the chain goes yield different building blocks fragment cuttings, rationalizing the difference in the observed kinetics.

Figure 4DGo illustrates the cutting of CI2 (1COA; Jackson and Fersht, 1991Go). Two building blocks are observed, pointing to the fast folding kinetics of this protein. On the other hand, the length of the green building block suggests that it might take longer to reach its native conformation.

Figure 4EGo depicts the cutting of villin (2VIK), a fast folding protein (Choe et al., 1998Go), into building blocks. The figure illustrates the third level of cutting. In the second level of cutting (not shown), the red and green building blocks constitute a single building block fragment. The two observed hydrophobic cores (Choe et al., 1998Go) are within the building blocks. One hydrophobic core is within the green and the second within the yellow. The red building block enhances the green on one side of the hydrophobic core (which is the reason for the red and green forming a single building block in the second level of cutting), and the blue enhances both, through a continuation of the green into the yellow, on the upper side of the figure. Villin is a practically sequentially folding protein, with long helices and strands, also contributing to its high folding rate.

Three-state folding kinetics: correlation with the building blocks folding model

Here we focus on two examples: Che Y and ribonuclease H. Che Y (5CHY) is a slow folding protein (Munioz et al., 1994Go), despite its being a sequential folder. Figure 5AGo depicts its third level of building blocks cutting. The protein is tim-barrel like, with a ß–{alpha}–ß repeating unit. Its five building blocks are stable, form independently and are likely to have high population times. On the face of it, we would have expected this protein to be a fast folder. On the other hand, considerations of the density of states rationalize its slow folding kinetics. At, say, 65% native peptide bond formation, there are many possible combinations of native, intra- and inter-building blocks interactions. At, say, 80% there are considerably fewer. Hence, as the population of the native peptide bonds increases (Munioz and Eaton, 1999Go), the density of states of potential combinations (i.e. entropy) is reduced, without a correspondingly appreciable-enough decrease in the free energy to compensate for this decrease. This leads to a barrier in the association of the building blocks forming the native contacts. This effect is particularly strong for a sequentially folding protein, like Che Y. By correctly associating four building blocks, as compared to three building blocks, we decrease the density of states (Lopez-Hernandez and Serrano, 1996Go). It is instructive to compare this situation with that of villin, a fast folding protein (Choe et al., 1998Go). In villin, in associating more building blocks we also decrease the density of states, as in Che Y. However, since in villin there are two hydrophobic cores, the decrease in free energy is sufficient to compensate for the decrease in entropy, leading to its fast kinetics. Theoretical considerations suggest that in general, in the intermediate state there would be some non-native interactions, largely between the building blocks, rather than within them. Hence, in the intermediate state, in sequentially-folding proteins, most of the interactions are native, possibly explaining the success of native contacts-based models (e.g. Munioz and Eaton, 1999Go). Additionally, the native interactions between the building blocks which already form, can be between different combinations of building blocks.

This situation is reminiscent of {alpha}-tryptophan synthase, which has a tim-barrel fold (2WSY, Zitzewitz et al., 1999Go). Through stop-codon mutagenesis, Matthews and colleagues have generated a series of fragments, with the same N-termini. Their studies have indicated that all eight fragments that they have constructed were capable of forming secondary structures, and most fold co-operatively. However, interestingly, the addition of the fourth, sixth and eighth ß-strands lead to a distinct increase in structure, co-operativity and/or stability. Zitzewitz et al. propose that this indicates the modular assembly of the ß–{alpha}–ß structural element. In particular, all fragments that contained the first four ß–{alpha} elements showed a co-operative unfolding transition at high concentrations of urea with reduced stability as compared to the full length protein. This, non-homogeneous increase in stability suggests the presence of an intermediate state. Hence, in both the {alpha}-tryptophan synthase and Che Y cases, there are intermediate state(s). In both, by omitting one building block the remaining building blocks would collapse to a more stable and compact conformation, with non-native interactions formed between the building blocks, with consequent barriers. We have recently reviewed the experimental and theoretical literature, providing evidence consistent with the proposition that the most populated conformation of a building block-peptide is likely to be similar to that of the building block when it forms part of the protein in its native state (Tsai and Nussinov, 2001Go;Tsai et al., 2001Go).

Figure 5BGo depicts the second level of building blocks cutting of ribonuclease H (2RN2), a slow three-state folding protein (Raschke and Marqusee, 1997Go). There are two domains in this protein, with the N-terminus red building block fragment mediating the interactions between the green, yellow and blue building blocks. In the absence of the red building block, non-native interactions are likely to take place between the sequentially connected green and yellow, and the yellow and blue building blocks. This case is reminiscent of the dihydrofolate reductase (7DFR; Gegg et al., 1997Go), except that the red building block is considerably larger.


    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
According to the building blocks folding model, the rate of protein folding depends on a number of criteria. First, the building block size, with the larger the size the slower the folding. Second, the number of building blocks in the protein. Third, their sequential/non-sequential mode of association in the native structure. Fourth, their pattern of cuttings in consecutive levels of the anatomy tree. Fifth, the population times of the building blocks. In general, the folding rate will be fast if: (i) the building blocks are relatively short and hence the number of their potential conformations is smaller; (ii) if they fold independently of each other; and (iii) if their association is sequential. Further, the folding rate may be expected to be high if the building blocks which are generated from a longer, contiguous fragment at a given level of cutting belong to a single building block in a previous level of cutting. On the other hand, the rate is expected to be slower if already at the earlier cutting level, the two building blocks are cut to form two separate units. The fact that they are split right at the outset of the cutting procedure, illustrates that their association is not too stable. In the first example here, the red and green building blocks in the third level of cutting (Figure 2AGo), constituted one building block in the second level of cutting. This protein (the activation domain of procarboxypeptidase A2, ADAh2; Villegas et al., 1995Go, PDB code: 1PBA) folds fast. In the second example, in Figure 2BGo, the red and the green building blocks are generated as such directly at the second level of cutting. This protein (PDB code: 1APS, muscle acylphospatase; van Nuland et al., 1998aGo) folds slowly. Under such circumstances, there is a higher probability of non-native interactions between the building blocks. The folding rate is also expected to be slower if the interactions of two consecutive building blocks are mediated by another building block (in yellow), whose sequential position follows both (histidine-containing phosphocarrier protein, Figure 2CGo; van Nuland et al., 1998bGo, PDB code: 1HDN). These considerations are in addition to the secondary structure type, to the relative contact order (Plaxco et al., 1998Go) and to the protein size.

Qualitatively, the building blocks folding model can provide some guidelines to explain the available kinetic data for two-state and three-state folding proteins. Together with the relative contact order (Plaxco et al., 1998Go), and the different rates of formation of {alpha}-helices and of ß-structures, the kinetics can be better understood.

The folding rates are the outcome of several criteria relating to the building blocks sizes, their population times, their sequential/non-sequential arrangement in the native protein, and the number of ways they are likely to associate. Through utilization of the anatomy trees, we show that proteins with similar general topologies, may have different folding pathways and kinetics. These depend on the details of the topologies and on the path the backbone actually follows. The fact that acylphosphatase and ADAh2 fold with different rates (Jackson, 1998Go; Chiti et al., 1999Go; Ionescu and Matthews, 1999Go) is not surprising. While overall their topologies are similar, the details differ, resulting in altered patterns of building blocks cuttings. In APS, the C-terminus is inserted between the red and green building blocks (Figure 2Go), which are already split at the second cutting step, unlike the situation in ADA. This suggests that when considering folding rates, and topologies, we should consider not only the type of fold such as {alpha}/ß, {alpha} + ß, ß, etc.

In general, neither the relative contact order, nor the building blocks cuttings are affected by mutations, since the details of the topologies are unchanged. Yet, mutations can affect the folding rates. Mutations can increase (decrease) the population times of some conformations (e.g. {alpha}-helices or ß-strands) with respect to others, shifting the equilibrium (Kumar et al., 2000Go). Mutations change the conditions, in a manner similar to that observed by changing the external environment (Tsai et al., 1999cGo), such as pH, temperature, ionic strength or denaturing agents, such as urea, either by lowering the barriers, or deepening the wells. Simulations might distinguish between these two mechanisms.

Current data suggest that evolution has not optimized protein sequences and structures for fast folding (Kim et al., 1998Go; Alm and Baker, 1999aGo; Plaxco et al., 2000Go). Since folding is generally on a time scale of seconds to minutes, in most cases, reducing the folding times is not biologically of prime importance. By lowering the barrier heights, molecular chaperones, or intra-molecular chaperones (Ma et al., 2000Go) change the landscape, speeding up protein folding. Additionally, the co-translational folding, particularly of sequentially folding proteins in eukaryotic systems, reduces the probability of misfolding, and aggregation.

The building blocks model is based on native interactions, derived from native protein conformations. The model states that folding involves associations of building blocks that are mostly in their native conformations. These conformations might not be stable, but they are the most highly populated ones. On the other hand, the relative contact order is residue-based. While a residue-based combinatorial assembly deteriorates to a random search, a combinatorial assembly of building blocks implies considerations of population times, and hence implicitly indicates a ‘restriction’ in the conformational space search. The funnel shape of the energy landscape already by itself suggests that folding proceeds by a combination of building blocks, rather than through a combinatorial assembly of residues. Thus, considerations of building blocks, as compared to residues, simplify the landscape. The more stable the non-native associations, the deeper the traps and the higher their population times. Nevertheless, these traps also serve a purpose, in trapping the native building blocks conformations and shifting the equilibrium in their favor.

Despite the progress that has been made in the computational methodologies for predicting the folding rates, currently there is no method which can predict the folding rate accurately. With the exception of ab initio calculations, to date all methods are based on the native state (Alm and Baker, 1999bGo; Galzitskaya and Finkelstein, 1999Go; Munioz and Eaton, 1999Go), without taking into account non-native interactions.


    Notes
 
3 To whom correspondence should be addressed. E-mail: ruthn{at}ncifcrf.gov Back


    Acknowledgments
 
We thank Drs Buyong Ma, Sandeep Kumar and, in particular, Jacob V.Maizel for many helpful discussions. The research of R.Nussinov in Israel has been supported in part by grant number 95-00208 from the BSF, by the Center of Excellence administered by the Israel Academy of Sciences, by the Magnet grant, by a Ministry of Science grant and by the Tel Aviv University Basic Research grants. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract number NO1-CO-56000. The content of this publication does not necessarily reflect the view or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government. The publisher or recipient acknowledges right of the US Government to retain a non-exclusive, royalty-free license in and to any copyright covering the article.


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Alm,E. and Baker,D. (1999a) Curr. Opin. Struct. Biol., 9, 189–196.[ISI][Medline]

Alm,E. and Baker,D. (1999b.) Proc. Natl Acad. Sci. USA, 96,11305–11310.[Abstract/Free Full Text]

Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535–542.[ISI][Medline]

Chiti,F., Taddei,N., Webster,P., Hamada,D., Fiaschi,T., Ramponi,G. and Dobson,C.M. (1999) Nat. Struct. Biol., 6, 380–386.[ISI][Medline]

Choe,S.E., Matsudaira,P.T., Osterhout,J., Wagner,G. and Shakhnovitch,E.I. (1998) Biochemistry, 37, 14508–14518.[ISI][Medline]

Clarke,J., Hamill,S.J. and Johnson,C.M. (1997) J. Mol. Biol., 270, 771–778.[ISI][Medline]

Galzitskaya,O.V. and Finkelstein,A.V. (1999) Proc. Natl Acad. Sci. USA, 96, 11299–11304.[Abstract/Free Full Text]

Gegg,C.V., Bowers,K.E. and Matthews,C.R. (1997) Protein Sci., 6, 1885–1892.[Abstract/Free Full Text]

Grantcharova,V.P. and Baker,D. (1997) Biochemistry, 36, 15685–15689.[ISI][Medline]

Guijarro,J.I., Morton,C.J., Plaxco,K.W., Campbell,I.D. and Dobson,C.B. (1998) J. Mol. Biol., 276, 657–667.[ISI][Medline]

Ionescu,R.M. and Matthews,C.M. (1999) Nat. Struct. Biol., 6, 304–307.[ISI][Medline]

Jackson,S.E. (1998) Fold. Des., 3, R81–R91.[ISI][Medline]

Jackson,S.E. and Fersht,A.R. (1991) Biochemistry, 30, 10428–10435.[ISI][Medline]

Khorasanizadeh,S., Peters,I.D., Butt,T.R. and Roder,H. (1993) Biochemistry, 32, 7054–7063.[ISI][Medline]

Kim,D.E., Gu,H. and Baker,D. (1998) Proc. Natl Acad. Sci. USA, 95, 4982–4986.[Abstract/Free Full Text]

Kumar,S., Ma,B., Tsai,C.J., Sinha,N. and Nussinov,R. (2000) Protein Sci., 9, 10–19.[Abstract]

Lopez-Hernandez,E. and Serrano,L. (1996) Fold. Des., 1, 43–55.[ISI][Medline]

Ma,B., Tsai,C.J. and Nussinov,R. (2000) Protein Eng., 13, 617–627.[Abstract/Free Full Text]

Martinez,J.C., Pisabarro,M.T. and Serrano,L. (1998) Nat. Struct. Biol., 5, 721–729.[ISI][Medline]

Munioz,V. and Eaton,W. (1999) Proc. Natl Acad. Sci. USA, 96, 1131–1136.

Munioz,V., Lopez,E.M., Jager,M. and Serrano,L. (1994) Biochemistry, 33, 5858–5866.[ISI][Medline]

Park,S.H, O'Neil,K.T. and Roder,H. 1997. Biochemistry, 36,14277–14283.[ISI][Medline]

Perl,D., Welker,C., Schindler,T., Schroder,K., Marahiel,M.A., Jaenicke,R. and Schmid,F.X. (1998) Nat. Struct. Biol., 5, 229–235.[ISI][Medline]

Plaxco,K.W., Simons,K.T. and Baker,D. (1998) J. Mol. Biol., 277, 985–994.[ISI][Medline]

Plaxco,K.W., Larson,S., Ruczinski,I., Riddle,D.S., Thayer,E.C., Buchwitz. B., Davidson,A.R. and Baker,D. (2000) J. Mol. Biol., 298, 303–312.[ISI][Medline]

Raschke,T.M. and Marqusee,S. (1997) Nat. Struct. Biol., 4, 298–304.[ISI][Medline]

Riddle,D.S., Santiago,J.V., Bray-Hall,S.T., Doshi,N., Grantcharova,V.P., Yi,O. and Baker,D. (1997) Nat. Struct. Biol., 4, 805–809.[ISI][Medline]

Roder,H. and Shastry,M.C.R. (1999) Curr. Opin. Struct. Biol., 9, 620–626.[ISI][Medline]

Schindler,T., Herrler,M., Marahiel,M.A. and Schmid,F.X. (1995.) Nat. Struct. Biol., 2,663–673.[ISI][Medline]

Silow,M. and Oliveberg,M. (1997) Biochemistry, 36, 7633–7636.[ISI][Medline]

Tsai,C.J. and Nussinov,R. (1997) Protein Sci., 6, 24–42.[Abstract/Free Full Text]

Tsai,C.J. and Nussinov,R. (2001) Cell Biochem. Biophys., 34, 209–235.[ISI][Medline]

Tsai,C.J., Xu,D. and Nussinov,R. (1998) Fold. Des., 3, R71–R80.[ISI][Medline]

Tsai,C.J., Kumar,S., Ma,B. and Nussinov,R. (1999a) Protein Sci., 8, 1181–1190.[Abstract]

Tsai,C.J., Ma,B. and Nussinov,R. (1999b) Proc. Natl Acad. Sci. USA, 96, 9970–9972.[Free Full Text]

Tsai,C.J., Maizel,J.V. and Nussinov,R. (1999c) Protein Sci., 7, 73–87.

Tsai,C.J., Maizel,J.V. and Nussinov,R. (2000) Proc. Natl Acad. Sci. USA, 97, 12038–12043.[Abstract/Free Full Text]

Tsai,C.J., Ma,M., Kumar,S., Wolfson,H. and Nussinov,R. (2001)CRC Crit. Rev. Biochem. Mol. Biol., in press.

van Nuland,N.A.J. et al. (1998a) Biochemistry, 37, 622–637.[ISI][Medline]

van Nuland,N.A.J. et al. (1998b) J. Mol. Biol., 283, 883–891.[ISI][Medline]

Villegas,V., et al. (1995) Biochemistry, 34, 15105–15110.[ISI][Medline]

Zehfus,M.H. (1993) Proteins, 16, 293–300.[ISI][Medline]

Zitzewitz,J.A., Gualfetti,P.J., Perons,I.A., Wasta,S.A. and Matthews,C.R. (1999) Protein Sci., 8, 1200–1209.[Abstract]

Received January 20, 2001; revised June 8, 2001; accepted July 18, 2001.