The importance of loop length in the folding of an immunoglobulin domain

Caroline F. Wright, John Christodoulou, Christopher M. Dobson and Jane Clarke1

University of Cambridge, Department of Chemistry, Lensfield Road, Cambridge CB2 1EW, UK

1 To whom correspondence should be addressed. E-mail: jc162{at}cam.ac.uk


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Immunoglobulin (Ig)-like proteins have been shown to fold following formation of a nucleus comprising interactions between residues that are distant in the primary sequence. What role do the loops connecting these nucleus residues play? Here, the importance of loops connecting ß-strands in different sheets of the Ig fold is investigated, by insertion of five glycine residues into the B–C loop of an Ig domain from human titin, TI I27. The folding pathway of this elongated ‘pseudo wild-type’ TI I27 is probed using protein engineering and {Phi}-value analysis. The {Phi}-values calculated for mutants within the pseudo wild-type protein indicate that the folding nucleus in wild-type TI I27 is conserved, supporting the hypothesis that the inter-sheet loop is not critical to the formation of a long-range folding nucleus.

Keywords: immunoglobulin/loop/nucleation condensation/protein folding/titin


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
A recurring theme in protein folding studies is the nucleation-condensation mechanism of folding (Fersht, 1995Go; Mirny et al., 1998Go; Vendruscolo et al., 2001Go; Fersht and Daggett, 2002Go; Dobson, 2003Go). In this mechanism of folding, long-range tertiary interactions between residues far removed in sequence define the correct chain topology and lead to collapse of the tertiary structure around an extended nucleus. Another model of protein folding is provided by the hierarchical mechanism, in which short range, local interactions dominate folding. In a hierarchical system, folding begins with the formation of marginally stable secondary structures local in sequence, such as ß-hairpins or {alpha}-helices, which then interact to produce intermediates of increasing tertiary complexity (Baldwin, 1989Go).

The study of structurally homologous proteins allows us to investigate how different sequences can produce a similar fold, whether the folding mechanism is conserved and what features of the native state topology are important in driving folding (Clarke et al., 1999Go; Gunasekaran et al., 2001Go). Individual residues play differing roles in stabilizing the transition state on the folding landscape (Vendruscolo et al., 2001Go) and both folding rates and mechanisms appear to be largely determined by the topology of the native state (Chiti et al., 1999Go; Debe et al., 1999Go; Baker, 2000Go; Makarov and Plaxco, 2003Go). The ‘fold approach’ (Clarke et al., 1999Go) seeks to compare proteins from diverse superfamilies, which are functionally unrelated and share little sequence homology and are linked only by a common fold. Since the proteins share no conservation of function, the evolutionary pressures upon them are different. This approach allows the advantages of a particular fold to be assessed in terms of structural stability or flexibility and folding properties alone.

The immunoglobulin-like (Ig-like) fold is one of the most common structural motifs in the protein database, and is subdivided into superfamilies with no detectable functional or evolutionary relationships (Jones, 1993Go; Bork et al., 1994Go). These domains all have a ß-sandwich structure of two interacting antiparallel ß-sheets with a Greek Key topology. The Ig-like fold, which includes the Ig superfamily, has over 12 000 members in the current Pfam database (Bateman et al., 2000Go). Do all proteins with this structural motif fold in a common manner?

Previous work has characterized the folding mechanism of several different Ig domains: the 27th I-type domain from the human muscle protein titin (Fowler and Clarke, 2001Go), the third fibronectin type III domain from human tenascin (Clarke et al., 1997Go; Hamill et al., 2000bGo), the tenth fnIII domain from human fibronectin (Cota et al., 2001Go) and a V-type Ig domain from rat CD2 (Lorch et al., 1999Go). Using {Phi}-value analysis (Fersht et al., 1992Go; Fersht, 1998Go), a ring of structurally equivalent residues within the core of the proteins has been identified as being well structured in the transition state. On the basis of these results it has been proposed that the proteins share a common nucleation-condensation folding pathway (Fersht, 1995Go; Mirny et al., 1998Go; Clarke et al., 1999Go) in which the folding nucleus is formed in the transition state by tertiary interactions between these equivalent residues in strands B, C, E and F (Figure 1). The {Phi}-values are high for the residues within the core and decrease as the distance from the nucleus increases. It is proposed that topologically restricted regions such as the B–C and E–F inter-sheet loops are also obligated to form in the transition state, whilst regions more distant from the folding core, such as the N- and C-termini, are unrestrained and hence free to fold at a later stage (Hamill et al., 2000bGo; Fowler and Clarke, 2001Go). Such regions have also been shown to unfold early (Paci and Karplus, 2000Go). If this hypothesis is correct, the inter-sheet loops do not play a critical role in initiating folding, but are pulled in later by the central nucleus.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 1. Structure of a ß-sandwich Ig domain. Ribbon representation of TI I27, with the side chains of the residues of the nucleus (I23, W34, L58 and F73) shown (Fowler and Clarke, 2001Go). The linking loop sequences between the folding nucleus residues are highlighted: B–C, black and E–F, white. Figure generated using Molscript (Kraulis, 1991Go).

 
In the work described in this paper, the 27th Ig domain from titin (TI I27) (Pfuhl and Pastore, 1995Go; Politou et al., 1995Go; Squire, 1997Go; Linke et al., 1998Go) is used as a model system to investigate the role of these inter-sheet loops in the Ig domains. This example is a particularly interesting one to choose because, although the folding data point towards a classic nucleation-condensation mechanism for this domain, more recently we have shown that the anomalous unfolding kinetics of TI I27 can be explained by the existence of at least one other parallel (un)folding pathway (Wright et al., 2003Go). The transition states along the two pathways differ greatly in terms of compactness, as is indicated by their ßT values. (ßT provides a measure of structure in the transition state relative to the native state, ranging from totally unstructured, ßT = 0, to a degree of structure equivalent to the native state, ßT = 1.) In the absence of denaturant, the nucleation-condensation pathway (denoted pathway L, for low denaturant) is dominant, in which the whole polypeptide chain is involved in a highly compact transition state, with a ßT value of ≥0.95. However, as the concentration of chemical denaturant is increased, this highly structured transition state becomes progressively destabilized relative to another, less compact transition state along a parallel pathway (denoted pathway H, for high denaturant) with a ßT of ~0.7, which therefore becomes more highly populated (Figure 2).



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 2. Two-dimensional representation of free energy profiles for two parallel pathways, L and H. Under normal folding conditions, the more compact transition state L (TSL) is lower in energy than transition state H (TSH), and so is the dominant folding pathway from the denatured state (D) to the native state (N). If the B–C loop is important for formation of TSL but not TSH, then elongation of this loop will destabilize TSL relative to TSH such that the dominant folding pathway is moved away from the native state, from pathway L to H.

 
Structural characterization of the two transition states by {Phi}-value analysis (Wright et al., 2003Go) revealed that the inter-sheet loops play different roles in the transition states of the two pathways. Whilst the E–F loop is involved in both transition states, the shorter B–C loop is only highly structured in transition state L whereas in pathway H it is totally unstructured. This difference provides an excellent situation in which to test the importance of the inter-sheet loops to the early stages of folding for the Ig domains. If a short inter-sheet loop is crucial to the formation of the folding nucleus, then elongation of the B–C loop in TI I27 should destabilize transition state L whilst having relatively little effect on transition state H. Therefore, we would expect pathway H to become more dominant at lower concentrations of denaturant and hence increase the upwards curvature observed in the denaturant dependent unfolding kinetics. On the other hand, if the B–C loop is unimportant to the formation and stability of the nucleus in transition state L, this latter pathway will continue to be dominant.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Cloning and purification

The production of plasmids in an ampicillin resistant T7 expression vector with EcoRI and BamHI nuclease sites either side of the cDNA insert has been described previously (Carrion-Vasquez et al., 1999Go). Insertion of the five sequential glycine residues into the plasmid was carried out using a standard overlap extension PCR procedure (for a schematic of the procedure used see Nagi and Regan, 1997Go). Site-directed mutagenesis was performed using the QuikChange Kit from Stratagene on this elongated pseudo wild-type plasmid. The identity of the pseudo wild-type and mutant proteins was confirmed by DNA sequencing and ESI mass spectroscopy. The mutants used in this work are listed in Table II. Protein expression was carried out in Escherichia coli C41 cells (Miroux and Walker, 1996Go). Uniformly labelled 13C and 15N TI I27 pseudo wild-type protein was expressed using M9 minimal media with 13C-glucose and 15N-ammonium chloride as the sole sources of carbon and nitrogen, respectively. Proteins were concentrated to ~200 mM using Sartorius Vivaspin concentrators, stored at 4°C and used within a week of production.


View this table:
[in this window]
[in a new window]
 
Table II. Summary of the thermodynamic and kinetic properties of the mutations made to pseudo wild-type TI I27

 
Stability

The stabilities of the pseudo wild-type and mutant proteins were determined using GdmCl-induced equilibrium denaturation (Carrion-Vasquez et al., 1999Go). The protein concentration was 1 mM in each case, and all experiments were carried out in PBS buffer (10 mM sodium phosphate, 137 mM NaCl, 2.7 mM KCl, pH 7.4) with 5 mM DTT at 25°C with the samples being allowed to equilibrate overnight to ensure that equilibrium was achieved. The excitation wavelength was 280 nm and the emission wavelength was 320 nm, where the difference in signal intensity between the native and denatured states is at a maximum. The data were fitted to a two-state transition (Clarke and Fersht, 1993Go) using the program KaleidaGraph (Synergy Software).

The change in free energy on mutation, {Delta}{Delta}GD-N was determined from the product of the mean mD-N and the [D]50% (Fersht et al., 1992Go) and the difference in the effect of mutation between wild-type and pseudo wild-type proteins was calculated using Equation (1):

(1)
where [D]50% is the concentration of denaturant at which 50% of the protein is folded for wild-type (wt) and pseudo wild-type (pwt) proteins, <mD-N>wt is the mD-N value determined from all stability measurements of wild-type and its mutant proteins (2.50 ± 0.04 kcal mol–1) (Fowler and Clarke, 2001Go) and <mD-N>pwt is the error weighted mean mD-N value determined from all measurements of pseudo wild-type and its mutant proteins (2.92 ± 0.06 kcal mol–1).

Refolding kinetics

Refolding measurements were carried out using an Applied Photophysics stopped-flow fluorimeter (Carrion-Vasquez et al., 1999Go). The final protein concentration was between 1 and 2 mM in each case, and all experiments were carried out in PBS buffer (10 mM sodium phosphate, 137 mM NaCl, 2.7 mM KCl, pH 7.4) with 5 mM DTT at 25°C. Refolding was followed by the increase in fluorescence with emission above 320 nm, except for the mutated protein L58A, where as a result of the effect of the mutation on the fluorescence, a cut-off filter of 335 nm was used. Between 4 and 10 traces were taken at each concentration of denaturant and averaged. The traces fitted well to a double-exponential function, with a drift term to account for baseline instability (due to protein photolysis over the long measurement times required), using the program KaleidaGraph (Synergy Software). The first major fast phase accounted for ~70–80% of the fluorescence amplitude. It is probable that the slower phase reflects the refolding of proteins limited by the rate of proline isomerization (TI I27 has four prolines, all in the trans conformation in the native state); only the major refolding phase is considered here.

Comparison of equilibrium and kinetic data

To confirm that the kinetic intermediate found to be populated during the folding of wild-type TI I27 at low denaturant concentrations was destabilized to the extent of being unavailable in the pseudo wild-type proteins, the kinetic stability ({Delta}Gkin) was calculated using Equation (2):

(2)
where R is the universal gas constant and T is the temperature (in K). Kkin is given by Equation (3):

(3)
where kf is the refolding rate in water, ku is the unfolding rate in water and Kiso = 0.25 is the proportion of molecules whose folding is limited by proline isomerization. In a two-state system, the equilibrium and kinetic data are equivalent (Jackson and Fersht, 1991Go).

Calculation of ßT

The ßT value gives a measure of the compactness of the transition state relative to the native state in terms of the magnitude of buried surface area exposed to solvent on unfolding. It is defined as:

(4)
where mku and mkf are the gradients a plot of the logarithm of unfolding and refolding rates respectively with denaturant concentration.

{Phi}-value analysis

{Phi}-values for folding were determined by analysis of the pseudo wild-type and mutant proteins on the assumption of a two-state scheme using Equation (5) (Fersht, 1998Go):

(5)
where {Delta}{Delta}GD-{ddagger} is the change in the difference in free energy between the denatured state (D) and the folding transition state ({ddagger}) upon making a conservative mutation. It is calculated from the refolding data using Equation (6):

(6)
where and are the refolding rate constants for pseudo wild-type and its mutant proteins, respectively.

NMR

Backbone resonances for pseudo wild-type TI I27 (15NH, 13Ca and 13Cb) were assigned by means of 3D-CBCA(CO)NH and HNCACB experiments (Bax, 1994Go) recorded at 700 MHz. Processing was performed using Felix2000 (Accelrys Inc.) with the Assign module used for analysis.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Sequence alignment

If the Ig domains fold via a similar mechanism, the core nucleating should be conserved throughout the range of Ig domains. Similarly, if a short inter-sheet loop is required for the initial events of folding its length should be well conserved. To investigate this, we surveyed the conservation of inter-sheet loop lengths (as defined by sequence distance between conserved core residues) in multiple Ig domains.

Domains with an Ig-like fold can be divided into superfamilies, in which the sequence and structural similarity between members suggests that a common evolutionary origin is possible; within these superfamilies domains may be grouped again into families in which either sequence or functional and structural similarities imply that a common evolutionary origin is very likely (Harpaz and Chothia, 1994Go; Chothia and Jones, 1997Go; Halaby et al., 1999Go). The Ig domains are split into many families, including the V-set, containing predominantly Ig antibodies (Al-Lazikani et al., 1997Go), and the I-set, of which TI I27 is a member (Harpaz and Chothia, 1994Go).

In order to assess the structural conservation of the putative nucleus residues (that generally have large, buried hydrophobic side chains) and the length of the sequences connecting them (which includes the inter-sheet loops) for proteins within the same family as TI I27, a structure-based sequence alignment was performed for the I-set only (Figure 3). Telokin, the ‘classical’ I-type Ig domain (Harpaz and Chothia, 1994Go; Bateman and Chothia, 1995Go; Chothia and Jones, 1997Go), was compared with the 20 other domains identified as I-type in the SCOP database (Lo Conte et al., 2000Go). The sequence distances between nucleus residues are also compared with those in the 40 I-type Ig domains in the I-band of titin modelled using sequence homology to TI I27 (Improta et al., 1996Go; Fraternali and Pastore, 1999Go) and the 30 domains in twitchin modelled using sequence homology to Ig 18' of twitchin (Fong et al., 1996Go).



View larger version (64K):
[in this window]
[in a new window]
 
Fig. 3. Structural alignment of I-set Ig domains, using telokin as the reference structure (Fong et al., 1996Go). (a) Residues shown in upper case have the same structure as telokin at that position whilst those shown in lower case have different structures. Residues in bold were used to align the sequences. Putative residues in the folding nucleus are highlighted, and the B–C linker region between them is shaded grey. (b) Stereo view of all 21 aligned structures superimposed and coloured by RMSD (from dark = same backbone structure, to light = very differently structured backbones) from telokin. The side chain from the highly conserved tryptophan residue in the C strand is also shown as this is the most highly conserved residue throughout the domains.

 
The structurally equivalent positions proposed to contain the folding nucleus in TI I27 are relatively well conserved hydrophobic residues throughout the domains. The sequence length between the nucleus residues containing the B–C and E–F loops are also fairly well conserved throughout the Ig superfamilies (Table I and Figure 3). The structure of the E–F loop is particularly highly conserved throughout the Ig family as it is important for structural stability (Hamill et al., 2000aGo); however, there is no such constraint on the B–C loop. The length of the adjoining sequence between the putative nucleus residues in the B and C strands (Ile23 and Trp34 for TI I27) varies from 11 to 17 residues throughout all the data sets, though the standard deviation is low and the mean for all data sets is 12, the same as for TI I27 itself.


View this table:
[in this window]
[in a new window]
 
Table I. Average B–C linker lengths and the standard deviation within the sets used for Ig domains

 
Within the V-type Ig domains, the hypervariable regions in both heavy and light chain antibodies are in structurally equivalent positions to the B–C loop within the I-type Ig domains. In contrast to the I-type domains, the loop length is highly variable within the V-type domains (Chothia and Lesk, 1987Go) as it is used to modulate antibody specificity. The length of the hypervariable loops themselves vary from four to 14 residues in length and are conformationally extremely diverse (Chothia and Lesk, 1987Go). Therefore, a fixed loop length cannot be required for the correct folding of these domains.

Loop elongation

We tested the importance of inter-sheet loop length to the nucleation-condensation mechanism of folding by inserting five glycine residues into the B–C inter-sheet loop of TI I27, between positions E27 and P28. The resulting elongated protein was fully folded and, relative to wild-type TI I27 (Improta et al., 1996Go), showed only minor structural rearrangement observed by NMR chemical shift changes (Figure 4). The only significant chemical shift changes are to residues surrounding the insertion (residues 26–30 within the B–C inter-sheet loop) and the neighbouring loops that are close in space to the B–C loop in the wild-type structure (in particular, residue 53 within the D–E loop).



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 4. Structural effects of loop elongation. NMR chemical shift changes indicate only minor structural rearrangement in pseudo wild-type TI I27. 1H and 15N changes ({Delta}p.p.m.) for each residue present in wild-type TI I27 (Improta et al., 1996Go). The position of the insertion of five glycine residues between positions 27 and 28 is indicated. The changes are localized to the B–C loop around the insertion and neighbouring D–E loop interacting with it. (Note that there are also small changes at positions 42 and 78 due to polymorphisms in the sequence of wild-type TI I27 used in this study.)

 
Choice of mutation

This elongated construct was used as a ‘pseudo wild-type’ protein in which a set of point mutations was made to probe the folding mechanism for comparison with wild-type TI I27. A variety of non-disruptive mutations, previously characterized in wild-type TI I27 (Fowler and Clarke, 2001Go), were constructed within the pseudo wild-type to probe its folding behaviour (Table II). These mutations were chosen to include the nucleus residues as well as residues in each of the B–C and E–F inter-sheet loops. Where possible, mutations were made that changed the wild-type residue to alanine. A mutation of glycine to alanine was also made within the 5G insert, at the third position (GIIIA), in order to determine a {Phi}-value for the loop itself within the elongated domain.

Stability

The stabilities of all the proteins were measured by equilibrium denaturation and the data fit well to a two-state equation (Clarke and Fersht, 1993Go). We determined a weighted mean mD-N value for the pseudo wild-type protein and all of its mutational variants of 2.92 ± 0.06 kcal mol–1 M–1. (Note that this weighted mean mD-N value excludes mD-N for mutants where [D]50% is <0.8 M, as it is not possible to fit the data reliably. The pseudo wild-type mD-N is significantly higher than that of the wild-type average mD-N (2.50 ± 0.04 kcal mol–1 M–1) (Fowler and Clarke, 2001Go) which may be due to an increase in the change in solvent accessible surface area upon unfolding within elongated proteins. Since the thermodynamic data for the pseudo wild-type proteins were generally recorded at a significantly lower concentration of denaturant than the wild-type proteins, this may also be due to variation in mD-N with denaturant concentration (Fersht, 1998Go).

The free energy of unfolding ({Delta}GD-N) of the pseudo wild-type protein itself is 5.1 ± 0.1 kcal mol–1, a destabilization ({Delta}{Delta}GD-N) of 2.6 ± 0.1 kcal mol–1 relative to the wild-type protein (Fowler and Clarke, 2001Go). The mutations destabilized the pseudo wild-type protein by between 0.06 and 3.41 kcal mol–1 (Table II and Figure 5). Mutation of residues near the extended loop were less destabilising in the pseudo wild-type protein than in wild-type TI I27, whilst those further from the insertion were destabilized to a similar extent as the same mutation in the wild-type (within the error limits).



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 5. Equilibrium denaturation curves for pseudo wild-type TI I27 and its mutants. Wild-type TI I27 is shown as a reference (Fowler and Clarke, 2001Go).

 
Kinetics

Both refolding and unfolding kinetics were measured (Figure 6) for the pseudo wild-type protein, and a kinetic stability ({Delta}Gkin) of 4.9 ± 0.1 kcal mol–1 calculated. This value is the same within error as the equilibrium stability of 5.1 ± 0.1 kcal mol–1, indicating that the pseudo wild-type protein folds with two-state kinetics. Whilst there is significant population of a folding intermediate below 1 M GdmCl in wild-type TI I27 (Fowler and Clarke, 2001Go) this intermediate has been destabilized and depopulated in the pseudo wild-type protein such that folding is two-state at all concentrations of denaturant studied here. The refolding rate of wild-type TI I27 is significantly reduced by extension of the B–C loop, which is expected from the predicted increase in contact order (Plaxco et al., 1998Go) from 15.8 to 16.7 due to the 5G insertion. This reduction in folding rate in the elongated protein is probably due in part to the increased entropic cost of ordering the loop in the transition state (Fersht, 2000Go).



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 6. (a) Refolding and (b) unfolding kinetics for the pseudo wild-type TI I27 protein and its mutational variants. (None shows any evidence for a folding intermediate nor the existence of parallel pathways.)

 
Plots of the unfolding kinetics of wild-type TI I27 against denaturant concentration showed unusual upward curvature which was interpreted in terms of parallel pathways (Wright et al., 2003Go). In contrast, the unfolding kinetics for the pseudo wild-type protein fit well to a linear dependence upon denaturant. This change in kinetic behaviour indicates that there are no longer multiple accessible pathways available for unfolding (within the experimentally observable range). The elongation of the B–C loop has altered the characteristics of the folding landscape such that any parallel pathways previously accessible to the wild-type protein have become experimentally indistinguishable from the dominant pathway.

Both the refolding and unfolding kinetics were also investigated for all mutational variants of the pseudo wild-type protein (Figure 6) except GIIIA. The refolding kinetics for all variants fit well to a linear dependence upon denaturant concentration. None of the mutants showed any upward curvature in the unfolding kinetics, though several had significant downward curvature—possibly due to movement of the transition state towards the native state along a single pathway (Hammond, 1955Go; Matouschek and Fersht, 1993Go; Oliveberg et al., 1998Go; Sanchez and Kiefhaber, 2002Go). Weighted average equilibrium and refolding m-values were used to calculate an average ßT value of 0.87 ± 0.02; for unfolding, only data which fitted well to a linear dependence on denaturant (with an error of ≤0.01 in the unfolding m-value) were used to calculate an average ßT value of 0.88 ± 0.02. {Phi}-values were calculated in water from refolding data only for each pseudo wild-type mutant (Table II).


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
In the present study, we have used TI I27 as a representative member of the I-type Ig domains, which are proposed to fold via a similar long-range nucleation-condensation mechanism with a structurally equivalent ring of nucleus residues distant in sequence (Clarke et al., 1999Go). Is the B–C inter-sheet loop critical to this folding mechanism? We have addressed this question in two ways: first, by structural alignment of multiple Ig domains to investigate the conservation of both core hydrophobic residues and B–C inter-sheet loop length, and secondly by insertion of five glycine residues into the B–C loop of TI I27 to examine the importance of this loop to the folding nucleus.

Sequence alignment

The results from the sequence alignment of the I-type Ig domains present an intriguing conundrum when compared with the V-type domains. The length of the region connecting the conserved hydrophobic residues in the B and C strands within the I-type domains, which includes the B–C loop as well as a portion of the B and C ß-strands themselves, is well conserved between domains and generally has a relatively short mean length of 12 residues. However, this region corresonds to the hypervariable regions in the heavy and light chain antibodies with V-type Ig structure, where the length and amino acid composition are used to modulate antibody specificity (Lantto and Ohlin, 2002Go). Clearly, then, the length of the inter-sheet loops can vary greatly within the Ig ß-sandwich domains without preventing folding to the correct structure. This variation is much smaller in the E–F loop due to the conservation of the tyrosine corner for structural stability (Hamill et al., 2000aGo).

Although increasing contact order predicts that proteins with longer loops will fold more slowly, due to the chain entropy contribution to the folding barrier (Plaxco et al., 1998Go), this does not mean that interactions within the loops themselves are important in the initial steps of folding. If a short inter-sheet loop were to have been required for formation and stabilization of the nucleus then this nucleation folding mechanism cannot be the same as the one used by V-type domains. On the other hand, if TI I27 can still fold via the same mechanism once the B–C loop has been elongated, then the nucleation-condensation mechanism may also be more generally applicable to other proteins with this Greek Key ß-sandwich fold, in which the major structural variation between domains lies in the loop regions.

Structure of the pseudo wild-type protein

The chemical shift data indicates the location of structural changes in TI I27 resulting from the insertion of five glycine residues between residues 27 and 28 in the B–C loop are shown in Figure 4. The ß-sandwich fold is well maintained and the majority of the structural changes observed are confined to the regions of the protein that are in close proximity to the loop itself—residues surrounding the insertion and other portions of the protein which pack directly against the B–C loop in the wild-type. The structures of wild-type and pseudo wild-type proteins are essentially identical and the ß-sandwich fold is maintained, indicating that a short B–C loop is not required for formation of this structure. Therefore, this pseudo wild-type protein provides a suitable model in which to test the importance of the inter-sheet loop length to the folding of Ig domains.

Although elongation of the B–C loop in TI I27 does not substantially alter its structure, it causes a large destabilization of nearly 3 kcal mol–1. Why does the addition of five glycines destabilize the protein so much? Previous studies on insertion of polyglycine residues into long, flexible loops such as CI2 (Ladurner and Fersht, 1997Go), and various SH3 domains (Viguera and Serrano, 1997Go; Martinez et al., 1999Go; Grantcharova et al., 2000Go) showed a minimal effect on protein stability (0.1 kcal mol–1 residue–1 added), whilst insertion into short, inflexible loops such as that of Rop (Nagi and Regan, 1997Go; Nagi et al., 1999Go) produced a larger effect on stability (0.26 kcal mol–1 residue–1). Indeed, insertion of 60–80 amino acids into a flexible loop of an SH2 domain (Scalley-Kim et al., 2003Go) resulted in a much smaller loss of free energy than would be predicted from polymer theory (Chan and Dill, 1988Go). The destabilization reported here (0.5 kcal mol–1 residue–1) on insertion of glycine residues into the short inter-sheet loop of TI I27 is significantly larger than previously observed, indicating the importance of this loop in maintaining the stability and structural integrity of the domain. By elongating the B–C inter-sheet loop, stabilizing interactions are lost not only between the B–C loop and adjacent residues (e.g. the D–E loop) but also perhaps within the core itself.

Mutants of TI I27 pseudo wild-type protein

All mutants of the pseudo wild-type TI I27 were destabilizing (Table II and Figure 5), though they fall into two distinct classes:

  1. Mutations distal to the extended loop—V13A, L36A, L58A, C63A, M67A, F73L and A82G. These mutations were equally destabilising to the pseudo wild-type protein as the wild-type TI I27, indicating very little structural change throughout the majority of the protein.
  2. Mutations near the extended loop or in regions of the protein interacting with it—V4A, I23A, L25A, V30A and G32A. These mutations were significantly less destabilizing in the pseudo wild-type protein than in wild-type TI I27, in particular V30A and G32A. This effect may be attributed to the reduced strength of long-range inter-sheet interactions resulting within the core of the pseudo wild-type protein, such that the mutations are less deleterious than would otherwise be the case. The same mutation is more destabilizing in the wild-type protein (Fowler and Clarke, 2001Go) as more interactions are present between residues in this more compact structure. This putative loss of interactions in the pseudo wild-type may also account for the large destabilization of the proteins due to elongation of the loop. The reduction in observed destabilization of the protein upon mutation may also be due to more effective reorganization within the pseudo wild-type relative to the wild-type protein, due to the greater flexibility of the extended B–C loop, thus allowing the protein to regain some of the stability lost on mutation. In the unusual case where a mutation actually adds interactions into the core of the protein (G32A), the pseudo wild-type protein simply reorganizes the hydrophobic packing to accommodate the extra methyl group.

The remaining glycine to alanine mutant at the third position within the 5G insert had no effect on the stability of the protein and thus could not be used for {Phi}-value analysis.

Folding mechanism

Previous work has examined the effect of altered contact order (Plaxco et al., 1998Go) on the folding mechanism of a specific protein by circular permutation. In some cases, such as the protein S6 from Thermus thermophilus (Miller et al., 2002Go), the permuted proteins all fold with similar rates despite having a wide range of relative contact orders; there is an accompanying change in overall topology and the transition state moves from a globally diffuse to a locally condensed nucleus as a result of reversing loop lengths (Lindberg et al., 2002Go). In the case of the src SH3-domain (Grantcharova and Baker, 2001Go), circularization alters the folding and unfolding rates substantially as well as changing the folding mechanism by delocalizing the transition state structure. These changes in mechanism are presumed to be a result of the change in chain topology. However, in the case of pseudo wild-type TI I27, although the contact order increases from 15.8 to 16.7 relative to the wild-type, which is predicted to result in a decreased folding rate, the overall chain topology is conserved.

Mutations in the nucleus of wild-type TI I27 can cause a change in folding pathway from a nucleation-condensation mechanism (via transition state L) to a more hierarchical mechanism in which local contacts dominate the early stages of folding (via transition state H) (Wright et al., 2003Go). Similarly, if the residues in the B–C loop were critical to the formation of the nucleus in pathway L then elongating the loop, making the B–C inter-sheet loop more remote from the core, should have the same effect, i.e. to switch the folding pathway from L to H.

If the folding nucleus remains identical despite this elongation, mutants of the pseudo wild-type (outside the elongated loop region) would be expected to have approximately the same effect on the folding characteristics of the domain as the identical mutation in the wild-type protein. Therefore, the pattern of the pseudo wild-type {Phi}-values should be similar to that for the wild-type from refolding or unfolding along pathway L (low denaturant, nucleation pathway) and, crucially, the nucleus itself will be conserved. If on the other hand the length of the B–C loop is critical to the formation and stability of the folding nucleus, then transition state L will be destabilized relative to transition state H (high denaturant pathway) by the elongation of the B–C loop. If this is the case, the pseudo wild-type protein will fold via this more hierarchical mechanism in which short-range contacts initiate folding. This transition state is much less structurally compact as folding is initiated by formation of local structure around the E–F loop; the pattern of the pseudo wild-type {Phi}-values would be similar to that for the wild-type from unfolding in pathway H. Crucially, the high {Phi}-values of the nucleus residues will not be conserved.

The ßT value of 0.88 for the pseudo wild-type protein lies closer to that of pathway L (ßT = 0.95) than pathway H (ßT = 0.74) in wild-type TI I27. However, it is clear from the ßT value that the transition state for folding of the pseudo wild-type protein has moved away from the native state relative to its position along the folding pathway for wild-type TI I27.

{Phi}-values for pseudo wild-type TI I27 were investigated by studying the mutations V4A, V13A, I23A, L25A, L36A, L58A, C63A, M67A and A82G. The {Phi}-values of those mutations which produced no significant change in the stability of the protein (V30A, G32A and the G to A mutant within the insert) were not calculated as the associated errors would be so large as to make them invalid. Overall, the {Phi}-values in the pseudo wild-type protein encompass a smaller range of values (from 0.1 to 0.7) than wild-type TI I27 (from 0.0 to 0.8; Fowler and Clarke, 2001Go), indicating that the nucleus has become somewhat more diffuse as a result of elongation of the B–C loop (Table II). This more diffuse transition state is reflected in the average ßT value of 0.88 implying that the shorter B–C loop is important in maintaining a compact structure within the transition state as well as stabilizing the native state.

Decisively, the high {Phi}-values observed for the nucleus residues within the nucleation-condensation pathway for wild-type TI I27 remain high in the pseudo-wild type protein, indicating that the folding nucleus is unchanged—the pattern of {Phi}-values in the pseudo wild-type follows that for pathway L but not pathway H (Figure 7). A plot of the {Phi}-values for the pseudo wild-type protein has an R2-value of 0.7 versus wild-type {Phi}-values for pathway L, and 0.2 versus wild-type {Phi}-values in pathway H.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 7. Comparison between {Phi}-values for the pseudo wild-type protein with the wild-type I27 in both pathways L and H, calculated from refolding and unfolding data. The pattern of {Phi}-values for the pseudo wild-type protein is similar to that of the wild-type protein in the nucleation-condensation pathway L. A plot of the {Phi}-values for the pseudo wild-type protein has an R2-value of 0.6 and 0.7 versus wild-type {Phi}-values for pathway L calculated from refolding and unfolding, respectively, and 0.2 versus wild-type {Phi}-values in pathway H.

 
The similarity between the pseudo wild-type {Phi}-values and those for the wild-type protein on pathway L provides strong evidence that folding of the pseudo-wild type proceeds via a nucleation-condensation mechanism, in which long-range tertiary interactions dominate folding. Therefore, the length of the B–C loop is not critical to the stability or formation of the nucleus. The nucleation-condensation mechanism of folding, in which the residues in the nucleus define the folding transition state without requiring stabilisation from residues in the loop regions, is very robust to changes in the lengths of loops connecting nucleus residues.

Interestingly, none of the mutants of pseudo wild-type TI I27 showed any upwards curvature in the unfolding kinetics. Elongation of the B–C loop seems to disfavour pathway H. There are two possible explanations for this. The ßT for pathway L is now lower than in wild-type TI I27. This means that TSL, being less structured, will be destabilized less quickly relative to TSH by increasing denaturant concentration. Another possible explanation for the disappearance of observable parallel pathways is that the transition state in pathway L has, in fact, been stabilized by the elongation of the B–C loop, relative to transition state H. It is possible that the highly structured TSL may be over-packed in wild-type protein. The increased flexibility of the elongated B–C loop in the pseudo wild-type protein may relieve some of this strain by reducing over-packing in the core, and hence stabilize TSL relative to TSH. Both these possibilities would have the effect of depopulating pathway H over the experimentally observable range. Alternatively, due to the now small difference in m-values for TSL and TSH, any gradual change of flux would result in only a very small change in the slope of the unfolding rate with denaturant concentration, which could be hidden by experimental noise.

Conclusion

These results may help to explain why the Ig-like domain is such a common structural unit throughout biology, across many protein families that are functionally unrelated. The ß-sandwich fold provides a very robust structural scaffold upon which evolution can act without altering either the structure or folding of the domain. By using a nucleation-condensation mechanism to fold, the evolutionary pressure upon the protein to fold reliably to its native state structure is relieved from the inter-sheet loops. This decoupling of function from folding allows them to be modulated in both sequence and length depending upon functional requirements, as is indicated by their use as ligand binding sites in the case of Ig antibodies.


    Acknowledgments
 
We thank Trevor Rutherford, Dafydd Jones, Peter Grice, Susan Fowler, Kathryn Scott, Lucy Randles and Annette Steward for their help and advice. We acknowledge the use of the Biomolecular NMR Facility in the Department of Chemistry, Cambridge University (EPSRC GR/R23787). The authors of this work are all supported by the Wellcome Trust. J.C. is a Wellcome Trust Research Fellow.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Al-Lazikani,B., Lesk,A.M. and Chothia,C. (1997) J. Mol. Biol., 273, 927–948.[CrossRef][ISI][Medline]

Baker,D. (2000) Nature, 405, 39–42.[CrossRef][ISI][Medline]

Baldwin,R.L. (1989) TIBS, 14, 291–294.[Medline]

Bateman,A. and Chothia,C. (1995) Nat. Struct. Biol., 2, 1068–1074.[CrossRef][ISI][Medline]

Bateman,A., Birney,E., Durbin,R., Eddy,S.R., Howe,K.L. and Sonnhammer,E.L.L. (2000) Nucleic Acids Res., 28, 263–266.[Abstract/Free Full Text]

Bax,A. (1994) Curr. Opin. Struct. Biol., 4, 738–744.[ISI]

Bork,P., Holm,L. and Sander,C. (1994) J. Mol. Biol., 242, 309–320.[CrossRef][ISI][Medline]

Carrion-Vasquez,M., Oberhauser,A.F., Fowler,S.B., Marszalek,P.E., Broedel,S.E., Clarke,J. and Fernandez,J.M. (1999) Proc. Natl Acad. Sci. USA, 96, 3694–3699.[Abstract/Free Full Text]

Chan,H.S. and Dill,K.A. (1988) J. Chem. Phys., 90, 492–509.[CrossRef][ISI]

Chiti,F., Taddei,N., White,P.M., Bucciantini,M., Magherini,F., Stefani,M. and Dobson,C.M. (1999) Nat. Struct. Biol., 11, 1005–1009.[CrossRef]

Chothia,C. and Jones,E.Y. (1997) Annu. Rev. Biochem., 66, 823–862.

Chothia,C. and Lesk,A.M. (1987) J. Mol. Biol., 196, 901–917.[ISI][Medline]

Clarke,J. and Fersht,A.R. (1993) Biochemistry, 32, 4322–4329.[ISI][Medline]

Clarke,J., Hamill,S.J. and Johnson,C.M. (1997) J. Mol. Biol., 270, 771–778.[CrossRef][ISI][Medline]

Clarke,J., Cota,E., Fowler,S.B. and Hamill,S.J. (1999) Structure, 7, 1145–1153.[ISI][Medline]

Cota,E., Steward,A., Fowler,S.B. and Clarke,J. (2001) J. Mol. Biol., 305, 1185–1194.[CrossRef][ISI][Medline]

Debe,D.A., Carlson,M.J. and Goddard III,W.A. (1999) Proc. Natl Acad. Sci. USA, 96, 2596–2601.[Abstract/Free Full Text]

Dobson,C.M. (2003) Nature, 426, 884–890.[CrossRef][ISI][Medline]

Fersht,A.R. (1995) Biochemistry, 92, 10869–10873.

Fersht,A.R. (1998) Structure and Mechanism in Protein Science. W.H. Freeman and Company, New York.

Fersht,A.R. (2000) Proc. Natl Acad. Sci. USA, 97, 1525–1529.[Abstract/Free Full Text]

Fersht,A.R. and Daggett,V. (2002) Cell, 108, 573–582.[CrossRef][ISI][Medline]

Fersht,A.R., Matouschek,A. and Serrano,L. (1992) J. Mol. Biol., 224, 771–782.[ISI][Medline]

Fong,S., Hamill,S.J., Proctor,M., Freund,S.M.V., Benian,G.M., Chothia,C., Bycroft,M. and Clarke,J. (1996) J. Mol. Biol., 264, 624–639.[CrossRef][ISI][Medline]

Fowler,S.B. and Clarke,J. (2001) Structure, 9, 355–366.[CrossRef][ISI][Medline]

Fowler,S.B., Best,R.B., Toca-Herrera,J.L., Rutherford,T.J., Steward,A., Paci,E., Karplus,M. and Clarke,J. (2002) J. Mol. Biol., 322, 841–849.[CrossRef][ISI][Medline]

Fraternali,F. and Pastore,A. (1999) J. Mol. Biol., 290, 581–593.[CrossRef][ISI][Medline]

Grantcharova,V.P. and Baker,D. (2001) J. Mol. Biol., 306, 555–563.[CrossRef][ISI][Medline]

Grantcharova,V.P., Riddle,D.S. and Baker,D. (2000) Proc. Natl Acad. Sci. USA, 97, 7084–7089.[Abstract/Free Full Text]

Gunasekaran,K., Eyles,S.J., Hagler,A.T. and Gierasch,L.M. (2001) Curr. Opin. Struct. Biol., 11, 83–93.[CrossRef][ISI][Medline]

Halaby,D.M., Poupon,A. and Mornon,J.-P. (1999) Protein Eng., 12, 563–571.[CrossRef][ISI][Medline]

Hamill,S.J., Cota,E., Chothia,C. and Clarke,J. (2000a) J. Mol. Biol., 295, 641–649.[CrossRef][ISI][Medline]

Hamill,S.J., Steward,A. and Clarke,J. (2000b) J. Mol. Biol., 297, 165–178.[CrossRef][ISI][Medline]

Hammond,G.S. (1955) J. Am. Chem. Soc., 77, 334–338.[ISI]

Harpaz,Y. and Chothia,C. (1994) J. Mol. Biol., 238, 528.[CrossRef][ISI][Medline]

Improta,S., Politou,A.S. and Pastore,A. (1996) Structure, 4, 323–337.[CrossRef][ISI][Medline]

Jackson,S.E. and Fersht,A.R. (1991) Biochemistry, 30, 10428–10435.[ISI][Medline]

Jones,E.Y. (1993) Curr. Opin. Struct. Biol., 3, 846–852.[CrossRef][ISI]

Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946–950.[CrossRef][ISI]

Ladurner,A.G. and Fersht,A.R. (1997) J. Mol. Biol., 273, 330–337.[CrossRef][ISI][Medline]

Lantto,J. and Ohlin,M. (2002) J. Biol. Chem., 277, 45108–45114.[Abstract/Free Full Text]

Lindberg,M., Tangrot,J. and Oliveberg,M. (2002) Nat. Struct. Biol., 9, 818–822.[ISI][Medline]

Linke,W.A., Stockmeier,M.R., Ivemeyer,M., Hosser,H. and Mundel,P. (1998) J. Cell Sci., 111, 1567–1574.[Abstract/Free Full Text]

Lo Conte,L., Ailey,B., Hubbard,T.J.P., Brenner,S.E., Murzin,A.G. and Chothia,C. (2000) Nucleic Acids Res., 28, 257–259.[Abstract/Free Full Text]

Lorch,M., Mason,J.M., Clarke,A.R. and Parker,M.J. (1999) Biochemistry, 38, 1377–1385.[CrossRef][ISI][Medline]

Makarov,D.E. and Plaxco,K.W. (2003) Protein Sci., 12, 17–26.[Abstract/Free Full Text]

Martinez,J.C., Viguera,A.R., Berisio,R., Wilmanns,M., Mateo,P.L., Filimonov,V.V. and Serrano,L. (1999) Biochemistry, 38, 549–559.[CrossRef][ISI][Medline]

Matouschek,A. and Fersht,A.R. (1993) Proc. Natl Acad. Sci. USA, 90, 7814–7818.[Abstract/Free Full Text]

Miller,E.J., Fischer,K.F. and Marqusee,S. (2002) Proc. Natl Acad. Sci. USA, 99, 10359–10363.[Abstract/Free Full Text]

Mirny,L.A., Abkevich,V.I. and Shakhnovich,E. (1998) Proc. Natl Acad. Sci. USA, 95, 4976–4981.[Abstract/Free Full Text]

Miroux,B. and Walker,J.E. (1996) J. Mol. Biol., 260, 289–298.[CrossRef][ISI][Medline]

Nagi,A.D. and Regan,L. (1997) Fold. Design, 2, 67–75.[ISI][Medline]

Nagi,A.D., Anderson,K.S. and Regan,L. (1999) J. Mol. Biol., 286, 257–265.[CrossRef][ISI][Medline]

Oliveberg,M., Tan,Y.-J., Silow,M. and Fersht,A.R. (1998) J. Mol. Biol., 277, 933–943.[CrossRef][ISI][Medline]

Paci,E. and Karplus,M. (2000) Proc. Natl Acad. Sci. USA, 97, 6521–6526.[Abstract/Free Full Text]

Pfuhl,M. and Pastore,A. (1995) Structure, 3, 391–401.[CrossRef][ISI][Medline]

Plaxco,K.W., Simons,K.T. and Baker,D. (1998) J. Mol. Biol., 277, 985–994.[CrossRef][ISI][Medline]

Politou,A.S., Thomas,D.J. and Pastore,A. (1995) Biophys. J., 69, 2601–2610.[Abstract]

Sanchez,I.E. and Kiefhaber,T. (2002) J. Mol. Biol., 325, 367–376.[CrossRef][ISI]

Scalley-Kim,M., Minard,P. and Baker,D. (2003) Protein Sci., 12, 197–206.[Abstract/Free Full Text]

Squire,J.M. (1997) Curr. Opin. Struct. Biol., 7, 247–257.[CrossRef][ISI][Medline]

Vendruscolo,M., Paci,E., Dobson,C.M. and Karplus,M. (2001) Nature, 409, 641–645.[CrossRef][ISI][Medline]

Viguera,A.R. and Serrano,L. (1997) Nat. Struct. Biol., 4, 939–946.[ISI][Medline]

Wright,C.F., Lindorff-Larsen,K., Randles,L.G. and Clarke,J. (2003) Nat. Struct. Biol., 10, 658–662.[CrossRef][ISI][Medline]

Received March 1, 2004; revised May 21, 2004; accepted May 24, 2004.

Edited by Susan Marqusee





This Article
Abstract
FREE Full Text (PDF)
All Versions of this Article:
17/5/443    most recent
gzh052v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Wright, C. F.
Articles by Clarke, J.
PubMed
PubMed Citation
Articles by Wright, C. F.
Articles by Clarke, J.