A native-like artificial protein from antisense DNA

Nicolas Fischer1, Lutz Riechmann and Greg Winter

Division of Protein and Nucleic Acid Chemistry, MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK

1 To whom correspondence should be addressed at: NovImmune SA, 64 avenue de la Roseraie, CH-1211 Geneva 4, Switzerland. e-mail: nfischer{at}novimmune.com


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
We describe the creation of folded chimaeric proteins by combining a designed polypeptide segment (bait) derived from a ß-sheet of a human antibody variable domain with random polypeptide segments encoded by human cDNA fragments. The repertoire of polypeptides was displayed on the surface of filamentous bacteriophage and folded polypeptides were selected by proteolysis. One of these, 2a6, was readily expressed in the Escherichia coli cytoplasm as a soluble and protease-resistant protein and could be purified after heating the bacterial lysate to 90°C. Soluble 2a6 is dimeric and its CD spectrum is consistent with components of both {alpha} and ß structure. 2a6 cooperatively and reversibly unfolds by heat or urea with a folding energy of 11.4 kcal mol–1 for the transition between folded dimer and unfolded monomer and its refolding steps proceed without the formation of detectable aggregates. Its stability and folding properties are therefore typical of native proteins. Sequence analysis revealed that the cDNA segment in 2a6 was recruited from the antisense strand of a human gene, suggesting that antisense sequences can provide a reservoir for the evolution of soluble and stable proteins.

Keywords: in vitro protein evolution/novel protein folds/phage display/proteolytic selection


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Most large proteins are composed of multiple domains, each of which is usually able to fold independently (Hosszu et al., 1997Go; Mayr et al., 1997Go). In many proteins these individual domains are non-homologous in both sequence and architecture, suggesting that recombination of non-homologous gene segments may have been an important mechanism during protein evolution (Murzin et al., 1995Go). Accordingly the dramatic increase in protein structural and functional diversity observed in eukaryotes has been attributed to the organization of their genes in exons and introns, which increases the probability of productively joining different coding DNA sequences through exon shuffling (Blake, 1978Go; Bogarad and Deem, 1999Go).

While protein domains are ~100–250 residues long (Chothia et al., 2003Go), typical exons encode polypeptides of about 40 amino acid residues on average (Fedorova and Fedorov, 2003Go). This suggests that the assembly of several domains in one polypeptide must have involved the shuffling of either large exons or groups of exons (Blake, 1983Go; Murzin et al., 1995Go). Correspondingly, it has been proposed that individual protein domains may have arisen by the recombination of smaller exons (Blake, 1983Go). Some support for this proposal comes from studies in experimental evolution, in which novel proteins were derived by random combination of non-homologous polypeptide segments of about 40 amino acid residues. In these studies, ‘bait’ DNA encoding the N-terminal half of a ß-barrel domain was fused with fragmented genomic Escherichia coli DNA and cloned for display on filamentous bacteriophage. Phage displaying folded polypeptides were selected by proteolysis; in most cases the protease-resistant polypeptides comprised genomic fragments in their natural reading frames. Furthermore, only those comprising natural reading frames were soluble when expressed in the cytoplasm of E.coli (Riechmann and Winter, 2000Go).

Here we have undertaken further studies and explored the use of a bait DNA encoding a designed polypeptide of a different architecture. This bait encoded two ß-strands of the variable domain of a human immunoglobulin V{kappa} chain joined by a glycine spacer to a third strand in an attempt to retain a three-stranded ß-sheet architecture. The bait DNA was combined with random human cDNA and the resulting chimaeric proteins were displayed on filamentous bacteriophage. Stably folded domains were enriched by proteolysis, the selected chimaeric domains expressed as soluble proteins and their biochemical properties analysed.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Protein bait and library construction

We used for cloning the pHEN-D-TAG phagemid vector, modified from pHEN1, which contains the H102A mutant of barnase between the pelB leader peptide and the gene for protein 3 (p3) of fd phage (Hoogenboom et al., 1991Go; Meiering et al., 1992Go). DNA fragments encoding residues 1–26 and 68–77 of a human immunoglobulin V{kappa} light chain (012/DPK9) (Cox et al., 1994Go) were amplified by polymerase chain reaction (PCR). Two different sets of primers encoding either two or three glycines were used, so that assembly of the PCR products yielded two different linker lengths in the final construct. During amplification, the mutation C23S was introduced in the N-terminal part of the V{kappa} domain. The polypeptides encoded by the resulting DNA fragments are LQDIQMTQSP SSLSASVGDR VTITSRASGG GGTDFTLTIS SGAQ and LQDIQMTQSP SSLSASVGDR VTITSRASGG GTDFTLTISS GAQ (sequences originating from the V{kappa} domain are underlined). The N-terminal LQ and C-terminal GAQ residues are partially encoded by PstI and SacI restriction sites present at the 5'- and 3'-ends of the PCR products, respectively. These PCR products were cloned between the PstI and SacI sites of pHEN-D-TAG to obtain the pV{kappa}-bait2 and pV{kappa}-bait3 vectors. In these constructs, the barnase/V{kappa}-bait fragments are out of frame relative to the gene of p3. The control vectors pDPK-9 and pV{kappa}-Bait were obtained by amplifying the whole V{kappa} DPK-9 domain or the V{kappa}-Bait (containing three glycines) with appropriate primers so that after cloning into pHEN-D-TAG a continuous open reading frame between barnase and p3 is restored. Restoration of the reading frame introduces an opal stop codon before p3. The presence of this stop codon reduces expression levels and presumably toxicity effects related to the expression of p3 fusions but allowing sufficient display on phage (Riechmann and Winter, 2000Go).

Human mRNA isolated from HeLa cells was used for a first strand synthesis reaction with oligo-dT primers. Single-stranded DNA was amplified in 30 cycles of random PCR using 20 pmol/ml oligonucleotide SN6 (5'-GAG CCT GCA GAG CTC CGG NNN NNN-3') and an annealing temperature of 30°C. PCR products were amplified for another 30 cycles after adding 500 pmol/ml oligonucleotide NOARG (5'-CGT GCG AGC CTG CAG AGC TCA GG-3') and using a temperature of 52°C for annealing. Products between 150 and 250 bp were excised from an agarose gel and reamplified (20 cycles) with oligonucleotide NOARG. The PCR products were digested with SacI for cloning into SacI digested and dephosphorylated pV{kappa}-bait2 and pV{kappa}-bait3 vectors. The ligation products were electroporated into E.coli strain TG1 and phages were rescued with the trypsin-sensitive KM13 helper phage (Kristensen and Winter, 1998Go).

Proteolytic selections and screenings

Approximately 1011 colony-forming units were incubated for 10 min at 10°C with 200 nM TCPK-treated trypsin (Sigma) in TBS-Ca buffer (25 mM Tris, 137 mM NaCl, 1 mM CaCl2, pH 7.4). Phages were then mixed with one volume of 4% Marvel-PBS and transferred to a streptavidin-coated microtitre plate with bound biotinylated C40A/C82A mutant of barstar (Hartley, 1993Go; Lubienski et al., 1993Go). Resistant phages were captured for 1 h at room temperature via the N-terminal barnase tag. Wells were washed 20 times with PBS, 5 min with 50 mM DTT in PBS to wash proteolysed phages remaining bound via disulfide bridges, followed by five additional PBS washes. Bound phages were eluted with 0.1 M glycine, pH 2.2, for 5 min and neutralized with one-tenth volume of 1 M Tris, pH 8. Eluted phages were used to infect TG1 cells for propagation.

Phage supernatants were screened by proteolysis in situ after capture on barstar-coated wells. Washes were performed as described above and bound phages were detected in ELISA with an anti-M13 phage antibody–horseradish peroxidase conjugate (Amersham). For proteolysis in solution, 1010 purified phages were treated with trypsin for 5 min at different temperatures, before inactivation of the protease with Pefablock (Roche) and capture on immobilized barstar.

Protein expression, purification and analysis

DNA fragments encoding chimaeric proteins were amplified by PCR with appropiate oligonucleotides and recloned into the bacterial expression vector pQE30 encoding an N-terminal hexahistine tag (Qiagen) using HindIII and BamHI restriction sites. During amplification the opal stop codon was converted into TGG. Soluble chimaeric proteins consequently have the N-terminal tag MRGHH HHHHG SQ followed by the chimaeric protein followed by the C-terminal tag WAKLN.

For expression, exponentially growing bacteria (0.5 l cultures in 2 l conical flasks) were induced for 4 h at 30°C. Proteins were purified from the soluble fraction of the bacterial cytoplasm using the B-Per bacterial protein extraction reagent (Pierce), according to the manufacturer’s instructions and nitrilotriacetic acid (NTA) agarose (Quiagen). Alternatively, the 2a6 protein was prepared by resuspending the induced bacterial pellet in 10 mM Tris, pH 8.0, followed by heating at 90°C for 10–15 min. The suspension was allowed to cool and centrifuged. Refolded 2a6 could then be purified from the supernatant using NTA agarose. No difference was detected between 2a6 samples prepared by either method. The protein was further purified by gel filtration on a Superdex-75 column (Amersham). The expected molecular weight of the protein was confirmed by surface-enhanced laser desorption/ionization (SELDI) (Ciphergen).

Soluble 2a6 protein was cross-linked in the presence of 1% glutaraldehyde for 2 min at 25°C using 5, 10 or 16 µM 2a6 monomer equivalents and analysed by SDS–PAGE.

Biophysical analyses

Circular dichroism (CD) spectra were recorded with a Jasco J-720 spectropolarimeter and the temperature was adjusted with a Jasco PTC-348WI temperature controller. 2a6 dimer (12 µM monomer equivalents) in PBS was heated at rates of 60 or 80°C/h with very similar results and its ellipticity during denaturation was followed at 225 nm. Data were fitted to a two-state model between unfolded monomer and folded dimer using equation 17 in Mateu and Fersht (1998Go). The midpoint of thermal unfolding (Tm) and the enthalpy change for unfolding ({Delta}H) were inferred from the thermodenaturation curve and a {Delta}Cp of 12 cal mol–1 K–1 per residue (Pace et al., 1989Go) was assumed ({Delta}Cp = 2808 cal mol–1 K–1 for the 2a6 dimer).

Fluorescence measurements were made using a Hitachi f-4500 spectrofluorimeter; 1.5 µM monomer equivalents of 2a6 were equilibrated in different concentrations of urea in PBS for at least 16 h at 15°C and its fluorescence emission was followed at 360 nm (excitation at 280 nm). Data were fitted to a two-state model for a direct transition between an unfolded monomer (Mu) and a folded dimer (Df) using equation 17 in Mateu and Fersht (1998Go).

Folding and unfolding of 2a6 (2 µM monomer equivalents) at 25°C were measured using a stopped-flow fluorimeter (Applied Photophysics SX17). Excitation was at 280 nm and emission was measured with a 360 nm cut-off filter.

For refolding experiments, 56 µM 2a6 was denatured in 4 M urea in PBS, pH 7, for 1 h at 25°C. Samples were then diluted to 22 µM 2a6 in 2.2 M urea in PBS to allow a stopped-flow analysis at lower denaturant concentrations. For refolding, denatured 2a6 was mixed with a 10-fold excess of different concentrations of urea (0–0.9 M) in PBS. The fluorescence decay of four traces was averaged and fitted to an equation combining a first-order and a second-order rate:

F(t) = Ffinal + A1exp(–k1t) + A2/(Ptk2t + 1)

where F(t) is the time-dependent fluorescence signal, Ffinal the fluorescence signal at infinite time, A1 and A2 the changes in fluorescence signal due to the first and second reactions, k1 and k2 the rate constants for the first and second reactions, Pt the concentration of 2a6 monomer equivalents and t the time. Curve fitting to equations for a single first-order, a single second-order rate or two first-order rate reactions was unsatisfactory.

For unfolding experiments, 22 µM 2a6 dimer in PBS was mixed with a 10-fold excess of different concentrations of urea (0.5–4.5 M) in PBS. The fluorescence increase of four traces was averaged and fitted to an equation comprising two first-order rates:

F(t) = Ffinal + A1exp(–k–1t) + A2exp(–k–2t)

where F(t) is the time-dependent fluorescence signal, Ffinal the fluorescence signal at infinite time, A1 and A2 the changes in fluorescence signal due to the first and second reactions, k–1 and k–2 the rate constants for the first and second reactions and t the time. Curve fitting to an equation for a single first-order rate reaction was unsatisfactory.

One-dimensional proton nuclear magnetic resonance (1D 1H NMR) experiments were performed on a Bruker AMX-500 instrument with protein at 300 µM monomer equivalents in 20 mM phosphate, 0.1 M NaCl, pH 6.2, in 93% H2O–7% D2O.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Library design and proteolytic selection

The two N-terminal ß-strands (residues 1–26, with a mutation Cys23Ser) and the seventh ß-strand of an immunoglobulin V{kappa} chain (residues 68–77), which comprise contiguous strands of a ß-sheet, were fused genetically through either two or three glycine residues. This polypeptide bait was cloned into a phagemid vector as a C-terminal fusion to barnase (used as an N-terminal affinity tag) followed by the minor phage coat protein p3 and thereby displayed on bacteriophage. The bait proved to be trypsin sensitive, suggesting that it was not folded (Kristensen and Winter, 1998Go; Sieber et al., 1998Go; Finucane and Woolfson, 1999Go; Riechmann and Winter, 2000Go; Martin et al., 2001Go). The bait was then fused at its C-terminus with polypeptide fragments encoded by randomly amplified human cDNA of around 150–250 bp to create a repertoire of 1.3x108 clones and selected by proteolysis with trypsin followed by capture on immobilized barstar via the N-terminal barnase tag. Selected phages were eluted at acidic pH and propagated in bacteria.

After two rounds of selection, phages with deletions started to dominate the repertoire (as analysed by PCR) and cDNA inserts of more than about 200 bp were PCR amplified from the population of phages and recloned into the phagemid vector for a third and final round of selection. Monoclonal phages isolated after the second and third rounds of selection were bound to immobilized barstar and proteolysed in situ with trypsin. Proteolytic resistance was assessed by detection of the phages remaining bound to the plate by ELISA. The cDNA inserts from 16 clones retaining at least 75% activity after proteolysis were sequenced and revealed seven different sequences, all of which were identified in the human genome. Three inserts (1c6, 3h8, 2f2) originated from coding sequences in the reading frame of the parent gene, three other inserts (1e4, 2a6, 1b5) were from antisense strands and one insert (3c12) corresponded to a 3' untranslated region (Table I). All of the clones contained the longer version of the linker (three glycines). When treated with trypsin over a range of temperatures, five clones (1e4, 2a6, 1c6, 3c12, 3h8) proved more resistant than the others (Figure 1A); the 2a6 clone dominated the library after the third round of selection.


View this table:
[in this window]
[in a new window]
 
Table I. Sequence and origin of the selected cDNA fragments
 



View larger version (63K):
[in this window]
[in a new window]
 
Fig. 1. (A) Proteolytic resistance of selected chimaeric polypeptides displayed on phage. After proteolysis at the indicated temperatures in solution, resistant phages were captured on immobilized barstar and detected by ELISA. Phages displaying the whole V{kappa} variable domain or the V{kappa}-bait only were used as a positive and negative controls, respectively. (B) Proteolysis of 2a6. Soluble 2a6 (20 µM monomer equivalents) was incubated at the indicated temperature with trypsin, chymotrypsin or thermolysin (20 nM) for 5 min and separated by SDS–PAGE. A non-proteolysed control was loaded in the second lane. Under similar conditions, the N-terminal fragment of 3c12 and other unfolded proteins were fully degraded. (C) Cross-linking of 2a6 in the presence of 1% glutaraldehyde for 2 min at 25°C using (a) 5, (b) 10 and (c) 16 µM 2a6 monomer equivalents. The same amount of protein was run in lanes a, b and c. No protein bands with a molecular weight of more than 28 kDa were seen. Monomer (m) and dimer (d) bands are indicated. (D) Amino acid sequence of soluble 2a6 including tags.

 
Characterization of 2a6

DNA encoding these five protease-resistant chimaeric polypeptides was recloned into a bacterial cytoplasmic expression vector encoding an N-terminal hexahistidine tag. The proteins 2a6 and 3c12 remained soluble, whereas the other polypeptides formed insoluble inclusions bodies. 3c12 suffered significant proteolytic degradation during expression and/or purification. 2a6, however, could be purified (several milligrams from 1 l of shaker flask culture) both under native conditions and after heating of the lysates without detectable degradation.

We probed the stability and folded nature of the soluble 2a6 protein by its resistance to proteolysis with trypsin, chymotrypsin and thermolysin at increasing temperatures (Figure 1B). 2a6 is largely resistant to all three proteases despite the presence of numerous potential proteolytic cleavage sites in its sequence (Figure 1D). The 2a6 protein was further purified by size-exclusion chromatography on a calibrated Superdex column. Its apparent molecular weight of 26 kDa (MWcalc. = 12 382 Da) suggests that the predominant oligomeric state of 2a6 is a dimer. The dimeric nature of 2a6 was confirmed by cross-linking of purified 2a6 (with 5, 10 and 16 µM monomer equivalents) in PBS with 1% glutaraldehyde for 2 min at 25°C, as cross-linked dimers but no higher order oligomers were seen (Figure 1C).

Further evidence for the folded nature of 2a6 was obtained from the one-dimensional 1H NMR spectrum, which shows chemical shift dispersion of amide protons to values downfield of 9 p.p.m. and of methyl groups to values around 0 p.p.m. (not shown). Chemical shift dispersion is indicative of a folded state as it is due to the variety of magnetic microenvironments present in folded proteins (Wüthrich, 1986Go). The CD spectrum of 2a6 was also consistent with components of both {alpha} and ß structure (Figure 2A). An {alpha}-helical content of ~10% was estimated using the ellipticity at 208 nm (Greenfield and Fasman, 1969Go). This is consistent with the secondary structure consensus prediction (Combet et al., 2000Go) for 2a6, which suggests 17% of helical structure (residues 66–75, 105–114) and 21% of ß-structure (residues 14–16, 30–36, 45–50, 77–79, 86–90).



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 2. Spectroscopic analyses and equilibrium denaturation. (A) Mean residue molar ellipticity of 5 µM 2a6 dimer recorded at 20°C. (B) Equilibrium denaturation of 2a6 by heat. 2a6 dimer (12 µM monomer equivalents in PBS) was heated at a rate of 80°C/h and its ellipticity during denaturation was followed at 225 nm. The continuous line represents the best fit of the data set to a model for a direct transition between unfolded monomer and folded dimer. (C) Fluorescence emission spectra of native 2a6 dimer (solid line; maximum at 333 nm) and urea-denatured 2a6 (broken line; maximum at 356 nm). Measurements were performed at 15°C and 2a6 was incubated in 8 M urea for 20 h at 15°C before measurement. (D) Urea-induced equilibrium denaturation of 2a6 (1.5 µM monomer equivalents) was measured by fluorescence intensity at 360 nm. The continuous lines represent the best fit of the data sets to a model for a direct transition between unfolded monomer and folded dimer.

 
Stability and folding of 2a6

The fact that soluble and folded 2a6 could be so readily isolated from the bacterial lysate after heating to 90°C prompted us to analyse its stability and folding properties in more detail. Thermal denaturation of 2a6 was followed by CD at 225 nm and was fully reversible with a sharp sigmoidal melting curve (consistent with a cooperative process) with a melting point of 39°C (Figure 2B) at a concentration of 12 µM 2a6 monomer equivalents. A conformational stability {Delta}GMu/Df of 10.9 kcal mol–1 at 298 K was calculated by fitting the equilibrium data to a two-state model for a direct transition between unfolded monomer (Mu) and folded dimer (Df). The enthalpy ({Delta}H) was 82 kcal mol–1 and a heat capacity change of 12 cal mol–1 K–1 per residue was assumed.

The degree of 2a6 unfolding in different concentrations of urea was followed by fluorescence emission spectroscopy from the single tryptophan residue close to its C-terminus. The change in fluorescence emission between native and urea unfolded 2a6 suggests that W113 is located in a more hydrophobic environment when folded (Figure 2C), as the emission maximum of tryptophan is shifted to shorter wavelength (Schmid, 1989Go). The fitting of the obtained sigmoidal transition curve to a simple two-state model for a direct transition between unfolded monomer (Mu) and folded dimer (Df) yielded the urea concentration for 50% denatured protein, [urea]50% = 1.4 M, {Delta}GMu/Df = 11.8 kcal mol–1 and an m value of 3.9 kcal mol–1 M–1 (Figure 2D).

Folding and unfolding rates of 2a6 in urea at 25°C were measured using stopped-flow fluorescence experiments (Figure 3). Two reaction rates were observed during folding and extrapolated to 0 M denaturant as k1 = 0.0408 s–1 (m1 = 2.1 kcal mol–1 M–1) and k2 = 4.56 M–1 s–1 (m2 = 1.8 kcal mol–1 M–1). Two reaction rates were also observed during unfolding and were extrapolated to 0 M denaturant as k–1 = 0.00975 s–1 (m–1 = 1.1 kcal mol–1 M–1) and k–2 = 0.0353 s–1 (m–2 = 1.2 kcal mol–1 M–1). The rates k1 and k–1 were determined using the curve-fitting equation for the Chevron plot (Figure 3C). The rates k2 and k–2 were determined by linear extrapolation of lnkobs to 0 M denaturant (Figure 3D).



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 3. Folding kinetics measured by stopped-flow fluorescence experiments. (A) Refolding kinetics of denatured 2a6 (2 µM) in 0.2 M urea. The black line represents the fit to an equation for a combination of a first-order and a second-order reaction. Inset: fitting of the refolding curve to rate equations comprising a single first-order (quadruple broken line), a single second-order (double broken line), two first-order rates (broken line) or a combination of one first-order and one second-order rate (solid line). Traces of the observed data points are shown in grey. (B) Unfolding kinetics of folded 2a6 (2 µM) in 4.1 M urea. The black line represents the fit to an equation for a combination of two first-order reactions. Inset: fitting of the folding curve to equations comprising one (broken line) or two first-order rates (solid line). Traces of the observed data points are shown in grey. (C) Chevron plot of lnkobs against urea concentration for the folding (circles) and unfolding (triangles) of the 2a6 monomer (at 2 µM). Observed rate constants for monomer folding (k1) and unfolding (k–1) were fitted in a Chevron plot to:lnkobs = ln{k1,waterexp(–m1[urea]) + k–1,waterexp(m–1[urea])}with k1,water = 0.0408 s–1, m1 = 2.1 kcal mol–1 M–1 and k–1,water = 0.00975 s–1, m–1 = 1.1 kcal mol–1 M–1. (D) Plot of lnkobs against urea concentration for the transition between folded 2a6 monomer to dimer. Circles and triangles indicate observed rate constants determined with 2 µM 2a6 monomer equivalents for dimerization and monomerization, respectively. Linear extrapolation to 0 M denaturant of lnk2,obs (up to 0.9 M urea) and lnk–2,obs (above 1.5 M urea) yields k2,water = 4.56 M s–1, m2 = 1.8 kcal mol–1 M–1 and k–2,water = 0.0353 s–1, m–2 = 1.2 kcal mol–1 M–1. Unfolding rate constants were derived from curve fitting of the stopped-flow data (A and B) to two first-order reactions (k–1, k–2). Folding rate constants were determined from fitting to one first-order (k1) and one second-order reaction (k2).

 
While curve fitting of the stopped-flow traces did not allow us to attribute the observed rates (k1, k–1, k2, k–2) to specific steps in the folding pathways, a rational analysis of their values did. As the starting point for the folding pathway of 2a6 is the unfolded monomer and the final product the folded dimer, the observation of two distinct rates during folding and unfolding requires the existence of an intermediate form of 2a6. We suggest that this corresponds to the folded monomer.

The two unfolding rates can be put in a sequential order. During the unfolding, the faster rate (k–2 = 0.0353 s–1 when extrapolated to 0 M denaturant) must precede the slower rate (the extrapolated k–1 = 0.00975 s–1), otherwise only the rate-determining slower rate would be observable. One of these (first-order unfolding rates) must describe the reverse reaction of the one first-order reaction (k1) observed during refolding. This rate must be k–1, because only the k–1 rates (and not the k–2 rates), which were observed at different urea concentrations, can be combined with the observed k1 rates in a Chevron plot (Figure 3C) using equation 18.6 in Fersht (1998Go):

lnkobs = ln{k1,waterexp(–m1[urea]) + k–1,waterexp(m–1[urea])}

Hence the unfolding reaction described by k–1 is the second step during unfolding and the reverse reaction of that described by k1. As the folding reaction described by k1 is first order (i.e. unimolecular), k1 must be the rate constant for the folding of the 2a6 monomer. Then the second-order refolding rate k2 must describe the second step during refolding (that is, the dimerization of the folded 2a6 monomers) with k–2 describing the dissociation of this dimer during unfolding. Their values at 0 M denaturant were determined by linear extrapolation of the respective lnkobs (Figure 3D).

The conclusions regarding the folding and unfolding rates are summarized in the diagram for the proposed folding pathway of 2a6 (Figure 4). The attribution of the determined rate constants to the specific steps of the folding pathway makes it possible to use the ratio of forward and backward rate constants associated with a specific folding step to determine the free energy difference between the reactants and products of this step according to:



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 4. Proposed folding pathway for 2a6 from unfolded monomer via folded monomer to folded dimer. Rate constants for the transitions between unfolded monomer (Mu) and folded monomer (Mf) and between folded monomer and folded dimer (Df) are given below the diagram. The free energy of unfolding for the transition from unfolded monomer to folded monomer ({Delta}GMu/Mf) was estimated from –RTln(k1/k–1) to be 0.85 kcal mol–1 and {Delta}GMf/Df for the folded monomer to folded dimer transition was estimated from –RTln(k2/k–2) to be 9.7 kcal mol–1. {Delta}GMu/DF for the (apparent) direct transition from unfolded monomer and folded dimer was determined from the average (11.4 kcal mol–1) of the values determined by thermodenaturation and by urea denaturation.

 
{Delta}G(Mu/Mf) = –RTln(k1/k–1) = 0.85 kcal mol–1

for the {Delta}G between unfolded and folded monomer and

{Delta}G(Mf/Df) = –RTln(k2/k–2) = 9.7 kcal mol–1

for the {Delta}G between folded monomer and folded dimer.

Combination of these two {Delta}G values [2{Delta}G(Mu/Mf) + {Delta}G(Mf/Df)] yields {Delta}G(Mu/Df) = 11.4 kcal mol–1 between unfolded monomer and folded dimer. This number is in reasonable agreement with the {Delta}G(Mu/Df) determined from the equilibrium unfolding experiment (urea unfolding 11.8 kcal mol–1; thermal unfolding 10.9 kcal mol–1). Hence the very short-lived accumulation of the folded monomer during unfolding does not significantly influence the determination of the free folding energy in the equilibrium denaturation experiments, where this intermediate is presumed absent.


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
We have extended our earlier attempts to create new folded proteins (Riechmann and Winter, 2000Go) by using a designed rather than natural polypeptide bait. The bait was derived by linking together topologically adjacent strands of a ß-sheet from an immunoglobulin V{kappa} domain and then fused to random polypeptide segments derived from human cDNA. One of the selected chimaeric proteins, 2a6, was purified as a soluble, intact and folded polypeptide and characterized in detail.

2a6 forms homodimers featuring cooperative and reversible unfolding during both thermal and chemical denaturation. Furthermore, the 2a6 dimer is largely resistant to proteolysis, both as a phage displayed fusion protein and in its soluble form. Other indicators for the folded nature of 2a6 are its CD spectrum, which is consistent with structural elements of both ß and {alpha} structure, and its distinct oligomerization status as a dimer. Unfortunately, the NMR signals for 2a6 are too broad to allow the determination of its solution structure and we have so far been unable to obtain crystals of 2a6 to determine its X-ray structure. It is therefore unclear whether the designed ß-sheet structure of the bait is retained within 2a6. Indication of a helical structure from the CD data, which according to the secondary structure prediction is most likely located in its C-terminal half, furthermore makes it unlikely that 2a6 folds into an immunoglobulin-like domain. However, the biochemical and biophysical analyses show that this novel domain is fully folded and has hallmarks of native proteins which are not found in less compact, native-like structures defined as molten globules (DeGrado et al., 1999Go).

The folding stability of 2a6 is ~11–12 kcal mol–1. This value is typical for dimers of natural proteins. {Delta}G for the unfolding of the Arc repressor dimer (2x53 residues) is 10 kcal mol–1 (Milla and Sauer, 1994Go). {Delta}G of the leucine zipper peptide GCN-4 dimer (2x33 residues) is 10.5 kcal mol–1 (Zitzewitz et al., 1995Go) and that of the E2 DNA binding domain dimer (2x80 residues) of the human papillomavirus is 11 kcal mol–1 (Mok et al., 1996Go). Accordingly, the folding and unfolding rates of the 2a6 dimer (4.6x105 M–1 s–1 and 0.035 s–1) are also in line with those seen in natural dimers (Jackson, 1998Go). The unfolding rate of the 2a6 monomer (0.0098 s–1) is also within normal values for natural monomeric proteins (Jackson, 1998Go), whereas the monomer folding rate (0.041 s–1) is rather slow, which is reflected in the very low stability of the transient 2a6 monomer (0.85 kcal mol–1).

The favourable and native-like folding properties are reflected in the ability of 2a6 to refold readily from heat-denatured bacterial lysates during purification, apparently without aggregation. The refolding rates of 2a6 are also consistent with the absence of alternative, unproductive folding pathways. If present, such pathways often lead to transiently accumulating aggregates at lower denaturant concentrations and a non-linear behaviour of the folding arm of the Chevron plot (Figure 3) (Silow and Oliveberg, 1997Go).

The dimer formation of 2a6 may be related to our selection process. Two out of four stably folded chimaeric proteins, in which the N-terminal half of CspA was recombined with random DNA fragments of bacterial origin, were found to be multimeric [1c2 and 1b11 in Riechmann and Winter (2000Go), whose oligomerization status was wrongly described in the original reference]. While the use of a helper phage and the opal stop codon within gene 3 of the phagemid (see Materials and methods) must favour the expression of monomers in most phage, it does not exclude the presence of multimers in a proportion of the phage. Indeed, the fact that the fusions are to the multivalent phage coat protein p3 (Model and Russel, 1988Go) may favour the selection of multimers over monomers in two respects. First, multimers are more likely to be stable and resist proteolysis; secondly, phage bearing protease-resistant multimers should be captured more readily owing to the greater avidity of binding of the multimeric barnase affinity tags.

A number of studies in the recent past have been aimed at generating novel proteins from partially or fully randomized polypeptide libraries, using different selection or screening strategies (Davidson and Sauer, 1994Go; Davidson et al., 1995Go; Doi et al., 1997Go, 1998Go; Keefe and Szostak, 2001Go). However, in most cases the selected polypeptides were insoluble or poorly structured when expressed on their own and had to be characterized in the presence of urea or guanidium chloride. We have shown that combinatorial shuffling of natural DNA fragments of human (this study) or bacterial (Riechmann and Winter, 2000Go) origin can be successfully used to create folded protein domains. In our earlier work, the three proteins that were soluble when expressed in E.coli proved to combine natural reading frames (Riechmann and Winter, 2000Go). Here we have shown that the protein 2a6 not only comprises a bait based on a designed ß-sheet but also an antisense read of a human gene. This suggests that non-coding DNA fragments could be mobilized in protein evolution through recombination events to generate soluble folded proteins directly. This would be expected to increase the evolutionary potential of genomes and complements exon shuffling as a productive way of recombining segments of coding DNA (Kolkman and Stemmer, 2001Go).


    Acknowledgements
 
We thank I.Lavenir for mass spectrometry analysis and S.Freund, U.Mayor and C.Johnson for help with the biophysical analysis. N.F. acknowledges the support of a Human Frontier Science Program longterm fellowship (LT0158/2000-M).


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Blake,C. (1978) Nature, 273, 267.[ISI]

Blake,C. (1983) Nature, 306, 535–537.[ISI][Medline]

Bogarad,L.D. and Deem,M.W. (1999) Proc. Natl Acad. Sci. USA, 96, 2591–2595.[Abstract/Free Full Text]

Chothia,C., Gough,J., Vogel,C. and Teichmann,S.A. (2003) Science, 300, 1701–1703.[Abstract/Free Full Text]

Combet,C., Blanchet,C., Geourjon,C. and Deleage,G. (2000) Trends Biochem. Sci., 25, 147–150.[CrossRef][ISI][Medline]

Cox,J.P., Tomlinson,I.M. and Winter,G. (1994) Eur. J. Immunol., 24, 827–836.[ISI][Medline]

Davidson,A.R. and Sauer,R.T. (1994) Proc. Natl Acad. Sci. USA, 91, 2146–2150.[Abstract]

Davidson,A.R., Lumb,K.J. and Sauer,R.T. (1995) Nat. Struct. Biol., 2, 856–864.[ISI][Medline]

DeGrado,W.F., Summa,C.M., Pavone,V., Nastri,F. and Lombardi,A. (1999) Annu. Rev. Biochem., 68, 779–819.[CrossRef][ISI][Medline]

Doi,N., Itaya,M., Yomo,T., Tokura,S. and Yanagawa,H. (1997) FEBS Lett., 402, 177–180.[CrossRef][ISI][Medline]

Doi,N., Yomo,T., Itaya,M. and Yanagawa,H. (1998) FEBS Lett., 427, 51–54.[CrossRef][ISI][Medline]

Fedorova,L. and Fedorov,A. (2003) Genetica, 118, 123–131.[CrossRef][ISI][Medline]

Fersht,A.R. (ed.) (1998) Structure and Mechanism in Protein Science. Freeman, San Francisco, pp. 540–572.

Finucane,M.D. and Woolfson,D.N. (1999) Biochemistry, 38, 11613–11623.[CrossRef][ISI][Medline]

Greenfield,N. and Fasman,G.D. (1969) Biochemistry, 8, 4108–4116.[ISI][Medline]

Hartley,R.W. (1993) Biochemistry, 32, 5978–5984.[ISI][Medline]

Hoogenboom,H.R., Griffiths,A.D., Johnson,K.S., Chiswell,D.J., Hudson,P. and Winter,G. (1991) Nucleic Acids Res., 19, 4133–4137.[Abstract]

Hosszu,L.L., Craven,C.J., Parker,M.J., Lorch,M., Spencer,J., Clarke,A.R. and Waltho,J.P. (1997) Nat. Struct. Biol., 4, 801–804.[ISI][Medline]

Jackson,S.E. (1998) Fold. Des., 3, R81–R91.[ISI][Medline]

Keefe,A.D. and Szostak,J.W. (2001) Nature, 410, 715–718.[CrossRef][ISI][Medline]

Kolkman,J.A. and Stemmer,W.P. (2001) Nat. Biotechnol., 19, 423–428.[CrossRef][ISI][Medline]

Kristensen,P. and Winter,G. (1998) Fold. Des., 3, 321–328.[ISI][Medline]

Lubienski,M.J., Bycroft,M., Jones,D.N. and Fersht,A.R. (1993) FEBS Lett., 332, 81–87.[CrossRef][ISI][Medline]

Martin,A., Sieber,V. and Schmid,F.X. (2001) J. Mol. Biol., 309, 717–726.[CrossRef][ISI][Medline]

Mateu,M.G. and Fersht,A.R. (1998) EMBO J., 17, 2748–2758.[Abstract/Free Full Text]

Mayr,E.M., Jaenicke,R. and Glockshuber,R. (1997) J. Mol. Biol., 269, 260–269.[CrossRef][ISI][Medline]

Meiering,E.M., Serrano,L. and Fersht,A.R. (1992) J. Mol. Biol., 225, 585–589.[ISI][Medline]

Milla,M.E. and Sauer,R.T. (1994) Biochemistry, 33, 1125–1133.[ISI][Medline]

Model,P. and Russel,M. (1988) In Calender,R. (ed.), The Bacteriophages. Plenum Press, New York, pp. 375–456.

Mok,Y.-K., De Prat Gay,G., Butler,P.J. and Bycroft,M. (1996) Protein Sci., 5, 310–319.[Abstract/Free Full Text]

Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536–540.[CrossRef][ISI][Medline]

Pace,C.N., Shirley,B.A. and Thomson,J.A. (1989) In Creighton,T.E. (ed.), Protein Structure, a Practical Approach. IRL Press, Oxford, pp. 311–330.

Riechmann,L. and Winter,G. (2000) Proc. Natl Acad. Sci. USA, 97, 10068–10073.[Abstract/Free Full Text]

Schmid,F. (1989) In Creighton,T.E. (ed.), Protein Structure, a Practical Approach. IRL Press, Oxford, pp. 251–285.

Sieber,V., Pluckthun,A. and Schmid,F.X. (1998) Nat. Biotechnol., 16, 955–960.[ISI][Medline]

Silow,M., Oliveberg,M. (1997) Proc. Natl Acad. Sci. USA 94, 6084–6086.[Abstract/Free Full Text]

Wüthrich,K. (1986) In Wüthrich,K. (ed.) NMR of Proteins and Nucleic Acids. Wiley, New York, pp. 26–39.

Zitzewitz,J.A., Bilsel,O., Luo,J.B., Jones,B.E. and Matthews,C.R. (1995) Biochemistry, 34, 12812–12819.[ISI][Medline]

Received October 2, 2003; accepted October 14, 2003 Edited by Alan Fersht