PoPMuSiC, an algorithm for predicting protein mutant stability changes. Application to prion proteins

Dimitri Gilis,1 and Marianne Rooman

Ingénierie Biomoléculaire, Université Libre de Bruxelles, CP 165/64,50 avenue Roosevelt, 1050 Brussels, Belgium


    Abstract
 Top
 Abstract
 Introduction
 Description of the PoPMuSiC...
 Application of PoPMuSiC to...
 Discussion
 References
 
A novel tool for computer-aided design of single-site mutations in proteins and peptides is presented. It proceeds by performing in silico all possible point mutations in a given protein or protein region and estimating the stability changes with linear combinations of database-derived potentials, whose coefficients depend on the solvent accessibility of the mutated residues. Upon completion, it yields a list of the most stabilizing, destabilizing or neutral mutations. This tool is applied to mouse, hamster and human prion proteins to identify the point mutations that are the most likely to stabilize their cellular form. The selected mutations are essentially located in the second helix, which presents an intrinsic preference to form ß-structures, with the best mutations being T183->F, T192->A and Q186->A. The T183 mutation is predicted to be by far the most stabilizing one, but should be considered with care as it blocks the glycosylation of N181 and this blockade is known to favor the cellular to scrapie conversion. Furthermore, following the hypothesis that the first helix might induce the formation of hydrophilic ß-aggregates, several mutations that are neutral with respect to the structure's stability but improve the helix hydrophobicity are selected, among which is E146->L. These mutations are intended as good candidates to undergo experimental tests.

Keywords: computer experiments/conformational change/database-derived potentials/mutant stability/sequence design


    Introduction
 Top
 Abstract
 Introduction
 Description of the PoPMuSiC...
 Application of PoPMuSiC to...
 Discussion
 References
 
The design and analysis of proteins and peptides with a modified sequence are interesting in several respects. First, they lead to improvements of the understanding of the relationships between sequence and structure. Indeed, information about the interactions that stabilize the tertiary structure of proteins can be obtained by determining the stability changes caused by mutations in their sequence. By comparing the changes in unfolding and activation free energies upon mutation, one can moreover get information about the structures that are already formed in the folding transition state (for a review, see Fersht and Serrano, 1993).

On a more practical level, rational protein design has numerous applications related to the modification of some specific physico-chemical or biological properties. These modifications involve, for instance, increasing the protein solubility to enhance or to maintain its activity under unusual pH or temperature conditions. Whatever the property that is changed, one has to check if the introduced mutations do not alter the protein structure and stability too much. Of course, the experimental determination of the change in folding free energy between wild-type and mutant proteins leads to the most reliable stability information, but it is time consuming and cannot be used to test all possible mutations in a protein. It is therefore useful to have predictive methods to reduce the number of mutations to be tested experimentally.

Several theoretical methods have been developed to predict stability changes caused by mutations in proteins. The earliest involved free energy calculations with detailed atomic models coupled to semi-empirical potentials (Basch et al., 1987Go; Tidor and Karplus, 1991Go), but they are so computer time consuming that it is impossible to test a large set of mutations. Since then, other approaches have been developed which are based on a simplified description of the protein and of the energetic function used to evaluate the stability changes. A first one combined a simplified force field with a search in a limited conformational space (Lee and Levitt, 1991Go; Lee, 1994Go) or with a homology modeling procedure (Lee, 1995Go). Wang et al. (1996) used mean force field calculations to evaluate the compatibility between amino acids and their protein environment. Others have tested the performance of database-derived potentials, in particular hydrophobic potentials (Koehl and Delarue, 1994Go), secondary structure potentials (Muñoz and Serrano, 1994Go), residue contact potentials (Miyazawa and Jernigan, 1994Go) and distance-dependent residue–residue interaction potentials (Sippl, 1995Go). Ota et al. (1995) have developed a pseudo-energy potential to evaluate the sequence–structure compatibility of mutated proteins. Their energy function is a modified version of the 3D profile introduced by Bowie et al. (1991), where the tertiary structure is converted into a 1D string representing the environment of each residue of the sequence. Still others predict the effect of buried mutations by evaluating the energetic cost of cavity formation when mutating a small residue into a larger one (Eriksson et al., 1992Go; Rashin et al., 1997Go).

Simpler approaches take into account the physico-chemical properties of the amino acids (van Gunsteren and Mark, 1992Go) or some structural properties of the environment of the mutated residue, such as the number of {alpha}-carbon atoms (Shortle et al., 1990Go) or the number of methyl and methylene groups (Serrano et al., 1992Go). Damborsky (1998) related these physico-chemical properties to stability data coming from sets of proteins that contain systematic substitutions at certain positions along the sequence. Finally, some approaches use amino acid substitution tables, derived from tolerated amino acid replacements in families of homologous proteins (Steipe et al., 1994Go; Topham et al., 1997Go).

The performance of all these methods is fairly good in general. However, their predictive value may sometimes be questioned. Indeed, many of them are only applied to a few residues or to residues that present common characteristics, situated for instance at the same buried position in the protein. The most extensively tested methods are simultaneously applied to mutations in only two different proteins. A method that give predictions correlating well with experiment on any point mutations, irrespective of the protein or peptide environment, is clearly lacking.

The PoPMuSiC program for computer-aided design of single-site mutations is intended to fill this gap. In brief, it evaluates the changes in stability of a given protein or peptide under all possible single site mutations, either in the whole sequence or in a region specified by the user, and returns a list of the most stabilizing or destabilizing mutations or of the mutations that do not affect stability. It uses different combinations of database-derived potentials, according to the solvent accessibility of the mutated residues. This program is an extension of our procedure for predicting stability changes upon point mutations perturbing only slightly the structure of the mutated proteins (Gilis and Rooman, 1996Go, 1997Go, 1999Go). This procedure has been tested on 344 experimentally studied mutations introduced at 132 different sites in seven different proteins and a synthetic peptide. It can therefore reasonably be expected to have a universal prediction value.

In the second part of this paper, PoPMuSiC is applied to mouse, hamster and human prion proteins. Our objective consists of identifying mutations that could stabilize the cellular prion form and hence limit the conversion of the cellular into the scrapie form that is the basis of a set of neurodegenerative diseases.


    Description of the PoPMuSiC program
 Top
 Abstract
 Introduction
 Description of the PoPMuSiC...
 Application of PoPMuSiC to...
 Discussion
 References
 
Program summary

The only input that PoPMuSiC requires is the wild-type protein or peptide structure in PDB (Bernstein et al., 1977Go) format. The output that it generates contains the n (with n a number given by the user) mutations that are, according to the user's wishes, the most destabilizing, the most stabilizing or neutral. By default, mutations are performed on the whole sequence, but the user may limit the mutations to a specified region of the sequence.

A schematic description of the program is given in Figure 1Go. It proceeds by first computing effective potentials derived from a set of known protein structures (as described in the section Database-derived potentials). If the protein to be mutated has a sequence identity of >25% with respect to a protein from this set, this protein is excluded when calculating the potentials. The program then reads the coordinates of the protein to be mutated from the PDB file, positions the average side chain centroids and computes the backbone torsion angle domains (as explained in the section Protein representation). Then, it mutates at each sequence position the wild-type amino acid into the 19 other amino acids and evaluates the changes in folding free energy caused by each of these mutations (as described in the section Evaluation of folding free energy changes). All these mutations are then classified as a function of increasing or decreasing folding free energy changes or as a function of smallest folding free energy changes in absolute value, according to the chosen option.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 1. Schematic description of the PoPMuSiC program.

 
All these procedures are detailed in the following sections. It has to be stressed that the PoPMuSiC program is fairly fast. For a protein of 150 residues, for example, the evaluation of all possible mutations takes only a few minutes of CPU time on a 300 MHz Linux PC and this time increases only linearly with protein length.

Protein representation

Proteins are represented by the main chain atoms N, C{alpha}, C and O, by the side chain atom Cß and by the pseudo-atom Cµ, corresponding to the geometric average of all heavy side chain atoms of a given amino acid type in a dataset of known structures (Kocher et al., 1994Go); for glycine, the Cµ pseudo-atom and the Cß are positioned on the C{alpha}. The Cµ therefore has a well-defined position for each amino acid type, which means that side chain degrees of freedom are neglected.

The backbone conformation of a protein is represented by the ({phi}, {Psi}, {omega}) torsion angles of its residues. These angles are grouped into seven domains called A, C, B, P, G, E and O, whose limits can be found in Rooman et al. (1991). Domain O groups all cis-conformations ({omega} {approx} 0°). The other six domains correspond to trans-conformations ({omega} {approx} 180°): A groups {alpha}-helical and C 310-helical structures, B corresponds to ß-like extended and P to polyproline-like extended conformations and G and E have negative {phi} angles, mirror-symmetrical with A/C and B/P, respectively.

Database-derived potentials

The potentials we use to evaluate the protein conformations are derived from observed frequencies of sequence and structure patterns in a dataset consisting of 141 high-resolution (<=2.5 Å) protein X-ray structures, with <25% sequence identity or no structure similarity, listed in Wintjens et al. (1996). We consider two types of potentials, called torsion and Cµ–Cµ potentials.

Torsion potentials describe only local interactions along the sequence. They take into account the propensities of ({phi}, {Psi}, {omega}) backbone torsion angle domains and pairs of ({phi}, {Psi}, {omega}) domains to be associated with a given amino acid. For this purpose we consider the seven ({phi}, {Psi}, {omega}) domains described in the previous section. Two variants of the torsion potential are used, called torsionshort-range and torsionmiddle-range. Both are computed from propensities of a ({phi}, {Psi}, {omega}) domain ti, at position i along the sequence or pairs of domains (ti, tj), at positions i and j, to be associated with an amino acid ak at position k. But we have k – 1 <= i,j <= k + 1 for the torsionshort-range potential and k 8 <= i,j <= k + 8 for the torsionmiddle-range potential. The precise expression of the folding free energy {Delta}Gtorsion computed from these propensities can be found in Gilis and Rooman (1996).

The Cµ–Cµ potentials are distance potentials dominated by non-local, hydrophobic interactions. They are based on propensities of pairs of amino acids (ai,aj) at positions i and j along the sequence to be separated by a spatial distance dij, calculated between the average side chain centroids Cµ (Kocher et al., 1994Go). We consider two variants of Cµ–Cµ potentials. The first one, called Cµ–Cµlong-range, describes purely non-local interactions along the sequence and only takes into account residues separated by at least 15 residues along the sequence, i.e. j >= i + 16. The second one, simply called Cµ–Cµ potential, although dominated by non-local interactions, possesses a local interaction component. The non-local component is obtained by considering together the frequencies of all residues separated by seven sequence positions and more, thus with j >= i + 8. The local component is obtained by computing separately the frequencies of residues separated by one to six positions along the sequence, for i + 1 < j < i + 8. The resulting folding free energies {Delta}GCµ–Cµ are given in Gilis and Rooman (1997).

Evaluation of folding free energy changes

To estimate the stability changes caused by single-site mutations, we compute the folding free energy changes as


where Cm and Cw are the mutant and wild-type conformations and Sm and Sw the mutant and wild-type sequences, respectively. With this convention, {Delta}{Delta}G is positive when the mutation is destabilizing and negative when it is stabilizing. The conformations Cm and Cw of the mutant and wild-type protein are assumed to be nearly identical. More precisely, the backbone conformations are taken as identical and only the position of the Cµ pseudo-atom, which is amino acid dependent, is different in the mutant and wild-type structures.

The folding free energies of the wild-type and mutant proteins are computed with linear combinations of the torsion and Cµ–Cµ potentials described in the previous section. Previous analyses (Gilis and Rooman, 1996Go, 1997Go, 1999Go) have shown that the combination that gives the best evaluation of the {Delta}{Delta}G depends on the solvent accessibility A of the mutated residue. These analyses were performed on a test set of 344 experimentally studied mutations introduced in seven different proteins and a synthetic peptide. They revealed that the weight of the local interactions described by torsion potentials increases when approaching the protein surface, whereas the weight of tertiary interactions described by Cµ–Cµ potentials increases when penetrating the protein core. The solvent accessibility A of a mutated residue is here defined as its solvent-accessible surface in the wild-type structure, computed by SurVol (Alard, 1991Go), over its solvent accessible surface in an extended tripeptide Gly–X–Gly (Rose et al., 1985Go).

The mutations can be divided into three subsets. When the mutated residue is at the surface, with a solvent accessibility A >= 50%, the optimal folding free energy has been shown to be equal to


The correlation coefficient between measured and computed {Delta}{Delta}Gs is equal to 0.87 on 91% of the 106 mutations; the excluded mutations are suspected to modify the backbone conformation (most of them are prolines) or to modify the stability of the unfolded state (Gilis and Rooman, 1996Go). To estimate the reliability of the computed {Delta}{Delta}Gs, we calculate the percentage of mutations that have their measured {Delta}{Delta}G in a certain interval I centered around their computed {Delta}{Delta}G. We find that 90% of the considered surface mutations have their {Delta}{Delta}Gmeasured in the interval I = {Delta}{Delta}GA >= 50% ± 0.42 kcal/mol and 70% of them within I = {Delta}{Delta}GA >= 50% ± 0.27 kcal/mol.

When the mutated residue is half buried, half exposed to the solvent, with a solvent accessibility comprised between 20 and 40%, the optimal folding free energy is


In this case, the correlation coefficient between measured and computed {Delta}{Delta}Gs is equal to 0.80 on 94% of the 68 mutations (Gilis and Rooman, 1997Go). The confidence on the computed {Delta}{Delta}Gs is, however, lower than for surface mutations. We find that 90% of these mutations have their {Delta}{Delta}Gmeasured in the interval I = {Delta}{Delta}G20 <= A <= 40% ± 1.28 kcal/mol and 70% of them within I = {Delta}{Delta}G20 <= A <= 40% ± 0.81 kcal/mol.

Finally, when the mutated residue is totally buried in the protein core, with a solvent accessibility <=20%, the optimal folding free energy is


The correlation coefficient between measured and computed {Delta}{Delta}Gs is equal to 0.80 on all the 119 mutations. For 90% of these mutations the {Delta}{Delta}Gmeasured falls in the interval I = {Delta}{Delta}GA <= 20% ± 2.30 kcal/mol and for 70% of them within I = {Delta}{Delta}GA <= 20% ± 1.46 kcal/mol. When restricting to mutated and mutant amino acids of similar size, with their radii differing by at most 0.2 Å, the correlation coefficient between measured and computed {Delta}{Delta}Gs increases up to 0.87 (Gilis and Rooman, 1997Go) and the computed {Delta}{Delta}Gs become more reliable: for 90% of these mutations we have I = {Delta}{Delta}GA <= 20% ± 1.87 kcal/mol and for 70% of them I = {Delta}{Delta}GA <= 20% ± 1.19 kcal/mol. Note that not only the errors on the computed {Delta}{Delta}Gs are higher for buried than for surface mutations, but also the {Delta}{Delta}Gs themselves (in absolute value), which explains why the correlation coefficients remain good for buried mutations.

When the mutated residue has a solvent accessibility between 40 and 50%, we do not evaluate its folding free energy. We have indeed seen that in this case, the solvent accessibility of the mutated residue is not a good measure to guide the choice of the optimal potential (Gilis and Rooman, 1997Go).

Discussion

The former tests of our procedure for predicting stability changes upon point mutations (Gilis and Rooman, 1996Go, 1997Go, 1999Go) confirm the good performance of the PoPMuSiC design program. We obtained correlation coefficients between measured and computed {Delta}{Delta}Gs between 0.80 and 0.87 on 279 out of a set of 296 mutations, excluding mutations of residues with solvent accessibility in the 40–50% range. These mutants were introduced in various environments of seven different proteins and a synthetic peptide, thereby suggesting the universality of our approach. Let us emphasize again that this property is unique among mutant stability prediction programs.

Yet in spite of its predictive power, PoPMuSiC admittedly suffers from limitations. Indeed, we found no combination of potentials allowing evaluation of the {Delta}{Delta}Gs of mutations of residues with a solvent accessibility between 40 and 50%. We only know there is a probability of 0.5 for {Delta}{Delta}G being correctly evaluated with the surface residue potential. Future work will therefore be devoted to the search for parameters other than the solvent accessibility to select the optimal combination of potentials. Second, if a mutation causes drastic structural rearrangements in the protein, our evaluation of {Delta}{Delta}G breaks down as it is based on the hypothesis that the wild-type and mutant structures are very similar. Since prolines can particularly be suspected to modify the backbone structure, we prefer to exclude mutations implying prolines or at least handle the corresponding predictions with care. Note, however, that the point mutations affecting the protein structure constitute a small minority of all possible mutations and that overlooking them only entails a small limitation.

Furthermore, while the correlation coefficients remain high whatever the environment of the mutated residue, the errors in the computed {Delta}{Delta}Gs are more variable. On the one hand, the stability change of mutations of solvent-accessible residues is evaluated with fairly good precision. The errors in this case are of the order of ±0.3–0.4 kcal/mol and thus roughly comparable to the errors in the experimentally measured {Delta}{Delta}Gs. At the other extreme, the errors on mutations of buried residues are of the order of ±1.2–1.9 kcal/mol and are even higher for mutations for which the sizes of the mutant and mutated amino acids differ significantly, probably because these lead to more important structural rearrangements. However, it should be noted that our program is extremely fast and capable of estimating the {Delta}{Delta}Gs of thousands of mutations in a few seconds. It therefore cannot be expected to compete with more time-consuming approaches based on an all-atom description and detailed force fields. The purpose of our program is not to predict exactly the stability change of a given mutation, but rather to propose a limited set of possible mutations likely to have the required (de)stabilization properties.

Moreover, when several structures of a given protein or of homologous proteins are known, an improved procedure can be devised consisting of combining the results obtained by PoPMuSiC and selecting the common predictions in all the structures. This procedure allows one to reduce the errors due to the roughness of the computations and to structural imprecisions entailing, for example, modifications in the solvent accessibility of the mutated residues. The confidence in the common predictions is difficult to estimate, but may certainly be expected to be much higher than that of the separate predictions.

PoPMuSiC therefore appears to be an efficient tool to restrict the number of mutations that must be tested experimentally or with more detailed theoretical methods. It can, for instance, be used in conjunction with physico-chemical requirements, such as the modification of solubility or functional properties. In this case, PoPMuSiC can be used to propose a set of mutations that do not affect the protein stability but modify the required physico-chemical properties. In another type of application, it can be used first to identify the protein regions or secondary structures where the number of stabilizing or destabilizing mutations is highest and which are thus most likely to be stabilized or destabilized. Within these regions, PoPMuSiC can then give more detailed information about the specific positions and mutant amino acids that have a (de)stabilizing effect. This is the approach followed in the second part of this paper, where we apply PoPMuSiC to predict mutations that are likely to stabilize the cellular form of mouse, hamster and human prion proteins.


    Application of PoPMuSiC to prion proteins
 Top
 Abstract
 Introduction
 Description of the PoPMuSiC...
 Application of PoPMuSiC to...
 Discussion
 References
 
Description of the prion proteins

The prion proteins (PrP) occur in two isoforms, called cellular (PrPC) and scrapie (PrPSc). Under some (not well-known) circumstances, the PrPC form is converted into PrPSc. This conversion is at the basis of a group of human and animal neurodegenerative diseases (Prusiner, 1991Go; Cohen and Prusiner, 1998Go; Prusiner et al., 1998Go). It has been demonstrated to involve a major conformational change, with helices transformed into ß-structures. Indeed, spectroscopic measurements have shown that the PrPC form contains essentially {alpha}-helices, whereas PrPSc has a high ß-sheet content (Pan et al., 1993Go; Safar et al., 1993Go; Pergami et al., 1996Go).

There is no experimental evidence about the structure of the scrapie PrPSc form apart from its high ß-sheet content. In contrast, the structure of the cellular PrPC form has been determined by nuclear magnetic resonance spectroscopy, for mouse, hamster and human prions (Riek et al., 1996Go; James et al., 1997Go; Zahn et al., 2000Go); their PDB entries are 1ag2, 1b10 and 1qlx, respectively (Bernstein et al., 1977Go). The sequence identity between them is of the order of 90% (see Fig. 2Go) and their structures superimpose with a backbone root mean square deviation comprised between 1.7 and 2.9 Å. They are thus fairly similar. Only the sequence regions [124–226] have been solved, the N-terminal parts [1-123] having no well-defined structure. The [124–226] structures consist of three helices, denoted H1, H2 and H3, whose limits are [144–153], [172–192] and [200–222] in mouse PrPC. H1 is flanked by two short ß-strands, which form an anti-parallel ß-sheet.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 2. Histograms representing the sum of the {Delta}{Delta}Gs for the stabilizing mutations in PrPC as a function of the sequence position. Top histogram, mouse PrPC; middle histogram, hamster PrPC; bottom histogram, human PrPC. The amino acid sequences are indicated above the histograms. The helices, defined according to DSSP (Kabsch and Sander, 1983Go), are colored red and the ß-strands blue. The names of the secondary structures are indicated in the top row. The underlined residues interact with protein X. Symbols x indicate that the solvent accessibility of the mutated residue is in the 40–50% range and hence that the {Delta}{Delta}G could not be reliably estimated.

 
The conformational change that transforms PrPC into PrPSc is not as simple as it appears at first sight. It seems to be mediated by another protein, referred to as protein X (Telling et al., 1995Go). Not much is known about this protein, except that it binds to some well-defined residues at the surface of the cellular form PrPC. These residues have experimentally been shown to involve three glutamines and a valine in mouse PrP (Kaneko et al., 1997Go): Q168 in the turn preceding the H2 helix, Q172 corresponding to the first residue in helix H2 and V215 and Q219 in helix H3 (Figure 2Go). The side chains of V215 and Q219 protrude from the same side of helix H3 and form a discontinuous epitope with the spatially adjacent residues Q168 and Q172 of loop H1–H2.

Several inherited forms of prion-based diseases are characterized by specific point mutations in PrP. Some of these mutations have been shown to destabilize the PrPC form but others are neutral, thereby suggesting that destabilization is not a general mechanism underlying the formation of PrPSc (Liemann and Glockshuber, 1999Go); this conclusion must be considered with care, however, as nothing is known about the effect of these mutations on the stability of PrPSc. Another result indicating the complexity of the PrPC to PrPSc conversion is that glycosylation may modulate the efficiency of the conversion (Lehmann and Harris, 1997Go). In particular, the blockade of glycosylation appears to promote acquisition of scrapie-like properties.

Recently, Morrissey and Shakhnovich (1999) proposed a model of prion infectivity based on the properties of helix H1. They stressed that H1 is highly charged and entirely composed of hydrophilic residues; it is actually the most soluble helix in all PDB structures. It has moreover a low capacity to form hydrophobic contacts with the protein core. They performed CHARMM energy calculations showing that ß-sheet-like aggregates of more than two H1 fragments are energetically favorable. Their model is based on the particular properties of the H1 helix, but is not incompatible with the possible binding of PrPC to protein X.

There therefore appear to be several possible scenarios to enhance or block the PrPC to PrPSc conversion. The first is to modify the affinity of PrP for protein X and requires substituting the residues near the PrP/protein X interface (Kaneko et al., 1997Go). A second one consists of decreasing the solubility of helix H1 so as to enhance its hydrophobic contacts with the protein core. A last one is to modify the relative stability of the two isoforms PrPC and PrPSc, without touching the amino acids binding to protein X. Note that stabilizing the cellular form will modulate the conversion, whether this conversion is under kinetic or under thermodynamic control. Indeed, if the conversion is under thermodynamic control, increasing the difference in stability between the cellular and the scrapie forms will have an influence on the equilibrium constant. If the kinetic control prevails, stabilizing the cellular form will heighten the kinetic barrier, assuming the transition state energy to remain unchanged (Cohen and Prusiner, 1998Go).

We shall focus in this paper on the last two scenarios. With the help of the PoPMuSiC algorithm, we shall in a first stage identify mutations that are likely to modify the intrinsic stability of PrPC in mouse, hamster and human PrPs, with a view to modifying their propensities of interconversion. In a second stage, we shall identify mutations that decrease the hydrophilicity of H1 and enhance or at least maintain the stability of the PrPC form.

Application of PoPMuSiC to mouse, hamster and human PrPC

First we use the PoPMuSiC design program to identify the single-site mutations that increase most the intrinsic stability of the cellular form of mouse, hamster and human PrPs, with the aim of proposing mutations that decrease the conversion of the cellular form into the scrapie form. In particular, we perform in silico, in the three PrPs, the 19 possible mutations at each sequence position and compute the associated changes in folding free energy {Delta}{Delta}G. We then calculate, for each position separately, the sum of the {Delta}{Delta}Gs of the subset of stabilizing mutations. These values give an estimation of whether it is possible to stabilize the structure by mutating certain residues. They are plotted in Figure 2Go.

Let us focus on the helices H1, H2 and H3. For the three PrPs considered, the amount of possible stabilization is highest in H2 and lowest in H1. Indeed, the sum of negative {Delta}{Delta}Gs is equal to –0.3 kcal/mol per residue on the average for H1, to –1.2 kcal/mol for H2 and to –0.7 kcal/mol for H3 (Table IGo). This indicates that H2 is the least stable helix and thus the most likely to be stabilized. Strikingly, this helix has exactly the same sequence in mouse, hamster and human PrPs.


View this table:
[in this window]
[in a new window]
 
Table I. Mean value of the {Delta}{Delta}Gs of the stabilizing mutations per residue, for the helices H1, H2 and H3 in mouse, hamster and human PrPs
 
Inspection of Figure 2Go shows that among all residues of H2, T183 has by far the most negative sum of stabilizing {Delta}{Delta}Gs in all three PrPs: this sum is equal to –7.8 kcal/mol on average. T183 is therefore the best candidate, out of all H2 residues, to be mutated with a view to stabilizing the PrPC structure. Seven other H2 residues can also be favorably mutated in all three PrPs. These are Q172, N173, Q186, H187, T188, V189 and T192. The specific mutations predicted to yield the lowest {Delta}{Delta}Gs in the three PrPs are listed in Table IIGo. The best candidate is T183->F.


View this table:
[in this window]
[in a new window]
 
Table II. The mutations in helix H2 that stabilize most the cellular form PrPC in mouse, hamster and human PrP
 
To understand what happens in helix H2, we applied the programs Prelude and Fugue (Rooman et al., 1991Go, 1992Go), which use database-derived potentials to predict the local structure of proteins or protein segments in absence of tertiary interactions; these structures are described by seven domains in ({phi}, {Psi}, {omega}) backbone torsion angle domains. The result is unambiguous: there is a very strong prediction for extended structure in the sequence region spanning residues 181–194, that is, in the C-terminal half of the H2 helix, in mouse, hamster and human PrPs. Thus, in absence of interactions with the rest of the sequence, this fragment should preferentially adopt an extended conformation. In contrast, the N-terminal part of helix H3 is predicted to present a strong intrinsic preference for a helical conformation.

We also performed predictions with Prelude and Fugue on the three T183->F mutant PrPs. These mutants do not annul the extended structure predictions in the H2 helix, but drastically diminish their strength.

It therefore seems that the H2 helix has an intrinsic propensity to form extended ß-structures in absence of the rest of the chain. Moreover, the known interaction sites of the prion protein with the protein that mediates the conformational change are situated in helix H3 and in the H1–H2 loop (Figure 2Go) and only involve the first residue of H2. Hence, changes in H2 probably do not affect this interaction. Helix H2 and in particular residue T183 in H2 therefore appears to be the ideal candidate to mutate in view of stabilizing the cellular form PrPC and hence to decrease the PrPC to PrPSc conversion.

This conclusion is, however, somewhat hasty. Indeed, residue T183 is necessary for allowing glycosylation of residue N181 and mutating it would block this process. This is not advisable because of the role of glycosylation in folding and maturation of nascent chains (Helenius, 1994Go). Moreover, in the case of PrP, the blockade of glycosylation has been shown to favor the PrPSc form (Lehmann and Harris, 1997Go). In summary, the mutation of T183 is predicted to stabilize PrPC and to decrease the ß-structure propensity of H2, but provokes the blockade of the glycosylation site, thereby promoting the PrPC to PrPSc conversion. These opposite tendencies, as well as the possible role in maturation, make the prediction of the actual effect of this mutation difficult. It is striking, however, that the position in H2 that is the least optimized through evolution with regard to stability is necessary for glycosylation.

The other mutations in H2 predicted to stabilize PrPC are H187->W, T192->A and Q186->A. When performing predictions with Prelude and Fugue on these three mutations, we see that only T192->A and Q186->A diminish the strength of the extended structure prediction. Although we cannot be sure of the effect of H187->W on PrPSc, as we do not know its structure, and as Prelude and Fugue only consider local interactions along the chain, we prefer to reject a mutation predicted to increase the intrinsic preference for ß-structures, dominant in PrPSc. We therefore only retain the mutations T192->A and Q186->A. Note, however, that these mutations are predicted to be significantly less stabilizing than some of the mutations involving T183.

The results plotted in Figure 2Go also reveal sequence regions outside the helices that are likely to be stabilized upon mutation. In particular, Q160, which is situated in the ß2-strand or just before it depending on the structure, has a sum of stabilizing {Delta}{Delta}Gs between –6.9 and –11.6 kcal/mol in mouse, hamster and human PrPs. This residue does not make contacts with protein X and is rather buried in the protein core. The most favorable mutation in the three PrPs is Q160->C, as measured by {Delta}{Delta}Gs between –1.7 and –1.9 kcal/mol. The next most favorable mutations are listed in Table IIIGo. They involve residues situated in or near ß2 and in the ß1-H1 loop.


View this table:
[in this window]
[in a new window]
 
Table III. The mutations in non-helical regions that stabilize most the cellular form PrPC in mouse, hamster and human PrP
 
The application of Prelude and Fugue to the Q160->C mutants of mouse, hamster and human prions shows that this mutation strengthens the prediction of the ß2-strand. Hence, although this mutation seems to stabilize the cellular form PrPC, it might very well also stabilize the ß-rich scrapie form PrPSc. This mutation is therefore not a reliable candidate for blocking the PrPC to PrPSc conversion. Moreover, as it introduces a Cys residue, it could be suspected to interfere with the formation of the disulfide bridge that links the only two Cys residues of the wild-type sequence.

In a second stage, we use PoPMuSiC to propose mutations that decrease the hydrophilicity of helix H1 without destabilizing it too much. Indeed, according to the scenario of Morrissey and Shakhnovich (1999), H1 could be involved in hydrophilic ß-sheet-like aggregates, owing to its high hydrophilicity and particular properties. As shown above, H1 is the most stable of the three helices and presents very few stabilizing mutations. Among the hydrophilic residues of H1 (for the definition of hydrophobic/hydrophilic residues, see Table IVGo), only E146 and R151 can favorably mutate and even then not in all three PrP species. Moreover, we restrict ourselves to mutations of totally or partially buried residues, as our purpose is to increase the hydrophobic core; we therefore drop R151 and all other residues with solvent accessibility in the 50–100% range. Table IVGo contains the least destabilizing out of the so-defined subset of mutations. The two best candidate mutations are E146->C and E146->L, which present an increase in hydrophilicity of 7.7 and 7.6 according to the hydrophobicity scale of Woese et al. (1966). The latter mutation is, however, predicted to have better stability properties than the former: it is computed as stabilizing in mouse and human PrPC and destabilizing in hamster PrPC.


View this table:
[in this window]
[in a new window]
 
Table IV. The mutations of hydrophilic into hydrophobic residues in helix H1 that stabilize or do not destabilize too much the cellular form PrPC in mouse, hamster and human PrP
 
Prelude and Fugue reveal that the E146->L mutant enhances the intrinsic stability of helix H1 in the case of mouse and hamster PrPC and does not modify it in the case of human PrPC. In contrast, the E146->C mutant systematically decreases the intrinsic stability of H1. These results support the above prediction that the mutation E146->L increases the stability of the cellular form PrPC. Although the structure of the scrapie form is not known, they could also be taken to suggest that this mutation does not increase the stability of the ß-rich scrapie form PrPSc. The same argument does not hold for E146->C, because it seems to favor ß-structures, which are more numerous in the scrapie form. This mutation moreover involves a Cys residue which could be problematic for disulfide bond formation. We therefore do not retain it. In summary, if it were to turn out that the PrPC to PrPSc conversion follows a scenario of the type proposed by Shakhnovich and Morrissey (1999), the mutation E146->L would be a good candidate to limit this conversion.


    Discussion
 Top
 Abstract
 Introduction
 Description of the PoPMuSiC...
 Application of PoPMuSiC to...
 Discussion
 References
 
The second part of this paper has the double aim of showing how the PoPMuSiC program can be applied to real systems to select the best mutational candidates, but also of gaining information on the residues that are responsible for the conformational change turning harmless prions into diseasecausing proteins. We actually used PoPMuSiC to propose mutations likely to render the PrPC to PrPSc conversion more difficult. All possible mutations in mouse, human and hamster PrPs were introduced in silico and their {Delta}{Delta}Gs estimated, which led us to establish a profile representing for each residue the sum of the {Delta}{Delta}Gs of the stabilizing mutations (Figure 2Go). This profile allows one rapidly to detect the residues or sequence regions whose structure is less stable and could be stabilized upon mutation.

This profile indicated that residues Q160 and T183 present the most potentially stabilizing mutations in the three considered PrPCs, mouse, hamster and human. The former residue is located in or just before the ß2-strand and the latter in the middle of the H2 helix. The mutations of Q160 predicted to stabilize the PrPC form are, however, also predicted to stabilize the ß2-strand. They may therefore be suspected to enhance the stability of the putative scrapie structure PrPSc and are therefore dropped. The mutations of T183 are predicted to favor PrPC and to decrease the intrinsic preference for ß-structures, but imply the blockade of the glycosylation of N181, thereby favoring the scrapie form (Lehmann and Harris, 1997Go). They are thus on the one hand not advisable and on the other hand particularly interesting and deserve further study.

Excluding the above two mutations, the next best candidates are T192->A and Q186->A. They are not only predicted to stabilize the PrPC structure, but also to decrease drastically the intrinsic propensity of the H2 helix to adopt an extended conformation in the absence of interactions with the rest of the chain. These mutations therefore appear to be good candidates to shift the equilibrium state towards PrPC or to heighten the kinetic barrier, thereby decreasing the rate of PrPC to PrPSc conversion without touching its interaction with the mediator protein X.

Following the proposition (Morrissey and Shakhnovich, 1999Go) that the highly hydrophilic helix H1 could be at the basis of the conversion of the cellular into the scrapie forms by inducing ß-sheet-like aggregates, we use PoPMuSiC to suggest appropriate mutations in H1. As this helix is fairly stable by itself, only slightly stabilizing or destabilizing mutations could be found. Among these, the best compromise between the stability requirements and the hydrophobicity increase of H1 is given by the mutation E146->L.

As shown in the first part of this paper, the errors in some of the {Delta}{Delta}Gs computed with the PoPMuSiC program are non-negligible, especially those of buried residues. We would emphasize, however, that since we compare results on three PrP species with different sequences and structures and focus on the mutations that present the same tendencies in the three species, the confidence that may be attached to the predictions undoubtedly increases.

As an additional test of our procedure, we predicted with PoPMuSiC the stability modifications of point mutations in PrP corresponding to inherited forms of prion-based diseases and compared the predictions with the experimentally measured values obtained by Liemann and Glockshuber (1999). Note that these mutations are either destabilizing or neutral and are therefore not picked up by our procedure. Furthermore, as they were introduced in murine PrP, whose structure has not been determined, we could not use it. We used instead the mouse, hamster and human PrP structures, computed with PoPMuSiC the folding free energy change of each mutation in the three separate structures and calculated the average. With this procedure, we found a good correlation of 0.80 between predicted and measured folding free energy changes for all nine mutations but one. The mutation whose folding free energy change is not well predicted is T183->A. However, the experimentally measured value may not be accurate in this case, as this mutant does not seem to fold according to the two-state model (Liemann and Glockshuber, 1999Go). This test therefore confirms the reliability of our predictions. However, the relation between the destabilization of PrPC and PrPSc formation is not clear, as some inherited PrP mutations are neutral with respect to stability; this could be related to the fact that the PrPC to PrPSc conversion is mediated by another protein. We might nevertheless argue that the stabilization of PrPC, which is the issue considered in this paper, should render the conversion more difficult, whatever the mediation mechanism may be.

We would like to end by calling for experimenters willing to perform the PrP mutations proposed in this paper, as this would, of course, constitute the only conclusive test of our method, hypotheses and conclusions. In particular, the scenario of H1 inducing the formation of ß-aggregates could be tested by performing the E146->L mutation in H1 or the other mutations listed in Table IVGo. The increase in the stability of the cellular form PrPC by the mutations T183->F, T192->A and Q186->A or those given in Table IIGo would also constitute an interesting test, which would at the same time verify the hypothesis of whether the stability increase in PrPC is sufficient to block the PrPC to PrPSc conversion, thereby specifying indirectly the role of protein X in the conformational change and of the glycosylation of residue N181.


    Notes
 
1 To whom correspondence should be addressed Back


    Acknowledgments
 
We thank Martine Prévost for interesting discussions on the prion protein. D.G. benefits from a `FIRST-Université' grant of the Walloon Region. M.R. is Senior Research Associate at the Belgian National Fund for Scientific Research (FNRS).


    References
 Top
 Abstract
 Introduction
 Description of the PoPMuSiC...
 Application of PoPMuSiC to...
 Discussion
 References
 
Alard,P. (1991) PhD Thesis, Université Libre de Bruxelles.

Basch,P.A., Singh,U.C., Langridge,R. and Kollman,P.A. (1987) Science, 236, 564–568.[ISI][Medline]

Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meywe,E.F.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanoushi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535–542.[ISI][Medline]

Bowie,J.U., Luthy,R. and Eisenberg,D. (1991) Science, 253, 164–170.[ISI][Medline]

Cohen,F.E. and Prusiner,S.B. (1998) Annu. Rev. Biochem., 67, 793–819.[ISI][Medline]

Damborsky,J. (1998) Protein Eng., 11, 21–30.[Abstract]

Eriksson,A.E., Baase,W.A., Zhang,X.-J., Heinz,D.W., Blaber,M., Baldwin,E.P. and Matthews,B.W. (1992) Science, 255, 178–183.[ISI][Medline]

Fersht,A.R. and Serrano,L. (1993) Curr. Opin. Struct. Biol., 3, 75–83.[ISI]

Gilis,D. and Rooman,M. (1996) J. Mol. Biol., 257, 1112–1126.[ISI][Medline]

Gilis,D. and Rooman,M. (1997) J. Mol. Biol., 272, 276–290.[ISI][Medline]

Gilis,D. and Rooman,M. (1999) Theor. Chim. Acta 101, 46–50.

Helenius,A. (1994) Mol. Biol. Cell, 5, 253–265.[ISI][Medline]

James,T.L. et al. (1997) Proc. Natl Acad. Sci. USA, 94, 10086–10091.[Abstract/Free Full Text]

Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577–2637.[ISI][Medline]

Kaneko,K., Zulianello,L., Scott,M., Cooper,C.M., Wallace,A.C., James,T.L., Cohen,F.E. and Prusiner,S.B. (1997) Proc. Natl Acad. Sci. USA, 94, 10069–10074.[Abstract/Free Full Text]

Koehl,P. and Delarue,M. (1994) Proteins: Struct. Funct. Genet., 20, 264–278.[ISI][Medline]

Kocher,J.-P.A., Rooman,M.J. and Wodak,S.J. (1994) J. Mol. Biol., 235, 1598–1613.[ISI][Medline]

Lee,C. (1994) J. Mol. Biol., 236, 918–939.[ISI][Medline]

Lee,C. (1995) Fold. Des., 1, 1–12.[Medline]

Lee,C. and Levitt,M. (1991) Nature, 352, 448–451.[ISI][Medline]

Lehmann,S. and Harris,A.A. (1997) J. Biol. Chem., 272, 21479–21487.[Abstract/Free Full Text]

Liemann,S. and Glockshuber,R. (1999) Biochemistry, 38, 3258–3267.[ISI][Medline]

Miyazawa,S. and Jernigan,R.L. (1994) Protein Eng., 7, 1209–1220.[Abstract]

Morrissey,M.P. and Shakhnovich,E.I. (1999) Proc. Natl Acad. Sci. USA, 96, 11293–11298.[Abstract/Free Full Text]

Muñoz,V. and Serrano,L. (1994) Proteins: Struct. Funct. Genet., 20, 301–311.[ISI][Medline]

Ota,M., Kanaya,S. and Nishikawa,K. (1995) J. Mol. Biol., 248, 733–738.[ISI][Medline]

Pan,K.-M. et al. (1993) Proc. Natl Acad. Sci. USA, 90, 10962–10966.[Abstract]

Pergami,P., Jaffe,H. and Safar,J. (1996) Anal. Biochem., 236, 63–73.[ISI][Medline]

Prusiner,S.B. (1991) Science, 252, 1515–1522.[ISI][Medline]

Prusiner,S.B., Scott,M.R., De Armond,S.J. and Cohen,F.E. (1998) Cell, 93, 337–348.[ISI][Medline]

Rashin,A.A., Rashin,B.H., Rashin,A. and Abagyan,R. (1997) Protein Sci., 6, 2143–2158.[Abstract/Free Full Text]

Riek,R., Hornemann,S., Wider,G., Billeter,M., Glockshuber,R. and Wuthrich,K. (1996) Nature, 382, 180–182.[ISI][Medline]

Rooman,M.J., Kocher,J.-P.A. and Wodak,S.J. (1991) J. Mol. Biol., 221, 961–979.[ISI][Medline]

Rooman,M.J., Kocher,J.-P.A. and Wodak,S.J. (1992) Biochemistry, 31, 10226–10238.[ISI][Medline]

Rose,G.D., Geselowitz,A.R., Lesser,G.J., Lee,R.H. and Zehfus,M.H. (1985) Science, 29, 834–838.

Safar,J., Roller,P.P., Gajdusek,D.C. and Gibbs,C.J.J. (1993) Protein Sci., 2, 2206–2216.[Abstract/Free Full Text]

Serrano,L., Kellis,J.T.,Jr, Cann,P., Matouschek,A. and Fersht,A.R. (1992) J. Mol. Biol., 224, 783–804.[ISI][Medline]

Shortle,D., Stites,W.E. and Meeker,A.K. (1990) Biochemistry, 29, 8033–8041.[ISI][Medline]

Sippl,M.J. (1995) Curr. Opin. Struct. Biol., 5, 229–235.[ISI][Medline]

Steipe,B., Schiller,B., Plückthun,A. and Steinbacher,S. (1994) J. Mol. Biol., 240, 188–192.[ISI][Medline]

Telling,G.C, Scott,M., Mastrianni,J., Gabizon,R., Torchia,M., Cohen,F.E., DeArmond,S.J. and Prusiner,S.B. (1995) Cell, 83, 79–80.[ISI][Medline]

Tidor,B. and Karplus,M. (1991) Biochemistry, 30, 3217–3228.[ISI][Medline]

Topham,C.M., Srinivasan,N. and Blundell,T.M. (1997) Protein Eng., 10, 7–21.[Abstract]

van Gunsteren,W.F. and Mark,A.E. (1992) J. Mol. Biol., 227, 389–395.[ISI][Medline]

Wang,Y., Lai,L., Han,Y. and Tang,Y. (1996) Protein Eng., 9, 479–484.[Abstract]

Wintjens,R.T., Rooman,M.J. and Wodak,S.J. (1996) J. Mol. Biol., 255, 235–253.[ISI][Medline]

Woese, CR, Dugre, DH, Dugre, SA, Kondo, M, Saxinger, WC (1966) Cold Spring Harbor Symp. Quant. Biol., 31, 723–736.[ISI][Medline]

Zahn,R., Liu,A., Luhrs,T., Calzolai,L., Von Schroetter,C., Garcia,F.L., Riek,R., Wider,G., Billeter,M. and Wuthrich,K. (2000) Proc. Natl Acad. Sci. USA, 97, 145–150.[Abstract/Free Full Text]

Received May 18, 2000; revised November 1, 2000; accepted November 9, 2000.