Understanding the relationship between the primary structure of proteins and their amyloidogenic propensity: clues from inclusion body formation

Susan Idicula-Thomas and Petety V. Balaji1

School of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400 076, India

1 To whom correspondence should be addressed. E-mail: balaji{at}iitb.ac.in


    Abstract
 Top
 Abstract
 Introduction
 Methods of analyses
 Results and discussion
 Acknowledgements
 References
 
Amyloid formation is dependent to a considerable extent on the amino acid sequence of the protein. The present study delineates certain sequence-dependent features that are correlated with amyloidogenic propensity. The analyses indicate that amyloid formation is favored by lower thermostability and increased half-life of the protein. There seems to be a certain degree of bias in the composition of order-promoting amino acids in the case of amyloidogenic proteins. Based on these parameters, a prediction function for the amyloidogenic propensity of proteins has been created. The prediction function has been found to rationalize the reported effect of certain mutations on amyloid formation. It seems that a higher sheet propensity of residues that constitute the first seven residues of a helical structure in a protein might increase the propensity for a helix to sheet transition in that region under denaturing conditions.

Keywords: aliphatic index/discordant stretch/instability index/sheet propensity/thermostability


    Introduction
 Top
 Abstract
 Introduction
 Methods of analyses
 Results and discussion
 Acknowledgements
 References
 
Amyloid and prion diseases result from the transformation of a protein from a soluble, native form to an ordered, insoluble fibril; this is accompanied by loss of biological function and/or gain of toxicity (Sipe, 1992Go; Dobson, 1999Go; Kallberg et al., 2001Go; Zerovnik, 2002Go). A number of proteins are now known to form amyloid fibrils under partially denaturing conditions (Dobson, 1999Go). Many of the amyloidogenic proteins are also known to be intrinsically unstructured; the high conformational flexibility, resulting from the absence of significant secondary or tertiary structures, enables these proteins to form fibrils more readily than the tightly packed globular proteins (Uversky, 2003Go).

Several factors have been found to contribute to amyloid formation. These include high protein concentration, proteolysis, mutations, local change in pH at membranes, oxidative or heat stress (Zerovnik, 2002Go; DuBay et al., 2004Go) and the presence of molecules such as serum amyloid P component, metal ions, apolipoprotein E and proteoglycans (Fink, 1998Go). Nevertheless, the primary structure of the protein is critical since the propensity of the protein to form fibrils is ultimately dictated by its amino acid sequence (Fink, 1998Go; Zanuy and Nussinov, 2003Go). Some of the sequence-related parameters that have been implicated in deciding the rate of aggregation of a polypeptide are hydrophobicity, hydrophobic–hydrophilic patterning, charge (Chiti et al., 2003Go; DuBay et al., 2004Go), high ß-sheet propensity (Kallijarvi et al., 2001Go; Tjernberg et al., 2002Go) and low ß-turn propensity (Kallijarvi et al., 2001Go). However, the contribution/importance of these factors to fibril formation varies among proteins. For example, sequence-dependent factors such as secondary structure propensity, peptide length, pI and hydrophobicity were found not to affect the amyloidogenecity of ß2-microglobulin peptides; only the high content of aromatic side chains was found to be the major determinant (Jones et al., 2003Go). This is a possible reason for the absence of any prominent sequence or structural characteristics among the known human amyloidogenic proteins (Sipe, 1992Go; Horwich, 2002Go).

Proteins that form inclusion bodies on overexpression in Escherichia coli and those that form amyloids share certain similarities. Amyloid and inclusion body formation, being different manifestations of aggregation phenomenon, are influenced by factors such as increased protein concentration, nature of folding intermediates and in vivo half-life of the protein (Fink, 1998Go; Horwich, 2002Go). Amyloid fibrils have a lower helical content and higher sheet content than the corresponding native protein (Kallberg et al., 2001Go). Similar secondary structural changes have also been observed in the inclusion bodies of some proteins (Przybycien et al., 1994Go). Another feature shared by these proteins is their sensitivity to point mutations (Wetzel, 1994Go; Villegas et al., 2000Go; Horwich, 2002Go). Certain mutations which alter the aggregation propensity of the proteins can be explained based on hydrophobicity, solvent accessibility and charge of the amino acid residues involved in the mutation (Dale et al., 1994Go; Malissard and Berger, 2001Go; Chiti et al., 2003Go; Monti et al., 2004Go).

The present study was aimed at delineating the relationship of the primary structure of a protein to its amyloid-forming propensity and at predicting the potential transition-prone helices within the protein. Towards this objective, datasets of proteins were created to represent amyloid forming proteins (dataset A), intrinsically unstructured proteins (dataset U), inclusion body forming proteins (dataset I) and proteins which are soluble on overexpression in E.coli (dataset S) (Table S1, available as Supplementary data at PEDS Online). The sequences of the proteins in these datasets were analyzed to identify features that are unique to amyloidogenic proteins. It was found that that the amyloidogenic proteins are enriched in order-promoting residues with high sheet propensity and have low thermostability and increased in vivo half-life. Based on these features, an index AP was heuristically determined. The correlation of AP with amyloidogenic propensity was established by predicting the amyloidogenicity of peptides that have been experimentally shown to form amyloids (dataset P) and by rationalizing the reported effect of mutations on amyloid formation.


    Methods of analyses
 Top
 Abstract
 Introduction
 Methods of analyses
 Results and discussion
 Acknowledgements
 References
 
Databases and software

PubMed, available at the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov), was used to access the MEDLINE bibliographic database. The DSSP secondary structure assignments of proteins with known three-dimensional structures were taken from the DSSP database (Kabsch and Sander, 1983Go). NCBI and Swiss-Prot databases were used for procuring protein sequences for the various datasets created in this study. The software SPSS 10.0 (www.spss.com) was used to perform discriminant analysis and the Kruskal–Wallis test.

Creation of datasets

Fifty-four amyloid-forming proteins were identified from PubMed and, of these, 36 were randomly chosen to constitute the training dataset (dataset A; Table S1) and the remaining 18 constitute the test dataset (dataset Atest). Similarly, datasets of natively unstructured proteins, datasets U (training; 36 proteins) and Utest (test; 18 proteins) were created. Proteins which are soluble on overexpression in E.coli constitute datasets S (training; 27 proteins) and Stest (test; 13 proteins). The datasets I (training; 115 proteins) and Itest (test; 57 proteins) consist of proteins that form inclusion body on overexpression in E.coli. The training datasets were used to identify the factors significantly varying amongst them and the test datasets were used to validate the observations made from the analysis on the training datasets. The sizes of the training and test datasets are in an ~2:1 ratio and the proteins were segregated into these two datasets randomly. The NCBI accession numbers of proteins in all the above datasets are given in Table S1. Dataset P consists of 133 peptides experimentally shown to form amyloid fibrils (Table S2) and this dataset was used as a test dataset. Dataset SP consists of 162 780 proteins deposited in the Swiss-Prot database as of October 11, 2004 (Release 45.7).

Calculation of instability index

The instability index of a protein IIP was calculated as

where L is the number of amino acids in the protein and DIWV is the instability weight value for the dipeptide xiyi+1 with the summation extending over the length of the protein. This index has been used to predict if a protein is stable (IIP < 40; in vivo half-life is >16 h) or unstable (IIP > 40; in vivo half-life is <5 h; Guruprasad et al., 1990Go).

Calculation of aliphatic index

The aliphatic index AI was calculated as

where X(Ala), X(Val), X(Ile) and X(Leu) are mole percent (i.e. 100xmole fraction) of Ala, Val, Ile and Leu in the protein. The coefficients a (=2.9) and b (=3.9) are the relative volumes of Val and of Leu/Ile side chains to that of Ala side chain; this index has been suggested as a positive factor for the increase in thermostability of globular proteins (Ikai, 1980Go).

Calculation of the frequencies of occurrence of amino acid residues

The frequency of occurrence FA,X of an amino acid residue A in a given dataset X was calculated as

where NA,X is the total occurrence of residue A in the dataset X and TX is the total number of residues in the same dataset.

Calculating the net orderliness of a protein

The disorder-promoting amino acids are Ala, Arg, Gln, Glu, Gly, Lys, Pro and Ser and the order-promoting amino acids are Asn, Cys, Ile, Leu, Phe, Trp, Tyr and Val (Uversky et al., 2000Go; Williams et al., 2001Go; Tompa, 2002Go). The net orderliness of a protein ORD is calculated as the difference between the frequencies of occurrence of the order- and disorder-promoting residues in the protein.

Calculating the helix and sheet propensity of tripeptides

The propensity PXYZ,SS of a tripeptide XYZ to be present in the secondary structure type SS (SS is {alpha} for {alpha}-helix and ß for ß-strand) was calculated as



where NXYZ,SS is the number of tripeptides XYZ found in secondary structure type SS, NXYZ,all is the total number of tripeptides XYZ in all the proteins, NSS represents the total number of tripeptides in secondary structure type SS and Nall represents the total number of tripeptides in all proteins. A tripeptide XYZ is considered as found in SS if all the three residues constituting the tripeptide reside in SS. Accordingly, the helix and sheet propensities for all 8000 tripeptides were calculated using the DSSP database (Kabsch and Sander, 1983Go; www.cmbi.kun.nl/gv/dssp).

Calculating the sheet propensity of a protein based on the secondary structure propensity of the constituting tripeptides

The tripeptide (TP)-based sheet propensity of a protein SPTP was calculated as



L being the number of residues in the protein and L – 2 the total number of tripeptides in the protein.


    Results and discussion
 Top
 Abstract
 Introduction
 Methods of analyses
 Results and discussion
 Acknowledgements
 References
 
Amino acid composition

Residues Ala, Arg, Gln, Glu, Gly, Lys, Pro and Ser have been classified as disorder-promoting amino acids whereas the amino acids Asn, Cys, Ile, Leu, Phe, Trp, Tyr and Val as order-promoting amino acids (Uversky et al., 2000Go; Williams et al., 2001Go; Tompa, 2002Go). Among the order-promoting residues, Asn, Cys, Trp and Tyr occur more frequently in proteins of datasets A and I compared with proteins of dataset S (Table I). As reported earlier (Williams et al., 2001Go), dataset U proteins are enriched in disorder-promoting residues (Table I). Proteins of dataset S do not show any significant bias in their amino acid composition with regard to order- or disorder-promoting amino acids.


View this table:
[in this window]
[in a new window]
 
Table I. Frequencies of occurrencea and secondary structure propensitiesb of residues in datasets I, A, S and U

 
The average ORD value, defined as the difference in the frequencies of occurrence of order- and disorder-promoting residues, is lowest for dataset U (Table II); this is as expected since intrinsically unstructured proteins are enriched in disorder-promoting amino acids (Williams et al., 2001Go). The order-promoting residues, with the exception of Leu and Asn, have a higher sheet propensity; in contrast, disorder-promoting residues, with the exception of Gly and Ser, have a zhigher helix propensity (Table I). Hence the tripeptide-based sheet propensity SPTP is lowest for dataset U (Table II). The average ORD and SPTP values for dataset S are lower than that for dataset I.


View this table:
[in this window]
[in a new window]
 
Table II. Comparison of the mean values of the sequence-based features of proteins across the various training datasetsa

 
The observation that some of the order-promoting residues (having higher sheet propensity) are favored in aggregation-prone datasets A and I and not favored in dataset S assumes significance in the light of the view that a helix to sheet transition operates in amyloid and inclusion body formation (Przybycien et al., 1994Go; Kallberg et al., 2001Go). Lysozyme (Booth et al., 1997Go) and prion proteins (Pan et al., 1993Go; Inouye and Kirschner, 1997Go), which are mainly helical in the native form, are also known to form amyloid fibrils with ß-sheet structure. The helix to strand conversion occurs prior to or during amyloid fibril formation (Kallberg et al., 2001Go). Raman spectroscopic studies of RTEM ß-lactamase showed that proteins in the inclusion body have a lower amount of {alpha}-helix and a higher amount of ß-strands than the native form (Przybycien et al., 1994Go).

Instability index

The instability index of proteins that have an in vivo half-life of <5 h was shown to be >40. For proteins whose instability index is <40, the in vivo half-life was shown to be >16 h. Hence this index could be used to compare the metabolic stabilities of two or more proteins (Guruprasad et al., 1990Go).

The average instability index IIP of the proteins of datasets A and I is lower than that of dataset S (Table II). Intrinsically unstructured proteins (dataset U) are reported to be highly sensitive to protease action and to have a low in vivo lifetime (Tompa, 2002Go); this is reflected in their high average values of IIP. Hence dataset A and I proteins seem to have a longer in vivo half-life compared with dataset S proteins.

This inference gains significance from the observation that the lifetime of partially folded intermediates influences the propensity of the protein to aggregate. Longer lived partially folded intermediates are known to have a greater chance of interaction with other partially folded intermediates; they would also exhaust the available molecular chaperones that otherwise prevent protein aggregation by interacting with them in an in vivo system (Fink, 1998Go; Rosen et al., 2002Go).

Aliphatic index

The aliphatic index AI of proteins of thermophilic bacteria has been found to be higher and the index can be used as a measure of thermostability of proteins (Ikai, 1980Go). This index is directly related to the mole fraction of Ala, Ile, Leu and Val in the protein (Ikai, 1980Go). The mean aliphatic index AI of proteins of datasets A, I and U are lower than that of the proteins of dataset S (Table II). However, it should be noted that the residues Ile, Leu and Val are both order-promoting and involved in increasing the aliphatic index and therefore a decrease in AI of the dataset may reflect in a decrease in the ORD (e.g. dataset A) and vice versa.

Apart from the AI, a few other parameters have also been associated with an increase in thermostability, namely a higher ratio of (Glu + Lys) to (Gln + His) (Farias et al., 2004Go) and a higher content of Glu, Lys and Arg and fewer uncharged polar residues (Ser, Thr, Asn and Gln) (Haney et al., 1999Go). As expected, the (Glu + Lys)/(Gln + His) ratio is higher in dataset S proteins (2.0) than proteins of dataset I (1.9) or A (1.5). Although the AI of dataset U proteins is lower, the above ratio is higher (2.3) than for dataset S proteins because of their high content of Glu and Lys residues. The content of Arg residues is also lower for datasets A and I than dataset S. The contents of Thr, Asn and Gln are higher for datasets A, U and I than dataset S (Table I). These observations suggest that the aggregation-prone datasets A and I are more thermolabile than dataset S proteins. In fact, thermolabile folding intermediates have been suggested to contribute to protein aggregation by exhausting the in vivo supply of chaperonins (King et al., 1996Go).

Identification of features that correlate with the amyloidogenic propensity of a protein

The Kruskal–Wallis test is a non-parametric test that can be used to compare values from three or more datasets. The AI, SPTP, ORD and IIP values of the datasets A, U, I and S were compared using this test. In addition to the above factors, the net charge (NC) of the protein was also included, since the role of charge of the protein on amyloid formation has also been reported (Chiti et al., 2003Go). The results of this test show that the parameters AI, SPTP, ORD and IIP, but not the net charge, vary significantly between the datasets A, U, I and S (Table S3). The average aliphatic index AI was plotted against the average instability index of the protein IIP, the average tripeptide-based sheet propensity SPTP and the mean orderliness ORD for all the datasets and a linear regression trendline was fitted using Microsoft Excel (Figure S1; panels A–C; available as Supplementary data at PEDS Online). The datasets A and I are on the same side of the trendline whereas the dataset U is on the same side of the trendline as dataset S. However, when the mean orderliness ORD is plotted against the mean tripeptide-based sheet propensity SPTP (Figure S2; panel D), dataset U falls in the category of aggregation-prone proteins. This is understandable since most of the proteins of dataset U are shown to be amyloidogenic, although they share many features of dataset S proteins. These observations suggest that the parameters AI, SPTP, ORD and IIP relate to the amyloidogenic/aggregation behavior of proteins, since, in all these four cases, the datasets A and I clearly fall on the same side of the trendline opposite to the side occupied by dataset S. Dataset U proteins share commonalities between aggregation prone (datasets A and I) and soluble proteins (dataset S).

Discriminant analysis

The proteins of the datasets A, Atest, U, Utest, I, Itest, S and Stest (Table S1) were pooled and were represented by the factors AI, SPTP, ORD, IIP and NC. Discriminant analysis by the stepwise method was performed to validate the correlation between these variables and the amyloidogenic propensity. The prediction accuracy was determined by the leave-one-out cross-validation. The analysis was done in two ways. (1) Comparing the 54 amyloidogenic proteins (i.e. datasets A and Atest) with 500 randomly chosen proteins from the Swiss-Prot database, 35 were classified as amyloidogenic (Table S4). Except for the net charge, all the other four factors were considered to be significant for classification. (2) A four-class classification was performed by considering the amyloidogenic proteins (datasets A and Atest), intrinsically unstructured proteins (datasets U and Utest), inclusion body-forming proteins (datasets I and Itest) and proteins soluble on overexpression in E.coli (datasets S and Stest). As in the binary classification, all four factors except net charge were considered to be significant for classification. However, only 29 of the 54 amyloidogenic proteins (datasets A and Atest) were classified as amyloidogenic (Table S4). The lower classification accuracy of the four-class method compared with the binary classification is attributable to the fact that three of the four classes considered for classification are aggregation-prone, viz. datasets A, U and I.

The classification function obtained by discriminant analysis cannot be used for the prediction of amyloidogenic proteins since the datasets are of small size and are also not normally distributed. Nevertheless, it can be inferred that the factors AI, SPTP, ORD and IIP vary significantly among the datasets considered in this study.

AP—an index for amyloidogenic propensity

A heuristic approach was undertaken wherein the amyloidogenic propensity of a protein or a peptide, denoted by the index AP, is correlated with the parameters AI, SPTP, ORD and IIP. AP is calculated as follows:

where S(AI) is the score based on the aliphatic index and is equivalent to 1.7 if the aliphatic index AI < 70 or –0.3 if AI ≥ 70. A discontinuous score has been used for AI since it is observed that the mean AI for the datasets A and U are <70, whereas it is >70 for datasets S and I. Such a clear differentiation is not shown by other factors for the datasets A, U, I and S (Figure S2), hence a continuous score was used for these three factors. The average values of SPTP, ORD and IIP for the training datasets are 0.32, –0.16 and 37.2, respectively. The value of IIP had to be scaled down, since it is nearly two orders of magnitude higher than that of the other two parameters. Various scaling factors were tried and the best classification results for the training datasets were obtained with a scaling factor of 10–3 (data not shown). The AP index was derived by optimization of the classification results for the training datasets A, U, I and S alone.

A protein/peptide is inferred to have a high propensity for amyloid formation if AP > 0. It was observed that 75, 53 and 33% of proteins in the datasets A, U and I, respectively, are predicted to be amyloidogenic. None of the proteins of the dataset S were predicted to have a high propensity for amyloid formation; 32% of the proteins deposited in the Swiss-Prot database were predicted to be amyloidogenic.

Validation of AP as an index for amyloidogenic propensity

AP-based predictions were done on the test datasets to validate the use of AP as an index for amyloidogenic propensity. As expected, datasets Atest and Utest have a higher percentage of amyloidogenic proteins (72 and 83%, respectively); 21% of dataset Itest proteins are predicted to be amyloidogenic whereas none of the dataset Stest proteins are predicted to be amyloidogenic. Of the 133 peptides shown experimentally to form amyloid fibrils (dataset P), 77% are predicted to be amyloidogenic (Table III). The amyloidogenicity of peptides predicted for a protein could be ranked based on the AP index. A peptide with a higher AP index would be predicted to be more amyloidogenic than a peptide with a lower value of AP.


View this table:
[in this window]
[in a new window]
 
Table III. Results of AP-based predictions on the various datasets

 
The amyloidogenic sites within a protein can be predicted based on AP by scanning the sequence of this protein using a window of predefined length. Windows of different lengths can be used since peptides ranging in size from four to several residues long have been shown to form amyloids. If the proteins from which the 77% peptides of dataset P are derived are scanned, these peptides would be identified as amyloidogenic. However, experimental validation is needed to confirm the prediction for other sites.

It has been observed that the intrinsic effects of mutations on the rates of aggregation of unfolded polypeptides can be correlated with changes in physicochemical properties such as hydrophobicity, secondary structure propensity and charge (Chiti et al., 2003Go). A total of 40 point mutants which display a difference in the rate of aggregation relative to the wild-type protein were considered for validating the predictions based on AP; the change in the rate of aggregation was predicted correctly for 33 of the 40 point mutations; an increase or decrease in amyloidogencity accompanying these mutation was reflected in their AP index (Table S5).

Amyloidogenicity is also dictated by structural features of the protein such as the helix stability (Andreola et al., 2003Go), ß-strand stability, presence of unsatisfied hydrogen bonds, buried uncompensated charges, solvent accessibility of the amyloidogenic peptide/strand (Thirumalai et al., 2003Go), ß-turn propensity (Kallijarvi et al., 2001Go) and the secondary structure of the peptide in the soluble state (Soto et al., 1995Go). The nature of the folding intermediates (Horwich, 2002Go) and the presence/absence of various kinds of interactions in an oligomeric intermediate involved in fibril formation such as hydrogen bonding, electrostatic and hydrophobic interactions also play a crucial role in dictating the aggregation propensity/rate of a polypeptide (Horwich, 2002Go; Thirumalai et al., 2003Go). In addition, experimental conditions also contribute to deciding the amyloidogenicity of a peptide (DuBay et al., 2004Go). For these reasons, a quantitative correlation between amyloidogenecity and the AP index has not been attempted at this stage. The inclusion of the secondary structure information in the AP index will enhance the prediction accuracy; however, this needs the availability of the 3-D structures of many more amyloidogenic proteins.

Prediction of transition-prone helices within a protein

The heuristic approach was extended to predict the transition-prone helices within proteins with known secondary structure, since a helix to ß-strand transition is seen to be frequently associated with amyloid formation (Kallberg et al., 2001Go; Forloni et al., 2002Go). The protein is scanned for such helices with a seven-residue sliding window and a heptapeptide is identified as a potential amyloidogenic helix if the following four conditions are satisfied:

  1. the SPTP of the seven-residue window is >0.14;
  2. all seven residues in the sliding window are {alpha}-helical;
  3. the residue immediately preceding the seven-residue window is not {alpha}-helical; and
  4. the amyloid-forming propensity AP of the heptapeptide is >0.

When a protein is predicted to have more than one putative transition-prone helix, the helix with the highest value of SPTP is considered as the critical transition-prone helix for that protein. It might be possible that the additional helices picked up may also be potential amyloidogenic sites. Here, the size of the sliding window considered for the prediction is optimized to seven residues. If the size of the window is reduced or increased from seven residues, it is seen that the number of predicted potential helices increases or decreases steeply (data not shown).

The proteins wherein a helical region has been shown experimentally to from amyloid fibrils (the data on such helical fragments are limited) or those that have been predicted to have such discordant helices (the protein is shown to form amyloid fibrils) were analyzed to predict the transition-prone helices. All nine proteins showed the presence of transition-prone helices and, in seven of the eight proteins, the transition-prone helices predicted to be critical in amyloid formation were part of the previously identified {alpha}/ß discordant stretch (Kallberg et al., 2001Go; Table S6). The transition-prone helices were also identified correctly for all three proteins whose amyloidogenic helices have been identified experimentally (Table S6).

The N-terminal region of human amylin peptide has been found to influence the overall kinetics of fibril formation of the peptide (Goldsbury et al., 2000Go). The N-cap residues play a role in helix stabilization (Serrano and Fersht, 1989Go; Parker and Hefford, 1998Go; Iqbalsyah and Doig, 2004Go). It is probable that the presence of residues having a higher sheet propensity at the N-cap positions initiates a helix to sheet transition under denaturing conditions. This assumption comes in the light of the observation that a helix to sheet transition accompanies amyloid formation (Kallberg et al., 2001Go) and mutations that decrease the stability of individual helices in proteins are associated with prion diseases (Kazmirski et al., 1995Go). Fibril formation was also observed to be inhibited by association of the amyloidogenic peptide with ligands that could prevent a helical segment from forming a ß-strand (Andreola et al., 2003Go).

Intrinsically unstructured proteins and proteins overexpressed in the soluble form share common features

Proteins belonging to datasets S and U display certain similarities. Both of them have higher net negative charge (Uversky et al., 2000Go; Tompa, 2002Go; data not shown) and are rich in residues with a higher helix propensity (Table I). Proteins of datasets S and U have a higher mean instability index IIP (40 and 46, respectively; Table II) than proteins of other datasets. The higher instability index of datasets S and U suggest that these proteins have a lower in vivo half-life than proteins of other datasets. It has been reported that intrinsically unstructured proteins have a high proteolytic sensitivity and hence a lower in vivo half-life (Tompa, 2002Go). However, a feature of dataset U proteins that is not shared by dataset S proteins is the low average aliphatic index as reflected by the lower frequencies of occurrence of Ala, Ile, Leu and Val in dataset U proteins than dataset S proteins; in this respect, dataset U proteins are similar to datasets A and I proteins (Table I).

Conclusions

The aim of this study was to understand the sequence characteristics of proteins that are prone to amyloid formation. The analyses reveal that amyloidogenicity is correlated with thermostability, in vivo half-life and the presence of order-promoting residues with high sheet propensity. These parameters were used to define a prediction function (AP index) for amyloidogenicity. It was observed that 72% of the known amyloidogenic proteins and 77% of the peptides shown experimentally to form amyloid fibrils were correctly classified based on the AP index. The prediction accuracy can be increased by considering other features, such as the nature of the folding intermediate and the environmental conditions. However, to date such information is available for only a few amyloidogenic proteins.


    Acknowledgements
 Top
 Abstract
 Introduction
 Methods of analyses
 Results and discussion
 Acknowledgements
 References
 
The authors are grateful to Drs R.V.Hosur and S.Noronha for their valuable suggestions and help. S.Idicula-Thomas is grateful to the Council of Scientific and Industrial Research, India, for a fellowship.


    References
 Top
 Abstract
 Introduction
 Methods of analyses
 Results and discussion
 Acknowledgements
 References
 
Andreola,A., Bellotti,V., Giorgetti,S., Mangione,P., Obici,L., Stoppini,M., Torres,J., Monzani,E., Merlini,G. and Sunde,M. (2003) J. Biol. Chem., 278, 2444–2451.[Abstract/Free Full Text]

Booth,D.R. et al. (1997) Nature, 385, 787–793.[CrossRef][ISI][Medline]

Chiti,F., Stefani,M., Taddei,N., Ramponi,G. and Dobson,C.M. (2003) Nature, 424, 805–808.[CrossRef][ISI][Medline]

Dale,G.E., Broger,C., Langen,H., D'Arcy,A. and Stuber,D. (1994) Protein Eng., 7, 933–939.[ISI][Medline]

Dobson,C.M. (1999) Trends Biochem. Sci., 24, 329–332.[CrossRef][ISI][Medline]

DuBay,K.F., Pawar,A.P., Chiti,F., Zurdo,J., Dobson,C.M. and Vendruscolo,M. (2004) J. Mol. Biol., 341, 1317–1326.[CrossRef][ISI][Medline]

Farias,S.T., Van Der Linden,M.G., Rego,T.G., Araujo,D.A. and Bonato,M.C. (2004) In Silico Biol., 4, 0030.

Fink,A.L. (1998) Fold. Des., 3, R9–R23.[ISI][Medline]

Forloni,G. et al. (2002) Neurobiol. Aging, 23, 957–976.[CrossRef][ISI][Medline]

Goldsbury,C., Goldie,K., Pellaud,J., Seelig,J., Frey,P., Muller,S.A., Kistler,J., Cooper,G.J. and Aebi,U. (2000) J. Struct. Biol., 130, 352–362.[CrossRef][ISI][Medline]

Guruprasad,K., Reddy,B.V. and Pandit,M.W. (1990) Protein Eng., 4, 155–161.[ISI][Medline]

Haney,P.J., Badger,J.H., Buldak,G.L., Reich,C.I., Woese,C.R. and Olsen,G.J. (1999) Proc. Natl Acad. Sci. USA, 96, 3578–3583.[Abstract/Free Full Text]

Horwich,A. (2002) J. Clin. Invest., 110, 1221–1232.[Free Full Text]

Ikai,A. (1980) J. Biochem., 88, 1895–1898.[Abstract]

Inouye,H. and Kirschner,D.A. (1997) J. Mol. Biol., 268, 375–389.[CrossRef][ISI][Medline]

Iqbalsyah,T.M. and Doig,A.J. (2004) Protein Sci., 13, 32–39.[Abstract/Free Full Text]

Jones,S., Manning,J., Kad,N.M. and Radford,S.E. (2003) J. Mol. Biol., 325, 249–257[CrossRef][ISI][Medline]

Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577–2637; http://www.cmbi.kun.nl/gv/dssp.[CrossRef][ISI][Medline]

Kallberg,Y., Gustafsson,M., Persson,B., Thyberg,J. and Johansson,J. (2001) J. Biol. Chem., 276, 12945–12950.[Abstract/Free Full Text]

Kallijarvi,J., Haltia,M. and Baumann,M.H. (2001) Biochemistry, 40, 10032–10037.[CrossRef][ISI][Medline]

Kazmirski,S.L., Alonso,D.O., Cohen,F.E., Prusiner,S.B. and Daggett,V. (1995) Chem. Biol., 2, 305–315.[CrossRef][ISI][Medline]

King,J., Haase-Pettingell,C., Robinson,A.S., Speed,M. and Mitraki,A. (1996) FASEB J., 10, 57–66.[Abstract/Free Full Text]

Malissard,M. and Berger,E.G. (2001) Eur. J. Biochem., 268, 4352–4358.[Abstract/Free Full Text]

Monti,M., Garolla di Bard,B.L., Calloni,G., Chiti,F., Amoresano,A., Ramponi,G. and Pucci,P. (2004) J. Mol. Biol., 336, 253–262.[CrossRef][ISI][Medline]

Pan,K.M. et al. (1993) Proc. Natl Acad. Sci., USA, 90, 10962–10966.[Abstract/Free Full Text]

Parker,M.H. and Hefford,M.A. (1998) Biotechnol. Appl. Biochem., 28, 69–76.[ISI][Medline]

Przybycien,T.M., Dunn,J.P., Valax,P. and Georgiou,G. (1994) Protein Eng., 7, 131–136.[ISI][Medline]

Rosen,R., Biran,D., Gur,E., Becher,D., Hecker,M. and Ron,E.Z. (2002) FEMS Microbiol. Lett., 207, 9–12.[CrossRef][ISI][Medline]

Serrano,L. and Fersht,A.R. (1989) Nature, 342, 296–299.[CrossRef][ISI][Medline]

Sipe,J.D. (1992) Annu. Rev. Biochem., 61, 947–975.[CrossRef][ISI][Medline]

Soto,C., Castano,E.M., Frangione,B. and Inestrosa,N.C. (1995) J. Biol. Chem., 270, 3063–3067.[Abstract/Free Full Text]

Thirumalai,D., Klimov,D.K. and Dima,R.I. (2003) Curr. Opin. Struct. Biol., 13, 146–159.[CrossRef][ISI][Medline]

Tjernberg,L., Hosia,W., Bark,N., Thyberg,J. and Johansson,J. (2002) J. Biol. Chem., 277, 43243–43246.[Abstract/Free Full Text]

Tompa,P. (2002) Trends Biochem. Sci., 27, 527–533.[CrossRef][ISI][Medline]

Uversky,V.N. (2003) Cell. Mol. Life Sci., 60, 1852–1871.[CrossRef][ISI][Medline]

Uversky,V.N., Gillespie,J.R. and Fink,A.L. (2000) Proteins, 41, 415–427.[CrossRef][ISI][Medline]

Villegas,V., Zurdo,J., Filimonov,V.V., Aviles,F.X., Dobson,C.M. and Serrano,L. (2000) Protein Sci., 9, 1700–1708.[Abstract]

Wetzel,R. (1994) Trends Biotechnol., 12, 193–198.[CrossRef][ISI][Medline]

Williams,R.M., Obradovi,Z., Mathura,V., Braun,W., Garner,E.C., Young,J., Takayama,S., Brown,C.J. and Dunker,A.K. (2001) Pac. Symp. Biocomput., 89–100.

Zanuy,D. and Nussinov,R. (2003) J. Mol. Biol., 329, 565–584.[CrossRef][ISI][Medline]

Zerovnik,E. (2002) Eur. J. Biochem., 269, 3362–3371.[Abstract/Free Full Text]

Received July 31, 2004; revised December 3, 2004; accepted March 9, 2005.

Edited by Luis Serrano





This Article
Abstract
FREE Full Text (PDF)
[Supplementary data]
All Versions of this Article:
18/4/175    most recent
gzi022v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Idicula-Thomas, S.
Articles by Balaji, P. V.
PubMed
PubMed Citation
Articles by Idicula-Thomas, S.
Articles by Balaji, P. V.