Prediction of protein secondary structure content

Wei-min Liu2 and Kou-Chen Chou1

Computer-Aided Drug Discovery, Pharmacia and Upjohn, Kalamazoo, MI 49007-4940 and 1 Department of Computer and Information Science, Indiana University Purdue University Indianapolis, Indianapolis,IN 46202-5132, USA


    Abstract
 Top
 Abstract
 Introduction
 Algorithm
 Results and discussion
 Conclusion
 Appendix A
 Appendix B
 References
 
All existing algorithms for predicting the content of protein secondary structure elements have been based on the conventional amino-acid-composition, where no sequence coupling effects are taken into account. In this article, an algorithm was developed for predicting the content of protein secondary structure elements that was based on a new amino-acid-composition, in which the sequence coupling effects are explicitly included through a series of conditional probability elements. The prediction was examined by a self-consistency test and an independent dataset test. Both indicated a remarkable improvement obtained when using the current algorithm to predict the contents of {alpha}-helix, ß-sheet, ß-bridge, 310-helix, {pi}-helix, H-bonded turn, bend and random coil. Examples of the improved accuracy by introducing the new amino-acid-composition, as well as its impact on the study of protein structural class and biologically function, are discussed.

Keywords: 1st-order coupled components/{alpha}-helix/ß-sheet/ß-bridge/310-helix/{pi}-helix/H-bonded turn/bend, random coil


    Introduction
 Top
 Abstract
 Introduction
 Algorithm
 Results and discussion
 Conclusion
 Appendix A
 Appendix B
 References
 
One of the biggest challenges in molecular biology is how to predict the three-dimensional (3D) structure of a protein given only its amino acid sequence. To help reach such a goal, various approaches targeting different levels and aspects of protein structure were initiated, such as secondary structure prediction (see, for example, Chou and Fasman, 1978; Fasman, 1989), structural class prediction (see, for example, Nakashima et al., 1986; Chou, 1989; Chou, 1995; Chou and Zhang, 1995; Bahar et al., 1997; Liu and Chou, 1997), domain class prediction (Chou et al., 1998Go) and secondary structure content prediction (Krigbaum and Knutton, 1973Go; Muskal and Kim, 1992Go; Zhang et al., 1996Go, 1998Go). Compared with structural class prediction, secondary structure content prediction is both more basic and more difficult. `More basic' refers to the fact that the protein structural classes, as classified by many investigators, are solely based on the percentages of the secondary structure content (see, for example, Klein and Delish, 1986; Nakashima et al., 1986; Chou, 1989; Kneller et al., 1990; Chou, 1995; Liu and Chou, 1997), though it must be mentioned here that a more reasonable scheme of classifying structural classes is actually based upon the evolutionary relationships of proteins and on the principles that govern their 3D structure (Murzin et al., 1995). Nevertheless, the classification thus obtained is still closely correlated with the percentages of the secondary structure contents (Chou et al., 1998Go). `More difficult' refers to the fact that, compared with the structural class prediction, the odds are much lower for the secondary structure content prediction. In the former case, the prediction is actually to foretell one class among only a few possible clusters. In the latter case, however, the prediction has to deal with a target of continuous quantity, i.e. to foretell one among an infinite number of possible values. As demonstrated in a recent study on predicting protein subcellular locations (Chou and Elrod, 1999Go), the lower the number of discriminating objects, the higher the rate of correct prediction would be in identifying them. This can be further elucidated through the following illustration. Suppose the percentages of {alpha}-helices and ß-sheets in a protein are symbolized by {alpha} and ß, respectively. For the category of protein structural classes, proteins with {alpha} >= 40% and ß <= 5% are classified as the {alpha}-protein class; proteins with {alpha} <= 5% and ß >= 40% are classified as the ß-protein class; and so forth (see, for example, Chou, 1995). Now if the results of secondary structure content prediction by some method for a given protein are {alpha} = 40% and ß = 0%, while its corresponding observed values are actually {alpha} = 60% and ß = 5%, this means there is a deviation of |60 – 40%| =20% = 0.2 and |5 – 0%| = 5% = 0.05 in the prediction of {alpha}-helix and ß-sheet content, respectively. Such a deviation would represent a significant error with respect to the accuracy of secondary structure content prediction; however, in the case of structural class prediction, the protein is correctly predicted as an {alpha}-protein class without any error at all. The above example presents us with a picture about the difficulty of developing an accurate method to predict the secondary structure content of proteins. Probably because of this, in contrast to structural class prediction, much less work has been done on secondary structural content prediction. It is well known that the knowledge of a priori secondary structure content can provide useful information in determining protein structure. Particularly, it also has a close relevance to many experimental methods such as circular dichroism (CD) spectroscopy (Sreerama and Woody, 1994Go) and Raman spectroscopy (Bussian and Sander, 1989Go). In view of its importance, and also to tackle the challenge of its difficulty, the focus of the present study will be aimed at secondary structure content prediction.

In a pioneering study, Krigbaum and Knutton (1973) introduced the multiple linear regression (MLR) algorithm to predict the secondary structure content of a protein based on its amino acid composition. Muskal and Kim (1992) approached the problem in a different way when they developed a tandem neural network method in which the protein's amino acid composition, molecular weight and heme presence were taken into account. Recently, by incorporating some nonlinear terms as well as knowledge of protein structural class, Zhang et al. (1996, 1998) proposed a new approach to predict the amount of secondary structure in a globular protein. According to their report, the predicted results of Zhang et al. (1995, 1998) are better than those of Krigbaum and Knutton (1973) and Muskal and Kim (1992). However, in Zhang's method, the a priori knowledge of structural class of the query protein is needed to perform the prediction of its secondary structure content. Thus, as a consequence, this method has some limitations. Besides, the amino-acid-composition defined in all the aforementioned methods is the 0th-order coupled composition, as defined by


where A, C, D, E, ..., and Y represent the single-letter codes of the 20 amino acids and P(A) represents the proportion of amino acid A (alanine) in a given protein, P(C) the proportion of C (cystenine), P(D) the proportion of D (aspartic acid), and so forth. As we can see from eqn 1, each amino acid component was treated independently, i.e. the coupling effects among the 20 amino acid components were not incorporated at all. The amino-acid-composition thus defined is actually the 0th-order coupled composition, as denoted by the subscript 0 of {Psi} in eqn 1.

Obviously, the 0th-order-coupled system is the lowest approximation. If we wish to incorporate the coupling effects of residues along a sequence so as to reflect more accurately the reality in a protein, how can we develop a method to predict its secondary structure content? The present study was initiated in an attempt to deal with this problem.


    Algorithm
 Top
 Abstract
 Introduction
 Algorithm
 Results and discussion
 Conclusion
 Appendix A
 Appendix B
 References
 
When the coupling effect of a residue with those adjacent to it is taken into account, the proportion factors in eqn 1 should be replaced by the 1st-order conditional proportions and the number of factors will increase from 20 to 20 x 20 = 400; i.e. the 0th-order coupled amino-acid-composition {Psi}0 should be replaced by the 1st-order coupled composition as formulated by:


where P(C|A) is the proportion of amino acid C occurring along a protein sequence from the N- to the C-terminus, given that A has occurred immediately preceding it; P(D|C) is the proportion of amino acid D occurring along the same sequence, given that C has occurred just preceding D; and so forth.

Generally speaking, if the coupling effects of the l (l = 2, 3, ...) closest neighboring amino acid residues are to be considered, then eqn 1 should be modified to be an l th-order coupled amino-acid-composition consisting of 20l +1 components, each of which would correspond to an lth-order conditional proportion. As one could surmise, the analysis of a higher-order coupled system would be much more complicated. Therefore, the treatment in this paper is confined to the 1st-order coupled system; i.e. only the coupling effect of the closest adjacent amino acids is taken into account, as formulated by eqn 2.

The current method is established on the basis of eqn 2, which formulates a conditional probability contribution from each amino acid in the sequence given that it is immediately preceded by a particular one of the 20 amino acids. Accordingly, the 1st-order coupled amino-acid-composition (eqn 2) introduced here involves explicit representation of sequential properties that are not included in the conventional amino-acid-composition, or the 0th-order coupled amino-acid-composition, as formulated by eqn 1.

Suppose the 20 native amino acids are denoted by Xi (i =1, 2, ..., 20) in the alphabetical order of their single-letter codes, i.e. X1 = A, X2 = C, ..., X20 = Y, then according to the normalization condition we have


For brevity, the 400 components in eqn 2 are denoted by y1, y2, ..., y400. The rationale of the current method is the secondary structure content of a protein is correlated with its amino-acid-composition; however, compared with the 0th-order composition, such a correlation would be more accurately reflected in terms of the 1st-order coupled composition. Thus, the content of a secondary structural element in a protein, e.g. {alpha}-helix, can be estimated by the following equation:


where {alpha} represents the {alpha}-helix content, n{alpha} the number of residues occurring in the {alpha}-helices of a given protein and n the number of its total residues, while F{alpha}(y1, y2, ..., y400) is a function to be determined. Expanding the function F{alpha} according to Taylor series at y1 = y2 = ... = y400 = 0, we have


where the subscript 0 means that the value of the corresponding term is obtained by substituting y1 = y2 = ... = y400 = 0 into it. Since all yi (i = 1, 2, ..., 400) in a real protein are generally << 1 with an average equal to = 0.0025 and the derivatives are bounded for real-world situations, the third term and above in eqn 5 can be neglected. Thus, we approximately have


where c = F and c = ({partial}F{alpha}->{partial}yi)0. The coefficients c (i = 0, 1, ..., 400) can be determined through a training dataset by the following procedure.

Suppose in a given training dataset there are N proteins identified by an index k, and its 400 coupled-components are denoted by yk,1, yk,2, ..., yk,400. In order to determine the coefficients of eqn 6, we define an objective function given by


where d is the content of {alpha}-helices in the kth protein and derived here from the DSSP file (Kabsch and Sander, 1983Go) of the kth protein in a given training dataset, as done in Chou et al. (1998). The process of determining the coefficients c (i = 0, 1, ..., 400) is actually a process of finding the minimum of Q{alpha}, and hence a process of solving the following set of linear algebraic equations


Actually, the procedure adopted here is essentially the least squares solution to the multiple regression problem. It can be shown that eqn 8 usually has a unique solution if N, the number of proteins in the training dataset, is equal to or greater than 401 (see Appendix A). Accordingly, all the coefficients c (i = 0, 1, ..., 400) in eqn 7 can be derived. We may also use singular value decomposition to obtain the least squares solution. Substituting them into eqn 6, we immediately obtain the desired equation for predicting the content of {alpha}-helices in a query protein.

Following a similar procedure, we can also predict the content of ß-sheet, its parallel and antiparallel fractions, as well as the content of ß-bridges, 310-helices, {pi}-helices, H-bonded turns, bends and random coils for a given protein. Accordingly, in parallel to eqn 6, a general formulation for predicting all the secondary structure elements can be written as


where {Phi} is a general symbol for all the secondary structure elements, and c (j = 0, 1, 2, ..., 400) are also called the 1st-order coupled `rule-parameters' for predicting the content of the secondary structural element {Phi}. When {Phi} = `{alpha}', eqn 9 will yield the content of {alpha}-helices; when {Phi} = `ß', the content of ß-sheets; when {Phi} = `parallel', the content of parallel ß-sheets; when {Phi} = `antiparallel', the content of antiparallel ß-sheets; when {Phi} = `bridge', the content of ß-bridges; when {Phi} = `310', the content of 310-helices; when {Phi} = `{pi}', the content of {pi}-helices; when {Phi} = `H-bond', the content of H-bonded turns; when {Phi} = `bend', the content of bends; and when {Phi} = `coil', the content of random coils. Note that by definition the secondary structure content must be within the range 0 to 1 (see eqn 4). Therefore, if it was found that {Phi} > 1 or {Phi} < 0, the value of {Phi} should be assigned to 1 or 0, respectively. However, cases like that happened very rarely.

In order to facilitate comparison, here let us also give the corresponding equations based on the conventional amino-acid-composition (eqn 1). By following the procedures parallel to the above derivation, these equations can be easily obtained as follows.


is actually the proportion of amino acid Xi in a protein whose secondary structure contents are to be predicted (see eqn 1), and b (j = 0, 1, 2, ..., 20) are the 0th-order coupled `rule-parameters' for predicting the content of the secondary structural element {Phi} as can be derived by the following equations:


where xk,1, xk,2, ..., xk,20 are the 20 0th-order coupled components (see eqn 1) as usually defined for the amino-acid-composition of the kth protein in the training dataset, and dk{Phi} is a general symbol for the observed content of the secondary structure element {Phi} in the kth protein. When {Phi} = `{alpha}', it becomes dk{alpha} of eqn 7 that is none but the observed content of {alpha}-helices in the kth protein. As mentioned here, the observed value of dk{Phi} ({Phi} = `{alpha}', `ß', `bridge', `310', `{pi}' or any other secondary structural element) can be derived from the DSSP file (Kabsch and Sander, 1983Go) of the kth protein in a given training dataset.

A comparison of eqns 10–12 with eqns 7–9 indicates that all the sequence-coupled effects are no longer counted for the result predicted by eqn 10. This is because all the conditional probability terms, which were originally associated with the 1st-order coupled rule-parameters in eqn 9, are degenerated into the independent amino-acid-composition terms (see eqns 10 and 11).


    Results and discussion
 Top
 Abstract
 Introduction
 Algorithm
 Results and discussion
 Conclusion
 Appendix A
 Appendix B
 References
 
As mentioned in the section of Algorithm and shown in Appendix A, in order to find the unique solution of c in eqn 9 (i = 0, 1, 2, ..., 400), the number of proteins in a training dataset must be greater than or equal to 401. In the current study, 628 proteins of known structure were selected for the training dataset, where the similarity between any two sequences is no more than 25%. Listed in Table IGo are the PDB codes of the 628 proteins. These proteins were used to derive the rule parameters c (i = 0, 1, 2, ..., 400) for predicting the secondary structure content.


View this table:
[in this window]
[in a new window]
 
Table I. The PDB codes of 628 proteins in the training dataseta

 
The results were examined through a self-consistency test and independent-dataset test. The following three errors were introduced to evaluate the prediction quality:

the average absolute error for each secondary structure element {delta}{Phi}


the standard deviation for each secondary structure element {sigma}{Phi}


and the overall average error <{delta}>


where {Phi} = `{alpha}', `ß', ..., or `coil', {Phi}k is the predicted content for the secondary structure element {Phi} in the kth protein, while d is the corresponding observed content, and {Lambda} is the total number of the secondary structure elements considered; that is, 10 for the current study.

Self-consistency test

In this test, the rule parameters derived from the 628 proteins in Table IGo by eqns 7–8 were used to predict the secondary structure content of the same proteins by eqn 9. The 10 sets of 1st-order coupled rule parameters (each contains 401 coefficients) thus found for predicting the content of {alpha}-helices, ß-sheets, its parallel and antiparallel proportions, ß-bridges, 310-helices, {pi}-helices, H-bonded turns, bends and random coils, respectively, are given in Appendix B. The results of the self-consistency test for the 628 proteins in Table IGo are given in Table IIGo, from which we can see that the average absolute errors for the prediction of {alpha}-helices and ß-sheets are 0.056 and 0.046 with a standard deviation of 0.008 and 0.005, respectively. For the other secondary structure elements, except for the proportions of parallel and antiparallel ß-strands, the average errors were all <=0.020 with a standard deviation of <=0.001. The average absolute error for the prediction of the parallel and antiparallel ß-strand portions are relatively large. However, even though the overall average error for all the 10 secondary structure elements is 0.062, by excluding these two from consideration, the overall average error becomes 0.028, indicating an excellent self-consistency by using the 1st-order couple composition regression algorithm. To show the prediction quality, the calculated and observed content of {alpha}-helices and ß-sheets in each of the 628 proteins are shown in Figure 1a and bGo, respectively.


View this table:
[in this window]
[in a new window]
 
Table II. Prediction errors by the self-consistency tests using the 0th- and 1st-order coupled amino-acid-composition algorithms, respectively, for the 628 proteins in Table IGo
 


View larger version (22K):
[in this window]
[in a new window]
 
Fig. 1. Comparison of the predicted and observed contents of (a) {alpha}-helices and (b) ß-sheets for the 628 proteins of Table IGo by the self-consistency test using the 1st-order coupled algorithm.

 
To provide a comparison, the self-consistency test, using the same protein dataset, was also performed for the prediction algorithm based on the conventional amino-acid-composition (eqns 10–12), and the corresponding results are also listed in Table IIGo. Owing to the space limit, the corresponding 0th-order coupled rule-parameters are not given here. As we can see from Table IIGo, the average error for each of the 10 secondary structure elements obtained by the 0th-order coupled algorithm (eqn 10) was almost two times of that by the 1st-order coupled algorithm (eqn 9). Actually, the overall average error for all the 10 secondary structure elements by the 0th-order coupled algorithm is 0.119, in contrast to 0.062 by the 1st-order coupled algorithm, indicating a significant improvement in the self-consistency after taking into account the coupling effect.

Although prediction errors reported above are very small, it should be pointed out here that they are merely the results obtained by the self-consistency test based on a limited number of proteins. Using the self-consistency test, the secondary structure content of each protein from a training dataset is predicted using the coefficients derived from the same dataset. In other words, the rule parameters derived from the training dataset include information about a protein later tested. This will certainly give an overly optimistic error estimate because of the memorization effect. Nevertheless, the self-consistency test is absolutely necessary because it reflects the consistency of a prediction method, especially for its algorithm part. A prediction algorithm certainly cannot be deemed a good one if it is non-consistent. In other words, the self-consistency test is necessary but not sufficient for evaluating a prediction method. As a complement, a cross-validation examination based on an independent testing dataset is needed as given below.

Independent-dataset test

Testing on a set of proteins not present in the training dataset is important because it can reflect the effectiveness of a prediction method, especially in checking the validity of a training dataset: whether it contains sufficient information to reflect all the important features concerned so as to yield high prediction quality in application. For cross-validation, an independent testing dataset was constructed. It consisted of 52 proteins with known structures (Table IIIGo). The sequence similarity between two proteins in this dataset, or between a protein in this dataset and any one in the training dataset (Table IGo), is no more than 35%. The secondary structure contents of these proteins were calculated in terms of the rule parameters derived from the proteins of the training dataset by the 0th- and 1st-order coupled algorithms, respectively. The results thus obtained for the content of {alpha}-helices and ß-sheets, together with the corresponding observed values, are listed in Table IIIGo. As we can see there, for each of the 52 proteins the content predicted by the 1st-order-coupled algorithm for both {alpha}-helices and ß-sheets are much closer to the observed values than those by the 0th-order coupled algorithm.


View this table:
[in this window]
[in a new window]
 
Table III. Predicted results for 52 independent proteins using the 0th- and 1st-order coupled amino-acid-composition algorithms, respectively
 
It is intriguing to note that the improvement in prediction quality by taking into account the 1st-order coupled effect may provide new structural or even biological insight by correcting the errors deduced from the 0th-order-coupled algorithm. For example, the hydrophobic protein from soybean (1hyp.pdb) is a typical {alpha} protein (Figure 2aGo) because the observed {alpha}-helical content is 47% and that of its ß-sheets is 0%. But according to the predicted results by the 0th-order coupled algorithm, its overall structure would be incorrectly classified as the {alpha}/ß or {alpha} + ß class because the contents of its {alpha}-helix and ß-sheets thus obtained were 24% and 16%, respectively (Table IIIGo). However, if predicted by the 1st-order coupled algorithm, the contents of {alpha}-helices and ß-sheets for the same protein were 45% and 7% (Table IIIGo), respectively, and hence its structural class would be correctly assigned as an {alpha} protein (Chou et al., 1998Go). The A-chain of PF3 single-stranded DNA binding protein (1pfsA.pdb) is a ß protein (Figure 2bGo) because its observed contents of {alpha}-helices and ß-sheets are 0 and 63%. But the corresponding contents predicted by the 0th-order coupled algorithm were 24 and 23% (Table IIIGo), leading again to an incorrect assignment of structural class. If predicted by the 1st-order coupled algorithm, the contents of {alpha}-helices and ß-sheets were 3 and 63%, and hence the protein would be correctly assigned as a ß-protein (Chou et al., 1998Go). Also, the I-chain of subtilisin carsberg (1cseI.pdb) and acylphosphatase (1aps.pdb) belong to the {alpha}/ß (Figure 2cGo) or {alpha} + ß class (Figure 2dGo). But according to the predicted results by the 0th-order coupled algorithm (Table IIIGo), both would be incorrectly assigned as a ß protein. Fortunately, the erroneous assignment in both cases would be corrected based on the predicted results obtained by the 1st-order coupled algorithm (Table IIIGo).



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 2. The overall structural feature of (a) 1hyp.pdb, the hydrophobic protein from soybean (Lehmann et al., 1989Go), (b) 1pfsA.pdb, the A-chain of PF3 single-stranded DNA binding protein (Folmer et al., 1995Go), (c) 1cseI.pdb, the I-chain of subtilisin carsberg (Bode et al., 1987Go) and (d) 1aps.pdb, the acylphosphatase (Pastore et al., 1992Go). See Table IIIGo for the observed contents of {alpha}-helices and ß-sheets in these proteins, and the corresponding values predicted by the 0th-order coupled and 1st-order coupled algorithms, respectively.

 
Much evidence indicates that there is some correlation between the overall structural features of a protein and its biological function. For example, the low-frequency motion (such as an accordion-like or breathing-like motion) of {alpha}-helices, ß-sheets or ß-barrels in some proteins is vitally important for their biological function (Chou, 1988Go, 1989Go). And the mode of this kind of low-frequency collective motion is determined by the content or size of the {alpha}-helices and ß-sheets as well as their arrangement in the protein. Also, many enzymes have the overall structure of type {alpha}/ß (Farber and Petsko, 1990Go). It has been found recently that the singular points of protein ß-sheets are usually near the active site and may contribute to forming the proper relative positions of catalytic residues (Liu and Chou, 1998Go). Accordingly, the improved accuracy obtained by the 1st-order coupled algorithm in predicting the contents of {alpha}-helices and ß-sheets in a protein might provide useful insights into not only its overall structural features but also its biological function.

It should be pointed out that although in principle the algorithm formulated here can be used to predict the percentage of parallel and antiparallel ß-sheets in a protein, the results are relatively much poorer than those of the other secondary structure elements. To improve this situation, the incorporation of some special effect into the algorithm might be necessary.


    Conclusion
 Top
 Abstract
 Introduction
 Algorithm
 Results and discussion
 Conclusion
 Appendix A
 Appendix B
 References
 
The conventional amino-acid-composition as defined in eqn 1 is a 0th-order-coupled composition. In comparison, using the 1st-order-coupled composition as formulated by eqn 2 can improve the prediction quality of protein secondary structure content. For example, the average absolute errors for predicting the contents of {alpha}-helices and ß-sheets were 0.056 and 0.046 in the self-consistency test (Table IIIGo); however, if the prediction algorithm was based on the 0th-order coupled composition, the corresponding errors would be 0.103 and 0.090, respectively. This is consistent with the results showing that the incorporation of the sequence coupling effect can improve the prediction quality of protein secondary structure (Chou, 1997aGo,bGo; Chou and Blinn, 1997Go). Moreover, it was demonstrated through example that the improved accuracy might provide useful insight into correction of errors resulting from the 0th-order-coupled algorithm about the overall structure and biological function of a protein.


    Appendix A
 Top
 Abstract
 Introduction
 Algorithm
 Results and discussion
 Conclusion
 Appendix A
 Appendix B
 References
 
For the reader's convenience, here let us show that eqn 8 usually has a unique solution if N, the number of proteins in the training dataset, is equal to or greater than 401. Substituting eqn 7 into eqn 8, we obtain


where yk,0 = 1 is a dummy symbol. The above equation can be written as


where


T is the transposition operator, and


Accordingly, we have


If XTX is invertible, C{alpha} has a unique solution


The condition that XTX is invertible requires N >= 401. When N >= 401 and when the N proteins selected for the training dataset are not homologous to one another, XTX is usually invertible.


    Appendix B
 Top
 Abstract
 Introduction
 Algorithm
 Results and discussion
 Conclusion
 Appendix A
 Appendix B
 References
 


View this table:
[in this window]
[in a new window]
 
The ten sets of 1st-order-coupled rule-parametersa for eqn 9 to calculate the contents of secondary structure elements

 

    Acknowledgments
 
This work was supported in part by the grant from the National Science Foundation (NSF DUE-9555408). We would also like to thank the two anonymous referees whose constructive comments were very helpful for improving the paper.


    Notes
 
2 To whom correspondence should be addressed Back


    References
 Top
 Abstract
 Introduction
 Algorithm
 Results and discussion
 Conclusion
 Appendix A
 Appendix B
 References
 
Bahar,I., Atilgan,A.R., Jernigan,R.L. and Erman,B. (1997) Proteins, 29, 172–185.[ISI][Medline]

Bode,W., Papamokos,E. and Musil,D. (1987) Eur. J. Biochem., 166, 673–692.[Abstract]

Bussian,B.M. and Sander,C. (1989) Biochemistry, 28, 4271–4277.[ISI]

Chou,K.C. (1988) Biophys. Chem., 30, 3–48.[ISI][Medline]

Chou,K.C. (1995) Proteins Struct. Funct. Genet., 21, 319–344.[ISI][Medline]

Chou,K.C. (1997a) J. Peptide Res., 49, 120–144.[ISI][Medline]

Chou,K.C. (1997b) Biopolymers, 42, 837–853.[ISI][Medline]

Chou,K.C. and Blinn,J.R. (1997) J. Protein Chem., 16, 575–595.[ISI][Medline]

Chou,K.C. and Elrod,D.W. (1999) Protein Engng, 12, 107–118.[Abstract/Free Full Text]

Chou,K.C. and Zhang,C.T. (1995) Crit. Rev. Biochem. Mol. Biol., 30, 275–349.[Abstract]

Chou,K.C., Liu,W., Maggiora,G.M. and Zhang,C.T. (1998) Prot. Struct. Funct. Genet., 31, 97–103.

Chou,P.Y. (1908) Amino Acid Composition of Four Classes of Proteins. In Abstracts of Papers, Part I, Second Chemical Congress of the North American Continent, Las Vegas.

Chou,P.Y. (1989) Prediction of Protein Structural Classes from Amino Acid Composition. In Fasman,G.D. (ed.), Prediction of Protein Structure and the Principles of Protein Conformation. Plenum Press, New York, pp. 549–586.

Chou,P.Y. and Fasman,G.D. (1978) Adv. Enzymol. Relat. Subj. Biochem., 47, 45–148.

Dubchak,I., Holbrook,S.R. and Kim,S.-H. (1993) Proteins, 16, 79–91.[ISI][Medline]

Farber,G.K. and Petsko,G.A. (1990) Trends Biochem. Sci., 15, 228–234.[ISI][Medline]

Fasman,G.D. (1989) The Development of the Prediction of Protein Structure. In Fasman,G.D. (ed.), Prediction of Protein Structure and the Principles of Protein Conformation. Plenum Press, New York, pp. 317–358.

Folmer,R.H., Nilges,M., Konings,R.N. and Hilbers,C.W. (1995) EMBO J., 14, 4132–4142.[Abstract]

Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577–2637.[ISI][Medline]

Krigbaum,W.R. and Knutton,S.P. (1973) Proc. Natl Acad. Sci. USA, 70, 2809–2813.[Abstract]

Lehmann,M.S., Pebay-Peyroula,E., Cohen-Addad,C. and Odani,S. (1989) J. Mol. Biol., 210, 235–236.[ISI][Medline]

Liu,W. and Chou,K.C. (1997) J. Protein Chem., 17, 209–217.[ISI]

Liu,W. and Chou,K.C. (1998) Protein Sci., 7, 2324–2330.[Abstract/Free Full Text]

Muskal,S.M. and Kim,S.-H. (1992) J. Mol. Biol., 225, 713–727.[ISI][Medline]

Nakashima,H., Nishikawa,K. and Ooi,T. (1986) J. Biochem., 99, 152–162.

Pastore,A., Saudek,V., Ramponi,G. and Williams,R.J.P. (1992) J. Mol. Biol., 224, 427–440.[ISI][Medline]

Sreerama,N. and Woody,R.W. (1994) J. Mol. Biol., 242, 497–507.[ISI][Medline]

Zhang,C.T., Zhang,Z. and He,Z. (1996) J. Protein Chem., 15, 775–786.[ISI][Medline]

Zhang,C.T., Zhang,Z. and He,Z. (1998) J. Protein Chem., 17, 261–272.[ISI][Medline]

Received March 9, 1999; revised July 24, 1999; accepted August 5, 1999.