Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
1 To whom correspondence should be addressed. e-mail: hpark{at}cs.umn.edu
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: directed acyclic graph scheme/position-specific scoring matrix/protein structure prediction/secondary structure/support vector machines
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A support vector machine (SVM) constructs an optimal separating hyperplane which maximizes the margin (i.e. the distance between the hyperplane and the nearest data point of each class) by mapping the input space into a high-dimensional feature space. The mapping is determined by a kernel function. Training with SVMs has crucial advantages including fast convergence, typically 12 orders of magnitude faster than neural networks (NNs) (Ding and Dubchak, 2001
), tending not to over-fit and the ability to find the problem formulation as a quadratic convex function minimization that is easier to solve (Vapnik, 1995,
1998; Burges and Schölkopf, 1997
; Osuna et al., 1997
; Burges, 1998
; Cristianini and Shaw-Taylor, 2000
; Hua and Sun, 2001
) The previous study for secondary structure prediction using support vector machines achieved good results by using the frequency profiles with evolutionary information and removing the influence of noise and outliers by discarding a fraction of samples which are hard to predict because they are located near the optimal separating hyperplane. However, the prediction level is not sufficient to compare favorably with the recent results of the neural network approaches.
The recent approaches based on neural networks, for example PSIPRED and Jnet, have been successfully advanced by PSI-BLAST PSSM (position-specific scoring matrix) profiles (Jones, 1999) derived from sequences that have remote similarities, by an iterative strategy. Jones (Jones, 1999
) expected that other secondary structure prediction methods will show measurable improvements in accuracy by using PSI-BLAST profiles instead of using the multiple sequence alignment approach and Hua and Sun (Hua and Sun, 2001
) have already pointed out that it is also possible to achieve significant improvement by incorporating PSI-BLAST-generated profiles in the SVMs approach.
In this paper, we show the improvement of prediction accuracy by new tertiary classifiers and their jury decision system, efficient methods to handle unbalanced data and a new optimization strategy for support vector machines that maximizes the Q3 measure. We also apply this to PSI-BLAST profiles, in order to improve the current prediction level and to show that the support vector machine approach is a valid method for secondary structure prediction. We further investigate a new way to reduce the influence of noise and outliers by using the theoretical relationships in the soft margin support vector machine. The training sets with an unbalanced number of data items in each class can produce an ill-balanced binary classifier that may have low recall for the smaller class. If we use an ill-balanced binary classifier, it may not produce a good final prediction result in spite of high prediction accuracy in each binary classifier, which constitutes the cascaded tertiary classifier. We adopted the one-versus-one scheme and directed acyclic graph (DAG) scheme (Heiler, 2002) for handling three class problems since these demonstrate better performance results for multi-classifications (Hsu and Lin, 2002
). We built the jury decision system for all the designed tertiary classifiers to obtain better prediction accuracy. Here, we will show that SVMpsi can achieve the most accurate published Q3 and SOV94 scores on the RS126 (Rost and Sander, 1993
) and CB513 (Cuff and Barton, 1999
) data sets. In the fifth critical assessment of structure prediction (CASP5) experiment, we predicted the most accurate structure for five proteins compared with the other groups. The average Q3 and SOV3 scores for SVMpsi were 79.10% and 79.38%, respectively. The results demonstrate that SVMpsi is one of the most promising methods for protein secondary structure prediction.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The secondary structure is assigned from the experimentally determined tertiary structure by DSSP (Kabsch and Sander, 1983), STRIDE (Frishman and Argos, 1995
) or DEFINE (Richards and Kundrot, 1988
). We use DSSP since it has been the most widely used secondary structure definition. It has eight secondary structure classes: H(
-helix), G(310-helix), I(
-helix), E(ß-strand), B(isolated ß-bridge), T(turn), S(bend) and (rest). Then reduction from eight classes to three states of helix (H), sheet (E) and coil (C) is done by using one of the following methods:
1. H,G and I to H; E to E; all other states to C
2. H,G to H; E,B to E; all other states to C
3. H,G to H; E to E; all other states to C
4. H to H; E,B to E; all other states to C
5. H to H; E to E; all other states to C
The 8- to 3-state reduction method can alter the apparent prediction accuracy (Cuff and Barton, 1999). Although we can expect an accuracy increase by using method 5, we used different methods for different data sets to provide fair comparison of our results with other methods. The details are discussed in the subsection where we present our data sets.
We tested protein secondary structure prediction using PSI-BLAST profiles and designed classifiers for the three-cluster problem based on the binary classifiers generated by SVMs. The final position-specific scoring matrices from PSI-BLAST against the SWALL (Bairoch and Apweiler, 2000) non-redundant protein sequence database are used. We applied PFILT (Jones et al., 1994
; Jones and Swindells, 2002
) to mask out regions of low-complexity sequences, the coiled coil region and transmembrane spans. For PSI-BLAST, an E-value threshold for inclusion of 0.001 and three iterations were applied to search the non-redundant sequence database.
The position-specific scoring matrix has 20xN elements, where N is the length of the target sequence and each element represents the log-likelihood of a particular residue substitution based on a weighted average of BLOSUM62 (Henikoff and Henikoff, 1992) matrix scores for a given alignment position in the template. The profile matrix elements in the range [7,7] are scaled to the [0,1] range by using the following function:
where x is the value from the raw profile matrix. We selected the above function after testing various scale functions to maximize the Q3 score. As in the PHD coding scheme, we used a sliding window method (Qian and Sejnowski, 1988; Rost and Sander, 1993
). In order to allow a window to extend over the N-terminus and the C-terminus, an additional 21st unit was appended for each residue. Therefore, each input vector has 21xw components, where w is the sliding window size. The window is shifted residue by residue through the protein chain. We constructed three one-versus-rest classifiers, each of which determines whether the secondary structure of the residue is a particular secondary state or not (H/
H, E/
E, C/
C), and three one-versus-one classifiers (H/E, E/C and C/H).
Prediction accuracy assessment
Several standard performance measures were used to assess prediction accuracy. Q3 is a measure of the three-state overall percentage of correctly predicted residues:
The correlation coefficient (CH, CE, CC) introduced by Matthews (Matthews, 1975) is
where pi is the number of correctly predicted residues in conformation, ri the number of those correctly rejected, ui the number of those incorrectly rejected (false negative) and oi that of those incorrectly predicted to be in the class (false positive), for i = H,E,C. The per residue accuracy (QH, QE, QC; QHpre, QEpre, QCpre) for each type of secondary structure (Hua and Sun, 2001) was also calculated as
and
where conformation state i can be H, E or C.
The segment overlap measure (SOV) is a measure for evaluation of secondary structure prediction methods by secondary structure segment rather than individual residues (Rost and Sander, 1994; Zemla et al., 1999
). SOV is calculated as
where S(i) is the set of all overlapping pairs of segments (s1, s2) in conformation state i, len(s1) is the number of residues in segment s1, minov(s1, s2) is the length of the actual overlap and maxov(s1, s2) is the total extent of the segment. The quality of match of each segment pair is taken as a ratio of the overlap of the two segments minov(s1, s2) and the total extent of that pair maxov(s1, s2). The definition of and the normalization factor N is different between SOV94 (Rost and Sander, 1994
) and SOV99 (Zemla et al., 1999
). We calculated SOV94 for RS126 and CB513 to compare the results since PHD (Rost and Sander, 1994
), PSIPRED (Jones, 1999
) and SVMfreq (Hua and Sun, 2001
) methods used SOV94.
Training and testing data sets
For comparing our new results with some previously published results (Hua and Sun, 2001) that used a frequency-based coding scheme, we selected non-homologous RS126 and CB513 data sets. The results show that the PSI-BLAST profiles are also helpful in improving accuracy in the SVM approach. The CB513 set includes the CB396 data set and almost all proteins of RS126 except nine homologues for which the SD significance score is >5 (Cuff and Barton, 1999
). The SD score is a more stringent measure of sequence similarity than the percentage identity since it corrects for bias due to the length and composition of sequences.
We prepared a data set of 480 proteins by removing proteins from CB513 that have <30 residues and those that contained only a few sequences in the first iteration of PSI-BLAST. The 16 proteins that are shorter than 30 residues are removed since it has been shown that they do not have well defined secondary structure (Cuff and Barton, 1999). The prepared KP480 data set may not be the same as the 480 data set of Jnet (Cuff and Barton, 2000
), although they are generated from CB513 by removing proteins that are shorter than 30 residues. Each data set is divided into seven folds that have a similar number of proteins and similar composition of the secondary structure to perform cross-validation tests.
In addition to the cross-validation tests for accessing the performance of the prediction method, we prepared a blind test set of 136 protein sequences that were not used in the training set. The test set was prepared using a structural similarity criterion so that it does not have any protein that is contained in the same fold family, i.e. the CATH (Orengo et al., 1997) T-level, with the CB513 training set. Each protein sequence of the test set represents unique protein folds. Only highly resolved structures (resolution <1.8 Å) of which the length is >60 residues and <600 residues were included in the blind test set. The structural similarity criterion is more stringent than the SD score, which is a measure of pairwise sequence similarity (Cuff and Barton, 1999
; Jones, 1999
). Hence there is no pair of similar sequences between the training and blind test sets. We used 8- to 3-state reduction method 2 for the RS126 data set to provide a fair comparison of our results with those of other methods such as PHD (Cuff and Barton, 1999
), DSC, PREDATOR, NNSSP and their consensus method (Cuff and Barton, 1999
), although PHD (Rost and Sander, 1993
) and SVMfreq (Hua and Sun, 2001
) methods based on frequency profiles used reduction method 1. The 8- to 3-state reduction method 4 was used for the KP480 set for comparison with the Jnet result based on the 8- to 3-state reduction method 4 (H to H, E,B to E, all other states to C). In Jnet, the 310-helix was removed since it represents a weak 1 kcal/mol hydrogen bond so that it does not represent core secondary structure. We adopted the 8- to 3-state reduction method 2 for the 7-fold cross-validation test of the CB513 data set and for the blind test of 136 non-redundant sequences, which is one of the most widely used and to compare the prediction performance of our method with that of other methods.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In an L1 soft margin support vector machine (Vapnik, 1995, 1998), we need to select a kernel function and the regularization parameter C in each binary classifier, to construct a classifier for multiple classes. The primal formulation of the soft-margin SVMs maximize margin and minimize training error simultaneously by solving the following optimization problem:
where xi represents an input vector, yi = ±1 according to whether xi is in the positive or negative class, n is the number of the training data and C is a parameter that controls the trade-off between margin and classification error represented by slack variables i. The separating hyperplane in a mapped high-dimensional feature space can be represented as wt
(x) + b = 0, where w is the solution of the primal formulation and
(·) is a non-linear function which maps the input space into a higher dimensional space.
The corresponding dual quadratic programming problem with the application of a kernel function K(xi, xj) can be written as
where i are the solutions of the dual formulation. The dual formulation of the soft margin SVM with control parameter C shows that the influence of a single training example is limited by C. Our substantial tests show that the RBF (radial basis function) kernel, defined as
is appropriate for complex classification problems, when parameters and C are selected from the optimization process.
Since multi-class classification is based on binary classifiers in the support vector machine, the criteria for selecting the optimal parameters and C in each binary classifier play a critical role. A common practice is to choose the parameters that maximize the accuracy (i.e. maximize the number of correct predictions) in each binary classifier. Certainly the optimization criteria should depend on the performance measure of the final results, which we would like to optimize. In the case of the protein secondary structure prediction, two of the most commonly used performance measures are Q3 and SOV3. Using the example of Q3 measure and the three-class classifier that is built upon two binary classifiers, which determine the membership in H/
H (H versus not H) in step 1 and E/C (E versus C) in step 2, we now illustrate this point further. The total number t of training data items and Q3 can be represented as
Q3 = (pH + pE + pC)/t
where
t = pH + rH + uH + oH
using the notation introduced before. To reflect the fact that the value of Q3 depends on the results from steps 1 and 2, Q3 may be rewritten in various ways, including
where #(H) denotes the number of data items not in H. The difficulty comes from the fact that the result in step 2 depends on the result from step 1 and there is no easy way to reflect this in the expression for Q3. We have chosen our optimized parameters by fine tuning based on accuracy, recall and the precision of each step. The recall (R) and precision (P) for H and
H in step 1 are defined as
Unlike in query processing, it is important to consider recalls and precision of both positive and negative classes in the classification. The optimized parameters chosen based on the results of each step and in both binary classifiers, are = 0.05 and C = 1.0 on the RS126 data set and on the KP480 set, and
= 0.05 and C = 2.5 on the CB513 data set for the PSI-BLAST profiles.
In a soft margin SVM, the support vectors satisfy the following relationships:
where SV means a support vector and a training point xi is the support vector only when i
0. To reduce the influence of noise and outliers, after finding the support vectors from the training stage, support vectors that are close to the optimal separating hyperplane or on the other side of the hyperplane can be partly or totally removed by ignoring those with
i = C in the retraining stage. However, there was no significant effect of this strategy on the prediction results when our encoding scheme based on PSI-BLAST was used, which shows that the PSI-BLAST profile in secondary structure prediction is robust in the presence of noise and outliers.
Optimal window length for binary classifiers
The optimal window length of the sliding window coding scheme was obtained by testing the accuracy for the various window sizes. When the window size is too short, it may lose some important classification information and prediction accuracy, and a too long window size may suffer from inclusion of unnecessary noise. For convenience, we call our method using PSI-BLAST profiles SVMpsi and Hua and Suns method (Hua and Sun, 2001) based on the frequency profiles approach SVMfreq. Table I shows that the optimal window length of SVMpsi is much longer than that of SVMfreq on the RS126 data set. The prediction accuracy for the binary classifier does not change dramatically when a window length >15 is used, which shows that the SVMpsi method effectively dealt with noise. We chose a window length of 15 for all results in this paper. This is slightly smaller than the sliding window length of 17 of the first layer neural network for sequence to structure prediction in Jnet (Cuff and Barton, 2000
).
|
There are many ways to design a tertiary classifier for secondary structure prediction based on binary classifiers. We used several methods proposed by Hua and Sun (Hua and Sun, 2001) to compare our results with theirs. Their methods are based on three one-versus-rest binary classifiers (H/
H, E/
E, C/
C) and three one-versus-one binary classifiers (E/C, C/H, H/E). Three cascade tertiary classifiers, SVM_TREE1(H/
H, E/C), SVM_TREE2(E/
E, C/H) and SVM_TREE3(C/
C, H/E), were made up of two binary classifiers. In the SVM_MAX_D tertiary classifier, the class for a testing sample was assigned as that corresponding to the largest positive distance to the optimal separating hyperplane among SVM_TREE1, SVM_TREE2, and SVM_TREE3 classifiers. The SVM_VOTE classifier combines all six binary classifiers using a simple voting principle: the testing sample was predicted to be in state i if the largest number of the six binary classifiers classified it as state i. SVM_JURY used the jury technique to combine all the results of the tertiary classifiers discussed above.
We designed two additional tertiary classifiers based on a one-versus-one scheme and a DAG scheme (Heiler, 2002). The one-versus-one classifier for secondary structure prediction chooses the majority results based on three classifiers H/E, E/C and C/H. Many test results show that one-versus-one classifiers are more accurate than one-versus-rest classifiers because the one-versus-rest scheme often needs to deal with two data sets with very different sizes, i.e. unbalanced training data (Heiler, 2002
; Hsu and Lin, 2002
). However, a potential problem with the one-versus-one scheme is that the voting scheme might suffer from incompetent classifiers. For example, while the test point is helix (H), the result from the one-versus-one classifier E/C that is not related to helix inappropriately contributes to the decision. We can reduce this problem by using the DAG scheme that can classify a new data point after two binary classifications for three class problems without influence from incompetent classifiers. For example, if the testing point is predicted to be E (not C) from E/C classifier, then the H/E classifier is applied, whereas if the point is predicted to be not sheet (
E) from E/C classifier, the classifier C/H is applied to determine if it is coil or helix. We developed the JURY2 classifier, which combines the results of SVM_MAX_D, SVM_VOTE, ONEvsONE, and DAG.
The results from the one-versus-one scheme and the DAG scheme were better than those of SVM_TREE1, SVM_TREE2 or SVM_TREE3. Moreover, the results were comparable to those of SVM_MAX_D or SVM_JURY prediction although they used only one-versus-one classifiers for decisions instead of all six binary classifiers (see Table II). This shows that the one-versus-one scheme or DAG scheme that utilizes only one-versus-one classifiers is a good approach in the three-class classification problem, such as protein secondary structure prediction, since we can reduce the computational complexity and the difficulty of big unbalanced classification by using one-versus-one rather than one-versus-rest binary classifiers.
|
For handling unbalanced data, we used different penalty parameters in the SVM formulation (Osuna et al., 1997):
Using different penalty parameters (C+ and C), we can resolve the situation when the recall value of the smaller class is too small to produce good secondary structure prediction. We optimized the weight parameters for each binary classifier in order to produce optimal Q3 or SOV3.
Reliability index and support vectors
When using machine learning approaches for the prediction of the secondary structure of a new sequence, it is important to know the prediction reliability. We used a reliability index (RI) to assign the prediction reliability. We can expect that a sample that has a large positive distance to the optimal separating hyperplane has a high probability of belonging to the positive class. The reliability index is defined as
where d(i) means the distance of the sample in state i to the optimal separating hyperplane of the binary classifier. The thresholds in the reliability index definition are chosen to make the percentage of residues about 22% for RI = 9 and about 12% for RI = 0. Figure 1 shows the average accuracy (Q3) and the percentage of residues covered against the cumulative reliability index from the SVMpsi method for the 136 blind test set proteins; 49.8% of all residues were predicted with RI 5 and 92% of them were correctly predicted; 22.8% of all residues were predicted with RI = 9 and 96% of them were correctly predicted. The ratio of the number of support vectors to all training samples for each of the six binary classifiers is <50%, except in the C/
C binary classifier. This shows that the PSI-BLAST profiles made classification easier than the multiple alignment frequency-based profiles that had a ratio of
50%. We developed a protein secondary structure predictor that is based on SVMlight (Joachims, 1999
). The single predicted
-helix is modified to the same secondary structure of the more reliable prediction (larger distance to optimal separating hyperplane) at the previous and next residue, since the occurrence of a single helix is not realistic.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Table III shows that the accuracy of the binary classifier is significantly improved with SVMpsi. It is interesting that the accuracy of the H/E binary classifier is improved by more than 9%, whereas that of the C/C binary classifier is improved by 4.72% on the RS126 data set. Whenever a binary classifier involves the class of coil (C) (C/
C, C/H, C/E), the prediction accuracy was lower. This seems to be because the class C involves states that are not as well defined, and therefore items that belong to class C do not seem to have high within-class consistency. As shown in Table IV, the Q3 and SOV94 of the SVMpsi method based on the PSI-BLAST profiles are higher than those of the SVMfreq method based on the frequency profiles with multiple sequence alignments (Hua and Sun, 2001
) as well as PHD, DSC, PREDATOR and NNSSP for the RS126 data set. The Q3 of SVMpsi outperforms the Bi-directional Recurrent Neural Network (BRNN) proposed by Baldi et al. (Baldi et al., 1999
) of 72.0% on the RS126 set. We can expect higher accuracy if the SVMpsi method is used as a component of the consensus method in conjunction with other good predictors, such as PSIPRED and Jnet. Q3 scores for RS126 and CB513 are improved by 4.9 and 3.1%, respectively and SOV94 scores are improved by 5.0 and 3.9%, respectively, compared with Hua and Suns results (Hua and Sun, 2001
). The improvement is much more than that of Jnet (3.1%) and comparison with PHD. We can say that the improvement of SVMs with PSI-BLAST is higher than that of NNs with the PSI-BLAST profiles. The SVMpsi method achieves the highest published Q3 and SOV94 values on both the RS126 and CB513 data sets to date. In the blind test of 136 protein sequences, the weighted average accuracy by sequence length, SOV94 and SOV99 scores were 77.2, 81.8 and 73.9%, respectively. Jones PSIPRED method based on neural networks (Jones, 1999
), which used the PSI-BLAST profiles, achieved an overall pre-residue accuracy of Q3 = 76.5% and SOV94 = 73.5% on his test set, which includes 187 sequences after training with over 1000 protein structures. Our results cannot be compared directly with those of PSIPRED since they used a different training set and test proteins that contain some sequences of the CB513 data set. Cuff and Barton (Cuff and Barton, 2000
) showed Q3 = 75.2% from cross-validated predictions of their 480 non-redundant testing set proteins when the PSI-BLAST profiles were used. We obtained Q3 = 78.5%, SOV99 = 75.6% and SOV94 = 82.8% from 7-fold cross-validated predictions on our KP480 non-redundant test set. A direct comparison of performance between the two methods was not possible because the prepared KP480 data set may not be exactly the same as the 480 data set of Jnet. However, it shows that the SVM approach is another good method to perform secondary structure prediction. The Q3 of the KP480 data set is higher than that of the CB513 data set. This is as expected because it helps to increase prediction accuracy to remove the sequences that are shorter than 30 residues and to use the 8- to 3-state reduction method 4 (H to helix, E,B to sheet, all other states to coil) instead of reduction method 2.
|
|
We could improve the support vector machine approach for the protein secondary structure prediction by new tertiary classifiers and their jury decision, an efficient method to handle unbalanced data and PSSM profiles. This was promoted to improve the current level of prediction using SVMs, since neural network approaches have been studied for various structures and profiles by many researchers. It is not fair to compare only the absolute Q3 values when they are trained by different datasets. To evaluate our method, we participated in the fifth CASP (Critical Assessment of Structure Prediction) experiment in 2002. Figure 2 shows the results for the predictions that were submitted to CASP5. The average Q3 and SOV3 scores for SVMpsi were 79.10 and 79.38%, respectively. We predicted the most accurate structure for five proteins compared with the other groups. This was ranked fourth among all participating groups. Twenty-one groups could predict the most accurate structure at least for one protein. The first rank group predicted the most accurate structure for seven proteins. It is not possible to say the exact rank with respect to average Q3 or SOV3 since the target size is small and the leading groups submitted different numbers of proteins. However, it shows that the SVMpsi method can at least match the current levels of prediction.
|
We have focused in this paper on the contribution of the local interaction to the protein secondary structure using a sliding window scheme. The tertiary interactions between residues far apart in sequence but close in three dimensions can be considered (Baldi et al., 1999) to improve prediction accuracy. However, a prediction accuracy of the secondary structure of >79% was obtained in CASP5 even when only the local contribution was considered. This shows that the local sequence environment of a residue substantially determines its secondary structure. It is possible that our method can be improved by considering long-range interaction. The SVMpsi method can also be improved by using larger training sets that contain new protein structure information, since the CB513 dataset used for the current SVMpsi was developed in 1999. It may require more memory to store data points while obtaining the optimal separating hyperplane. The prediction takes a long time if the ratio between the number of support vectors and the data points is large. The optimization of kernel parameters may become difficult due to the computing time. A remaining problem is to handle huge training datasets using the SVMs approach. The neural networks (NNs) approach suffers from the local minima, determination of appropriate structure of neural networks and too many parameters. Although SVMs also suffer from the kernel choice, we have shown that it is a comparable method for protein secondary structure prediction. This approach can be used for biologically important, relevant problems, such as prediction of solvent accessibility and disulfide bonding state and connectivity. Although the local information is already effectively implemented by the sliding window, it is important to consider the long-range interactions that are a major driving force underlying remote contact. Using the acquired information, it is possible to improve the accuracy of the prediction of protein tertiary structure. This will also be useful for studying protein folding processes.
![]() |
Acknowledgements |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Baker,D. and Sali,A. (2001) Science, 294, 9396.
Baldi,P., Brunak,S., Frasconi,P., Pollastri,G. and Soda,G. (1999) Bioinformatics, 15, 937946.
Burges,C.J.C. (1998) Data Min. Knowledge Discov., 2, 121167.[CrossRef][ISI]
Burges,C.J.C. and Scholkopf,B. (1997) In Mozer,M., Jordan,M. and Petsche,T. (eds), Advances in Neural Information Processing Systems, Vol. 9. MIT Press, Cambridge, MA, pp. 375381.
Cristianini,N. and Shawe-Taylor,J. (2000) Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge.
Cuff,J.A. and Barton,G.J. (1999) Proteins: Struct. Funct. Genet., 34, 508519.[CrossRef][ISI][Medline]
Cuff,J.A. and Barton,G.J. (2000) Proteins: Struct. Funct. Genet., 40, 502511.[CrossRef][ISI][Medline]
Ding,C.H.Q. and Dubchak,I. (2001) Bioinformatics, 17, 349358.[Abstract]
Frishman,D. and Argos,P. (1995) Proteins: Struct. Funct. Genet., 23, 566579.[ISI][Medline]
Frishman,D. and Argos,P. (1997) Proteins: Struct. Funct. Genet., 27, 329335.[CrossRef][ISI][Medline]
Heiler,M. (2002) Diploma Thesis, University of Mannheim.
Henikoff,S. and Henikoff,J.G. (1992) Proc. Natl Acad. Sci. USA, 89, 1091510919.[Abstract]
Hsu,C.W. and Lin,C.J. (2002) IEEE Trans. Neural Networks, 13, 415425.[CrossRef][ISI]
Hua,S.J. and Sun,Z.R. (2001) J. Mol. Biol., 308, 397407.[CrossRef][ISI][Medline]
Joachims,T. (1999) In Scholkopf,B., Burges,C. and Smola,A. (eds), Advances in Kernel Methods Support Vector Learning. MIT Press, Cambridge, MA, pp. 4156.
Jones,D.T. (1999) J. Mol. Biol., 292, 195202.[CrossRef][ISI][Medline]
Jones,D.T. and Swindells,M.B. (2002) Trends Biochem. Sci., 27, 161164.[CrossRef][ISI][Medline]
Jones,D.T., Taylor,W.R. and Thornton,J.M. (1994) Biochemistry, 33, 30383049.[ISI][Medline]
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[ISI][Medline]
King,R.D. and Sternberg,M.J.E. (1996) Protein Sci., 5, 22982310.
Matthews,B.W. (1975) Biochim. Biophys. Acta, 405, 442451.[ISI][Medline]
Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 10931108.[ISI][Medline]
Osuna,E., Freund,R. and Girosi,F. (1997) Technical Report AI Memo 1602, MIT A.I. Laboratory. MIT Press, Cambridge, MA.
Qian,N. and Sejnowski,T.J. (1988) J. Mol. Biol., 202, 865884.[ISI][Medline]
Richards,F.M. and Kundrot,C.E. (1988) Proteins: Struct. Funct. Genet., 3, 7184.[ISI][Medline]
Rost,B. and Sander,C. (1993) J. Mol. Biol., 232, 584599.[CrossRef][ISI][Medline]
Rost,B. and Sander,C. (1994) J. Mol. Biol., 235, 1326.[CrossRef][ISI][Medline]
Russell,R.B., Copley,R.R. and Barton,G.J. (1996) J. Mol. Biol., 259, 349365.[CrossRef][ISI][Medline]
Salamov,A.A. and Solovyev,V.V. (1995) J. Mol. Biol., 247, 1115.[CrossRef][ISI][Medline]
Vapnik,V. (1995) The Nature of Statistical Learning Theory. Springer, New York.
Vapnik,V. (1998) Statistical Learning Theory. Wiley, New York.
Zemla,A., Venclovas,C., Fidelis,K. and Rost,B. (1999) Proteins: Struct. Funct. Genet., 34, 220223.[CrossRef][ISI][Medline]
Received January 26, 2003; revised June 10, 2003; accepted June 23, 2003.