Simple consensus procedures are effective and sufficient in secondary structure prediction

Mario Albrecht1,3, Silvio C.E. Tosatto2,3, Thomas Lengauer1 and Giorgio Valle2

1Max-Planck-Institute for Informatics, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany and 2CRIBI Biotechnology Center, University of Padova, Viale G. Colombo 3, 35121 Padova, Italy M.Albrecht and S.C.E.Tosatto contributed equally to this work.

3 To whom correspondence should be addressed. e-mail: mario.albrecht{at}mpi-sb.mpg.de; silvio{at}cribi.unipd.it


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
We have analyzed the performance of majority voting on minimal combination sets of three state-of-the-art secondary structure prediction methods in order to obtain a consensus prediction. Using three large benchmark sets from the EVA server, our results show a significant improvement in the average Q3 prediction accuracy of up to 1.5 percentage points by consensus formation. The application of an additional trivial filtering procedure for predicted secondary structure elements that are too short, does not significantly affect the prediction accuracy. Our analysis also provides valuable insight into the similarity of the results of the prediction methods that we combine as well as the higher confidence in consistently predicted secondary structure.

Keywords: consensus formation/protein structure prediction/secondary structure


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Recent improvements in the prediction accuracy have been accomplished not only by incorporating evolutionary information, but also by combining the results of single, independent secondary structure prediction methods into a consensus prediction (Rost, 2001Go). The original prediction server Jpred computed a consensus of prediction results simply by majority voting (Cuff et al., 1998Go). Minor method variations such as different weights added to the results did not lead to significantly higher prediction accuracies (Cuff and Barton, 1999Go; King et al., 2000Go). The Jpred server has been improved recently (Cuff and Barton, 2000Go) and now employs a complex combination of neural networks—a method that has also been applied successfully for consensus formation by other groups (Chandonia and Karplus, 1999Go; King et al., 2000Go; Petersen et al., 2000Go). Similar sophisticated approaches use multivariate linear regression (Guermeur et al., 1999Go) or decision trees (Selbig et al., 1999Go) trained for optimal method selection. Other method variants apply either cascaded multiple classifiers of secondary structure (Ouali and King, 2000Go) or a composite secondary structure assembled from the results of several methods (An and Friesner, 2002Go). The common feature of all these consensus approaches is the use of results from usually more than three secondary structure prediction methods.

We found that a set of only three state-of-the-art methods combined using majority voting is sufficient to achieve similar improvements in the prediction accuracy. This simple approach runs at low computational cost, but uses the currently best prediction servers.

In order to test our approach, we participated in the critical assessment of structure prediction, the CASP5 experiment of the year 2002 (Tramontano, 2003Go). We combined the prediction results of the three servers PSIPRED, SAM-T02 and SSpro2, which are based on different prediction approaches using neural networks and hidden Markov models. The three servers have shown top performance in former CASP experiments and the continuous automatic evaluation (EVA) of protein structure prediction servers (Eyrich et al., 2001Go; Rost and Eyrich, 2001Go) and have higher overall accuracy than older combination methods such as Jpred.

Astonishingly, our method significantly outperformed almost all other methods participating in CASP5 and reached the second rank below a manual expert submission according to the SOV score (Rost et al., 1994Go; Zemla et al., 1999Go), normalized with respect to the total number of all 78 target protein domain sequences (for details see http://www.russell.embl.de/casp5/). In particular, our method ranked first regarding the SOV accuracy measure for a subset of 21 target domains unrelated by sequence and with low sequence similarity to known protein structures. Regarding the alternative Q3 score (Rost and Eyrich, 2001Go), our combination method ranks first for the set of all targets and the subset of sequence unrelated targets.

Encouraged by the CASP5 results, we decided to investigate our approach on larger benchmark sets obtained from EVA. In particular, we show that our approach always improves the prediction accuracy over the best single method of the three methods combined to form the consensus. Comparing the frequencies of the occurrence of certain majority situations, we are able to draw interesting conclusions on the degree of similarity between results of single prediction methods and on the increased confidence in consistently predicted secondary structure.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Benchmark sets

In our evaluation, we used the three benchmark sets ‘common2’, ‘common5’ and ‘common6’ from 22 September 2002 with sequences of low identity, as provided by the EVA web site (http://cubic.bioc.columbia.edu/~eva/). The set ‘common2’ contains 121 sequences with 16 858 amino acids, ‘common5’ contains 214 sequences with 44 871 amino acids and ‘common6’ contains 539 sequences with 98 308 residues. Because not all methods have returned predictions for every sequence requested by EVA, not every benchmark set could be combined with the same three methods used for consensus computation (see footnote of Table I).

Consensus formation

For each benchmark set, three single methods of top performance in EVA are selected in order to compute the consensus secondary structure sequence by majority voting. Specifically, we used the results of the following seven prediction methods: PSIPRED (Jones, 1999Go; McGuffin et al., 2000Go), SAM-T99 (Karplus et al., 1998Go), SSpro1 (Baldi et al., 1999Go), SSpro2 (Pollastri et al., 2002Go), PHDpsi (Przybylski and Rost, 2002Go), PROFsec (Rost and Eyrich, 2001Go) and Jpred (Cuff and Barton, 2000Go).

Three cases need to be distinguished when forming the consensus sequence per amino acid according to the three possible secondary states {alpha}-helix (H), ß-strand (E) and other/loop (L). 3:0 votes means consistent prediction among all three methods. 2:1 votes result in the majority decision. The rare case of a tie 1:1:1 is resolved by assuming the L state. Each consensus sequence is annotated with a confidence array, which contains values ranging from 1 to 3 according to the maximum number of identical votes per residue.

Prediction accuracy

To determine the prediction accuracy, we compared the predicted consensus sequence with the true three-state sequence derived from the DSSP secondary structure assignment of known 3D structures (Kabsch and Sander, 1983Go). Each of the three possible states H, E and L results from the collapse transformation of the eight DSSP states per residue according to the following schema (Rost and Eyrich, 2001Go): {G, H, I} -> {alpha}-helix (H), {B, E} -> ß-strand (E), {S, T, ‘.’} -> other (L).

For each benchmark set, we computed average Q3 and SOV percentage values (Rost et al., 1994Go; Zemla et al., 1999Go; Rost and Eyrich, 2001Go) as well as the separate percentages QH, QE and QL of residues predicted correctly in the observed H, E and L states, respectively. For each accuracy measure, we calculated the standard error by dividing the standard deviation of the measure by the square root of the benchmark set size. Based on the assumption of a Gaussian distribution of the accuracy measures with similar standard deviations as known from observations, the accuracy difference between two distinct prediction methods is said to be statistically significant if it is larger than the standard error (Rost and Eyrich, 2001Go).


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Accuracy improvement

The results of the consensus formation by majority voting using three different benchmark sets are summarized in Table I. The comparison of our consensus approach with the respective best single method demonstrates that the total average Q3 accuracy is increased significantly (in the sense described above) by 1.45, 1.50 and 0.41 percentage points for all three sets ‘common2’, ‘common5’ and ‘common6’, respectively. In addition, the SOV measure is improved by 0.68 percentage points for the ‘common5’ set, while it does not change considerably for the other two sets. Table I also contains the results of the consensus prediction method Jpred as available for the sets ‘common2’ and ‘common5’, but its accuracy is generally clearly below those of other methods. For comparison, we included the results of PROFsec, another top-performing single prediction method, in Table I for the ‘common2’ set. Its Q3 prediction accuracy shows a significantly lower performance of 1.27–1.57 percentage points in contrast to the very similar consensus results of any three single methods combined out of the four available methods, PSIPRED, SAM-T99, SSpro and PROFsec.


View this table:
[in this window]
[in a new window]
 
Table I. Results of the consensus secondary structure prediction for the benchmark sets (a) ‘common2’, (b) ‘common5’ and (c) ‘common6’
 
If one compares the accuracy measures of each of the methods that have been combined to form the consensus, based on a separation according to the true H, E and L states observed, it appears that the consensus formation always improves the Q3 value of the L state class by 0.51–1.55 percentage points. This finding could mean that single methods tend to under-predict the L state.

Filtering of prediction results

We found that the application of a trivial filtering procedure that eliminates {alpha}-helices and ß-strands that are too short, generally neither deteriorates nor ameliorates the prediction accuracy significantly, be it before and/or after the consensus formation (supplementary data is available at Protein Engineering online). This kind of structural filtering can be employed without disadvantages in order to clean up the secondary structure predictions before further processing.

Frequency of majority situations

The additional analysis of the overall frequency of the three possible types of majority situations 3:0, 2:1 and tie 1:1:1 uncovers that the problematic case of a tie with each of the three single methods predicting a different secondary structure occurs in at most 1% of all cases (Table II). Thus, the tie case can be neglected when applying our consensus approach.


View this table:
[in this window]
[in a new window]
 
Table II. Overall frequencies of the three types of majority situations for the three benchmark sets (%)
 
In contrast, 3:0 consistency appears approximately three times as often as the 2:1 majority. Here, some methods resemble each other more than others. For instance, based on the benchmark set ‘common6’, PROFsec is much more similar to PHDpsi than to PSIPRED: the pair (PROFsec, PHDpsi) has a higher 2:1 frequency of 12.6% than the pair (PSIPRED, PROFsec) and the pair (PSIPRED, PHDpsi) with 6.2 and 3.2%, respectively. In contrast, the 2:1 frequencies of the three methods employed for the sets ‘common2’ and ‘common5’ are not far from each other. Thus, the respective three methods that are combined seem to be equally dissimilar to each other. Together with the higher improvement in prediction accuracy observed for the same two benchmark sets compared with the ‘common6’ set, the rule of thumb may be deduced that the best performance of our approach can be obtained by combining three single methods of top prediction accuracy with approximately equal dissimilarity of their results.

Prediction confidence

We also verified the intuitive expectation that the confidence in the correctness of the prediction is increased by consensus formation. We found that the Q3 and SOV values computed solely for secondary structure states that are consistently predicted by all three methods are much higher than overall values with an increase of 6.32–6.62 and 5.06–5.46 percentage points for Q3 and SOV, respectively (Table III). Similar results are obtained after separating the Q3 value into the three secondary structure classes: QH and QE are increased by 4.12–6.03 and 2.88–4.31 percentage points, respectively, while QL is increased on average by 5.83–6.20 percentage points.


View this table:
[in this window]
[in a new window]
 
Table III. Improvement of the Q3, QH, QE, QL and SOV measures (%) if their computation is restricted to secondary structure states that are consistently predicted by all three methods for the benchmark sets (a) ‘common2’, (b) ‘common5’ and (c) ‘common6’
 
Conclusions

In summary, we discovered that a simple consensus approach based on the majority voting of solely three prediction methods can be superior to each of the three methods as well as to complex combinations of more than three single prediction methods as employed in Jpred. Our method was proven to work with distinct combinations of different prediction methods on large benchmark sets. Presumably, the success of the method is mainly due to the use of three of the currently best single methods and the noise-filtering properties of a consensus approach, which helps to ignore the training errors of single methods. We believe that any three state-of-the-art prediction methods can be used for the consensus. The method is less expensive computationally than other consensus approaches and has the advantage of not requiring the calibration of involved parameters.


    Acknowledgements
 
The authors are grateful to Alessandro Cestaro and Stefano Toppo for assistance during the CASP5 experiment. Part of the research by M.A. and T.L. has been funded by the Deutsche Forschungsgemeinschaft (DFG) under contract no. LE 491/14-1. S.T. and G.V. are supported by a Telethon (Italy) grant, no. B057-I.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
An,Y. and Friesner,R.A. (2002) Proteins, 48, 352–366.[CrossRef][ISI][Medline]

Baldi,P., Brunak,S., Frasconi,P., Soda,G. and Pollastri,G. (1999) Bioinformatics, 15, 937–946.[Abstract/Free Full Text]

Chandonia,J.M. and Karplus,M. (1999) Proteins, 35, 293–306.[CrossRef][ISI][Medline]

Cuff,J.A. and Barton,G.J. (1999) Proteins, 34, 508–519.[CrossRef][ISI][Medline]

Cuff,J.A. and Barton,G.J. (2000) Proteins, 40, 502–511.[CrossRef][ISI][Medline]

Cuff,J.A., Clamp,M.E., Siddiqui,A.S., Finlay,M. and Barton,G.J. (1998) Bioinformatics, 14, 892–893.[Abstract]

Eyrich,V.A., Marti-Renom,M.A., Przybylski,D., Madhusudhan,M.S., Fiser,A., Pazos,F., Valencia,A., Sali,A. and Rost,B. (2001) Bioinformatics, 17, 1242–1243.[Abstract/Free Full Text]

Guermeur,Y., Geourjon,C., Gallinari,P. and Deléage,G. (1999) Bioinformatics, 15, 413–421.[Abstract/Free Full Text]

Jones,D.T. (1999) J. Mol. Biol., 292, 195–202.[CrossRef][ISI][Medline]

Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577–2637.[ISI][Medline]

Karplus,K., Barrett,C. and Hughey,R. (1998) Bioinformatics, 14, 846–856.[Abstract]

King,R.D., Ouali,M., Strong,A.T., Aly,A., Elmaghraby,A., Kantardzic,M. and Page,D. (2000) Protein Eng., 13, 15–19.[Abstract/Free Full Text]

McGuffin,L.J., Bryson,K. and Jones,D.T. (2000) Bioinformatics, 16, 404–405.[Abstract]

Ouali,M. and King,R.D. (2000) Protein Sci., 9, 1162–1176.[Abstract]

Petersen,T.N., Lundegaard,C., Nielsen,M., Bohr,H., Bohr,J., Brunak,S., Gippert,G.P. and Lund,O. (2000) Proteins, 41, 17–20.[CrossRef][ISI][Medline]

Pollastri,G., Przybylski,D., Rost,B. and Baldi,P. (2002) Proteins, 47, 228–235.[CrossRef][ISI][Medline]

Przybylski,D. and Rost,B. (2002) Proteins, 46, 197–205.[CrossRef][ISI][Medline]

Rost,B. (2001) J. Struct. Biol., 134, 204–218.[ISI][Medline]

Rost,B. and Eyrich,V.A. (2001) Proteins, 45, Suppl. 5, 192–199.[CrossRef]

Rost,B., Sander,C. and Schneider,R. (1994) J. Mol. Biol., 235, 13–26.[CrossRef][ISI][Medline]

Selbig,J., Mevissen,T. and Lengauer,T. (1999) Bioinformatics, 15, 1039–1046.[Abstract/Free Full Text]

Tramontano,A. (2003) Nature Struct. Biol., 10, 87–90.[CrossRef][ISI][Medline]

Zemla,A., Venclovas,C., Fidelis,K. and Rost,B. (1999) Proteins, 34, 220–223.[CrossRef][ISI][Medline]

Received March 7, 2003; revised May 24, 2003; accepted June 6, 2003.