A simple method for predicting transmembrane {alpha} helices with better accuracy

M. Michael Gromiha

Tsukuba Life Science Center, The Institute of Physical and Chemical Research (RIKEN), 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussions
 Conclusions
 References
 
The prediction of a protein's structure from its amino acid sequence has been a long-standing goal of molecular biology. In this work, a new set of conformational parameters for membrane spanning {alpha} helices was developed using the information from the topology of 70 membrane proteins. Based on these conformational parameters, a simple algorithm has been formulated to predict the transmembrane {alpha} helices in membrane proteins. A FORTRAN program has been developed which takes the amino acid sequence as input and gives the predicted transmembrane {alpha}-helices as output. The present method correctly identifies 295 transmembrane helical segments in 70 membrane proteins with only two overpredictions. Furthermore, this method predicts all 45 transmembrane helices in the photosynthetic reaction center, bacteriorhodopsin and cytochrome c oxidase to an 86% level of accuracy and so is better than all other methods published to date.

Keywords: conformational parameter/prediction/topology/transmembrane {alpha} helices


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussions
 Conclusions
 References
 
Membrane proteins are important for a broad range of processes and functions in biological systems: for example, signal recognition, transport phenomena, energy translocation and conservation in the living cell (von Heijne, 1988Go; Jennings, 1989Go; Traxler et al., 1993Go). While the number of published amino acid sequences of membrane proteins is growing exponentially, there are few three-dimensional structures known to date; examples include the photosynthetic reaction center (Deisenhofer et al., 1985Go; Deisenhofer and Michel, 1989Go; Feher et al., 1989Go), bacteriorhodopsin (Henderson et al., 1990Go), porin (Weiss et al., 1991Go) and cytochrome c oxidase (Iwata et al., 1995Go). Hence, efficient predictive methods can be of help in the modelling of the structures of membrane proteins starting from the amino acid sequence, as obtained from DNA recombinant analysis.

Several methods have been proposed for the prediction of transmembrane {alpha} helices in membrane proteins. They are based on several different algorithms—hydrophobicity profiles (Esposti et al., 1990Go; von Heijne, 1992Go; Ponnuswamy and Gromiha, 1993Go; Hirokawa et al., 1998Go), neural networks (Rost et al., 1995Go, 1996Go; Casadio et al., 1996Go; Lohmann et al., 1996Go; Alloy et al., 1997), multiple alignment (Cserzo et al., 1994Go; Persson and Argos, 1994Go, 1997Go), consensus procedure (Parodi et al., 1994Go) and the dense alignment surface method (Cserzo et al., 1997Go).

In our previous work, we have developed a set of conformation parameters for membrane spanning ß-strands and applied them successfully to the prediction of transmembrane ß-strands in bacterial porins (Gromiha et al., 1997Go). In this article, a set of conformational parameters for all 20 amino acid residues in the transmembrane {alpha} helices of membrane proteins was developed from the topology of 70 membrane proteins. A simple method was proposed for the prediction of transmembrane {alpha} helices based on the conformational parameters. This method identifies the membrane spanning regions of 70 membrane proteins to an accuracy of 97% and predicts all the transmembrane segments in three proteins with known three-dimensional structures—PRC from Rhodopseudomonas viridis, bacteriorhodopsin and cytochrome c oxidase—to an 86% level of accuracy, better than any other previously published method. This algorithm has been automated with a computer program written in FORTRAN; the predictive results are available from the author.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussions
 Conclusions
 References
 
Databases

Three sets of data were used for the present study. First, a training set with 70 membrane proteins, whose topology is known experimentally. This data set was used to derive the conformational parameters. Each protein in this set contains from one to 12 membrane spanning segments, and the list of proteins, along with the number of transmembrane segments, is shown in Table IGo. The amino acid sequence and topology of the membrane spanning segments for all of the 70 membrane proteins were taken from the SWISSPROT database (Bairoch and Boeckmann, 1992Go). Second, a test set containing the experimental data for three membrane proteins—the photosynthetic reaction center (PRC) from R.viridis (Deisenhofer and Michel, 1989Go), bacteriorhodopsin (Henderson et al., 1990Go) and cytochrome c oxidase (Iwata et al., 1995Go)—whose three-dimensional structures are known at high resolution. The membrane spanning segments in these proteins were taken from the Protein Data Bank (PDB) of the Brookhaven National Laboratory (Bernstein et al., 1977Go). Third, a set of 150 non-homologous globular proteins, whose structures have been determined at high resolution. This set of proteins has representatives from all of the protein structural classes—all-{alpha}, all-ß, {alpha}+ß and {alpha}/ß—and from proteins of varying size. The PDB codes of the proteins used in the present work are given in Table IIGo.


View this table:
[in this window]
[in a new window]
 
Table I. Training set of proteins used in the present studya
 

View this table:
[in this window]
[in a new window]
 
Table II. PDB codes for the set of globular proteins used in the present study
 
Development of conformational parameters

A conformational parameter set for all 20 amino acid residues has been developed as described below. The frequency of occurrence (%) of amino acid residues was computed in the transmembrane helical part (fm) of membrane proteins, followed by the occurrence (%) in the whole complex (ft). The conformational parameter is computed using the equation


The set of 70 membrane proteins listed in Table IGo was used to derive the conformational parameters.

Prediction of transmembrane {alpha} helices

Primary rule Consider the amino acid sequence of a membrane protein. If the conformational parameter of a particular amino acid is >=0.80 (average value obtained from the set of 70 membrane proteins), then the index of priority assigned to that residue is 1, and if the value is <0.8, the index is taken to be zero.


Here, i varies from 1 to N, where N is the total number of residues.

Secondary rules A set of secondary rules has been formulated to predict the transmembrane helical segments.

S1
Search for a continuous sequence of 18 points with higher priority (priority index = 1), with a maximum of three non-adjacent exceptions; pick up the segments and append the overlapping segments.

S2
Collect all four consecutive overlapping residues with each of the appended segments obtained from S1 (e.g., if the appended segment is 1–25, the overlapping four residues are 1–4, 2–5, . . ., 22–25); check whether two zeros are present within any of these four-residue segments. If so, cut the segment with the high priority residue (priority index = 1) as the terminal one, and select the longer segment as the transmembrane helix.

S3
Longer segments (more than 40 residues) are divided into two segments; the terminal residues are fixed so that each segment contains a minimum number of zeros and sufficient number of residues (minimum of 18 residues) to be a transmembrane helix.

Accuracy of prediction The accuracy of predicted segments was computed using the equation


where N, Nu and No are the total number of residues, the number of residues underpredicted and the number of residues overpredicted in a particular protein, respectively.


    Results and discussions
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussions
 Conclusions
 References
 
Conformational parameter set for membrane spanning {alpha} helices

The set of conformational parameters for all 20 amino acid residues are given in Table IIIGo. It can be seen that the residues Ala, Cys, Phe, Gly, Ile, Leu, Met, Ser, Thr, Val, Trp and Tyr are more prevalent in the transmembrane {alpha} helices. It is interesting to note that all of the aromatic residues prefer the transmembrane regions, consistent with the study of Sciffer et al. (1992) on the importance of tryptophan residues in membrane proteins. Also, proline is not a favored residue ({alpha} = 0.563) in the transmembrane regions, as indicated by studies of Deber et al. (1990) and of the {alpha} helices of globular proteins (Barlow and Thornton, 1988Go; Gromiha and Ponnuswamy, 1995Go). It is noteworthy that the polar residues, Ser and Thr, prefer the membrane region of transmembrane helical proteins despite only marginally higher {alpha} values (0.82 and 0.84, respectively) than average (0.80).


View this table:
[in this window]
[in a new window]
 
Table III. Conformational parameters for all 20 amino acid residues in the transmembrane {alpha} helices of inner membrane proteins
 
Prediction of membrane spanning helices in membrane proteins

The present predictive method was applied to a set of 70 membrane proteins, which were used to derive the conformational parameters. The topology of all these proteins were known experimentally and the proteins within this set contain single and multiple membrane spanning segments traversing from two to 12 times. This method predicts the correct topology of 68 inner membrane proteins with experimentally determined topologies and correctly identifies 295 transmembrane segments with only two overpredictions.

Prediction of transmembrane helices in PRC, bacteriorhodopsin and cytochrome c oxidase

The predictive method was applied to three proteins—PRC, bacteriorhodopsin and cytochrome c oxidase—whose three-dimensional structures have been determined at high resolution. These proteins were not used to derive the conformational parameters and the jack-knife test determines the accuracy of the prediction.

As a working example, here the primary and secondary rules (explained in the methods section) are applied for the prediction of the transmembrane helices in the protein bacteriorhodopsin.

Primary rule Consider the first five residues in the amino acid sequence of bacteriorhodopsin, AQITG. The conformational parameters for these residues are 1.334, 0.343, 1.803, 0.838 and 0.998, respectively. Hence, as per the primary rule, the priority index values of the five residues are 1, 0, 1, 1 and 1, respectively. The computed priority index values for each residues in bacteriorhodopsin are given in Table IV(a).

Secondary rules Step 1 (search for segments with high priority values). First, rule S1 is applied to select the segments. The sequence was searched to find a stretch of 18 high priority residues with a maximum of three non-adjacent exceptions. The overlapping segments thus found—9–26, 10–27, ..., 17–34—were appended to give the first segment, 9–34. A continuous search identified the next segment, 41–72. In a similar manner, the whole sequence was searched. The segments obtained from step 1 are given in Table IV(b) (column 1).

Step 2 (selection of the transmembrane segments). Consider the first segment 9–34; there are no two low priority values among any of the four consecutive residues in this segment. Hence, residues 9–34 were selected as the first segment. A similar pattern was observed for the segments 41–72 and 176–200. For segment 77–102, two low priority values for residues 81 and 84 were present near the N-terminus and hence the segment was cut at residue 81 and the segment 82–102 was selected; for segment 105–162, two low priority residues were observed at positions 158 and 160, near the C-terminus, and it was therefore cut at residue 157 and the segment 105–157 was selected as a transmembrane helix. Similarly, the segment 202–223 was selected as there were two low priority residues at 224 and 226. The selected segments from step 2 are given in column 2 of Table IV(b).

Step 3 (division of longer segments). A longer segment 105–157 of 53 residues was observed. The segment was divided into two segments in such a way that each of the segments contain a minimum number of low priority values and a sufficient number of residues (minimum of 18 residues) to be a transmembrane helix. Hence, the segments 105–127 and 134–157 were obtained.

The final predicted segments are given in Table IVb (last column).

In a similar way, the membrane spanning segments were predicted in the other two membrane proteins. The predicted transmembrane helical segments for all three proteins are shown in Table VGo, along with the experimentally-derived transmembrane segments, presented for comparison.


View this table:
[in this window]
[in a new window]
 
Table V. Prediction of transmembrane {alpha} helices in PRC, bacteriorhodopsin and cytochrome c oxidase
 
From Table VGo, it can be seen that the present method identifies all of the 45 transmembrane segments, and predicts 2026 residues correctly—that is, a predictive accuracy of 86%.

Testing the prediction of transmembrane-like helices in globular proteins

The present method was applied to a set of 150 soluble globular proteins to check whether this method predicts any transmembrane segments in these proteins. It correctly excluded 99% of the considered proteins to be of globular type. This result confirms that the present method excludes transmembrane-like helices present in globular proteins. This method was compared with three other methods, DAS (Cserzo et al., 1994Go), TMPRED (Rost et al., 1995Go) and SOSUI (Yanagihara et al., 1989Go), where the number of proteins correctly excluded were 125, 138 and 150, respectively.

Comparison with other methods

Recently, Nikiforovich (1998) proposed a non-statistical procedure for the prediction of transmembrane helical segments based on energy calculations and predicted the membrane spanning helices in PRC, bacteriorhodopsin and cytochrome c oxidase. The predicted results have been compared with four other recent methods, SOSUI (Yanagihara et al., 1989Go), DAS (Cserzo et al., 1994Go), TMPRED (Rost et al., 1995Go) and core (Nikiforovich, 1998Go). The transmembrane {alpha} helices for all three proteins were predicted with the present method and the results compared with those of the five methods used by Nikiforovich (1998), together with the results of our earlier method, based on the hydrophobicity profile (Ponnuswamy and Gromiha, 1993Go). The results, presented in Table VIGo, show that the present method predicts all the transmembrane {alpha} helices in cytochrome c oxidase with a higher level of accuracy than all other methods, whereas the methods TMPRED (Rost et al., 1995Go) and SIGNAL (Nikiforovich, 1998Go) failed to identify one of the transmembrane segments. The method SIGNAL (Nikiforovich, 1998Go) and SURHYD (Ponnuswamy and Gromiha, 1993Go) have higher accuracies for the protein PRC. The higher accuracy of PRC by our previous method, SURHYD (Ponnuswamy and Gromiha, 1993Go) may be due to the fact that the information was taken from this protein to compute the surrounding hydrophobicity scale and the same scale was used for predictive purposes. Also, by considering all three proteins on both counts, number of transmembrane segments correctly predicted and percentage accuracy of prediction, the prediction performance of the present method is satisfactorily high among other methods. Further, this method predicts the transmembrane {alpha} helices for all proteins in the set with an accuracy of >80% and an average accuracy of 86%.


View this table:
[in this window]
[in a new window]
 
Table VI. Comparison of predictive ability (%) of six other methods with the present method for PRC, bacteriorhodopsin and cytochrome c oxidase
 

    Conclusions
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussions
 Conclusions
 References
 
In this paper, a new set of conformational parameters for membrane spanning {alpha} helices was developed from the information about the topology of 70 membrane proteins. Then a primary rule was proposed to predict the transmembrane {alpha} helices of inner membrane proteins based on the application of conformational parameters. Based on the results, a set of secondary rules was proposed to extract the segments. The present method identified the membrane spanning helices in 70 membrane proteins with an accuracy of 97%. Furthermore, this method predicts all of the transmembrane {alpha} helices with an accuracy >80% for the proteins PRC, bacteriorhodopsin and cytochrome c oxidase individually and with an overall accuracy of 86%. These accuracy levels are superior to other methods published to date.

Availability of the program

The executable file, MICHELP, is available from the author and will be distributed upon request.


View this table:
[in this window]
[in a new window]
 
Table IV.
 

    Acknowledgments
 
The author wishes to thank Dr V.T.Jacob for critical reading of the manuscript.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussions
 Conclusions
 References
 
Aloy,P., Cedano,J., Oliva,B., Aviles,F.X. and Querol,E. (1997) CABIOS, 13, 231–234.[Abstract]

Bairoch,A. and Boeckmann,B. (1992) Nucleic Acids Res., 20, 2019–2022.[ISI][Medline]

Barlow,D.J. and Thornton,J.M. (1988) J. Mol. Biol., 201, 601–609.[ISI][Medline]

Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535–542.[ISI][Medline]

Casadio,R., Fariselli,P., Taroni,C. and Compiani,M. (1996) Eur. Biophys J., 24, 165–178.[ISI][Medline]

Cserzo,M., Bernassau,J.-M., Simon,I. and Maigret,B. (1994) J. Mol. Biol., 243, 388–396.[ISI][Medline]

Cserzo,M., Wallin,E., Simon,I., von Heijne,G. and Elofsson,A. (1997) Protein Engng, 10, 673–676.[Abstract]

Deber,C.M., Glibowicka,M. and Woolley,G.A. (1990) Biopolymers, 29, 149–157.[ISI][Medline]

Deisenhofer,J. and Michel,H. (1989) Science, 245, 1463–1473.[ISI]

Deisenhofer,J., Epp,O., Sinning,I. and Michel,H. (1985) J. Mol. Biol., 246, 429–457.

Esposti,D.M., Crimi,M. and Venturoli,G. (1990) Eur. J. Biochem., 190, 207–219.[Abstract]

Feher,G., Allen,J.P., Okamura,M.Y. and Rees,D.C. (1989) Nature, 339, 111–116.[ISI]

Gromiha,M.M. and Ponnuswamy,P.K. (1995) Int. J. Pept. Protein Res., 45, 225–240.[ISI][Medline]

Gromiha,M.M., Majumdar,R. and Ponnuswamy,P.K. (1997) Protein Engng, 10, 497–500.[Abstract]

Henderson,R., Baldwin,J.M., Ceska,T.A, Zemlin,F., Beckmann,E. and Downing,K.H. (1990) J. Mol. Biol., 213, 899–929.[ISI][Medline]

Hirokawa,T., Boon-chieng,S. and Mitaku,S. (1998) Bioinformatics, 14, 378–379.[Abstract]

Iwata,S., Ostermeier,C., Ludwig,B. and Michel,H. (1995) Nature, 376, 660–669.[ISI][Medline]

Jennings,M.L. (1989) Annu. Rev. Biochem., 58, 999–1027.[ISI][Medline]

Lohmann,R., Schneider,G. and Wrede,P. (1996) Biopolymers, 38, 13–29.[ISI][Medline]

Nikiforovich,G.V. (1998) Protein Engng, 11, 279–283.[Abstract]

Parodi,L.A., Granatir,C.A. and Maggiora,G.M. (1994) CABIOS, 10, 527–535.[Abstract]

Persson,B. and Argos,P. (1994) J. Mol. Biol., 237, 182–192.[ISI][Medline]

Persson,B. and Argos,P. (1997) J. Protein Chem., 16, 453–457.[ISI][Medline]

Ponnuswamy,P.K. and Gromiha,M.M. (1993) Int. J. Pept. Protein Res., 42, 326–341.[ISI][Medline]

Rost,B., Casadio,R., Fariselli,P. and Sander,C. (1995) Protein Sci., 4, 521–533.[Abstract/Free Full Text]

Rost,B., Fariselli,P. and Casadio,R. (1996) Protein Sci., 5, 1704–1718.[Abstract/Free Full Text]

Sciffer,M., Chang,C.-H. and Stewans,F.J. (1992) Protein Engng, 5, 213–214.[Abstract]

Traxler,B., Boyd,D. and Beckwith,J. (1993) J. Memb. Biol., 132, 1–11.[ISI][Medline]

von Heijne,G. (1988) Biochim. Biophys Acta, 947, 307–333.[ISI][Medline]

von Heijne,G. (1992) J. Mol. Biol., 225, 487–494.[ISI][Medline]

Weiss,M.S., Kreush,A., Shiltz,E., Nestel,U., Welte,W., Weckesser,J. and Schulz,G.E. (1991) FEBS Lett., 280, 379–382.[ISI][Medline]

Yanagihara,N., Suwa,M. and Mitaku,S. (1989) Biophys Chem., 34, 69–77.[ISI][Medline]

Received August 14, 1998; revised February 4, 1999; accepted March 18, 1999.