Tsukuba Life Science Center, The Institute of Physical and Chemical Research (RIKEN), 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: conformational parameter/prediction/topology/transmembrane helices
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Several methods have been proposed for the prediction of transmembrane helices in membrane proteins. They are based on several different algorithmshydrophobicity profiles (Esposti et al., 1990
; von Heijne, 1992
; Ponnuswamy and Gromiha, 1993
; Hirokawa et al., 1998
), neural networks (Rost et al., 1995
, 1996
; Casadio et al., 1996
; Lohmann et al., 1996
; Alloy et al., 1997), multiple alignment (Cserzo et al., 1994
; Persson and Argos, 1994
, 1997
), consensus procedure (Parodi et al., 1994
) and the dense alignment surface method (Cserzo et al., 1997
).
In our previous work, we have developed a set of conformation parameters for membrane spanning ß-strands and applied them successfully to the prediction of transmembrane ß-strands in bacterial porins (Gromiha et al., 1997). In this article, a set of conformational parameters for all 20 amino acid residues in the transmembrane
helices of membrane proteins was developed from the topology of 70 membrane proteins. A simple method was proposed for the prediction of transmembrane
helices based on the conformational parameters. This method identifies the membrane spanning regions of 70 membrane proteins to an accuracy of 97% and predicts all the transmembrane segments in three proteins with known three-dimensional structuresPRC from Rhodopseudomonas viridis, bacteriorhodopsin and cytochrome c oxidaseto an 86% level of accuracy, better than any other previously published method. This algorithm has been automated with a computer program written in FORTRAN; the predictive results are available from the author.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Three sets of data were used for the present study. First, a training set with 70 membrane proteins, whose topology is known experimentally. This data set was used to derive the conformational parameters. Each protein in this set contains from one to 12 membrane spanning segments, and the list of proteins, along with the number of transmembrane segments, is shown in Table I. The amino acid sequence and topology of the membrane spanning segments for all of the 70 membrane proteins were taken from the SWISSPROT database (Bairoch and Boeckmann, 1992
). Second, a test set containing the experimental data for three membrane proteinsthe photosynthetic reaction center (PRC) from R.viridis (Deisenhofer and Michel, 1989
), bacteriorhodopsin (Henderson et al., 1990
) and cytochrome c oxidase (Iwata et al., 1995
)whose three-dimensional structures are known at high resolution. The membrane spanning segments in these proteins were taken from the Protein Data Bank (PDB) of the Brookhaven National Laboratory (Bernstein et al., 1977
). Third, a set of 150 non-homologous globular proteins, whose structures have been determined at high resolution. This set of proteins has representatives from all of the protein structural classesall-
, all-ß,
+ß and
/ßand from proteins of varying size. The PDB codes of the proteins used in the present work are given in Table II
.
|
|
A conformational parameter set for all 20 amino acid residues has been developed as described below. The frequency of occurrence (%) of amino acid residues was computed in the transmembrane helical part (fm) of membrane proteins, followed by the occurrence (%) in the whole complex (ft). The conformational parameter is computed using the equation
![]() |
The set of 70 membrane proteins listed in Table I was used to derive the conformational parameters.
Prediction of transmembrane helices
Primary rule
Consider the amino acid sequence of a membrane protein. If the conformational parameter of a particular amino acid is 0.80 (average value obtained from the set of 70 membrane proteins), then the index of priority assigned to that residue is 1, and if the value is <0.8, the index is taken to be zero.
|
Here, i varies from 1 to N, where N is the total number of residues.
Secondary rules A set of secondary rules has been formulated to predict the transmembrane helical segments.
S1
Search for a continuous sequence of 18 points with higher priority (priority index = 1), with a maximum of three non-adjacent exceptions; pick up the segments and append the overlapping segments.
S2
Collect all four consecutive overlapping residues with each of the appended segments obtained from S1 (e.g., if the appended segment is 125, the overlapping four residues are 14, 25, . . ., 2225); check whether two zeros are present within any of these four-residue segments. If so, cut the segment with the high priority residue (priority index = 1) as the terminal one, and select the longer segment as the transmembrane helix.
S3
Longer segments (more than 40 residues) are divided into two segments; the terminal residues are fixed so that each segment contains a minimum number of zeros and sufficient number of residues (minimum of 18 residues) to be a transmembrane helix.
Accuracy of prediction The accuracy of predicted segments was computed using the equation
|
where N, Nu and No are the total number of residues, the number of residues underpredicted and the number of residues overpredicted in a particular protein, respectively.
![]() |
Results and discussions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The set of conformational parameters for all 20 amino acid residues are given in Table III. It can be seen that the residues Ala, Cys, Phe, Gly, Ile, Leu, Met, Ser, Thr, Val, Trp and Tyr are more prevalent in the transmembrane
helices. It is interesting to note that all of the aromatic residues prefer the transmembrane regions, consistent with the study of Sciffer et al. (1992) on the importance of tryptophan residues in membrane proteins. Also, proline is not a favored residue (
= 0.563) in the transmembrane regions, as indicated by studies of Deber et al. (1990) and of the
helices of globular proteins (Barlow and Thornton, 1988
; Gromiha and Ponnuswamy, 1995
). It is noteworthy that the polar residues, Ser and Thr, prefer the membrane region of transmembrane helical proteins despite only marginally higher
values (0.82 and 0.84, respectively) than average (0.80).
|
The present predictive method was applied to a set of 70 membrane proteins, which were used to derive the conformational parameters. The topology of all these proteins were known experimentally and the proteins within this set contain single and multiple membrane spanning segments traversing from two to 12 times. This method predicts the correct topology of 68 inner membrane proteins with experimentally determined topologies and correctly identifies 295 transmembrane segments with only two overpredictions.
Prediction of transmembrane helices in PRC, bacteriorhodopsin and cytochrome c oxidase
The predictive method was applied to three proteinsPRC, bacteriorhodopsin and cytochrome c oxidasewhose three-dimensional structures have been determined at high resolution. These proteins were not used to derive the conformational parameters and the jack-knife test determines the accuracy of the prediction.
As a working example, here the primary and secondary rules (explained in the methods section) are applied for the prediction of the transmembrane helices in the protein bacteriorhodopsin.
Primary rule Consider the first five residues in the amino acid sequence of bacteriorhodopsin, AQITG. The conformational parameters for these residues are 1.334, 0.343, 1.803, 0.838 and 0.998, respectively. Hence, as per the primary rule, the priority index values of the five residues are 1, 0, 1, 1 and 1, respectively. The computed priority index values for each residues in bacteriorhodopsin are given in Table IV(a).
Secondary rules Step 1 (search for segments with high priority values). First, rule S1 is applied to select the segments. The sequence was searched to find a stretch of 18 high priority residues with a maximum of three non-adjacent exceptions. The overlapping segments thus found926, 1027, ..., 1734were appended to give the first segment, 934. A continuous search identified the next segment, 4172. In a similar manner, the whole sequence was searched. The segments obtained from step 1 are given in Table IV(b) (column 1).
Step 2 (selection of the transmembrane segments). Consider the first segment 934; there are no two low priority values among any of the four consecutive residues in this segment. Hence, residues 934 were selected as the first segment. A similar pattern was observed for the segments 4172 and 176200. For segment 77102, two low priority values for residues 81 and 84 were present near the N-terminus and hence the segment was cut at residue 81 and the segment 82102 was selected; for segment 105162, two low priority residues were observed at positions 158 and 160, near the C-terminus, and it was therefore cut at residue 157 and the segment 105157 was selected as a transmembrane helix. Similarly, the segment 202223 was selected as there were two low priority residues at 224 and 226. The selected segments from step 2 are given in column 2 of Table IV(b).
Step 3 (division of longer segments). A longer segment 105157 of 53 residues was observed. The segment was divided into two segments in such a way that each of the segments contain a minimum number of low priority values and a sufficient number of residues (minimum of 18 residues) to be a transmembrane helix. Hence, the segments 105127 and 134157 were obtained.
The final predicted segments are given in Table IVb (last column).
In a similar way, the membrane spanning segments were predicted in the other two membrane proteins. The predicted transmembrane helical segments for all three proteins are shown in Table V, along with the experimentally-derived transmembrane segments, presented for comparison.
|
Testing the prediction of transmembrane-like helices in globular proteins
The present method was applied to a set of 150 soluble globular proteins to check whether this method predicts any transmembrane segments in these proteins. It correctly excluded 99% of the considered proteins to be of globular type. This result confirms that the present method excludes transmembrane-like helices present in globular proteins. This method was compared with three other methods, DAS (Cserzo et al., 1994), TMPRED (Rost et al., 1995
) and SOSUI (Yanagihara et al., 1989
), where the number of proteins correctly excluded were 125, 138 and 150, respectively.
Comparison with other methods
Recently, Nikiforovich (1998) proposed a non-statistical procedure for the prediction of transmembrane helical segments based on energy calculations and predicted the membrane spanning helices in PRC, bacteriorhodopsin and cytochrome c oxidase. The predicted results have been compared with four other recent methods, SOSUI (Yanagihara et al., 1989), DAS (Cserzo et al., 1994
), TMPRED (Rost et al., 1995
) and core (Nikiforovich, 1998
). The transmembrane
helices for all three proteins were predicted with the present method and the results compared with those of the five methods used by Nikiforovich (1998), together with the results of our earlier method, based on the hydrophobicity profile (Ponnuswamy and Gromiha, 1993
). The results, presented in Table VI
, show that the present method predicts all the transmembrane
helices in cytochrome c oxidase with a higher level of accuracy than all other methods, whereas the methods TMPRED (Rost et al., 1995
) and SIGNAL (Nikiforovich, 1998
) failed to identify one of the transmembrane segments. The method SIGNAL (Nikiforovich, 1998
) and SURHYD (Ponnuswamy and Gromiha, 1993
) have higher accuracies for the protein PRC. The higher accuracy of PRC by our previous method, SURHYD (Ponnuswamy and Gromiha, 1993
) may be due to the fact that the information was taken from this protein to compute the surrounding hydrophobicity scale and the same scale was used for predictive purposes. Also, by considering all three proteins on both counts, number of transmembrane segments correctly predicted and percentage accuracy of prediction, the prediction performance of the present method is satisfactorily high among other methods. Further, this method predicts the transmembrane
helices for all proteins in the set with an accuracy of >80% and an average accuracy of 86%.
|
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Availability of the program
The executable file, MICHELP, is available from the author and will be distributed upon request.
|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bairoch,A. and Boeckmann,B. (1992) Nucleic Acids Res., 20, 20192022.[ISI][Medline]
Barlow,D.J. and Thornton,J.M. (1988) J. Mol. Biol., 201, 601609.[ISI][Medline]
Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[ISI][Medline]
Casadio,R., Fariselli,P., Taroni,C. and Compiani,M. (1996) Eur. Biophys J., 24, 165178.[ISI][Medline]
Cserzo,M., Bernassau,J.-M., Simon,I. and Maigret,B. (1994) J. Mol. Biol., 243, 388396.[ISI][Medline]
Cserzo,M., Wallin,E., Simon,I., von Heijne,G. and Elofsson,A. (1997) Protein Engng, 10, 673676.[Abstract]
Deber,C.M., Glibowicka,M. and Woolley,G.A. (1990) Biopolymers, 29, 149157.[ISI][Medline]
Deisenhofer,J. and Michel,H. (1989) Science, 245, 14631473.[ISI]
Deisenhofer,J., Epp,O., Sinning,I. and Michel,H. (1985) J. Mol. Biol., 246, 429457.
Esposti,D.M., Crimi,M. and Venturoli,G. (1990) Eur. J. Biochem., 190, 207219.[Abstract]
Feher,G., Allen,J.P., Okamura,M.Y. and Rees,D.C. (1989) Nature, 339, 111116.[ISI]
Gromiha,M.M. and Ponnuswamy,P.K. (1995) Int. J. Pept. Protein Res., 45, 225240.[ISI][Medline]
Gromiha,M.M., Majumdar,R. and Ponnuswamy,P.K. (1997) Protein Engng, 10, 497500.[Abstract]
Henderson,R., Baldwin,J.M., Ceska,T.A, Zemlin,F., Beckmann,E. and Downing,K.H. (1990) J. Mol. Biol., 213, 899929.[ISI][Medline]
Hirokawa,T., Boon-chieng,S. and Mitaku,S. (1998) Bioinformatics, 14, 378379.[Abstract]
Iwata,S., Ostermeier,C., Ludwig,B. and Michel,H. (1995) Nature, 376, 660669.[ISI][Medline]
Jennings,M.L. (1989) Annu. Rev. Biochem., 58, 9991027.[ISI][Medline]
Lohmann,R., Schneider,G. and Wrede,P. (1996) Biopolymers, 38, 1329.[ISI][Medline]
Nikiforovich,G.V. (1998) Protein Engng, 11, 279283.[Abstract]
Parodi,L.A., Granatir,C.A. and Maggiora,G.M. (1994) CABIOS, 10, 527535.[Abstract]
Persson,B. and Argos,P. (1994) J. Mol. Biol., 237, 182192.[ISI][Medline]
Persson,B. and Argos,P. (1997) J. Protein Chem., 16, 453457.[ISI][Medline]
Ponnuswamy,P.K. and Gromiha,M.M. (1993) Int. J. Pept. Protein Res., 42, 326341.[ISI][Medline]
Rost,B., Casadio,R., Fariselli,P. and Sander,C. (1995) Protein Sci., 4, 521533.
Rost,B., Fariselli,P. and Casadio,R. (1996) Protein Sci., 5, 17041718.
Sciffer,M., Chang,C.-H. and Stewans,F.J. (1992) Protein Engng, 5, 213214.[Abstract]
Traxler,B., Boyd,D. and Beckwith,J. (1993) J. Memb. Biol., 132, 111.[ISI][Medline]
von Heijne,G. (1988) Biochim. Biophys Acta, 947, 307333.[ISI][Medline]
von Heijne,G. (1992) J. Mol. Biol., 225, 487494.[ISI][Medline]
Weiss,M.S., Kreush,A., Shiltz,E., Nestel,U., Welte,W., Weckesser,J. and Schulz,G.E. (1991) FEBS Lett., 280, 379382.[ISI][Medline]
Yanagihara,N., Suwa,M. and Mitaku,S. (1989) Biophys Chem., 34, 6977.[ISI][Medline]
Received August 14, 1998; revised February 4, 1999; accepted March 18, 1999.