Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK 1Present address: Structural Genomics Consortium, University of Oxford, Botnar Research Centre, Oxford OX3 7LD, UK
2 To whom correspondence should be addressed. E-mail: mark{at}biop.ox.ac.uk
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: channel/membrane protein/pore/prediction/secondary structure/transmembrane helix
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Membrane proteins seem to exhibit a more restricted range of folds than their water-soluble counterparts, making them more amenable to structural predictions (Popot and Engelman, 2000; Ubarretxena-Belandia and Engelman, 2001
). Only two overall fold classes have been observed for membrane proteins: the
-helix bundle and the ß-barrel. In this paper, we are concerned solely with
-helical membrane proteins. ß-Barrel membrane proteins are restricted to bacterial outer membrane proteins and their relatives (Koebnik et al., 2000
).
-Helical membrane proteins contain one or more transmembrane (TM) helices which consist predominantly of hydrophobic amino acids (Engelman et al., 1980
, 1986
). These TM helices are generally independently stable in a membrane or membrane-like environment. This resulted in the formulation of the two-state model of membrane protein folding (Popot and Engelman, 1990
, 2000
), which recently has been modified to include a third state to allow for the incorporation of non TM helix elements (Engelman et al., 2003
). Furthermore, structural studies on proteins involved in aiding the insertion of proteins into membranes (van den Berg et al., 2003
) have refined our models of membrane protein folding in vivo. However, the requirement for TM helices to behave as independent folding units seems to hold and underlies methods for predicting the location of TM helices within membrane protein sequences.
At an early stage it was recognized that TM helices could be identified by examination of the amino acid sequences of membrane proteins (Deber et al., 1986; Engelman et al., 1986
; Landolt-Marticorena et al., 1993
), in particular by searching for
20 amino acid stretches with a predominance of hydrophobic amino acids (Kyte and Doolittle, 1982
). Subsequent improvements in our understanding of amino acid propensities in membrane environments (Li and Deber, 1992
; Deber and Goto, 1996
) and the results of structural bioinformatics analyses of membrane proteins have led to refinement of this simple rule. For example, the amphipathic aromatic amino acids Trp and Tyr prefer to be located at the interface between the membrane and solution (aromatic belts) (Landolt-Marticorena et al., 1993
; Arkin and Brunger, 1998
; Ulmschneider and Sansom, 2001
). Arg and Lys have an increased abundance in the cytoplasmic loops of bacterial membrane proteins (von Heijne, 1992
). By combining analysis of known membrane protein structures (Ulmschneider and Sansom, 2001
) with experimental studies of membrane proteins and of model peptides (Killian and von Heijne, 2000
), it has been possible to recognize such structural features that stabilize a TM
-helical conformation. Such observations are embedded in most TM helix prediction methods.
The early recognition of the importance of hydrophobic -helices has resulted in a plethora of methods for prediction of TM helix topology based on analysis of membrane protein sequences (Möller et al., 2001
). Early methods used simple hydrophobicity scales (Kyte and Doolittle, 1982
; Eisenberg, 1984
) to search for potential membrane-spanning helices. Refinements of such methods included consideration of additional TM helix sequence patterns such as the positive-inside rule (von Heijne, 1992
). More recent methods use a variety of pattern matching methods (e.g. neural nets and hidden Markov models) to search for TM helices based on sequences of such elements in membrane proteins of known structure. There are currently >12 different programs available for predicting the location of TM helices within a membrane protein sequence. These include ALOM2 (Nakai and Kanehisa, 1992
), TMPRED (Hofmann and Stoffel, 1993
), DAS (Cserzo et al., 1997
), HMMTOP2 (Tusnády and Simon, 1998
, 2001
), TMHMM2 (Krogh et al., 2001
), PHD (Rost et al., 1996
), TMAP (Persson and Argos, 1997
), TOPPRED2 (Claros and von Heijne, 1994
), MEMSAT (Jones et al., 1994
), MPEX (White and Wimley, 1999
), SPLIT4 (Juretic et al., 2002
) and TM-FINDER (Deber et al., 2001
).
Until recently, evaluations of TM helix prediction algorithms (Möller et al., 2001; Ikeda et al., 2002
) have used low-resolution data sets (e.g. experimentally confirmed TM topologies). A recent comparison with high-resolution structural data sets has been performed (Jayasinghe et al., 2001
; Chen et al., 2002
). In this paper we compare and evaluate different TM helix prediction methods with respect to the expanded dataset of crystal structures of integral membrane proteins. We also develop a consensus strategy and analyse cases of prediction difficulty. In particular, we analyse difficult predictions and indicate how prediction comparison/consensus may enhance our understanding of these proteins. We illustrate this via application to the much debated structure of a bacterial voltage-gated K channel, KvAP (Jiang et al., 2003
).
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Two datasets of TM helices in membrane proteins were used: one redundant and one non-redundant (see Table I and above). The non-redundant dataset excluded homologues of a given membrane protein from different organisms [with the exception of fumarate reductase which has a different topology in Escherichia coli from that found in Wolinella succinogenes (Iverson et al., 1999; Lancaster et al., 1999
)]. Thus, proteins were removed if their sequences had an identity of
30% following pairwise sequence alignment using Blast2. For example, GlpF has 30% sequence identity with bovine Aqp1. In such cases, the protein with the larger number of TM helices was selected. If the number of TM helices was the same in the two proteins then the higher resolution structure was retained.
|
TM helices within membrane protein structures were defined using DSSP (Kabsch and Sander, 1983). TM helices may be interrupted by a turn or kink in the helix, e.g. due to the presence of an intra-helical proline (Cordes et al., 2002
). For example, a break is observed in TM4 of the Ca ATPase (pdb code 1EUL). If the break or kink was not extensive (i.e. not greater than about three residues as determined by visual inspection), then the two helical segments were considered to form a single TM helix. All TM helices were visually inspected to check the automatic TM helix assignment. Using this method we consider the full extent of each TM helix, including residues which may reside outside the (presumed) limits of the lipid bilayer. This approach was adopted because any attempt to define simply the bilayer spanning element of a TM helix is contingent upon the model used to assign the bilayer spanning element, given the absence of lipid molecules from the majority of crystals of membrane proteins.
TM helix prediction
Locally running versions of the following TM prediction methods were installed: ALOM2 (Nakai and Kanehisa, 1992), DAS (Cserzo et al., 1997
), HMMTOP2 (Tusnády and Simon, 1998
, 2001
), MEMSAT 1.5 (Jones et al., 1994
), MEMSAT2 (Jones, 1998
), MPEX (White and Wimley, 1999
), PHD (Rost et al., 1996
), SPLIT4 (Juretic et al., 2002
), TMAP (Persson and Argos, 1997
), TM-FINDER (Deber et al., 2001
), TMHMM2 (Krogh et al., 2001
), TMPRED (Hofmann and Stoffel, 1993
) and TOPPRED2 (Claros and von Heijne, 1994
).
Consensus predictions
A consensus prediction for each sequence was calculated using a simple majority vote type procedure. If 9 methods predict a residue as in a TM helix it is assigned an H in the consensus. If between 6 and 8 methods (i.e. a majority) predict a TM residue then an h is assigned. If <6 (but >1) methods predict a TM helix then a ? is assigned.
Prediction assessment
Prediction accuracy was initially determined using the method described by Tusnády and Simon (Tusnády and Simon, 1998). The number of observed TM helices (NOBS) in the structures was counted as were the number of predicted TM helices (NP). We also counted the total number of correct predictions (NC). A prediction is considered correct when there is an overlap of at least three amino acids between a predicted and a known TM segment. If NC was higher than NP then it was set equal to NP. The total number of protein chains in the datasets was counted (NTOT), as was the number of chains for which both the number and location of all TM segments were predicted correctly (NTM), i.e. where NOBS = NP = NC. The efficiency of the TM helix prediction was measured using the ratios M = NC/NOBS and C = NC/NP. The per segment prediction power was then given by a geometric mean QP = 100(MC)1/2. The effect of varying the overlap value from 1 to 9 has been shown to have only a small effect on the prediction success of MPEX (Jayasinghe et al., 2001
). Nonetheless, prediction accuracy was additionally measured varying the overlap from 1 to 20 residues.
A score was also calculated to give a per residue accuracy for each method, Q3 = N1iPi, where N is the total number of observed residues and Pi the total number of residues correctly assigned to class i; the two classes being transmembrane (TM) and non-transmembrane (NTM).
Scoring the ends
To test the accuracy of the various methods in predicting the location of the ends of the TM helices, we used the scoring method summarized in Figure 1. The method can only be applied to helices whose overall location has been correctly predicted. The accuracy of the prediction algorithms was considered both at the N- and C-termini of each helix. If the prediction was an exact match then the method scored zero. The total score is normalized for the number of correctly assigned helices and provides a measure of the accuracy of predicting the ends of the TM helices for which the smaller the score the more accurate the method.
|
The E.coli proteome was taken from the Integr8 database (http://www.ebi.ac.uk/integr8/EBI-Integr8-HomePage.do). Each protein sequence was submitted to all of the TM prediction methods (using the consensus prediction script) and also to signal peptide (SP) prediction using both SignalP-NN and SignalP-HMM http://www.cbs.dtu.dk/services/SignalP/ (Bendtsen et al., 2004). An SP was only considered to be present if both SignalP methods predicted an SP. If an SP was predicted and the consensus TM prediction indicated that the first TM helix was present in the first 70 amino acids, then this first TM helix was removed from the consensus prediction. Although it is possible that a few TM helices are removed due to false-positive prediction of SPs, the subsequent consensus prediction should provide at least a lower limit for the percentage of
-helical membrane proteins present in the proteome.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The prediction methods were assessed against two datasets of membrane proteins of known 3D structure (Table I). The redundant dataset contains all -helical membrane proteins of known structure solved to 3.5 Å resolution or better up to a cut-off date of August 2004. This dataset contains 434 TM helices, from 112 polypeptide chains. A non-redundant subset contains 268 TM helices and 73 protein chains. The non-redundant set was created by checking for any proteins with a sequence identity of
30%. Where two protein had a sequence identity of
30%, the protein with the larger number of TM helices was selected. If the total number of TM helices was the same in both structures, then the structure with the higher resolution was chosen. Redundancy has been suggested to bias estimations of prediction accuracy (Chen et al., 2002
). By using both a redundant and a non-redundant dataset, it is possible to evaluate such bias.
The redundant dataset does not contain any membrane protein that consists solely of a single TM helix. However, a histogram (Figure 2A) of the distribution of numbers of TM helices per polypeptide chain for the 414 TM helices (excluding half-helices) in the redundant dataset reveals a high frequency of polypeptide chains containing a single membrane-spanning segment. This agrees with several analyses of predicted TM proteins (e.g. Arkin et al., 1997), which suggest that the majority of membrane-protein genes encode a chain with just one TM helix. For example, in Figure 2B we show the result of applying a consensus TM helix prediction (see below) to all the predicted membrane proteins of the E.coli genome. Interestingly, although the overall distribution is similar for the membrane protein crystal structures (Figure 2A) and for the predicted membrane proteome (Figure 2B), there is a suggestion of some bias in favour of simpler (i.e. smaller number of TM helices per polypeptide chain) membrane proteins in the structural dataset. This may be due, in part, to a misclassification of a few signal peptides as TM helices, although as described in the Methods section we used SignalP to exclude signal peptides.
|
|
Prediction of the number and location of TM -helices
The various prediction methods were analysed for their ability to predict both the number and location of all TM helices (including half-TMs; Tables II and III). The per-segment values given in the tables were calculated using a requirement of at least a three residue overlap between an observed and predicted helix for a correct prediction (see below for analysis of the effect of overlap).
|
|
The per-residue accuracies reveal the percentage of residues correctly predicted as either TM or non-TM, with the Q3 score providing an overall per-residue accuracy. As with the per-segment assignment, ALOM2 performs least well (Q3 = 74%) and SPLIT4 and TMHMM2 perform the best (Q3 = 85 and 83%, respectively). These conclusions do not change whether one employs the redundant or the non-redundant dataset.
Many membrane proteins have large extramembraneous domains. Consequently, there are more non-TM residues (57%) than TM residues (
43%) in both datasets. Hence the accuracy for the non-TM region will tend to dominate the per-residue accuracy. For this reason, we also report the score for both the TM and non-TM classes. For all methods the score for the non-TM class (Q3NTM) is higher than that for its TM counterpart (Q3TM). For Q3TM SPLIT4 and TMAP are the strongest methods with scores of 78 and 76%, respectively, for the non-redundant dataset.
A revealing statistic when analysing the prediction accuracy of a particular method is to consider how often the method accurately predicts both the number and location of TM helices in an entire protein chain, which may correspond more closely to what is required by an experimentalist when running a prediction. For example, it may be misleading to suggest that a prediction method is 90% accurate if it does not succeed in predicting the location of all TM helices in 90% of protein chains tested. With a requirement of a three residue overlap between a predicted and an observed helix for a correct prediction, the methods with the highest NTM values for both datasets are SPLIT4 and HMMTOP2, corresponding to prediction accuracies (by this measure) of 84% for SPLIT4 and 80% for HMMTOP2 (for the non-redundant dataset; the corresponding values for the redundant dataset are 79 and 76%, respectively).
In summary, as might be expected, there is some dependence on the measure used as to which TM prediction methods are best. On balance, SPLIT4, TMHMM2, HMMTOP2 and TMAP all score highly, with SPLIT4 performing well across several measures.
Analysis of predicted TM helix lengths across all proteins in both of the datasets reveals that the majority of methods predict most TM helices to be 1725 residues in length. In contrast, 48% of TM helices in known structures are >25 residues in length (see above). As a consequence, all methods (except TMAP) have an average predicted helix length less than the average length of
25 residues calculated from the known structures.
The discrepancies between predicted and observed helix lengths can be visualized more easily from histograms of the TM helix length distributions (Figure 4). SPLIT4 (which performs well in the scoring procedures discussed above) yields a distribution that best overlays the experimental distribution of helix lengths. TMFINDER, MEMSAT and TMPRED also perform reasonably well using such an overlap as a criterion for prediction accuracy. Several of the methods favour a particular helix length. For example, MPEX has a peak at 19 residues corresponding to 67% of all predictions. TMAP yields peaks at 21 and 29 residues, PHD has a peak at 18 residues and TMHMM favours helices containing 23 residues.
|
Accurate TM helix prediction depends on identifying not only the number of transmembrane helices correctly, but also their start and end residues. This is of particular importance if TM prediction is to be used as a preliminary to three-dimensional modelling of a membrane protein. Therefore, termini scores were calculated for each prediction method (Tables IV and V), providing measures of prediction of the start and end positions of the TM helices (see Methods and Figure 1 for details). Note that the lower the value of a termini score, the better the prediction. For both redundant and non-redundant datasets the results follow the same trend. The three methods that perform the best are TMHMM2, MEMSAT2 and SPLIT4 with overall termini scores of 8.0, 8.3 and 8.2, respectively, for the redundant dataset and scores of 8.4, 8.4 and 8.5 for the non-redundant dataset. These values directly translate to the average number of residues the methods predict incorrectly for a given helix. Hence on average these (best) three methods are inaccurate by just over two turns of -helix.
|
|
|
The analysis of under- and over-prediction (Tables IV and V) reveals that nearly all of the methods under-predict at the N-terminus. This confirms the observation above that the methods have a tendency to predict the N-terminus after the actual start of the helix. There is also evidence for under-prediction at the C-terminus. For all methods except SPLIT4, under-prediction at the C-terminus is more common than over-prediction.
There are several possible explanations for the dominance of under-prediction. The most likely explanation is that the methods can predict the core hydrophobic region of TM helices but tend to miss the more polar extensions at either end of the helix. This was noted previously for long helices such as in chain D (residues 198231) of the cytochrome bc complex (pdb code 1BGY) (Chen et al., 2002).
All the prediction methods clearly have considerable difficulty in predicting the precise location of the termini. In fairness, we should note that prediction methods have not been designed to predict helix ends accurately. Even the best methods only predict the correct position 911% of the time. Similar work on secondary structure prediction of globular proteins has revealed that only one-third of JPRED (44%) and PHD (40%) predictions correctly identify the N-terminus (Wilson et al., 2002) and that under-prediction at the N-terminus occurs for globular proteins as well as membrane proteins.
Sensitivity analysis
In the discussion above, a helix is assigned as being correctly predicted if there is an overlap of at least three residues between the predicted helix and the observed helix. However, it is important to analyse the sensitivity of the prediction scores to the residue overlap used. We therefore analysed the effect of the residue overlap on the per-segment accuracy (QP; see Figure 6). In particular, we wished to examine the extent to which the ranking of prediction methods is sensitive to the overlap definition. From the per-segment accuracy as a function of overlap for all of the methods, varying the overlap value between one and 10 residues has little effect on the ranking of the methods. Hence our conclusions as to the best methods are relatively robust to this assumption. The rankings only start to change at high (i.e. >10) overlaps.
|
|
In order to understand better the secondary structure prediction of membrane proteins, it is useful to compare those proteins for which the methods work well with those for which the methods perform poorly. Therefore, in addition to comparison between methods, we also performed a comparison between proteins. The proteins were ranked in terms of how well their TM helices were predicted. Such a ranking enables us to consider the best and worst case predictions, thus permitting a more detailed analysis of which structural features may confuse TM helix prediction.
For example, if we rank the protein chains on the basis of their termini score for prediction using SPLIT4, then (for the redundant dataset) the mean termini score is 7.9, with a standard deviation of 3.7. A number of single polypeptide chain membrane proteins score significantly worse than average (defined as a termini score of >11.6, i.e. more than the mean ± standard deviation), including the chloride channel/transporter ClC (1OTS), the sugar transporter LacY (1PV6), the drug transporter AcrB (1IWG) and CaATPase (1EUL).
In analysing some of the best and worse case predictions, we made use of a consensus prediction (see Methods for details). The consensus method performs well and is on a par with the best method for each of the various scoring functions. For the redundant dataset the consensus method yielded a per-segment QP score of 93% (close to the best individual score of 94% for SPLIT4), a per-residue Q3 score of 83% (compared with the best individual score of 83% for TMHMM2) and a termini score of 8.6 (compared with 8.4 for TMHMM2 and MEMSAT2).
In Figure 8, we illustrate a best case TM predictionthat of TM1 from bovine rhodopsin (1L9H). Here the agreement between the various methods is good and the consensus prediction corresponds well with the experimental helix location, missing only the first three residues and the final residue. This helix is very hydrophobic (although it does include a proline close to the centre) with an N-terminal tryptophan and two lysines immediately after the helix. At 31 residues long it is (just) within one standard deviation of the mean TM helix length. Hence it presents a clear TM helix signature in its sequence.
|
|
|
Another pore-forming protein with a short helix in the interior of the TM helix bundle is the protein-conducting SecY complex. The -subunit, which is the pore-forming subunit, is formed by 10 TM helices, which surround a central pore. In the conformation captured in the crystal structure the pore is plugged by a short (four-residue)
-helix, referred as TM2a (see Figure 11). If one examines the consensus prediction for the SecY
-subunit, it can be seen that whilst there are typically accurate predictions for TM1 and TM2, the short plug helix TM2a is missed by all of the methods, with the exception of TMFINDER, which predicts it as an N-terminal extension of the TM2 helix. This is consistent with the three-state model of membrane protein folding (i.e. TM2a folds into the cavity formed by the other TM helices) and also with the proposal that the plug helix may leave the pore during the cycle of conformational changes associated with transport through the pore (van den Berg et al., 2003
).
|
We will now consider a membrane protein, the structure of which has generated some debate as to its conformation when present within a membrane and which we therefore omitted from the dataset used to analyse the prediction methods. This is KvAP, a bacterial homologue of eukaryotic voltage-gated K channels (Cuello et al., 2004; Elliott et al., 2004
; Jiang et al., 2004
; Lee and MacKinnon, 2004
; Starace and Bezanilla, 2004
). Prior to the structure determination, topological data and homology with the simpler K channel KcsA had suggested a structure with six TM helices (S1 to S6) per monomer, with the pore-forming half-TM (the P helix) located between S5 and S6. The S4 helix contained multiple basic residues and was proposed to act as the voltage sensor, undergoing a change in orientation when the voltage across a cell membrane changed. The other TM helices (i.e. S1 to S3 and S5) were proposed to form a cavity within which the more polar S4 helix was accommodated. The crystal structure of KvAP was therefore a surprise when it revealed that the S4 helix was located in a membrane surface location, along with the C-terminal half of S3 (S3b; see Figure 12). The S3bS4 helix hairpin is suggested to form a mobile element which changes orientation (and possibly conformation) in response to changes in transmembrane voltage.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We also considered the performance of TM prediction methods from the perspective of the identification of exact start and end residues for TM helices. This is of considerable importance if TM predictions are to be used as a component of attempts to model the three-dimensional folds of membrane proteins. Again SPLIT4 performs well, but even the three best methods are inaccurate by just over two turns of -helix in terms of predicting the termini. Therefore, other may be required to refine sequence-based predictions of TM helices. One approach may be via molecular simulations of model TM helices in a bilayer environment (Forrest et al., 1999
). However, systematic studies of such a simulation-based approach have yet to be performed.
A simple consensus method for TM helix prediction was also explored. Although this does not seem to outperform the best single methods, it is of value for, e.g., genome-wide prediction of membrane protein topologies (J.M.Cuthbertson, D.A.Doyle and M.S.P.Sansom, unpublished results) and for identifying potential -helical regions which do not present a clear TM helix signal. Such regions may correspond to, e.g., functionally important half-TM helices within more complex membrane proteins. Hence the use of a consensus method can direct one's attention to difficult and/or interesting regions of a prediction. It is also likely that as our experimental understanding of the processes of topogenesis of these more complex regions of membrane proteins improves (Umigai et al., 2003
), we will be able to embed such knowledge in improved methods for TM prediction.
We have undertaken a number of case studies of when TM helix prediction methods fail (for known structures) or give mixed signal results (for unknown structures). Albeit from a limited number of cases, our results suggest that failure can arise where half-TMs or more complex secondary structure elements are located within an outer bundle of canonical hydrophobic TM helices. This is consistent with a recent modification (Engelman et al., 2003) of the original two-state model (Popot and Engelman, 1990
) of membrane protein folding and with more recent studies of the mechanism of recognition of TM helices by the translocon in the endoplasmic reticulum (Hessa et al., 2005a
). In this three-state model, e.g., loops containing half-TMs are inserted after formation of the TM helix bundle. This model is supported by, e.g., recent studies of topogenesis of the pore-forming P-loop of K+ channels (Umigai et al., 2003
). Therefore, it seems that the consensus method predicts TM helices that appear to form a core structural framework within which functionally important sections may then be inserted. Hence for K+ channels the pore-lining outer and inner helices are well predicted, whereas the selectivity filter involving the pore half-TM helix is not. A similar situation holds for the CLC channel/transporter and for the aquaporins. Hence a consensus prediction may be able to help in the identification of functionally important regions, especially in combination with sequence homology between functionally related proteins.
In summary, this study not only provides an updated critical evaluation of TM helix prediction methods, but also provides an analysis of (some of) the factors that may limit the success of such approaches. These analyses may aid the development of further predictive approaches for TM helices. For example, future prediction methods may include consideration of some of the additional complexities of TM helices, including: helix lengths and tilts relative to the bilayer plane (Bowie, 1997) and the presence of glycine (Eilers et al., 2000
) and of proline (Cordes et al., 2002
) residues and of more complex motifs (Senes et al., 2000
) within TM helices. These and related advances in the structural bioinformatics of membrane proteins (Chamberlain and Bowie, 2004
; Eyre et al., 2004
; Gimpelev et al., 2004
) may be combined with novel approaches (Pellegrini-Calace et al., 2003
; Beuming and Weinstein, 2004
; Im and Brooks, 2004
; Kim et al., 2004
; Kokubo and Okamoto, 2004
) to modelling and simulation of membrane protein helix folding. In this way we may hope to advance protocols for the prediction and modelling of the three-dimensional TM fold of membrane proteins.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Arkin,I. and Brunger,A. (1998) Biochim. Biophys. Acta, 1429, 113128.[ISI][Medline]
Arkin,I.T., Brünger,A.T. and Engelman,D.M. (1997) Proteins., 28, 465466.[CrossRef][ISI][Medline]
Bendtsen,J.D., Nielsen,H., von Heijne,G. and Brunak,S. (2004) J. Mol. Biol., 340, 783795.[CrossRef][ISI][Medline]
Beuming,T. and Weinstein,H. (2004) Bioinformatics, 20, 18221835.
Booth,P.J. (2000) Biochim. Biophys. Acta, 1460, 414.[ISI][Medline]
Bowie,J.U. (1997) J. Mol. Biol., 272, 780789.[CrossRef][ISI][Medline]
Chamberlain,A.K. and Bowie,J.U. (2004) Biophys. J., 87, 34603469.
Chen,C.P., Kernytsky,A. and Rost,B. (2002) Protein Sci., 11, 27742791.
Claros,M.G. and von Heijne,G. (1994) Comput. Appl. Biosci., 10, 685686.[Medline]
Cordes,F.S., Bright,J.N. and Sansom,M.S.P. (2002) J. Mol. Biol., 323, 951960.[CrossRef][ISI][Medline]
Cserzo,M., Wallin,E., Simon,I., von Heijne,G. and Elofsson,A. (1997) Protein Eng., 10, 673676.[CrossRef][ISI][Medline]
Cuello,L.G., Cortes,D.M. and Perozo,E. (2004) Science, 306, 491495.
Deber,C.M. and Goto,N.K. (1996) Nat. Struct. Biol., 3, 815818.[CrossRef][ISI][Medline]
Deber,C.M., Brandl,C.J., Deber,R.B., Hsu,L.C. and Young,X.K. (1986) Arch. Biochem. Biophys., 251, 6876.[CrossRef][ISI][Medline]
Deber,C.M., Wang,C., Liu,L.P., Prior,A.S., Agrawal,S., Muskat,B.L. and Cuticchia,A.J. (2001) Protein Sci., 10, 212219.
Dutzler,R., Campbell,E.B., Cadene,M., Chait,B.T. and MacKinnon,R. (2002) Nature, 415, 287294.[CrossRef][ISI][Medline]
Eilers,M., Shekar,S.C., Shieh,T., Smith,S.O. and Fleming,P.J. (2000) Proc. Natl Acad. Sci. USA, 97, 57965801.
Eisenberg,D. (1984) Annu. Rev. Biochem., 53, 595623.[CrossRef][ISI][Medline]
Elliott,D.J.S., Neale,E.J., Aziz,Q., Dunham,J.P., Munsey,T.S., Hunter,M. and Sivaprasadarao,A. (2004) EMBO J., 23, 47174726.
Engelman,D.M., Henderson,R., McLachlan,A.D. and Wallace,B.A. (1980) Proc. Natl Acad. Sci. USA, 77, 20232027.
Engelman,D.M., Steitz,T.A. and Goldman,A. (1986) Annu. Rev. Biophys. Biophys. Chem., 15, 321353.[CrossRef][ISI][Medline]
Engelman,D.M. et al. (2003) FEBS Lett., 555, 122125.[CrossRef][ISI][Medline]
Eyre,T.A., Partridge,L. and Thornton,J.M. (2004) Protein Eng. Des. Sel., 17, 613624.
Forrest,L.R., Tieleman,D.P. and Sansom,M.S.P. (1999) Biophys. J., 76, 18861896.
Gimpelev,M., Forrest,L.R., Murray,D. and Honig,B. (2004) Biophys. J., 87, 40754086.
Hessa,T., Kim,H., Bihlmaier,K., Lundin,C., Boekel,J., Andersson,H., Nilsson,I., White,S.H. and von Heijne,G. (2005a) Nature, 433, 377381.[CrossRef][ISI][Medline]
Hessa,T., White,S.H. and von Heijne,G. (2005b) Science, 307, 1427.
Hofmann,K. and Stoffel,W. (1993) Biol. Chem. Hoppe-Seyler, 374, 166.
Hunt,J.F., Earnest,T.N., Bousche,O., Kalghatgi,K., Reilly,K., Horvath,C., Rothschild,K.J. and Engelman,D.M. (1997a) Biochemistry, 36, 1515615176.[CrossRef][ISI][Medline]
Hunt,J.F., Rath,P., Rothschild,K.J. and Engelman,D.M. (1997b) Biochemistry, 36, 1517715192.[CrossRef][ISI][Medline]
Ikeda,M., Arai,M., Lao,D.M. and Shimizu,T. (2002) In Silico Biol., 2, 1933.[Medline]
Im,W. and Brooks,C.L. (2004) J. Mol. Biol., 337, 531519.[CrossRef]
Iverson,T.M., Luna-Chavez,C., Cecchini,G. and Rees,D.C. (1999) Science, 284, 19611966.
Jayasinghe,S., Hristova,K. and White,S.H. (2001) Protein Sci., 10, 455458.
Jiang,Q.X., Wang,D.N. and MacKinnon,R. (2004) Nature, 430, 806810.[CrossRef][ISI][Medline]
Jiang,Y., Lee,A., Chen,J., Ruta,V., Cadene,M., Chait,B.T. and Mackinnon,R. (2003) Nature, 423, 3341.[CrossRef][ISI][Medline]
Jones,D.T. (1998) FEBS Lett., 423, 281285.[CrossRef][ISI][Medline]
Jones,D.T., Taylor,W.R. and Thornton,J.M. (1994) Biochemistry, 33, 30383049.[CrossRef][ISI][Medline]
Juretic,D., Zoranic,L. and Zucic,D. (2002) J. Chem. Inf. Comput. Sci., 42, 620632.[CrossRef][ISI][Medline]
Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 25772637.[CrossRef][ISI][Medline]
Katragadda,M., Alderfer,J.L. and Yeagle,P.L. (2001) Biophys. J., 81, 10291036.
Killian,J.A. and von Heijne,G. (2000) Trends Biochem. Sci., 25, 429434.[CrossRef][ISI][Medline]
Kim,S., Chamberlain,A.K. and Bowie,J.U. (2004) Proc. Natl Acad. Sci. USA, 101, 59885991.
Koebnik,R., Locher,K.P. and Van Gelder,P. (2000) Mol. Microbiol., 37, 239253.[CrossRef][ISI][Medline]
Kokubo,H. and Okamoto,Y. (2004) J. Chem. Phys., 120, 1083710847.[CrossRef][ISI][Medline]
Krogh,A.L.B., von Heijne,G. and Sonnhammer,E.L.L. (2001) J. Mol. Biol., 305, 567580.[CrossRef][ISI][Medline]
Kyte,J. and Doolittle,R.F. (1982) J. Mol. Biol., 157, 105132.[CrossRef][ISI][Medline]
Lancaster,C.R., Kroger,A., Auer,M. and Michel,H. (1999) Nature, 402, 377385.[CrossRef][ISI][Medline]
Landolt-Marticorena,C., Williams,K.A., Deber,C.M. and Reithmeier,R.A.F. (1993) J. Mol. Biol., 229, 602608.[CrossRef][ISI][Medline]
Lee,S.Y. and MacKinnon,R. (2004) Nature, 430, 232235.[CrossRef][ISI][Medline]
Li,S.C. and Deber,C.M. (1992) FEBS Lett, 311, 217220.[CrossRef][ISI][Medline]
Marti,T. (1998) J. Biol. Chem., 273, 93129322.
Möller,S., Croning,D.R. and Apweiler,R. (2001) Bioinformatics, 17, 646653.
Nakai,K. and Kanehisa,M. (1992) Genomics, 14, 897911.[ISI][Medline]
Pellegrini-Calace,M., Carotti,A. and Jones,D.T. (2003) Proteins, 50, 537545.[CrossRef][ISI][Medline]
Persson,B. and Argos,P. (1997) J. Protein Chem., 16, 453457.[CrossRef][ISI][Medline]
Popot,J.L. and Engelman,D.M. (1990) Biochemistry, 29, 40314037.[CrossRef][ISI][Medline]
Popot,J.L. and Engelman,D.M. (2000) Annu. Rev. Biochem., 69, 881922.[CrossRef][ISI][Medline]
Rost,B., Fariselli,P. and Casadio,R. (1996) Protein Sci., 5, 17041718.
Senes,A., Gerstein,M. and Engelman,D.M. (2000) J. Mol. Biol., 296, 921936.[CrossRef][ISI][Medline]
Starace,D.M. and Bezanilla,F. (2004) Nature, 427, 548553.[CrossRef][ISI][Medline]
Terstappen,G.C. and Reggiani,A. (2001) Trends Pharmacol. Sci., 22, 2326.[CrossRef][ISI][Medline]
Tusnády,G.E. and Simon,I. (1998) J. Mol. Biol., 283, 489506.[CrossRef][ISI][Medline]
Tusnády,G.E. and Simon,I. (2001) Bioinformatics, 17, 849850.
Ubarretxena-Belandia,I. and Engelman,D.M. (2001) Curr. Opin. Struct. Biol., 11, 370376.[CrossRef][ISI][Medline]
Ulmschneider,M.B. and Sansom,M.S.P. (2001) Biochim. Biophys. Acta, 1512, 114.[ISI][Medline]
Umigai,N., Sato,Y., Mizutani,A., Utsumi,T., Sakaguchi,M. and Uozumi,N. (2003) J. Biol. Chem., 278, 4037340384.
van den Berg,B., Clemons,W.M., Collinson,I., Modis,Y., Hartmann,E., Harrison,S.C. and Rapoport,T.A. (2003) Nature, 427, 3644.[CrossRef][ISI][Medline]
Viklund,H. and Elofsson,A. (2004) Protein Sci., 13, 19081917.
von Heijne,G. (1992) J. Mol. Biol., 225, 487494.[CrossRef][ISI][Medline]
Wallin,E. and von Heijne,G. (1998) Protein Sci., 7, 10291038.
White,S.H. (2004) Protein Sci., 13, 19481949.
White,S.H. and Wimley,W.C. (1999) Annu. Rev. Biophys. Biomol. Struct., 28, 319365.[CrossRef][ISI][Medline]
Wilson,C.L., Hubbard,S.J. and Doig,A.J. (2002) Protein Eng., 15, 545554.[CrossRef][ISI][Medline]
Received April 25, 2005; accepted April 25, 2005.
Edited by Klaus Schulten