Transmembrane helix prediction: a comparative evaluation and analysis

Jonathan M. Cuthbertson, Declan A. Doyle1 and Mark S.P. Sansom2

Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK 1Present address: Structural Genomics Consortium, University of Oxford, Botnar Research Centre, Oxford OX3 7LD, UK

2 To whom correspondence should be addressed. E-mail: mark{at}biop.ox.ac.uk


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
The prediction of transmembrane (TM) helices plays an important role in the study of membrane proteins, given the relatively small number (~0.5% of the PDB) of high-resolution structures for such proteins. We used two datasets (one redundant and one non-redundant) of high-resolution structures of membrane proteins to evaluate and analyse TM helix prediction. The redundant (non-redundant) dataset contains structure of 434 (268) TM helices, from 112 (73) polypeptide chains. Of the 434 helices in the dataset, 20 may be classified as ‘half-TM’ as they are too short to span a lipid bilayer. We compared 13 TM helix prediction methods, evaluating each method using per segment, per residue and termini scores. Four methods consistently performed well: SPLIT4, TMHMM2, HMMTOP2 and TMAP. However, even the best methods were in error by, on average, about two turns of helix at the TM helix termini. The best and worst case predictions for individual proteins were analysed. In particular, the performance of the various methods and of a consensus prediction method, were compared for a number of proteins (e.g. SecY, ClC, KvAP) containing half-TM helices. The difficulties of predicting half-TM helices suggests that current prediction methods successfully embody the two-state model of membrane protein folding, but do not accommodate a third stage in which, e.g., short helices and re-entrant loops fold within a bundle of stable TM helices.

Keywords: channel/membrane protein/pore/prediction/secondary structure/transmembrane helix


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Analysis of the genome sequences of several organisms suggests that ~25% of genes may encode integral membrane proteins (Jones, 1998Go; Wallin and von Heijne, 1998Go; Krogh et al., 2001Go). Membrane proteins play many roles in cells, including transmembrane transport, signalling and energy transduction. From a pharmacological perspective, ~50% of drug targets are membrane proteins (Terstappen and Reggiani, 2001Go). The number of high-resolution structures for membrane proteins is increasing (see http://blanco.biomol.uci.edu/Membrane_Proteins_xtal.html for a summary), but membrane protein structures still account for only ~0.5% (White, 2004Go) of the protein structures in the RCSB. Hence there is a continued need for development and evaluation of optimal methods for prediction of membrane protein structure.

Membrane proteins seem to exhibit a more restricted range of folds than their water-soluble counterparts, making them more amenable to structural predictions (Popot and Engelman, 2000Go; Ubarretxena-Belandia and Engelman, 2001Go). Only two overall fold classes have been observed for membrane proteins: the {alpha}-helix bundle and the ß-barrel. In this paper, we are concerned solely with {alpha}-helical membrane proteins. ß-Barrel membrane proteins are restricted to bacterial outer membrane proteins and their relatives (Koebnik et al., 2000Go). {alpha}-Helical membrane proteins contain one or more transmembrane (TM) helices which consist predominantly of hydrophobic amino acids (Engelman et al., 1980Go, 1986Go). These TM helices are generally independently stable in a membrane or membrane-like environment. This resulted in the formulation of the two-state model of membrane protein folding (Popot and Engelman, 1990Go, 2000Go), which recently has been modified to include a third state to allow for the incorporation of non TM helix elements (Engelman et al., 2003Go). Furthermore, structural studies on proteins involved in aiding the insertion of proteins into membranes (van den Berg et al., 2003Go) have refined our models of membrane protein folding in vivo. However, the requirement for TM helices to behave as independent folding units seems to hold and underlies methods for predicting the location of TM helices within membrane protein sequences.

At an early stage it was recognized that TM helices could be identified by examination of the amino acid sequences of membrane proteins (Deber et al., 1986Go; Engelman et al., 1986Go; Landolt-Marticorena et al., 1993Go), in particular by searching for ~20 amino acid stretches with a predominance of hydrophobic amino acids (Kyte and Doolittle, 1982Go). Subsequent improvements in our understanding of amino acid propensities in membrane environments (Li and Deber, 1992Go; Deber and Goto, 1996Go) and the results of structural bioinformatics analyses of membrane proteins have led to refinement of this simple rule. For example, the amphipathic aromatic amino acids Trp and Tyr prefer to be located at the interface between the membrane and solution (aromatic belts) (Landolt-Marticorena et al., 1993Go; Arkin and Brunger, 1998Go; Ulmschneider and Sansom, 2001Go). Arg and Lys have an increased abundance in the cytoplasmic loops of bacterial membrane proteins (von Heijne, 1992Go). By combining analysis of known membrane protein structures (Ulmschneider and Sansom, 2001Go) with experimental studies of membrane proteins and of model peptides (Killian and von Heijne, 2000Go), it has been possible to recognize such structural features that stabilize a TM {alpha}-helical conformation. Such observations are embedded in most TM helix prediction methods.

The early recognition of the importance of hydrophobic {alpha}-helices has resulted in a plethora of methods for prediction of TM helix topology based on analysis of membrane protein sequences (Möller et al., 2001Go). Early methods used simple hydrophobicity scales (Kyte and Doolittle, 1982Go; Eisenberg, 1984Go) to search for potential membrane-spanning helices. Refinements of such methods included consideration of additional TM helix sequence patterns such as the positive-inside rule (von Heijne, 1992Go). More recent methods use a variety of pattern matching methods (e.g. neural nets and hidden Markov models) to search for TM helices based on sequences of such elements in membrane proteins of known structure. There are currently >12 different programs available for predicting the location of TM helices within a membrane protein sequence. These include ALOM2 (Nakai and Kanehisa, 1992Go), TMPRED (Hofmann and Stoffel, 1993Go), DAS (Cserzo et al., 1997Go), HMMTOP2 (Tusnády and Simon, 1998Go, 2001Go), TMHMM2 (Krogh et al., 2001Go), PHD (Rost et al., 1996Go), TMAP (Persson and Argos, 1997Go), TOPPRED2 (Claros and von Heijne, 1994Go), MEMSAT (Jones et al., 1994Go), MPEX (White and Wimley, 1999Go), SPLIT4 (Juretic et al., 2002Go) and TM-FINDER (Deber et al., 2001Go).

Until recently, evaluations of TM helix prediction algorithms (Möller et al., 2001Go; Ikeda et al., 2002Go) have used low-resolution data sets (e.g. experimentally confirmed TM topologies). A recent comparison with high-resolution structural data sets has been performed (Jayasinghe et al., 2001Go; Chen et al., 2002Go). In this paper we compare and evaluate different TM helix prediction methods with respect to the expanded dataset of crystal structures of integral membrane proteins. We also develop a consensus strategy and analyse cases of prediction difficulty. In particular, we analyse ‘difficult’ predictions and indicate how prediction comparison/consensus may enhance our understanding of these proteins. We illustrate this via application to the much debated structure of a bacterial voltage-gated K channel, KvAP (Jiang et al., 2003Go).


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Datasets: membrane proteins of known 3D structure

Two datasets of TM helices in membrane proteins were used: one redundant and one non-redundant (see Table I and above). The non-redundant dataset excluded homologues of a given membrane protein from different organisms [with the exception of fumarate reductase which has a different topology in Escherichia coli from that found in Wolinella succinogenes (Iverson et al., 1999Go; Lancaster et al., 1999Go)]. Thus, proteins were removed if their sequences had an identity of ≥30% following pairwise sequence alignment using Blast2. For example, GlpF has 30% sequence identity with bovine Aqp1. In such cases, the protein with the larger number of TM helices was selected. If the number of TM helices was the same in the two proteins then the higher resolution structure was retained.


View this table:
[in this window]
[in a new window]
 
Table I. Membrane proteins used in this study

 
Defining the TM helices

TM helices within membrane protein structures were defined using DSSP (Kabsch and Sander, 1983Go). TM helices may be interrupted by a turn or kink in the helix, e.g. due to the presence of an intra-helical proline (Cordes et al., 2002Go). For example, a break is observed in TM4 of the Ca ATPase (pdb code 1EUL). If the break or kink was not extensive (i.e. not greater than about three residues as determined by visual inspection), then the two helical segments were considered to form a single TM helix. All TM helices were visually inspected to check the automatic TM helix assignment. Using this method we consider the full extent of each TM helix, including residues which may reside outside the (presumed) limits of the lipid bilayer. This approach was adopted because any attempt to define simply the bilayer spanning element of a TM helix is contingent upon the model used to assign the bilayer spanning element, given the absence of lipid molecules from the majority of crystals of membrane proteins.

TM helix prediction

Locally running versions of the following TM prediction methods were installed: ALOM2 (Nakai and Kanehisa, 1992Go), DAS (Cserzo et al., 1997Go), HMMTOP2 (Tusnády and Simon, 1998Go, 2001Go), MEMSAT 1.5 (Jones et al., 1994Go), MEMSAT2 (Jones, 1998Go), MPEX (White and Wimley, 1999Go), PHD (Rost et al., 1996Go), SPLIT4 (Juretic et al., 2002Go), TMAP (Persson and Argos, 1997Go), TM-FINDER (Deber et al., 2001Go), TMHMM2 (Krogh et al., 2001Go), TMPRED (Hofmann and Stoffel, 1993Go) and TOPPRED2 (Claros and von Heijne, 1994Go).

Consensus predictions

A consensus prediction for each sequence was calculated using a simple majority vote type procedure. If ≥9 methods predict a residue as in a TM helix it is assigned an ‘H’ in the consensus. If between 6 and 8 methods (i.e. a majority) predict a TM residue then an ‘h’ is assigned. If <6 (but >1) methods predict a TM helix then a ‘?’ is assigned.

Prediction assessment

Prediction accuracy was initially determined using the method described by Tusnády and Simon (Tusnády and Simon, 1998Go). The number of observed TM helices (NOBS) in the structures was counted as were the number of predicted TM helices (NP). We also counted the total number of correct predictions (NC). A prediction is considered correct when there is an overlap of at least three amino acids between a predicted and a known TM segment. If NC was higher than NP then it was set equal to NP. The total number of protein chains in the datasets was counted (NTOT), as was the number of chains for which both the number and location of all TM segments were predicted correctly (NTM), i.e. where NOBS = NP = NC. The efficiency of the TM helix prediction was measured using the ratios M = NC/NOBS and C = NC/NP. The per segment prediction power was then given by a geometric mean QP = 100(MC)1/2. The effect of varying the overlap value from 1 to 9 has been shown to have only a small effect on the prediction success of MPEX (Jayasinghe et al., 2001Go). Nonetheless, prediction accuracy was additionally measured varying the overlap from 1 to 20 residues.

A score was also calculated to give a per residue accuracy for each method, Q3 = N–1{Sigma}iPi, where N is the total number of observed residues and Pi the total number of residues correctly assigned to class i; the two classes being transmembrane (TM) and non-transmembrane (NTM).

Scoring the ends

To test the accuracy of the various methods in predicting the location of the ends of the TM helices, we used the scoring method summarized in Figure 1. The method can only be applied to helices whose overall location has been correctly predicted. The accuracy of the prediction algorithms was considered both at the N- and C-termini of each helix. If the prediction was an exact match then the method scored zero. The total score is normalized for the number of correctly assigned helices and provides a measure of the accuracy of predicting the ends of the TM helices for which the smaller the score the more accurate the method.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 1. Definition of the method used for scoring the prediction of TM helix start and end points. For the N terminus, the first helical residue is defined as i = 0, under-prediction as i < 0 and over-prediction as i > 0; for the C terminus, the last helical residue is defined as j = 0, under-prediction as j < 0 and over-prediction as j > 0.

 
E.coli membrane proteome

The E.coli proteome was taken from the Integr8 database (http://www.ebi.ac.uk/integr8/EBI-Integr8-HomePage.do). Each protein sequence was submitted to all of the TM prediction methods (using the consensus prediction script) and also to signal peptide (SP) prediction using both SignalP-NN and SignalP-HMM http://www.cbs.dtu.dk/services/SignalP/ (Bendtsen et al., 2004Go). An SP was only considered to be present if both SignalP methods predicted an SP. If an SP was predicted and the consensus TM prediction indicated that the first TM helix was present in the first 70 amino acids, then this first TM helix was removed from the consensus prediction. Although it is possible that a few TM helices are removed due to false-positive prediction of SPs, the subsequent consensus prediction should provide at least a lower limit for the percentage of {alpha}-helical membrane proteins present in the proteome.


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Datasets: membrane proteins of known structure

The prediction methods were assessed against two datasets of membrane proteins of known 3D structure (Table I). The redundant dataset contains all {alpha}-helical membrane proteins of known structure solved to 3.5 Å resolution or better up to a cut-off date of August 2004. This dataset contains 434 TM helices, from 112 polypeptide chains. A non-redundant subset contains 268 TM helices and 73 protein chains. The non-redundant set was created by checking for any proteins with a sequence identity of ≥30%. Where two protein had a sequence identity of ≥30%, the protein with the larger number of TM helices was selected. If the total number of TM helices was the same in both structures, then the structure with the higher resolution was chosen. Redundancy has been suggested to bias estimations of prediction accuracy (Chen et al., 2002Go). By using both a redundant and a non-redundant dataset, it is possible to evaluate such bias.

The redundant dataset does not contain any membrane protein that consists solely of a single TM helix. However, a histogram (Figure 2A) of the distribution of numbers of TM helices per polypeptide chain for the 414 TM helices (excluding half-helices) in the redundant dataset reveals a high frequency of polypeptide chains containing a single membrane-spanning segment. This agrees with several analyses of predicted TM proteins (e.g. Arkin et al., 1997Go), which suggest that the majority of membrane-protein genes encode a chain with just one TM helix. For example, in Figure 2B we show the result of applying a consensus TM helix prediction (see below) to all the predicted membrane proteins of the E.coli genome. Interestingly, although the overall distribution is similar for the membrane protein crystal structures (Figure 2A) and for the predicted membrane proteome (Figure 2B), there is a suggestion of some bias in favour of simpler (i.e. smaller number of TM helices per polypeptide chain) membrane proteins in the structural dataset. This may be due, in part, to a misclassification of a few signal peptides as TM helices, although as described in the Methods section we used SignalP to exclude signal peptides.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 2. Helix number distributions. (A) Distribution of number of TM helices per polypeptide chain, derived from the 414 TM helices in the redundant dataset (excluding all half-TM helices). (B) Number of TM helices per polypeptide chain, derive from a consensus prediction of TM helices for the E.coli proteome.

 
The distributions of helix lengths in the two datasets were examined (Figure 3). There is no significant difference in the distribution between the redundant and non-redundant datasets (mean ± standard deviation of 25.2 ± 6.0 and 25.3 ± 5.9 residues, respectively). These values are comparable to an early analysis performed on a small dataset of 45 TM helices (Bowie, 1997Go), which yielded an average helix length of 26.4 residues, suggesting that the helix length distribution by now is stable to expansion of the dataset of structures.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 3. Helix length distributions, for (A) the redundant dataset and (B) the non-redundant dataset. The half-TM component of each distribution is indicated by a vertical arrow.

 
Prediction methods typically search for TM helices 17–25 residues in length. Of the 434 TM helices in the redundant dataset, only 200 (46%) of the helices fall within this range and 208 (48%) of the helices exceed 25 residues in length (Figure 3A). [For the non-redundant set, 46% of the helices fall within this range and 49% are longer than 25 residues (Figure 3B).] Several membrane proteins contain TM helices that do not span the bilayer, for example the pore (P) helix of the potassium channel KcsA (1K4C) and the NPA-containing loops of the aquaporins. These ‘half-TMs’ are shorter in length than conventional TM helices and are expected to be more difficult to predict. The distributions of TM helices given in Figure 3 reveal a small but significant population of half-TMs to be present in both datasets. We will return to half-TMs and prediction difficulties in the discussion below.

Prediction of the number and location of TM {alpha}-helices

The various prediction methods were analysed for their ability to predict both the number and location of all TM helices (including half-TMs; Tables II and III). The per-segment values given in the tables were calculated using a requirement of at least a three residue overlap between an observed and predicted helix for a correct prediction (see below for analysis of the effect of overlap).


View this table:
[in this window]
[in a new window]
 
Table II. Prediction of the number and location of TM helices: redundant dataseta

 

View this table:
[in this window]
[in a new window]
 
Table III. Prediction of the number and location of TM helices: non-redundant dataseta

 
On a per-segment basis, most methods score highly with values for the prediction power (QP) of ~≥90%, except for ALOM2, which scores 80% (non-redundant dataset) to 81% (redundant dataset). SPLIT4 and HMMTOP2 rank highest by this measure; SPLIT4 clearly performs well, although not at the 99% per-segment accuracy reported for a smaller (178 TM helices) non-redundant dataset (Juretic et al., 2002Go).

The per-residue accuracies reveal the percentage of residues correctly predicted as either TM or non-TM, with the Q3 score providing an overall per-residue accuracy. As with the per-segment assignment, ALOM2 performs least well (Q3 = 74%) and SPLIT4 and TMHMM2 perform the best (Q3 = 85 and 83%, respectively). These conclusions do not change whether one employs the redundant or the non-redundant dataset.

Many membrane proteins have large extramembraneous domains. Consequently, there are more non-TM residues (~57%) than TM residues (~43%) in both datasets. Hence the accuracy for the non-TM region will tend to dominate the per-residue accuracy. For this reason, we also report the score for both the TM and non-TM classes. For all methods the score for the non-TM class (Q3NTM) is higher than that for its TM counterpart (Q3TM). For Q3TM SPLIT4 and TMAP are the strongest methods with scores of 78 and 76%, respectively, for the non-redundant dataset.

A revealing statistic when analysing the prediction accuracy of a particular method is to consider how often the method accurately predicts both the number and location of TM helices in an entire protein chain, which may correspond more closely to what is required by an experimentalist when running a prediction. For example, it may be misleading to suggest that a prediction method is ~90% accurate if it does not succeed in predicting the location of all TM helices in 90% of protein chains tested. With a requirement of a three residue overlap between a predicted and an observed helix for a correct prediction, the methods with the highest NTM values for both datasets are SPLIT4 and HMMTOP2, corresponding to prediction accuracies (by this measure) of 84% for SPLIT4 and 80% for HMMTOP2 (for the non-redundant dataset; the corresponding values for the redundant dataset are 79 and 76%, respectively).

In summary, as might be expected, there is some dependence on the measure used as to which TM prediction methods are ‘best’. On balance, SPLIT4, TMHMM2, HMMTOP2 and TMAP all score highly, with SPLIT4 performing well across several measures.

Analysis of predicted TM helix lengths across all proteins in both of the datasets reveals that the majority of methods predict most TM helices to be 17–25 residues in length. In contrast, ~48% of TM helices in known structures are >25 residues in length (see above). As a consequence, all methods (except TMAP) have an average predicted helix length less than the average length of ~25 residues calculated from the known structures.

The discrepancies between predicted and observed helix lengths can be visualized more easily from histograms of the TM helix length distributions (Figure 4). SPLIT4 (which performs well in the scoring procedures discussed above) yields a distribution that best overlays the experimental distribution of helix lengths. TMFINDER, MEMSAT and TMPRED also perform reasonably well using such an ‘overlap’ as a criterion for prediction accuracy. Several of the methods favour a particular helix length. For example, MPEX has a peak at 19 residues corresponding to 67% of all predictions. TMAP yields peaks at 21 and 29 residues, PHD has a peak at 18 residues and TMHMM favours helices containing 23 residues.



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 4. Comparison of TM helix length distributions for the X-ray data (red, redundant dataset) and predictions (blue). Distributions are shown for each of the 13 prediction methods compared in this study. A colour version of this figure is available as Supplementary material at PEDS Online.

 
Prediction of TM helix start and end points

Accurate TM helix prediction depends on identifying not only the number of transmembrane helices correctly, but also their start and end residues. This is of particular importance if TM prediction is to be used as a preliminary to three-dimensional modelling of a membrane protein. Therefore, termini scores were calculated for each prediction method (Tables IV and V), providing measures of prediction of the start and end positions of the TM helices (see Methods and Figure 1 for details). Note that the lower the value of a termini score, the better the prediction. For both redundant and non-redundant datasets the results follow the same trend. The three methods that perform the best are TMHMM2, MEMSAT2 and SPLIT4 with overall termini scores of 8.0, 8.3 and 8.2, respectively, for the redundant dataset and scores of 8.4, 8.4 and 8.5 for the non-redundant dataset. These values directly translate to the average number of residues the methods predict incorrectly for a given helix. Hence on average these (best) three methods are inaccurate by just over two turns of {alpha}-helix.


View this table:
[in this window]
[in a new window]
 
Table IV. Success rate at predicting the termini of TM helices: redundant dataseta

 

View this table:
[in this window]
[in a new window]
 
Table V. Success rate at predicting the termini of TM helices: non-redundant dataseta

 
The termini scores can be broken down into the N score and C score. The N and C scores for TMHMM2 are 4.4 and 4.0, respectively, for the non-redundant dataset, suggesting that the method misses, on average, one turn of the helix at each end (see Figure 5). It is interesting that with the exception of SPLIT4 and TMAP, which have approximately the same N and C scores, all the methods have higher N Scores than C scores (where a higher score corresponds to a poorer prediction).



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 5. Termini score distributions. Examples of termini score distributions are shown for TMHMM2 (using the redundant dataset). (A) The C score distribution and (B) the N score distribution.

 
The majority of methods also have a higher percentage of correct predictions at the C-terminus (C correct). This suggests that most of the methods find the C-terminus position more easily than the N-terminus. TMHMM2 is the strongest method at locating the termini. If a histogram is plotted of all the values that contribute to the overall C score for TMHMM2 (Figure 5A), then a peak is seen at a value of zero. Hence, averaging across the entire dataset, the correct C-terminus position is predicted more frequently than any other location. The peak for a similar plot of the values at the N-terminus (Figure 5B) is offset to a value of –4. A negative value corresponds to under-prediction, suggesting that the method tends on average to predict the N-terminus approximately one turn of the helix after the start of the observed helix.

The analysis of under- and over-prediction (Tables IV and V) reveals that nearly all of the methods under-predict at the N-terminus. This confirms the observation above that the methods have a tendency to predict the N-terminus after the actual start of the helix. There is also evidence for under-prediction at the C-terminus. For all methods except SPLIT4, under-prediction at the C-terminus is more common than over-prediction.

There are several possible explanations for the dominance of under-prediction. The most likely explanation is that the methods can predict the core hydrophobic region of TM helices but tend to miss the more polar extensions at either end of the helix. This was noted previously for long helices such as in chain D (residues 198–231) of the cytochrome bc complex (pdb code 1BGY) (Chen et al., 2002Go).

All the prediction methods clearly have considerable difficulty in predicting the precise location of the termini. In fairness, we should note that prediction methods have not been designed to predict helix ends accurately. Even the best methods only predict the correct position 9–11% of the time. Similar work on secondary structure prediction of globular proteins has revealed that only one-third of JPRED (44%) and PHD (40%) predictions correctly identify the N-terminus (Wilson et al., 2002Go) and that under-prediction at the N-terminus occurs for globular proteins as well as membrane proteins.

Sensitivity analysis

In the discussion above, a helix is assigned as being correctly predicted if there is an overlap of at least three residues between the predicted helix and the observed helix. However, it is important to analyse the sensitivity of the prediction scores to the residue overlap used. We therefore analysed the effect of the residue overlap on the per-segment accuracy (QP; see Figure 6). In particular, we wished to examine the extent to which the ranking of prediction methods is sensitive to the overlap definition. From the per-segment accuracy as a function of overlap for all of the methods, varying the overlap value between one and 10 residues has little effect on the ranking of the methods. Hence our conclusions as to the ‘best’ methods are relatively robust to this assumption. The rankings only start to change at high (i.e. >10) overlaps.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 6. Comparative analysis of the per segment accuracy for all 13 methods [(A) ALOM2 to PHD; (B) SPLIT4 to TOPPRED2] as a function of residue overlap for the non-redundant dataset (including half-TM helices). A colour version of this figure is available as Supplementary material at PEDS Online.

 
Increasing the overlap causes a decline in the overall termini score for all methods (Figure 7), i.e. yielding an increase in the accuracy of termini prediction. This is expected since the better the overlap between an observed and predicted helix the more likely it is that the predicted termini will be close to their observed positions. For the more accurate methods, increasing the overlap causes only a small decline in the termini score. The three methods which perform the best for the default overlap value of three (TMHMM2, SPLIT4 and MEMSAT2) continue to have the lowest termini scores across the entire range of overlap values. It is worth noting that while termini prediction accuracy improves as the overlap for a correct prediction is increased, fewer helices are predicted correctly, as confirmed by the per-segment analysis, and so fewer termini are being considered. For example, the non-redundant dataset contains 268 observed helices. For TMHMM2 the total number of helices correctly predicted at an overlap of three residues is 239 (termini score 8.4), compared with 150 for an overlap of 20 residues (termini score 6.5).



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 7. Comparative analysis of the termini scores for all 13 methods as a function of residue overlap for the non-redundant dataset (including half-TM helices). A colour version of this figure is available as Supplementary material at PEDS Online.

 
Best and worst case examples

In order to understand better the secondary structure prediction of membrane proteins, it is useful to compare those proteins for which the methods work well with those for which the methods perform poorly. Therefore, in addition to comparison between methods, we also performed a comparison between proteins. The proteins were ranked in terms of how well their TM helices were predicted. Such a ranking enables us to consider the best and worst case predictions, thus permitting a more detailed analysis of which structural features may ‘confuse’ TM helix prediction.

For example, if we rank the protein chains on the basis of their termini score for prediction using SPLIT4, then (for the redundant dataset) the mean termini score is 7.9, with a standard deviation of 3.7. A number of single polypeptide chain membrane proteins score significantly worse than average (defined as a termini score of >11.6, i.e. more than the mean ± standard deviation), including the chloride channel/transporter ClC (1OTS), the sugar transporter LacY (1PV6), the drug transporter AcrB (1IWG) and CaATPase (1EUL).

In analysing some of the best and worse case predictions, we made use of a consensus prediction (see Methods for details). The consensus method performs well and is on a par with the best method for each of the various scoring functions. For the redundant dataset the consensus method yielded a per-segment QP score of 93% (close to the best individual score of 94% for SPLIT4), a per-residue Q3 score of 83% (compared with the best individual score of 83% for TMHMM2) and a termini score of 8.6 (compared with 8.4 for TMHMM2 and MEMSAT2).

In Figure 8, we illustrate a ‘best’ case TM prediction—that of TM1 from bovine rhodopsin (1L9H). Here the agreement between the various methods is good and the consensus prediction corresponds well with the experimental helix location, missing only the first three residues and the final residue. This helix is very hydrophobic (although it does include a proline close to the centre) with an N-terminal tryptophan and two lysines immediately after the helix. At 31 residues long it is (just) within one standard deviation of the mean TM helix length. Hence it presents a clear TM helix signature in its sequence.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 8. Comparison of prediction methods for TM helix 1 from bovine rhodopsin. The sequence is given at the top. The grey box indicates the extent of the helix in the X-ray structure (pdb code 1L9H). For each method, the predicted TM helix residues are indicated by stars. The consensus prediction is given at the bottom. A colour version of this figure is available as Supplementary material at PEDS Online.

 
Now let us examine a more complicated example, taken from bacteriorhodopsin (BR). Studies of synthetic peptides corresponding to each of the seven TM helices of BR have shown that peptides TM1 to TM5 are capable of forming independently stable helices when inserted into lipid bilayers (Hunt et al., 1997aGo,bGo; Marti, 1998Go; Booth, 2000Go). In contrast, a TM7 peptide does not form a stable {alpha}-helix in detergent micelles (Hunt et al., 1997aGo) and is insoluble in DMSO or in chloroform/methanol solutions (Katragadda et al., 2001Go). In general, C-terminal fragments of BR misfold or only partially fold in the absence of the rest of the protein (Booth, 2000Go). Interestingly the consensus method reveals that TM6 and TM7 are less clearly predicted than the first five helices (Figure 9). This suggests that consensus prediction may be able to identify regions in membrane proteins which do not give a clear TM helix signal and which may behave anomalously in terms of their behaviour as autonomous TM helix folding units.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 9. Consensus prediction applied to TM helices 6 and 7 (highlighted in grey) from bacteriorhodopsin. As discussed in the text, this C-terminal region is less well predicted than the previous five TM helices (not shown). The positions of the helices in the crystal structure (pdb code 1C3W) as defined by DSSP (Kabsch and Sander, 1983Go) are given by the blue letter ‘H’. The consensus colour scheme is the same as in Figure 8. A colour version of this figure is available as Supplementary material at PEDS Online.

 
The structure of the ClC protein has been solved in both E.coli and Salmonella typhimurium (Dutzler et al., 2002Go). It is a prokaryotic homologue of eukaryotic ClC Cl channels, which functions in E.coli as an H+/Cl exchanger (Accardi and Miller, 2004Go). The ClC protein is homodimeric, with 16 TM helices per monomer. None of the prediction methods used in this study successfully predict the number of TM helices. The total number of predicted TM helices ranges from eight to 11. Several of the TM helices are highly tilted and they vary considerably in length (from seven to 38 residues). In Figure 10 we show one ClC monomer with the helix colour based on prediction results (red = strongly predicted, for TMs 1, 2, 4, 8, 9, 10, 13 and 16; yellow = two adjacent TMs predicted as a single helix, for TMs 6 and 7, 11 and 12 and 14 and 15; and green = not predicted). The strongly predicted TMs fully span the membrane and are found on the outer (i.e. membrane-facing) surface of the protein. The yellow and green helices are found inside the outer helix bundle. This may imply [according to the three-state folding model (Engelman et al., 2003Go)] that these inner helices fold after the outer bundle of strongly predicted (and independently stable) TM has already formed.



View larger version (40K):
[in this window]
[in a new window]
 
Fig. 10. Consensus prediction as applied to the ClC chloride channel/transporter (1KPK). Helices coloured in red are strongly predicted. Helices coloured in yellow are predicted as a single helix rather than two separate helices. Helices coloured in green are not predicted by any of the methods. Helices coloured in grey are non-transmembrane. (A) View perpendicular to the bilayer normal, in which dashed lines represent the approximate location of the bilayer. (B) View down the bilayer normal, i.e. rotated 90° relative to (A). Note that the strongly predicted TM helices (red) form the exterior of the TM helix bundle. A colour version of this figure is available as Supplementary material at PEDS Online.

 
Interestingly, a similar situation is found with the aquaporins (Aqp1, GlpF and AqpZ, see Table I). In the aquaporin architecture there are six ‘classical’ TM helices plus two half-TMs that line up end-to-end (in an anti-parallel orientation). Again, none of the methods predicts correctly the half-TMs—instead they miss one or both of the half-TMs or overlap them with another TM helix. Inspection of the Aqp fold reveals that the two half-TMs are somewhat shielded from the surrounding lipid environment.

Another pore-forming protein with a short helix in the interior of the TM helix bundle is the protein-conducting SecY complex. The {alpha}-subunit, which is the pore-forming subunit, is formed by 10 TM helices, which surround a central pore. In the conformation captured in the crystal structure the pore is ‘plugged’ by a short (four-residue) {alpha}-helix, referred as TM2a (see Figure 11). If one examines the consensus prediction for the SecY {alpha}-subunit, it can be seen that whilst there are typically accurate predictions for TM1 and TM2, the short plug helix TM2a is missed by all of the methods, with the exception of TMFINDER, which predicts it as an N-terminal extension of the TM2 helix. This is consistent with the three-state model of membrane protein folding (i.e. TM2a folds into the cavity formed by the other TM helices) and also with the proposal that the plug helix may leave the pore during the cycle of conformational changes associated with transport through the pore (van den Berg et al., 2003Go).



View larger version (57K):
[in this window]
[in a new window]
 
Fig. 11. (A) Section from the consensus prediction for the {alpha}-subunit of the SecY complex (pdb code 1RHZ) illustrating the failure to predict the short ‘plug’ helix TM2a. (B, C) Two perpendicular views of the SecY {alpha}-subunit, showing the TM1 and TM2 transmembrane helices (blue ribbons) and the short ‘plug’ helix (red ribbon). A colour version of this figure is available as Supplementary material at PEDS Online.

 
Application to KvAP

We will now consider a membrane protein, the structure of which has generated some debate as to its conformation when present within a membrane and which we therefore omitted from the dataset used to analyse the prediction methods. This is KvAP, a bacterial homologue of eukaryotic voltage-gated K channels (Cuello et al., 2004Go; Elliott et al., 2004Go; Jiang et al., 2004Go; Lee and MacKinnon, 2004Go; Starace and Bezanilla, 2004Go). Prior to the structure determination, topological data and homology with the simpler K channel KcsA had suggested a structure with six TM helices (S1 to S6) per monomer, with the pore-forming half-TM (the P helix) located between S5 and S6. The S4 helix contained multiple basic residues and was proposed to act as the voltage sensor, undergoing a change in orientation when the voltage across a cell membrane changed. The other TM helices (i.e. S1 to S3 and S5) were proposed to form a cavity within which the more polar S4 helix was accommodated. The crystal structure of KvAP was therefore a surprise when it revealed that the S4 helix was located in a membrane surface location, along with the C-terminal half of S3 (S3b; see Figure 12). The S3b–S4 helix hairpin is suggested to form a mobile element which changes orientation (and possibly conformation) in response to changes in transmembrane voltage.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 12. Section from the consensus prediction for KvAP (pdb code 1ORQ) illustrating that the S3a, S4 (gating) and P (pore) helices are less well predicted than the other helices. The colour scheme is the same as used in Figures 8, 9 and 11. A colour version of this figure is available as Supplementary material at PEDS Online.

 
When consensus prediction is applied to the KvAP protein sequence, the results are of interest (Figure 12). TM helices S1, S2, S5 and S6 are relatively strongly predicted. However, prediction of the S4 helix is very weak with the consensus prediction method assigning a string of ‘?’s to this region. Nevertheless, it is intriguing that several of the individual prediction methods do predict a TM helix for S4. Recent experimental studies (Hessa et al., 2005bGo) indicate that the isolated S4 helix can insert into endoplasmic reticulum membranes, but that S4 is near the threshold for bilayer insertion. S3 is also complex—S3a is not predicted as a TM helix but S3b is. Hence the region which is thought to form the mobile voltage sensor (S3 and S4) has a complex TM prediction ‘signal’. Turning to the pore domain (which is homologous to that in KcsA, see Table I), the P-helix (a half-TM) is only weakly predicted, as was noted above for the half-TMs of ClC and Aqp and for the short ‘plug’ helix (TM2a) of the SecY complex.


    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
This study adds to a number of recent comparisons (Jayasinghe et al., 2001Go; Chen et al., 2002Go) of TM helix prediction methods against an expanding dataset of X-ray structures of {alpha}-helical integral membrane proteins. (Of course, it remains possible that membrane proteins that crystallize most readily may present an atypical distribution of TM helices, but until further data are available it remains difficult to test this.) As might be expected, the exact ranking of the different prediction methods depends on the metric employed to compare them. However, a number of methods consistently perform better than others. In particular, SPLIT4 and TMHMM2 and HMMTOP2 perform well according to a number of criteria. If we examine the basis of these two methods, we may be able to understand better the basis of their performance. As discussed by, e.g., Viklund and Elofsson (2004)Go, the hidden Markov models underlying the latter two models reflect the physical and biological processes underlying membrane protein stability and insertion. SPLIT4 uses a combination of hydrophobicity analysis (based on more than one hydrophobicity scale) and identification of cytoplasmic charge clusters. Taken together, this suggests that optimal prediction is obtained by the method that best reflects the biological and physical principles governing membrane protein architecture.

We also considered the performance of TM prediction methods from the perspective of the identification of exact start and end residues for TM helices. This is of considerable importance if TM predictions are to be used as a component of attempts to model the three-dimensional folds of membrane proteins. Again SPLIT4 performs well, but even the three best methods are inaccurate by just over two turns of {alpha}-helix in terms of predicting the termini. Therefore, other may be required to refine sequence-based predictions of TM helices. One approach may be via molecular simulations of model TM helices in a bilayer environment (Forrest et al., 1999Go). However, systematic studies of such a simulation-based approach have yet to be performed.

A simple consensus method for TM helix prediction was also explored. Although this does not seem to outperform the best single methods, it is of value for, e.g., genome-wide prediction of membrane protein topologies (J.M.Cuthbertson, D.A.Doyle and M.S.P.Sansom, unpublished results) and for identifying potential {alpha}-helical regions which do not present a clear TM helix signal. Such regions may correspond to, e.g., functionally important half-TM helices within more complex membrane proteins. Hence the use of a consensus method can direct one's attention to difficult and/or interesting regions of a prediction. It is also likely that as our experimental understanding of the processes of topogenesis of these more complex regions of membrane proteins improves (Umigai et al., 2003Go), we will be able to embed such knowledge in improved methods for TM prediction.

We have undertaken a number of case studies of when TM helix prediction methods fail (for known structures) or give ‘mixed signal’ results (for unknown structures). Albeit from a limited number of cases, our results suggest that ‘failure’ can arise where half-TMs or more complex secondary structure elements are located within an outer bundle of ‘canonical’ hydrophobic TM helices. This is consistent with a recent modification (Engelman et al., 2003Go) of the original two-state model (Popot and Engelman, 1990Go) of membrane protein folding and with more recent studies of the mechanism of recognition of TM helices by the translocon in the endoplasmic reticulum (Hessa et al., 2005aGo). In this three-state model, e.g., loops containing half-TMs are inserted after formation of the TM helix bundle. This model is supported by, e.g., recent studies of topogenesis of the pore-forming P-loop of K+ channels (Umigai et al., 2003Go). Therefore, it seems that the consensus method predicts TM helices that appear to form a core structural framework within which functionally important sections may then be inserted. Hence for K+ channels the pore-lining outer and inner helices are well predicted, whereas the selectivity filter involving the pore half-TM helix is not. A similar situation holds for the CLC channel/transporter and for the aquaporins. Hence a consensus prediction may be able to help in the identification of functionally important regions, especially in combination with sequence homology between functionally related proteins.

In summary, this study not only provides an updated critical evaluation of TM helix prediction methods, but also provides an analysis of (some of) the factors that may limit the success of such approaches. These analyses may aid the development of further predictive approaches for TM helices. For example, future prediction methods may include consideration of some of the additional complexities of TM helices, including: helix lengths and tilts relative to the bilayer plane (Bowie, 1997Go) and the presence of glycine (Eilers et al., 2000Go) and of proline (Cordes et al., 2002Go) residues and of more complex motifs (Senes et al., 2000Go) within TM helices. These and related advances in the structural bioinformatics of membrane proteins (Chamberlain and Bowie, 2004Go; Eyre et al., 2004Go; Gimpelev et al., 2004Go) may be combined with novel approaches (Pellegrini-Calace et al., 2003Go; Beuming and Weinstein, 2004Go; Im and Brooks, 2004Go; Kim et al., 2004Go; Kokubo and Okamoto, 2004Go) to modelling and simulation of membrane protein helix folding. In this way we may hope to advance protocols for the prediction and modelling of the three-dimensional TM fold of membrane proteins.


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Accardi,A. and Miller,C. (2004) Nature, 427, 803–807.[CrossRef][ISI][Medline]

Arkin,I. and Brunger,A. (1998) Biochim. Biophys. Acta, 1429, 113–128.[ISI][Medline]

Arkin,I.T., Brünger,A.T. and Engelman,D.M. (1997) Proteins., 28, 465–466.[CrossRef][ISI][Medline]

Bendtsen,J.D., Nielsen,H., von Heijne,G. and Brunak,S. (2004) J. Mol. Biol., 340, 783–795.[CrossRef][ISI][Medline]

Beuming,T. and Weinstein,H. (2004) Bioinformatics, 20, 1822–1835.[Abstract/Free Full Text]

Booth,P.J. (2000) Biochim. Biophys. Acta, 1460, 4–14.[ISI][Medline]

Bowie,J.U. (1997) J. Mol. Biol., 272, 780–789.[CrossRef][ISI][Medline]

Chamberlain,A.K. and Bowie,J.U. (2004) Biophys. J., 87, 3460–3469.[Abstract/Free Full Text]

Chen,C.P., Kernytsky,A. and Rost,B. (2002) Protein Sci., 11, 2774–2791.[Abstract/Free Full Text]

Claros,M.G. and von Heijne,G. (1994) Comput. Appl. Biosci., 10, 685–686.[Medline]

Cordes,F.S., Bright,J.N. and Sansom,M.S.P. (2002) J. Mol. Biol., 323, 951–960.[CrossRef][ISI][Medline]

Cserzo,M., Wallin,E., Simon,I., von Heijne,G. and Elofsson,A. (1997) Protein Eng., 10, 673–676.[CrossRef][ISI][Medline]

Cuello,L.G., Cortes,D.M. and Perozo,E. (2004) Science, 306, 491–495.[Abstract/Free Full Text]

Deber,C.M. and Goto,N.K. (1996) Nat. Struct. Biol., 3, 815–818.[CrossRef][ISI][Medline]

Deber,C.M., Brandl,C.J., Deber,R.B., Hsu,L.C. and Young,X.K. (1986) Arch. Biochem. Biophys., 251, 68–76.[CrossRef][ISI][Medline]

Deber,C.M., Wang,C., Liu,L.P., Prior,A.S., Agrawal,S., Muskat,B.L. and Cuticchia,A.J. (2001) Protein Sci., 10, 212–219.[Abstract/Free Full Text]

Dutzler,R., Campbell,E.B., Cadene,M., Chait,B.T. and MacKinnon,R. (2002) Nature, 415, 287–294.[CrossRef][ISI][Medline]

Eilers,M., Shekar,S.C., Shieh,T., Smith,S.O. and Fleming,P.J. (2000) Proc. Natl Acad. Sci. USA, 97, 5796–5801.[Abstract/Free Full Text]

Eisenberg,D. (1984) Annu. Rev. Biochem., 53, 595–623.[CrossRef][ISI][Medline]

Elliott,D.J.S., Neale,E.J., Aziz,Q., Dunham,J.P., Munsey,T.S., Hunter,M. and Sivaprasadarao,A. (2004) EMBO J., 23, 4717–4726.[Abstract/Free Full Text]

Engelman,D.M., Henderson,R., McLachlan,A.D. and Wallace,B.A. (1980) Proc. Natl Acad. Sci. USA, 77, 2023–2027.[Abstract/Free Full Text]

Engelman,D.M., Steitz,T.A. and Goldman,A. (1986) Annu. Rev. Biophys. Biophys. Chem., 15, 321–353.[CrossRef][ISI][Medline]

Engelman,D.M. et al. (2003) FEBS Lett., 555, 122–125.[CrossRef][ISI][Medline]

Eyre,T.A., Partridge,L. and Thornton,J.M. (2004) Protein Eng. Des. Sel., 17, 613–624.[Abstract/Free Full Text]

Forrest,L.R., Tieleman,D.P. and Sansom,M.S.P. (1999) Biophys. J., 76, 1886–1896.[Abstract/Free Full Text]

Gimpelev,M., Forrest,L.R., Murray,D. and Honig,B. (2004) Biophys. J., 87, 4075–4086.[Abstract/Free Full Text]

Hessa,T., Kim,H., Bihlmaier,K., Lundin,C., Boekel,J., Andersson,H., Nilsson,I., White,S.H. and von Heijne,G. (2005a) Nature, 433, 377–381.[CrossRef][ISI][Medline]

Hessa,T., White,S.H. and von Heijne,G. (2005b) Science, 307, 1427.[Abstract/Free Full Text]

Hofmann,K. and Stoffel,W. (1993) Biol. Chem. Hoppe-Seyler, 374, 166.

Hunt,J.F., Earnest,T.N., Bousche,O., Kalghatgi,K., Reilly,K., Horvath,C., Rothschild,K.J. and Engelman,D.M. (1997a) Biochemistry, 36, 15156–15176.[CrossRef][ISI][Medline]

Hunt,J.F., Rath,P., Rothschild,K.J. and Engelman,D.M. (1997b) Biochemistry, 36, 15177–15192.[CrossRef][ISI][Medline]

Ikeda,M., Arai,M., Lao,D.M. and Shimizu,T. (2002) In Silico Biol., 2, 19–33.[Medline]

Im,W. and Brooks,C.L. (2004) J. Mol. Biol., 337, 531–519.[CrossRef]

Iverson,T.M., Luna-Chavez,C., Cecchini,G. and Rees,D.C. (1999) Science, 284, 1961–1966.[Abstract/Free Full Text]

Jayasinghe,S., Hristova,K. and White,S.H. (2001) Protein Sci., 10, 455–458.[Abstract/Free Full Text]

Jiang,Q.X., Wang,D.N. and MacKinnon,R. (2004) Nature, 430, 806–810.[CrossRef][ISI][Medline]

Jiang,Y., Lee,A., Chen,J., Ruta,V., Cadene,M., Chait,B.T. and Mackinnon,R. (2003) Nature, 423, 33–41.[CrossRef][ISI][Medline]

Jones,D.T. (1998) FEBS Lett., 423, 281–285.[CrossRef][ISI][Medline]

Jones,D.T., Taylor,W.R. and Thornton,J.M. (1994) Biochemistry, 33, 3038–3049.[CrossRef][ISI][Medline]

Juretic,D., Zoranic,L. and Zucic,D. (2002) J. Chem. Inf. Comput. Sci., 42, 620–632.[CrossRef][ISI][Medline]

Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577–2637.[CrossRef][ISI][Medline]

Katragadda,M., Alderfer,J.L. and Yeagle,P.L. (2001) Biophys. J., 81, 1029–1036.[Abstract/Free Full Text]

Killian,J.A. and von Heijne,G. (2000) Trends Biochem. Sci., 25, 429–434.[CrossRef][ISI][Medline]

Kim,S., Chamberlain,A.K. and Bowie,J.U. (2004) Proc. Natl Acad. Sci. USA, 101, 5988–5991.[Abstract/Free Full Text]

Koebnik,R., Locher,K.P. and Van Gelder,P. (2000) Mol. Microbiol., 37, 239–253.[CrossRef][ISI][Medline]

Kokubo,H. and Okamoto,Y. (2004) J. Chem. Phys., 120, 10837–10847.[CrossRef][ISI][Medline]

Krogh,A.L.B., von Heijne,G. and Sonnhammer,E.L.L. (2001) J. Mol. Biol., 305, 567–580.[CrossRef][ISI][Medline]

Kyte,J. and Doolittle,R.F. (1982) J. Mol. Biol., 157, 105–132.[CrossRef][ISI][Medline]

Lancaster,C.R., Kroger,A., Auer,M. and Michel,H. (1999) Nature, 402, 377–385.[CrossRef][ISI][Medline]

Landolt-Marticorena,C., Williams,K.A., Deber,C.M. and Reithmeier,R.A.F. (1993) J. Mol. Biol., 229, 602–608.[CrossRef][ISI][Medline]

Lee,S.Y. and MacKinnon,R. (2004) Nature, 430, 232–235.[CrossRef][ISI][Medline]

Li,S.C. and Deber,C.M. (1992) FEBS Lett, 311, 217–220.[CrossRef][ISI][Medline]

Marti,T. (1998) J. Biol. Chem., 273, 9312–9322.[Abstract/Free Full Text]

Möller,S., Croning,D.R. and Apweiler,R. (2001) Bioinformatics, 17, 646–653.[Abstract/Free Full Text]

Nakai,K. and Kanehisa,M. (1992) Genomics, 14, 897–911.[ISI][Medline]

Pellegrini-Calace,M., Carotti,A. and Jones,D.T. (2003) Proteins, 50, 537–545.[CrossRef][ISI][Medline]

Persson,B. and Argos,P. (1997) J. Protein Chem., 16, 453–457.[CrossRef][ISI][Medline]

Popot,J.L. and Engelman,D.M. (1990) Biochemistry, 29, 4031–4037.[CrossRef][ISI][Medline]

Popot,J.L. and Engelman,D.M. (2000) Annu. Rev. Biochem., 69, 881–922.[CrossRef][ISI][Medline]

Rost,B., Fariselli,P. and Casadio,R. (1996) Protein Sci., 5, 1704–1718.[Abstract/Free Full Text]

Senes,A., Gerstein,M. and Engelman,D.M. (2000) J. Mol. Biol., 296, 921–936.[CrossRef][ISI][Medline]

Starace,D.M. and Bezanilla,F. (2004) Nature, 427, 548–553.[CrossRef][ISI][Medline]

Terstappen,G.C. and Reggiani,A. (2001) Trends Pharmacol. Sci., 22, 23–26.[CrossRef][ISI][Medline]

Tusnády,G.E. and Simon,I. (1998) J. Mol. Biol., 283, 489–506.[CrossRef][ISI][Medline]

Tusnády,G.E. and Simon,I. (2001) Bioinformatics, 17, 849–850.[Abstract/Free Full Text]

Ubarretxena-Belandia,I. and Engelman,D.M. (2001) Curr. Opin. Struct. Biol., 11, 370–376.[CrossRef][ISI][Medline]

Ulmschneider,M.B. and Sansom,M.S.P. (2001) Biochim. Biophys. Acta, 1512, 1–14.[ISI][Medline]

Umigai,N., Sato,Y., Mizutani,A., Utsumi,T., Sakaguchi,M. and Uozumi,N. (2003) J. Biol. Chem., 278, 40373–40384.[Abstract/Free Full Text]

van den Berg,B., Clemons,W.M., Collinson,I., Modis,Y., Hartmann,E., Harrison,S.C. and Rapoport,T.A. (2003) Nature, 427, 36–44.[CrossRef][ISI][Medline]

Viklund,H. and Elofsson,A. (2004) Protein Sci., 13, 1908–1917.[Abstract/Free Full Text]

von Heijne,G. (1992) J. Mol. Biol., 225, 487–494.[CrossRef][ISI][Medline]

Wallin,E. and von Heijne,G. (1998) Protein Sci., 7, 1029–1038.[Abstract/Free Full Text]

White,S.H. (2004) Protein Sci., 13, 1948–1949.[Abstract/Free Full Text]

White,S.H. and Wimley,W.C. (1999) Annu. Rev. Biophys. Biomol. Struct., 28, 319–365.[CrossRef][ISI][Medline]

Wilson,C.L., Hubbard,S.J. and Doig,A.J. (2002) Protein Eng., 15, 545–554.[CrossRef][ISI][Medline]

Received April 25, 2005; accepted April 25, 2005.

Edited by Klaus Schulten