Computational analysis of {alpha}-helical membrane protein structure: implications for the prediction of 3D structural models

Tina A. Eyre1, Linda Partridge1 and Janet M. Thornton2,3

1Department of Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT and 2European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK

3 To whom correspondence should be addressed. E-mail: thornton{at}ebi.ac.uk


    Abstract
 Top
 Abstract
 Introduction
 Review of previous transmembrane...
 Methods
 Results
 References
 
Relatively little has been known about the structure of {alpha}-helical membrane proteins, since until recently few structures had been crystallized. These limited data have restricted structural analyses to the prediction of secondary structure, rather than tertiary folds. In order to address this, this paper describes an analysis of the 23 available membrane protein structures. A number of findings are made that are of particular relevance to transmembrane helix packing: (1) on average lipid-tail-accessible transmembrane residues are significantly more hydrophobic, less conserved and contain different residue types to buried residues; (2) charged residues are not always buried and, when accessible to membrane lipid tails, few are paired with another charge and instead they often interact with phospholipid head-groups or with other residue types; (3) a significant proportion of lipid-tail-accessible charged and polar residues form hydrogen bonds only with residues one turn away in the same helix (intra-helix); (4) pore-lining residues are usually hydrophobic and it is difficult to distinguish them from buried residues in terms of either residue type or conservation; and (5) information was gained about the proportion of helices that tend to contribute to lining a pore and the resulting pore diameter. These findings are discussed with relevance to the prediction of membrane protein 3D structure.

Keywords: membrane protein/structure prediction/transmembrane helix


    Introduction
 Top
 Abstract
 Introduction
 Review of previous transmembrane...
 Methods
 Results
 References
 
{alpha}-Helical transmembrane (TM) proteins are estimated to comprise 20–50% of all proteins (Arkin et al., 1997Go; Wallin and von Heijne, 1998Go) and are of great biological significance since they mediate most of the communication between cells and cellular compartments. Practical difficulties in crystallizing TM proteins have led to very few structures being available for analysis. Whilst the location of TM {alpha}-helices in sequence is now relatively efficient (von Heijne and Gavel, 1988Go; Jones et al., 1994Go; Jayasinghe et al., 2001Go; Krogh et al., 2001Go; Moller et al., 2001Go), our knowledge of the way in which these helices pack together remains limited. This study analyses the TM structures that are available at present to identify the characteristics of residues forming helix–helix interfaces, lining TM pores and accessible to membrane lipid tails. The goal is to identify clues in the sequence conservation, hydrophobicity and residue content of these regions that can be used to guide 3D modelling from sequence information alone. The data are discussed with relevance to the prediction of 3D structural models of TM proteins.


    Review of previous transmembrane protein structure analyses
 Top
 Abstract
 Introduction
 Review of previous transmembrane...
 Methods
 Results
 References
 
As shown in Figure 1, the central region of the membrane consists of a 30 Å thick hydrophobic region, formed by the hydrocarbon tails of the phospholipids. This is surrounded on either side by a 15 Å thick region formed by the highly polar phospholipid head-groups. The regions of TM helices located in these two environments will be termed the ‘lipid-tail-spanning’ and ‘head-group-spanning’ regions, respectively, throughout this paper. Within each of these regions, the residues of transmembrane helices can be considered to be found in one of three environments: either (i) buried against other regions of protein (ii) accessible to the membrane lipid tails or head-groups or (iii) lining a pore. Throughout this work the characteristics of these regions are analysed and compared. The lipid-tail/head-group-accessible, buried and pore-lining regions of the TM helices are indicated in Figure 1.



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 1. Schematic, 2D slice showing the structure and thickness of a typical membrane and several transmembrane helices forming a pore. For analysis, residues are classified into different groups according to their environment, as indicated.

 
The different environments of lipid-tail-accessible and buried residues result in different sequence characteristics. Hence lipid-tail-accessible residues will more often be hydrophobic in character than buried residues, since the protein must be stable when folded in the fatty, hydrophobic membrane. Early analysis of the preferences of residues for buried or lipid-tail-accessible positions was performed on the photosynthetic reaction centre of Rhodobacter sphaeroides (Rees et al., 1989Go). In this TM protein, the lipid-tail-accessible residues are more hydrophobic than those buried (Rees et al., 1989Go), whereas in soluble proteins the buried residues are more hydrophobic (Chothia, 1976Go), reflecting the differing environments in which these classes of protein are found.

Rees et al. (1989)Go showed that the buried residues of the photosynthetic reaction centre of R.sphaeroides are more conserved than the lipid-tail-accessible residues. This is likely to be because there will be strong selective pressure to conserve buried residues so that they continue to interact favourably with their interaction partners on other helices and maintain the tertiary fold. There will be much less selective pressure for lipid-tail-accessible residues to be conserved, beyond the baseline requirement to be hydrophobic, since mutations in these residues are much less likely to disrupt the folding, and hence the function, of the protein. The present study determines the degree to which these trends in hydrophobicity and conservation hold for a much larger data set of TM proteins.

Several studies have analysed the differences in residue composition between the lipid-tail-accessible and buried regions. It has been observed that, in addition to the interaction of inter-helical hydrophobic residues found in soluble proteins, TM proteins make use of a wide variety of interactions between polar and charged residues during TM helix packing (Adamian and Liang, 2001Go; Eilers et al., 2002Go). Javadpour et al. (1999)Go found trends for glycine, alanine, serine, threonine, cysteine and proline to be preferred in buried positions in four TM proteins and for leucine, isoleucine, glutamine and, unexpectedly, the charged residues to be prefered in lipid-tail accessible positions. In contrast, a more recent study involving 15 structures, by Ulmschneider and Sansom (2001)Go, found that leucine and isoleucine showed no preference and that the charged residues weakly favoured buried positions. However these results are likely to be biased by the inclusion of multiple members of four of the protein families within the data set.

Since there has recently been a rapid rise in the number of membrane protein structures solved (twice as many are now available as when the last comprehensive study was performed in 2001), it seems likely that it will now be possible to determine with more confidence which of these previous conflicting results is correct. To our knowledge, the more recent studies analysing a number of structures have not included sequence conservation information. In addition, no studies have analysed whether the observed differences between lipid-tail-accessible and buried residues are sufficient to distinguish between the two environments in a predictive way. These issues are addressed in the current work.

Mainly owing to lack of data, TM structure prediction methods have often been based only on sequence data (Donnelly et al., 1989Go; Pilpel et al., 1999Go) or on very small numbers of TM protein structures (Donnelly et al., 1993Go; Liu et al., 2004Go). In addition, a number of methods have made use of residue potentials derived from the structures of soluble proteins (Fleishman and Ben-Tal, 2002Go; Chen and Chen, 2003Go; Pellegrini-Calace et al., 2003Go). Given the considerable differences between the packing of TM and soluble protein helices (Rees et al., 1989Go; Javadpour et al., 1999Go; Eilers et al., 2000Go; Jiang and Vakser, 2000Go; Adamian and Liang, 2001Go; Ulmschneider and Sansom, 2001Go), the use of data derived from soluble proteins is likely to limit the accuracy of these predictive methods.

There is therefore a need to identify what features, if any, of TM helix packing would be applicable for use in the prediction of the tertiary structure of TM proteins. In the current work, the amino acid composition of 23 non-homologous {alpha}-helical membrane proteins with known structures was analysed. Only polytopic proteins were selected (those spanning the membrane more than once), since those with only one TM helix could contain no buried residues within the membrane-spanning region. Sequence conservation, hydrophobicity and amino acid propensities were compared between lipid-tail-accessible and buried residues. Differences between the two groups of residues with the potential for use in TM protein structure prediction are identified and discussed, in order to determine whether prediction of 3D structural models from protein sequence characteristics alone will be feasible in the near future. Interestingly, it is also found that several charged and polar residues prefer lipid-tail-accessible positions to buried and that many of the hydrogen bonds made by side chains are intra-helical.


    Methods
 Top
 Abstract
 Introduction
 Review of previous transmembrane...
 Methods
 Results
 References
 
Identification of available transmembrane protein structures

Membrane proteins of known 3D structure were identified from a website of membrane protein resources (http://blanco.biomol.uci.edu/MemPro_resources.html). Proteins with only one TM helix or those that span the membrane with ß-sheets were excluded. At the time of writing (July 2004), the structures of 66 {alpha}-helical polytopic TM proteins had been determined and deposited in the PDB. If two of these proteins shared more than 20% sequence identity with another or were known to be from the same family, the relative with the lowest resolution structure was removed from the data set. Forty-three proteins were removed in this way, leaving 23 non-homologous proteins, listed in Table I.


View this table:
[in this window]
[in a new window]
 
Table I. Non-homologous polytopic membrane proteins with known 3D structure as of July 2004

 
A viral fusion protein (2siv) is excluded from the calculations, owing to its extremely unusual amino acid composition. These unusual properties are likely to be caused by the ability of the protein to exist in both aqueous and membrane-spanning environments, unlike a typical TM protein (Malashkevich et al., 1998Go).

Eighteen water-soluble proteins for comparison with the membrane proteins were randomly extracted by text searching of the PDB (update version November 2002, containing 19 311 structures) with the keyword ‘protein’. Homologous proteins were removed using the same method as for the TM proteins.

Location of transmembrane helices from 3D coordinates

The algorithm ProSEC (Slidel, 1997Go) was used to identify all secondary structure elements ({alpha}-helices and ß-sheets) in each of the proteins. This program uses the assignment of residues to secondary structure classes made by DSSP (Kabsch and Sander, 1983Go) to define the positions of helices and strands. The angles between sequential pairs of residues are determined and breaks or joins are introduced between the secondary structure elements as appropriate. While ProSEC was designed to assign automatically the handedness of a motif, its output can be set to give a set of vectors defining the start and end point of each {alpha}-helix and ß-sheet.

The membrane normal was approximated for each protein using an algorithm we developed in Perl named PSlice. Likely TM helices were selected from the helices identified by ProSEC in an iterative process, owing to their length (>15 residues) and approximately parallel arrangement (less than 40° to one another). The membrane normal was calculated as the average of the vectors representing these TM helices. The likely TM-spanning region for each protein was identified by PSlice as the 30 Å thick slice, perpendicular to the membrane normal, for which the surface-accessible residues were maximally hydrophobic. Surface-accessible residues were defined throughout this work as those with a relative accessible surface area, calculated by NACCESS (Hubbard, 1992Go), of 5% or more.

Throughout this work, the White and Wimley scale (Wimley et al., 1996Go; Jayasinghe et al., 2001Go) was used to assess residue hydrophobicity. This scale takes into account the solvation energy of both the amino acid side chain and the peptide backbone. It is derived from the partitioning of peptides, containing the ‘guest’ residue under study, into n-octanol. The scale was selected, first, since it has been shown to predict TM helices with an accuracy of >99% from sequence (Jayasinghe et al., 2001Go). Second, initial studies suggested that it was also effective for the current work, since a distinct peak of hydrophobicity in the residues along the protein surface could be identified. In many cases other hydrophobicity scales were also used, giving similar results, but only the data derived from the White and Wimley scale are shown.

Assignment of residues to classes for analysis

Lipid-tail-spanning residues were defined as those with their Ca atom within the 30 Å lipid-tail-spanning slice identified by PSlice, according to the 3D coordinates. Head-group-spanning residues were defined as those residues with their C{alpha} atom within either of the 15 Å head-group-spanning slices flanking the lipid-tail-spanning slice, as illustrated in Figure 1. The residues within the lipid-tail- and head-group-spanning regions were defined as membrane-spanning. All other residues were classified as non-membrane-spanning and were excluded from the analyses. The membrane-spanning residues were further subdivided into lipid-tail-spanning accessible, lipid-tail-spanning buried, head-group-spanning accessible and head-group-spanning buried groups, as determined by their accessible surface area calculated by NACCESS.

Pore-lining residues were identified manually from a list of accessible residues, by visual inspection of the structures. They are defined as those residues with an accessible surface area of >5% that are not found on the surface of the protein. Only the pore-lining residues within the lipid-tail-spanning region were considered; head-group-spanning pore-lining residues were excluded from the analyses. A few residues that appeared to be snorkelling were excluded from the pore-lining set, since their side chains are likely to be involved in interaction with water or other polar groups in the head-group region, rather than with the contents of the pore. The pore-lining residues were included within the accessible sets of residues for comparisons of buried and accessible residues, due to the difficulty in automatically extracting these residues. However, the inclusion of pore-lining residues is unlikely to be a significant source of error, since the pore-lining residues comprise only 1% of the 6835 residues accessible to lipid tails or head-groups.

Comparison of the secondary structure characteristics of TM and soluble proteins

For each helix defined by ProSEC, the helix length was taken to be the number of residues for which the C{alpha} atom was found within the TM lipid-tail-spanning slice defined by PSlice. By this method the first and last residues were found for each helix. Angular tilts were calculated as the angle between the membrane normal, computed by PSlice and the vector connecting the C{alpha} atoms of the first and last residues in each helix. Helix lengths and tilts were then compared between the soluble and TM proteins analysed.

Analysis of pore diameter

Pores were found in the 12 of the proteins analysed (the K+ channel, aquaporin, ATP synthase H+ channel, the multidrug efflux transporter, adenine nucleotide carrier, inward rectifying K+ channel, the protein-conducting channel, the Clc chloride channel, the small and large mechanosensitive ion channels, formate dehydrogenase and the light-harvesting protein). The pore diameter, at a certain height within the pore, is taken to be the distance, according to the 3D coordinates, between the C{alpha} atoms of the two pore-lining residues at that height that are furthest apart. Hence the pore diameters may be slightly over-estimated since they do not take into account the volume of the side chains. The pore diameters used in this work are the average of three diameters, calculated at different heights in the pore. Both functional and non-functional pores (defined later) are included in the analysis of pore diameter.

Comparison of the hydrophobicity of lipid-tail-accessible and buried residues

For each TM helix, the average hydrophobicity score on the White and Wimley scale (Wimley et al., 1996Go; Jayasinghe et al., 2001Go) of the membrane-accessible residues was compared to that for the buried residues. The calculation was performed separately for both the lipid-tail-spanning and head-group-spanning regions of the helices. For the lipid-tail-spanning region, of the 455 TM helices in the data set, 217 remained after homologous chains were removed. Fourteen helices contained either no accessible or no buried residues, leaving 203 for analysis. Similarly, for the head-group-spanning region, of the 429 non-homologous helices in the data set, 224 helices contained either no accessible or no buried residues, leaving 215 for analysis.

Throughout this work, the statistical significance, or otherwise, of results was determined by paired or unpaired Student's t-tests as appropriate. Significance is indicated, throughout the text, by the use of a probability value, P, which indicates the probability of the observed result occurring by chance alone.

Comparison of the sequence conservation of lipid-tail-accessible and buried residues

For this work, PSIBLAST was run, for a maximum of 20 iterations or until convergence, using a threshold of 1 x 10–40, to identify sequence homologues for each of the proteins of known structure. A relatively high threshold was used to select only close homologues, for which function is likely to be conserved. The prediction of residue conservation amongst these homologues was performed by SCORECONS (Valdar and Thornton, 2001Go). The SCORECONS algorithm scores each residue position of a PSIBLAST-derived (Altschul et al., 1997Go) multiple sequence alignment in terms of its conservation. The mutation matrix of Jones et al. (1992)Go is used to determine the likelihood of a particular residue being replaced by another and to calculate a score based on the variability of each position. A score of 0 indicates a lack of conservation at that position, whereas the maximal score of 1 indicates very high sequence conservation. SCORECONS was run using all default parameters.

For each TM helix the average conservation score of the lipid-tail-accessible residues was compared with that for the buried residues. The calculation was performed separately for both the lipid-tail-spanning and head-group-spanning regions of the helices. For the lipid-tail-spanning region, insufficient homologues were found to derive conservation scores for 49 of the 217 non-homologous helices and eight helices lacked either buried or lipid-tail-accessible residues, leaving 160 for analysis. Similarly, for the head-group-spanning region, insufficient homologues were found for 112 of the 439 helices and 162 lacked buried or lipid-tail-accessible residues, leaving 165 for analysis.

Comparison of the preferences of particular residues for lipid-tail-accessible or buried positions

The propensity of each residue type for lipid-tail-accessible positions is calculated as

This method normalizes the data to account for the greater total number of residues that are accessible than buried. As a result, a value of 1 represents that the residue shows no preference for accessible or buried positions. A value of >1 indicates a preference for lipid-tail-accessible positions and a value of <1 represents a preference for buried positions.

Hydrogen bond analysis

Hydrogen bonding partners were analysed for the 1047 observed charged residues accessible to lipid tails in the data set. Charged residues were arginine, lysine, glutamate, histidine and aspartate. Polar residues were serine, threonine, asparagine, glutamine, cysteine. The remaining residues were classified as hydrophobic. Hydrogen bonds were detected by HBPlus v 3.0 (McDonald and Thornton, 1994Go) and classified using an algorithm developed for the purpose. Hydrogen bonds between pairs of main-chain atoms were excluded from the analysis. Intra-helical hydrogen bonds are those that are formed between two residues found three or four positions apart on the same chain. All other hydrogen bonds are classed as inter-helical. Only hydrogen bonds to protein are detected by HBPlus so bonds to head-groups are inferred from visual inspection of the structures. Likely snorkelling residues were defined as those with C{alpha} atoms <8 Å from the head-group-spanning region (the length of the lysine side chain was estimated as ~8 Å, from bond length calculations).

Analysis of pore-lining residues

The characteristics of pore-lining residues were compared with those of buried and lipid-tail-accessible residues. Only functional pores (defined as those through which transport is known to occur) were included in the analysis since a preliminary study showed differences between the residue composition of functional and non-functional pores. (Non-functional pores were more hydrophobic, probably because they are packed with phospholipids in the native structure.) Functional pores were present in the K+ channel, aquaporin, the ATP synthase H+ channel, the multidrug efflux transporter, the adenine nucleotide carrier, the inward rectifying K+ channel, the protein-conducting channel, the Clc chloride channel and the small and large mechanosensitive ion channels. The channels found in formate dehydrogenase and the light-harvesting protein, and also the inter-subunit pores in aquaporin and the multidrug efflux transporter, were classed as non-functional and excluded from the analysis of pore-lining residue types.


    Results
 Top
 Abstract
 Introduction
 Review of previous transmembrane...
 Methods
 Results
 References
 
Transmembrane protein structures currently available

There are currently 23 non-homologous {alpha}-helical polytopic TM proteins with known structure, as listed in Table I. The proteins average ~125 kDa in size, although some are small single polypeptide chain proteins, whereas others are very large protein complexes consisting of up to 20 chains. They contain 455 TM helices in total, with an average of 19 TM helices per protein. The smallest protein, the adenine nucleotide carrier, has six TM helices, whereas the largest, the multidrug efflux transporter, has 36.

Location of transmembrane helices from 3D coordinates

Fifteen TM proteins, from the analysed data set, are shown in the upper panel of Figure 2, illustrating the considerable variation in size and structure amongst TM proteins.



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 2. Upper panel: a selection of currently available TM protein structures, showing the membrane-spanning slice in red. Lower panel: views of 10 of the pore-containing proteins, along the membrane normal. IR, inward rectifying.

 
Comparison of the secondary structure characteristics of TM and soluble proteins

Various statistics concerning the number, length and angle of {alpha}-helices and ß-sheets in TM proteins, and in soluble proteins for comparison, were calculated as described in the Methods section. The results are consistent with those obtained previously with smaller data sets (Bowie, 1997Go; Ulmschneider and Sansom, 2001Go). The TM proteins contain an average of 19 TM helices, with an average length of 23 residues. The average length of both helices in soluble proteins and non-TM helices in TM proteins is nine residues. Similarly, the average length of ß-strands in both classes of protein is four residues. It can therefore be concluded that, apart from the addition of several longer helices that span the membrane, TM proteins do not differ greatly in their secondary structure composition from non-TM proteins.

The average angular tilt of the TM helices, relative to the membrane normal, was 17 ± 11°. This roughly parallel arrangement of TM helices is likely to facilitate the modelling of their packing in comparison with that for the helices in soluble proteins. However, this does not seem true to the degree that helix tilt can be ignored during modelling (T.Eyre, L.Partridge and J.Thornton, unpublished data). Interestingly, whilst most TM helices are considerably longer than is required to span the membrane, the structures also contain a number of TM helices which only partially span the bilayer. The role of these helices remains unknown. Unfortunately, there appears to be no significant correlation between angle to the membrane normal and helix length (R2 = 0.08), excluding the possibility of estimating the tilt of a TM helix from its length during TM protein modelling.

Analysis of pore diameter

Twelve of the data set proteins contain a membrane-spanning pore. Some of these proteins are shown in the lower panel of Figure 2, in a view along the membrane normal, illustrating the arrangement of TM helices. When modelling a channel-containing TM protein from sequence, considerable constraints could be imposed on the number of possible models if one could determine the approximate size of the pore and number of helices lining it. It is therefore significant that there is a relationship between the total number of TM helices and the number of pore-lining helices (R2 = 0.67, as shown in Figure 3, bottom). Although based on limited data, this relationship permits the number of pore-lining helices to be easily estimated for any protein thought to contain a pore. Second, it is valuable to note that there is a linear relationship between the number of pore-lining helices and the average pore diameter, as shown in Figure 3, top (R2 = 0.83). This information allows estimations to be made about the size of the pore and the number of pore-lining helices that may prove useful in modelling.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 3. (A) Analysis of the relationship between average pore diameter and the number of pore-lining helices. (B) Analysis of the relationship between the total number of TM helices and the number of pore-lining helices. The relationship is approximated by the line y = 0.03x2 + 1.75x.

 
Comparison of the hydrophobicity of lipid-tail-accessible and buried residues

From the data set of 203 helices within the lipid-tail-spanning region of the membrane, the accessible residues are found to be significantly more hydrophobic, according to the White and Wimley hydrophobicity scale, than buried ones (mean hydrophobicity scores –0.47 and –0.12, P < 0.0001, for 2510 lipid-tail-accessible and 1528 buried residues, respectively). This is illustrated in Figure 4. A similar result was achieved using the Kyte and Doolittle scale (Kyte and Doolittle, 1982Go) (data not shown). As expected, the reverse trend is observed in the head-group-spanning region, where the buried residues are more hydrophobic than the accessible ones (P < 0.001), as shown in Figure 4.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 4. Comparison of the hydrophobicity, according to the White and Wimley scale, of lipid-tail/head-group-accessible and buried residues. Plotted is the difference in hydrophobicity between the buried and accessible residues of each helix. Only residues with C{alpha} atoms found within the lipid-tail-spanning region or head-group-spanning region of each protein are considered. See Methods section for details.

 
For predictive power we need to consider not just mean hydrophobicity values but also the strength of the relationship and the frequency with which exceptions occur. We found that in 74% of helices, the mean hydrophobicity is greater for lipid-tail-accessible residues than for buried residues. Therefore, hydrophobicity can make a valuable contribution to prediction but, since the theoretical maximum accuracy of prediction is only 74%, other parameters will be required to increase the performance of the prediction.

Conversely, 26% of helices do not show the expected hydrophobicity characteristics and some have lipid-tail-accessible residues that are considerably less hydrophobic than their buried residues (Figure 4). The situation occurs mainly in helices where the interaction with neighbouring helices is performed by hydrophobic residues, giving a higher than average hydrophobicity of the buried residues. When this is combined with one or more lipid-tail-accessible charged or polar residues, the hydrophobicity of the accessible residues is reduced and in some cases becomes significantly less than the hydrophobicity of the buried residues. Various explanations for the presence of lipid-tail-accessible charged and polar residues are discussed later. These findings confirm that the folding of TM proteins does not simply rely on the principle of burying all hydrophilic residues and exposing all hydrophobic ones.

In the head-group-spanning region, only 61% of helices show the expected trend, with buried residues more hydrophobic than lipid-tail-accessible residues. This much weaker relationship suggests that TM helix packing methods which make use of hydrophobicity data should probably consider the lipid-tail-spanning residues alone.

Comparison of the sequence conservation of lipid-tail-accessible and buried residues

The lipid-tail-spanning residues are more conserved than residues in general, when compared against all of the residues in the TM proteins in this study, as shown in Figure 5. As shown in both Figures 5 and 6, lipid-tail-accessible residues are significantly less conserved in terms of sequence than buried residues in the lipid-tail-spanning region (mean conservation scores are 0.63 and 0.68, respectively, P < 0.001, for 2115 lipid-tail-accessible and 1231 buried residues). Similarly, buried residues also appear to be more conserved than accessible ones throughout the whole protein (i.e. including both membrane- and non-membrane-spanning residues). This is probably due to the requirement of buried residues to be conserved in order to maintain favourable interactions with neighbouring residues.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 5. Comparison of the distributions of conservation scores for buried and lipid-tail-accessible residues within the lipid-tail-spanning region and all residues (membrane-spanning and soluble domains) in 23 TM proteins.

 


View larger version (14K):
[in this window]
[in a new window]
 
Fig. 6. Comparison of the sequence conservation scores of lipid-tail/head-group-accessible and buried residues. Plotted is the difference in conservation between the buried and accessible residues of each helix. Only residues with C{alpha} atoms found within the lipid-spanning region or head-group-spanning region of each protein are considered. See Methods section for details.

 
Whilst there is great variability in the actual average conservation score of buried and lipid-tail-accessible residues between proteins and even between helices in the same protein, in 73% of helices the buried positions are more conserved than the lipid-tail-accessible ones. This can be seen in Figure 6. A slightly weaker trend is observed when comparing the sequence conservation of accessible and buried head-group-spanning residues (shown in Figure 6), where 65% of helices have buried residues more conserved than head-group-accessible residues. Again, this suggests that conservation is a method by which the buried face of many TM helices could be identified, although other information would be needed to improve the accuracy above the maximum of 73%. In addition and similar to the findings for hydrophobicity, the conservation of residues within the lipid-tail-spanning region is likely to make a slightly more reliable predictor of helix packing than in the head-group-spanning region.

Comparison of the preferences of particular residues for lipid-tail-accessible or buried positions

Within the lipid-tail-spanning region, certain individual amino acids show a significant preference for either accessible or buried environments, as shown in Figure 7. According to this method, the hydrophobic amino acids leucine, phenylalanine and tryptophan show a preference for lipid-tail-accessible positions. Consistent with the work of Ulmschneider and Sansom (2001)Go, alanine shows a preference for buried positions, but the other hydrophobic residues show no significant preference. This is at first sight somewhat unexpected, given the hydrophobicity of the bilayer. However, the problem appears to be to due to the fact that these residues have a genuine lack of preference, with propensity values very close to 1, rather than a lack of statistical power caused by the limited size of the data set. The finding probably reflects both the ability of hydrophobic residues to interact favourably with the membrane lipid tails and their important role in leucine zipper packing (Crick, 1953Go; Langosch and Heringa, 1998Go) of the TM helices.



View larger version (43K):
[in this window]
[in a new window]
 
Fig. 7. Residue preferences for lipid-tail-accessible or buried positions in 23 TM proteins. Only residues in the lipid-tail-spanning region are included. A value of >1 indicates an enrichment in the lipid-tail-accessible positions relative to buried positions. Similarly, a value of <1 indicates an enrichment in the buried positions relative to lipid-tail-accessible positions. For details of propensity calculation, see Methods section.

 
In general, lipid-tail-accessible positions are enriched in aromatic residues (P < 0.05). This may be due to the fact that such large aromatic residues are difficult to pack efficiently between helices. In addition, aromatic residues are thought to play an important role in the anchoring of TM proteins at the correct height within the bilayer (Yuen et al., 2000Go), for which an accessible position is likely to be required.

Glycine, serine, alanine, histidine and cysteine show strong preferences for buried positions. These results are consistent with the recent work of several groups who have shown the importance of small polar residues, particularly glycine, alanine, serine and threonine, in the close packing of TM helix interfaces (Javadpour et al., 1999Go; Eilers et al., 2000Go; Senes et al., 2000Go; Ulmschneider and Sansom, 2001Go). Threonine shows a similar, but weaker and non-significant, preference.

The interaction of these small and polar residues forms the second major method by which TM helices pack, a mechanism that is rarely observed in soluble proteins (Eilers et al., 2002Go). Their small side chain volumes allow the close approach of polar residues on the facing helix to the backbone C{alpha}s for hydrogen bonding (Senes et al., 2001Go). The C{alpha}–H···O bond has been recently shown to cluster at interface regions rich in these residues (Senes et al., 2001Go) and is thought to contribute considerably to protein stability (Shi et al., 2001Go, 2002Go).

The polar or charged residues glutamine, lysine and arginine shows a strong preference for lipid-tail-accessible positions (P < 0.01). This preference for lipid-tail-accessible positions is an unexpected finding, since charged and polar residues are generally thought to be disfavoured in positions accessible to non-polar lipids. Possible explanations are given later, when the hydrogen bonding partners of lipid-tail-accessible charged residues are analysed. Interestingly, glutamine, arginine and lysine share a very long and flexible side chain, perhaps facilitating their interaction with groups that are located too far away for other residues to reach. The other charged residues, with shorter side chains, tend to be either buried (glutamic acid) or evenly distributed between buried and accessible positions (aspartate).

There is very little correlation between the traditional hydrophobicity scales and the propensity of residues to be accessible (Figure 8). Hence a scale that can accurately predict the likely environment (buried or lipid-tail-accessible) of a lipid-tail-spanning residue must take into account these deviations between the two measures. A lipid-accessibility (LA) scale was derived for this purpose from the observed propensities of lipid-tail-spanning residues to be buried. The value that a residue receives in the LA scale, shown in Figure 8, is simply computed by proportionally scaling the residue propensities into the desired range of –0.5 to 0.5. Hence this is not a traditional hydrophobicity scale, representing the water/lipid-tail solubility of a residue, but a measure that encapsulates all of the factors affecting the positioning of a residue in a real TM helix. We hypothesized that this scale would give far more reliable predictions of residue location than a hydrophobicity scale, since it is derived from the observed properties of buried and lipid-tail-accessible TM residues. In contrast, hydrophobicity scales are often based on accessibilities in soluble proteins or on water–octanol or vapour partition coefficients of individual residues. The performance of the LA scale at prediction of buried and lipid-tail-accessible residues has been tested by applying it to modelling of a TM protein with unknown structure (T.Eyre, L.Partridge and J.Thornton, unpublished data).



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 8. Upper panel: plot of the White and Wimley (WW) hydrophobicity scale, the Kyte and Doolittle (KD) hydrophobicity scale, the GES scale (Engelman et al., 1986Go) and our LA scale against the % lipid-tail-accessibility of each residue. The LA scale is derived from this residue lipid-tail-accessibility data. The correlation coefficient (R2) for each scale with the accessibility data is 1 for the LA scale, 0.07 for the KD scale, 0.002 for the WW scale and 0.14 for the GES scale. The values for a single residue on each scale are joined by a dotted line. Lower panel: % occurrence of each residue in the lipid-tail-spanning region.

 
When comparing the LA scale with several hydrophobicity scales, as shown in Figure 8, major differences between them can be seen. Virtually none of the residues receive a similar score in both the LA and hydrophobicity scales, reflecting the lack of suitability of the hydrophobicity scales for prediction in TM proteins. In particular, whilst the traditional hydrophobicity scales suggest that charged and polar residues in the membrane will be located in buried positions owing to their hydrophilicity, the LA scale demonstrates that the opposite is true for arginine, lysine and glutamine. In addition, the hydrophobicity scales suggest a lipid-tail-accessible location for the hydrophobic residues, with polar residues showing less preference for either environment. However, the LA scale shows that polar residues are, in fact, the major residues which make up helix packing contacts and that hydrophobic residues show little preference for accessible or buried positions.

Unfortunately, on the LA scale, the residues with a strong preference for buried or accessible positions are relatively rare within TM helices, as shown in the bottom part of Figure 8. These residues, particularly arginine and lysine, will therefore be unable to make a major contribution to the prediction of helix face accessibility. Conversely, the very common residues, such as leucine and isoleucine, show very little preference for either environment. It is likely that this effect will be the limiting factor in the predictive power of the scale.

Proposed roles for lipid-tail-accessible charged residues

There are several possible reasons to explain why charged residues might occur in positions classified in this study as lipid-tail-accessible. These are: (i) the charged residues may be paired; (ii) the charged residues may be near the interface with the head-group region; (iii) the charged residues may line a pore or water-filled cavity and therefore not be truly lipid-tail-accessible; (iv) the charged residues may be hydrogen bonded to other residues, water or cofactors. The contribution of each of these reasons to the number of lipid-tail-accessible charged residues was assessed, as shown in Figure 9 and discussed in the sections below.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 9. (A) Hydrogen (H) bonding partners and (B) types of hydrogen bonds for the 1047 observed lipid-tail-accessible charged residues in the data set. Hydrogen bonds were detected by HBPlus (McDonald and Thornton, 1994Go). Main-chain/main-chain bonds are excluded. Since both hydrogen bonding partners are included, each bond contributes twice to the analysis.

 
Pairing of lipid-tail-accessible charges. We tested the hypothesis that perhaps some charged residues prefer lipid-tail-accessible positions because they are paired and therefore effectively neutralized, exposing only non-polar chains. Within a hydrophobic environment, these paired charges would provide extremely strong bonds between the helices. This process does seem to occur, but it accounts for only 11% of the 1047 charged residues accessible to lipid tails. It seems that for a charged residue to exist within a lipid-tail-accessible environment it need not necessarily be paired with an opposing charge if its hydrogen bonding requirements can be satisfied in other ways. About 9% of lipid-tail-accessible charged residues form hydrogen bonds only with uncharged residues. This leaves ~80% of lipid-tail-accessible charged residues that do not form hydrogen bonds with at least one other residue in the protein. These seemingly ‘unsatisfied’ residues will be discussed later.

Hydrogen bonding may occur either between two helices (inter-helix) or between two residues separated by one turn on the same helix (intra-helix). The relative contributions of inter-helix and intra-helix hydrogen bonding are shown in Figure 9.

Inter-helix hydrogen bonding is observed for 15% of all lipid-tail-accessible charged residues. It may serve to anchor one helix relative to another, maintaining the correct conformation of the protein. Inter-helical hydrogen bonds have been identified in the calcium ATPase with this role (Adamian and Liang, 2003Go). However, given the long and flexible side chains of arginine and lysine, this anchoring is unlikely to be highly rigid. An alternative role for inter-helical hydrogen bonding may be to provide a driving force in the correct initial folding of a protein, particularly where the protein would otherwise have a tendency to miss-fold.

Sixty-three of the 1047 lipid-tail-accessible charged residues (6%) form intra-helix hydrogen bonds with residues one turn apart on the same helix. Intra-helical salt bridges were first identified in globular proteins by Marqusee and Baldwin (1987)Go, where they have been shown to increase the stability of helices. They also may serve to kink the helix or to affect its flexibility, but this was not observed during visual examination of the data set TM proteins. Chin and von Heijne (2000)Go have shown that charge interactions between lysine and aspartate residues placed one turn apart cause a polyleucine helix to be located further into the membrane. This suggests that the intra-helical pairing of the charges reduces the free energy change associated with membrane insertion, although the mechanism by which this occurs is unknown. Perhaps the intra-helical interaction of charged and polar residues plays a similar role.

On average, each protein in the data set has 12 inter-helical and five intra-helical hydrogen bonds (including those between both charged and polar residues), in addition to hydrogen bonds between main-chain atoms. Hence almost one-third of all hydrogen bonds (and a very similar proportion of all lipid-tail-accessible hydrogen bonds) formed between residue side chains are intra-helical. Whilst the importance of inter-helix hydrogen bonding in TM proteins has been noted (Adamian and Liang, 2002Go, 2003Go), the significant contribution made by intra-helix interactions has not previously been recognized. This result has implications for the modelling of TM proteins. It suggests that, in general, the presence of charged or polar residues does not provide constraints by which helix–helix interactions can be predicted, since these residues need not be paired with other similar residues on adjacent helices.

Interaction of lipid-tail-accessible charges with head-groups. As described above, ~80% of lipid-tail-accessible charged residues (842 out of 1047) do not form hydrogen bonds with any other residue in the protein. Of these, 25 (3%) formed hydrogen bonds with water. However, given the frequent difficulty in determining the precise location of water molecules in a structure and the low resolution of many membrane protein structures, this figure may be affected by crystallographic errors.

The majority (532 or 63%) of the 842 unsatisfied residues were found close to the lipid-tail/head-group interface. It seems likely that these latter residues are able to reside favourably within the hydrophobic lipid-tail environment because they can ‘snorkel’ their charged groups up into the head-group region where they can form hydrogen bonds with water molecules or with polar atoms of the head groups. Snorkelling can be clearly identified by visual inspection in many of the proteins analysed. The phenomenon is well known for both lysine and arginine (Mishra et al., 1994Go; de Planque et al., 1999Go; Killian and von Heijne, 2000Go), owing to their long, positively charged side chains. Further support for this hypothesis derives from the observation that, on average, 65% of the potential snorkelling residues in each protein are arginine or lysine. In contrast, a value of only 37% would be expected by chance alone, due to the relative frequencies of each residue type in the TM lipid-tail-spanning region as a whole. A possible role of snorkelling residues may be in vertically anchoring the protein in the membrane.

Role of lipid-tail-accessible charges in pore lining. Charged residues are often required for the function of the protein, perhaps in the lining of an ion channel pore or the binding of cofactors or ligands. Ten of the 23 proteins in the analysis contain functional pores. Hence, maybe some of the charged residues classified as lipid-tail-accessible actually line a water-filled pore or cavity and hence are not truly accessible to lipids, despite being classified within this group owing to their large accessible surface area. However, only 75 of the 6835 lipid-tail-accessible residues (1%) were located within a channel. In addition, only seven charged residues were found lining any of the pores and these each made at least one inter-helical hydrogen bond to another residue. Hence location within a water-filled pore is not solely responsible for satisfying the hydrogen bonding requirements of any accessible charged residues.

On analysis of the 75 pore-lining residues, it was found that 9% were charged, 20% were polar, 7% were aromatic and 50% were non-aromatic hydrophobic residues. The most common pore-lining residues were isoleucine (16%), alanine (16%) and glycine (15%). The pore-lining residues are similar in hydrophobicity to buried residues (which also contain 50% of hydrophobic residues). However, there is a slight enrichment of polar and charged residues and a reduction in the numbers of aromatic residues in pore-lining compared with buried positions. The reason for the relatively high hydrophobicity of pore-lining residues is likely to be that they are suited to efficient flow of polar substrates, such as ions, by preventing strong interactions between substrate and channel that may hinder transport. This finding is in contrast to several previous assumptions that a water-filled pore would tend to be lined with polar residues (Milks et al., 1988Go; Oiki et al., 1990Go; Opella et al., 1999Go; Arechaga et al., 2001Go). This knowledge is important in the location of pore-lining helix faces when modelling pore-containing TM proteins.

Figure 10 compares the proportions of pore-lining buried and lipid-tail-accessible residues of each residue type within the lipid-tail-spanning region. It can be seen that there is relatively little difference between the profiles. Hence, distinguishing between pore-lining and buried residues using a predictive method based on residue propensities would be particularly difficult. The problem is probably due, at least in part, to the relatively small number of pore-lining residues available in the data set (75). The finding suggests that considerably more 3D structures of pore-containing TM proteins will be required before predictive methods can identify a channel's pore-lining residues from sequence alone.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 10. A comparison of the proportions of pore-lining, buried and lipid-tail-accessible residues of each residue type in the lipid-tail-spanning region.

 
Similarly, there is no significant difference between the sequence conservation of pore-lining residues and buried residues (Figure 11; the average conservation score was 0.69 for pore-lining and 0.68 for buried residues). This result reflects the strong requirement for the conservation of both buried and pore-lining residues, albeit for different reasons (buried residues are likely to be conserved to maintain structure, whereas the conservation of pore-lining residues is important functionally). Hence sequence conservation is unlikely to be of use in the identification of pore-lining residues.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 11. Comparison of the distribution of sequence conservation scores for 57 pore-lining residues, and residues in buried and lipid-tail-accessible positions in the lipid-tail-spanning region. The mean conservation score is 0.69 for buried residues, 0.69 for pore-lining and 0.59 for lipid-tail-accessible residues. There is no significant difference between the conservation of pore-lining and buried or lipid-tail-accessible residues.

 
Non-hydrogen bonded lipid-tail-accessible charged residues. The remaining 285 (27%) charged, lipid-tail-accessible residues appear to be genuinely accessible to lipid. No source of hydrogen bonding has been identified for them by visual inspection of 3D structures, suggesting that they remain unsatisfied. It is possible that these residues form hydrogen bonds with cofactors or other molecules that are not found in the crystal structures. Alternatively they may occur as a result of slight inaccuracies in the location of the lipid-tail-spanning slice.

Discussion

During this study, the 23 currently available non-homologous polytopic TM protein structures were analysed. This number is more than twice that available for any previous analysis. We performed basic analysis of these structures, generally confirming the results of previous, smaller studies by identifying the preferences of different residues for different TM environments. The results clearly show that the majority of TM helix–helix contacts are made by either small polar residues, particularly glycine, alanine and serine, or large hydrophobic residues. The aromatic residues and several of the charged and large polar residues, show strong preferences for lipid accessibility.

This work confirms the results of Javadpour et al. (1999)Go, using only four TM proteins, indicating that some charged residues in the lipid-tail region show a preference for lipid-tail-accessible, not buried, positions. However, the conclusions contrast with the more recent work of Ulmschneider and Sansom (2001)Go, who stated a trend for the opposite preference. Despite the larger data set used for the latter analysis (15 TM proteins), the results are likely to have been biased by the inclusion of multiple members of the same protein family.

Possible explanations for the presence of lipid-tail-accessible charged residues have not previously been investigated. Here we show that the majority of lipid-tail-accessible charged residues are not paired, but do satisfy their hydrogen bonding potential in other ways, namely by snorkelling their charges into the head-group region, but also by interaction with other residue types. Interestingly, a significant proportion of the interactions with other residues are intra-helical.

Any event that occurs significantly more often than would be expected by chance is likely to be favourable in some way. Hence TM lipid-tail-accessible charged residues must confer some advantage to protein folding or function. For intra-helical paired charges, this may consist of increasing the stability of the protein, as has been shown for soluble proteins (Marqusee and Baldwin, 1987Go) and for a small polyleucine TM helix (Chin and von Heijne, 2000Go). The next steps will involve further experimental work to determine how this increase in stability is achieved and whether unpaired lipid-tail-accessible charged and polar residues have a similar role.

The thickness of the membrane varies between organisms, particularly between prokaryotes and eukaryotes, owing to different lipid-tail compositions. For example, the hydrophobic length of a C22 phosphatidylcholine bilayer, as is optimal for the eukaryotic KcsA K+ channel, is thought to be ~34 Å (Williamson et al., 2003Go), where as that for OmpF in a bacterial membrane is ~25 Å (O'Keeffe et al., 2000Go). These differences may have caused inaccuracies in the location of lipid-tail- and head-group-spanning residues. At present it was felt that the data set is not sufficiently large to permit division of the structures into prokaryotic and eukaryotic sets for separate analysis. However, this approach will likely prove an interesting study in the future, once more structures are available. The present study represents a set highly biased towards prokaryotes (78%) and it is therefore important to focus future structural genomics efforts more towards eukaryotic membrane proteins.

Many features of TM proteins were identified during this work that differ from those found in soluble proteins. TM helices are longer, more parallel and more highly conserved than the helices in soluble proteins. In addition, they contain different residues at buried and accessible positions. The preferences of residues for buried or accessible positions in TM proteins cannot simply be predicted by the use of a traditional hydrophobicity scale, since it is not true that all hydrophobic residues prefer lipid-tail-accessible positions and all hydrophilic residues prefer to be buried. A ‘lipid-accessibility scale’ was developed during this work that represents the residue preferences that are found in TM proteins. Many poorly understood factors involved in residue positioning in TM proteins are encompassed in the LA scale, which shows very little correlation with hydrophobicity scales. The LA scale, together with other knowledge gained during this work, will be of use in the prediction of TM protein structure in the future.


    Acknowledgments
 
The authors gratefully acknowledge the financial support of the BBSRC and the Wellcome Trust throughout this project.


    References
 Top
 Abstract
 Introduction
 Review of previous transmembrane...
 Methods
 Results
 References
 
Abramson,J., Riistama,S., Larsson,G., Jasaitis,A., Svensson-Ek,M., Laakkonen,L., Puustinen,A., Iwata,S. and Wikstrom,M. (2000) Nat. Struct. Biol., 7, 910–917.[CrossRef][ISI][Medline]

Abramson,J., Smirnova,I., Kasho,V., Verner,G., Kaback,H.R. and Iwata,S. (2003) Science, 301, 610–615.[Abstract/Free Full Text]

Adamian,L. and Liang,J. (2001) J. Mol. Biol., 311, 891–907.[CrossRef][ISI][Medline]

Adamian,L. and Liang,J. (2002) Proteins, 47, 209–218.[CrossRef][ISI][Medline]

Adamian,L. and Liang,J. (2003) Cell Biochem. Biophys., 39, 1–12.[ISI][Medline]

Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

Arechaga,I., Ledesma,A. and Rial,E. (2001) IUBMB Life, 52, 165–173.[ISI][Medline]

Arkin,I.T., Brunger,A.T. and Engelman,D.M. (1997) Proteins, 28, 465–466.[CrossRef][ISI][Medline]

Bairoch,A. and Apweiler,R. (1997) Nucleic Acids Res., 25, 31–36.[Abstract/Free Full Text]

Bass,R.B., Strop,P., Barclay,M. and Rees,D.C. (2002) Science, 298, 1582–1587.[Abstract/Free Full Text]

Bertero,M.G., Rothery,R.A., Palak,M., Hou,C., Lim,D., Blasco,F., Weiner,J.H. and Strynadka,N.C. (2003) Nat. Struct. Biol., 19, 681–687.[CrossRef]

Bowie,J.U. (1997) J. Mol. Biol., 272, 780–789.[CrossRef][ISI][Medline]

Chang,C.H., el-Kabbani,O., Tiede,D. and Norris,J. (1991) Biochemistry, 30, 5352–5360.[ISI][Medline]

Chang,G., Spencer,R.H., Lee,A.T. and Barclay,M.T. (1998) Science, 282, 2220–2226.[Abstract/Free Full Text]

Chen,C.M. and Chen,C.C. (2003) Biophys. J., 84, 1902–1908.[Abstract/Free Full Text]

Chin,C.N. and von Heijne,G. (2000) J. Mol. Biol., 303, 1–5.[CrossRef][ISI][Medline]

Chothia,C. (1976) J. Mol. Biol., 105, 1–12.[ISI][Medline]

Conroy,M.J., Westerhuis,W.H., Parkes-Loach,P.S., Loach,P.A., Hunter,C.N. and Williamson,M.P. (2000) J. Mol. Biol., 298, 83–94.[CrossRef][ISI][Medline]

Crick,F.H.C. (1953) Acta Crystallogr., 6, 689–697.[CrossRef][ISI]

De Groot,B.L., Engel,A. and Grubmuller,H. (2001) FEBS Lett., 504, 206–211.[CrossRef][ISI][Medline]

De Planque,M.R., Kruijtzer,J.A., Liskamp,R.M., Kruijff,B. and Killian,J.A. (1999) J. Biol. Chem., 274, 20839–20846.[Abstract/Free Full Text]

Donnelly,D., Johnson,M.S., Blundell,T.L. and Saunders,J. (1989) FEBS Lett., 251, 109–116.[CrossRef][ISI][Medline]

Donnelly,D., Overington,J.P., Ruffle,S.V., Nugent,J.H. and Blundell,T.L. (1993) Protein Sci., 2, 55–70.[Abstract/Free Full Text]

Doyle,D.A., Morais Cabral,J., Pfuetzner,R.A., Kuo,A., Gulbis,J.M., Cohen,S.L., Chait,B.T. and MacKinnon,R. (1998) Science, 280, 69–77.[Abstract/Free Full Text]

Eilers,M., Shekar,S.C., Shieh,T., Smith,S.O. and Fleming,P.J. (2000) Proc. Natl Acad. Sci. USA, 97, 5796–5801.[Abstract/Free Full Text]

Eilers,M., Patel,A.B., Liu,W. and Smith,S.O. (2002) Biophys. J., 82, 2720–2736.[Abstract/Free Full Text]

Fleishman,S.J.and Ben-Tal,N. (2002) J. Mol. Biol., 321, 363–378.[CrossRef][ISI][Medline]

Girvin,M.E., Rastogi,V.K., Abildgaard,F., Markley,J.L. and Fillingame,R.H. (1998) Biochemistry, 37, 8817–8824.[CrossRef][ISI][Medline]

Hubbard,S. (1992) NACCESS. Department of Biochemistry, University College, London.

Iverson,T.M., Luna-Chavez,C., Cecchini,G. and Rees,D.C. (1999) Science, 284, 1961–1966.[Abstract/Free Full Text]

Iwata,S., Lee,J.W., Okada,K., Lee,J.K. and Jap,B.K. (1998) Science, 281, 64–71.[Abstract/Free Full Text]

Javadpour,M.M., Eilers,M., Groesbeek,M. and Smith,S.O. (1999) Biophys. J., 77, 1609–1618.[Abstract/Free Full Text]

Jayasinghe,S., Hristova,K. and White,S.H. (2001) J. Mol. Biol., 312, 927–934.[CrossRef][ISI][Medline]

Jiang,S. and Vakser,I.A. (2000) Proteins, 40, 429–435.[CrossRef][ISI][Medline]

Jones,D.T., Taylor,W.R. and Thornton,J.M. (1992) Comput. Appl. Biosci., 8, 275–282.[Abstract]

Jones,D.T., Taylor,W.R. and Thornton,J.M. (1994) Biochemistry, 33, 3038–3049.[ISI][Medline]

Jordan,P., Fromme,P., Witt,H.T., Klukas,O., Saenger,W. and Krauss,N. (2001) Nature, 411, 909–917.[CrossRef][ISI][Medline]

Jormakka,M., Tornroth,S., Byrne,B. and Iwata,S. (2002) Science, 295, 1863–1868.[Abstract/Free Full Text]

Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577–2637.[ISI][Medline]

Killian,J.A. and von Heijne,G. (2000) Trends Biochem. Sci., 25, 429–434.[CrossRef][ISI][Medline]

Krogh,A., Larsson,B., von Heijne,G. and Sonnhammer,E.L. (2001) J. Mol. Biol., 305, 567–580.[CrossRef][ISI][Medline]

Kuo,A., Gulbis,J.M., Antcliff,J.F., Rahman,T., Lowe,E.D., Zimmer,J., Cuthbertson,J., Ashcroft,F.M., Ezaki,T. and Doyle,D.A. (2003) Science, 300, 1922–1926.[Abstract/Free Full Text]

Kurisu,G., Zhang,H., Smith,J.L. and Cramer,W.A. (2003) Science, 302, 1009–1014.[Abstract/Free Full Text]

Kyte,J. and Doolittle,R.F. (1982) J. Mol. Biol., 157, 1, 105–132.[ISI][Medline]

Langosch,D. and Heringa,J. (1998) Proteins, 31, 150–159.[CrossRef][ISI][Medline]

Liu,W., Markus,E., Patel,A. and Smith,S. (2004) J. Mol. Biol., 337, 713–729.[CrossRef][ISI][Medline]

Locher,K.P., Lee,A.T. and Rees,D.C. (2002) Science, 296, 1091–1098.[Abstract/Free Full Text]

Malashkevich,V.N., Chan,D.C., Chutkowski,C.T. and Kim,P.S. (1998) Proc. Natl Acad. Sci. USA, 95, 9134–9139.[Abstract/Free Full Text]

Marqusee,S. and Baldwin,R.L. (1987) Proc. Natl Acad. Sci. USA, 84, 8898–8902.[Abstract]

McDonald,I.K. and Thornton,J.M. (1994) J. Mol. Biol., 238, 777–793.[CrossRef][ISI][Medline]

Milks,L.C., Kumar,N.M., Houghten,R., Unwin,N. and Gilula,N.B. (1988) EMBO J., 7, 2967–2975.[Abstract]

Mishra,V.K., Palgunachari,M.N., Segrest,J.P. and Anantharamaiah,G.M. (1994) J. Biol. Chem., 269, 7185–7191.[Abstract/Free Full Text]

Moller,S., Croning,M.D. and Apweiler,R. (2001) Bioinformatics, 17, 646–653.[Abstract/Free Full Text]

Murakami,S., Nakashima,R., Yamashita,E. and Yamaguchi,A. (2002) Nature, 419, 587–593.[CrossRef][ISI][Medline]

Oiki,S., Madison,V. and Montal,M. (1990) Proteins, 8, 226–236.[ISI][Medline]

O'Keeffe,A.H., East,J.M. and Lee,A.G. (2000) Biophys. J., 79, 2066–2074.[Abstract/Free Full Text]

Opella,S.J., Marassi,F.M., Gesell,J.J. and Valente,A.P. (1999) Nat. Struct. Biol., 6, 374–379.[CrossRef][ISI][Medline]

Palczewski,K. et al. (2000) Science, 289, 739–745.[Abstract/Free Full Text]

Pebay-Peyroula,E., Dahout-Gonzalez,C. and Kahn,R. (2003) Nature, 426, 39–44.[CrossRef][ISI][Medline]

Pellegrini-Calace,M., Carotti,A. and Jones,D.T. (2003) Proteins, 50, 537–545.[CrossRef][ISI][Medline]

Pilpel,Y., Ben-Tal,N. and Lancet,D. (1999) J. Mol. Biol., 294, 921–935.[CrossRef][ISI][Medline]

Rees,D.C., DeAntonio,L. and Eisenberg,D. (1989) Science, 245, 510–513.[ISI][Medline]

Senes,A., Gerstein,M. and Engelman,D.M. (2000) J. Mol. Biol., 296, 921–936.[CrossRef][ISI][Medline]

Senes,A., Ubarretxena-Belandia,I. and Engelman,D.M. (2001) Proc. Natl Acad. Sci. USA, 98, 9056–9061.[Abstract/Free Full Text]

Shi,Z., Olson,C.A., Bell A.J. and Kallenbach,N.R. (2001) Biopolymers, 60, 366–380.[CrossRef][ISI][Medline]

Shi,Z., Olson,C.A., Bell,A.J. and Kallenbach,N.R. (2002) Biophys. Chem., 101–102, 267–279.[ISI]

Slidel, T. (1997) PhD Thesis, University of London.

Soulimane,T., Buse,G., Bourenkov,G.P., Bartunik,H.D., Huber,R. and Than,M.E. (2000) EMBO J., 19, 1766–1776.[Abstract/Free Full Text]

Toyoshima,C., Nakasako,M., Nomura,H. and Ogawa,H. (2000) Nature, 405, 647–655.[CrossRef][ISI][Medline]

Ulmschneider,M.B. and Sansom,M.S. (2001) Biochim. Biophys. Acta, 1512, 1–14.[ISI][Medline]

Valdar,W.S. and Thornton,J.M. (2001) Proteins, 42, 108–124.[CrossRef][ISI][Medline]

Van den Berg,B., Clemons,W.M.,Jr., Collinson,I., Modis,Y., Hartmann,E., Harrison,S.C. and Rapoport,T.A. Nature, 427, 36–44.

Von Heijne,G. and Gavel,Y. (1988) Eur. J. Biochem., 174, 671–678.[Abstract]

Wallin,E. and von Heijne,G. (1998) Protein Sci., 7, 1029–1038.[Abstract/Free Full Text]

Williamson,I.M., Alvis,S.J., East,J.M. and Lee,A.G. (2003) Cell Mol. Life Sci., 60, 1581–1590.[CrossRef][ISI][Medline]

Wimley,W.C., Creamer,T.P. and White,S.H. (1996) Biochemistry, 35, 5109–24.[CrossRef][ISI][Medline]

Yuen,C.T., Davidson,A.R. and Deber,C.M. (2000) Biochemistry, 39, 16155–16162.[CrossRef][ISI][Medline]

Received April 23, 2004; revised August 25, 2004; accepted August 26, 2004.

Edited by Andrej Sali