Identification of transmembrane protein functions by binary topology patterns

Yoshiaki Sugiyama1, Natalia Polulyakh1,2 and Toshio Shimizu1,3

1Department of Electronic and Information System Engineering, Faculty of Science and Technology, Hirosaki University, Hirosaki 036-8561, Japan 2Present address: Graduate School of Humanity and Science, Ochanomizu University, 2-1-1 Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan

3 To whom correspondence should be addressed. e-mail: slsimi{at}si.hirosaki-u.ac.jp


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
We propose a novel method for identifying and classifying the functions of transmembrane (TM) proteins based on their TM topology [the number of TM segments (tms), the loop length and the N-terminus location]. In this method, the TM topology is expressed as a string of ‘0’ and ‘1’, and this is designated the binary topology pattern (BTP). We focused on TM proteins with up to 12 tms, with the exception of 1 and 9 tms, and classified them into 37 functional groups by the number of tms and the functional annotation. These grouped TM protein sequences were used to determine BTPs which are specific to the individual functional groups. Since the evaluated accuracies (sensitivity, specificity and self-consistency) of these patterns in functional identification were quite high overall, i.e. 0.940, 0.934 and 0.935, respectively, as averaged over the 37 functional groups, we confirmed that TM protein function can be identified by the number of tms and the characteristics of loop lengths, i.e. BTPs.

Keywords: binary topology pattern/functional identification/loop length/transmembrane protein/transmembrane topology


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Recent studies have revealed that the fraction of transmembrane (TM) proteins in the proteome is almost constant, ~20–30%, irrespective of the diverse range of organisms and genome sizes (Jones, 1998Go; Wallin and von Heijne, 1998Go; Mitaku et al., 1999Go; Stevens and Arkin, 2000Go; Krogh et al., 2001Go; Liu and Rost, 2001Go; Arai et al., 2002Go). Although TM proteins play important roles in biological organisms, such as serving as highly active mediators between the cell and its environment, catalyzing specific transport of metabolites and ions across the membrane barriers, etc., ~70% of them in individual TM proteomes still remain functionally unknown or are not yet well annotated (M.Arai and T.Shimizu, manuscript in preparation). This is a much higher rate than for soluble proteins, e.g. <30% in the case of Escherichia coli (Serres et al., 2001Go). This is because sequence-similarity-based homology searches are less effective in the functional assignment of TM proteins compared with soluble proteins. This may be attributed mostly to their structure, in that the polypeptide chains are composed of several hydrophobic regions [TM segments (tms)] with less conservation of the amino acid sequence than for the hydrophilic regions (N-tail loop, connecting loops and C-tail loop).

At the same time, this rather simple structural feature is making the prediction of the secondary structure (TM topology, i.e. the number of tms + loop lengths + N-tail location) from the amino acid sequence an easier task for TM proteins than for soluble proteins. In this context, many TM topology prediction methods have been proposed so far (e.g. Claros and von Heijne, 1994Go; Jones et al., 1994Go; Rost et al., 1996Go; Hirokawa et al., 1998Go; Sonnhammer et al., 1998Go; Tusnady and Simon, 1998Go), although their prediction accuracy is not yet high enough (Moeller et al., 2001Go; Chen et al., 2002Go; Ikeda et al., 2002Go). In order to obtain predictions of even higher accuracy practically, several consensus approaches have recently been tried by combining several of the proposed prediction methods (Promponas et al., 1999Go; Nilsson et al., 2000Go, 2002; Bertaccini and Trudell, 2002Go; Ikeda et al., 2002Go, 2003; Kall and Sonnhammer, 2002Go).

One of the reasons why so much effort has been made in developing TM topology prediction methods is that there is a good possibility of classifying and identifying the functions of TM protein sequences from knowing their accurate TM topologies. For example, Tusnady et al. (Tusnady et al., 1997Go) suggested that 12-tms ABC transporter proteins are characterized by a specific and common TM topology pattern and that TM topology pattern analysis may significantly help the search for characteristic domains, in addition to sequence comparisons. From their analysis of four-tms receptors and channel proteins, Clements and Martin (Clements and Martin, 2002Go) recently proposed a new idea for the functional identification of TM proteins by searching for characteristic patterns in the hydropathy profiles. It has also been reported that the lengths of the intracellular second and fourth loops of G-protein coupled receptors (GPCRs) are short and their lengths are strongly conserved, while the intracellular sixth loop, whose length is quite long, has a large variation in its length (Otaki and Firestein, 2001Go). The authors indicated the possibility of classifying the GPCR functions according to the loop lengths. From these findings, it seems that the TM topology has been conserved to preserve the function of the TM protein in the evolutionary process more rigorously than the amino acid sequence.

In this study, we propose a novel method for classifying/identifying TM protein functions based on the TM topology, i.e. the length characteristics of the loops. In this method, the length of each loop is expressed as ‘1’ or ‘0’, depending on whether it is longer or shorter, respectively, than the threshold length defined for each loop, and then the TM topology is treated as a string of ‘0’ and ‘1’, which is named the binary topology pattern (BTP).


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Functional groups of TM proteins

The data used in this study are TM protein sequences taken from SwissProt 38.0 (Bairoch and Apweiler, 2000Go). Excluding the TM protein entries with a partly defined sequence (i.e. a fragment) and those with an unknown N-terminus location, we finally obtained 4348 sequences with numbers of tms from one to 30. We focused on 2097 entries with 2–12 tms, with the exception of nine tms, which were classified into 37 functional groups, including 10 ‘others’ groups, according to the functional descriptions in the DE, CC or KW lines of the SwissProt database, as summarized in Table I. TM proteins with the annotation of a probable, putative or hypothetical function were only included in the ‘others’ group.


View this table:
[in this window]
[in a new window]
 
Table I. Functional groups of the TM proteins used in this study
 
As the lengths of the tms described in SwissProt were not constant and varied from entry to entry (i.e. from 10 to 30 residues), we normalized the tms length to 21 residues for all the tms, by expanding 10 residues in both the N- and C-terminal directions from the center position of original tms. The signal peptides were removed from the sequences in advance of the following analysis.

It should be noted here that 26 entries included in the three-tms glutamate receptor group (35 entries in total are contained, see Table I) are registered in SwissProt 38.0 originally as four-tms glutamate receptor. Following the reports that the second tms in previously proposed topology models does not span the membrane but is a membrane pore-lining loop (Hollmann et al., 1994Go; Anand, 2000Go), we decided to treat the 26 entries as a three-tms glutamate receptor in this study without changing the annotated N-tail location and segment positions for the remaining three segments.

The list of classified TM proteins used in this study is available at ftp://bioinfo.si.hirosaki-u.ac.jp/BTP/.

Binary topology pattern (BTP)

Consider an amino acid sequence belonging to a certain functional group of a TM protein with n-tms. Let li denote the length of the ith loop (1 <= i <= n + 1). Here, l1 means the length of the N-tail loop. Next, we define the threshold length of the ith loop, lti, to be compared with li in order to assign a binary loop length, bi, to the ith loop by using the following criteria:

Here, ‘1’ means that the loop is a long one, and ‘0’ a short one. For example, for the case of a four-tms gap junction [gap junction protein CX32.2, SwissProt ID CX32_MICUN (Yoshizaki et al, 1994Go)] with the loop lengths l = {18, 36, 55, 20, 71} residues, the binary loop lengths are determined as b = (0, 1, 1, 0, 1) with lt = {47, 30, 28, 80, 42} residues, as illustrated in Figure 1.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 1. The TM topology and binary loop lengths, b, of gap junction protein, CX32.2. This TM protein sequence has a length of 284 residues with loop lengths, li (i = 1–5), of 18, 36, 55, 20 and 71 residues, respectively. In this example, the threshold loop lengths, lt, are set to {47, 30, 28, 80, 42} residues.

 
Next, we calculate the average binary loop length, a, for every functional group by averaging b across all the entries, i.e.

where N is the number of entries contained in the functional group. The lengths of the individual loops vary from sequence to sequence even within a single functional group, although the degree of variation is different from loop to loop, as realized in Figure 2. The average binary loop lengths of the first loop (N-tail loop) and the second loop (1–2 loop) change rapidly from 1.0 to 0.0 with an increase of lti in narrow ranges of ~20 and ~35 residues, respectively, indicating that the loop lengths are quite close to each other. On the contrary, the lengths of the third loop (2–3 loop), fourth loop (3–4 loop) and fifth loop (C-tail) are much more divergent, the fifth loop in particular.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 2. The variation of the average binary loop lengths, a, with the threshold loop length, lti, for the four-tms ‘gap junction’ group with: first loop (N-tail loop) (circles); second loop (1–2 loop) (triangles); third loop (2–3 loop) (squares); fourth loop (3–4 loop) (inverted triangles); and fifth loop (C-tail loop) (diamonds).

 
Then, we calculate the root mean square (r.m.s.) difference, di, of the ith average binary loop length among the functional groups, by using the following equation:

where m is the number of the functional groups with n-tms, and api and aqi are the ith average binary loop lengths of functional groups p and q, respectively. In Figure 3, the relationships of the r.m.s. difference, di, versus the threshold length, lti, are shown for individual loops. For respective loops, the threshold length giving the maximum value of the r.m.s. difference is considered to be the optimum threshold length, at which the average binary loop lengths calculated for the respective groups are expressed most discriminatively with each other. The threshold lengths were obtained, in this example, as 44–50, 29–31, 27–29, 80 and 42 residues for the first, second, third, fourth and fifth loops, respectively. For the first, second and third loops of which threshold lengths were not determined uniquely, we adopted the average value of these lengths as appropriate for the optimum threshold length. It seems to be a proper treatment, since a was calculated uniquely without any changes with varying threshold lengths within these ranges (44–50, 29–31 and 27–29 residues) obtained for the three loops. It is not the case for the fourth and fifth loops that have unique threshold lengths determined. Only a small deviation (even one residue) from the obtained threshold lengths (i.e. 80 and 42 residues, respectively) alters the average binary loop lengths, a explicitly. When we take 39 or 43 residues (instead of 42) as the threshold length for the fifth loop, for example, a5 becomes 0.99 or 0.92 (instead of 0.94 for 42 residues). Thus, 47, 30, 28, 80 and 42 are obtained as the optimum threshold lengths for individual loops in the ensemble of the four-tms functional groups, and a for the ‘gap junction’ group, for example, is calculated as (0.00, 1.00, 1.00, 0.00, 0.94) with lt = {47, 30, 28, 80, 42}.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 3. The relationships of the r.m.s. difference, d, versus the threshold length, lt, are shown: N-tail loop (circles); 1–2 loop (triangles); 2–3 loop (squares); 3–4 loop (inverted triangles); and C-tail loop (diamonds).

 
As realized in the example of a above, each ai value is not necessarily just 0.0 or 1.0, i.e. it can be larger than 0.0 and smaller than 1.0. Then, we define the permission, {epsilon} (0 <= {epsilon} < 0.5), with which the average binary loop lengths, a, are binarized to obtain the BTP, p, by applying the following criteria:

where ‘*’ is the ‘wild card’ meaning that the binary loop length is not defined for the ith loop. When we set the value of {epsilon} to 0.01, for example, the BTP, p, for the ‘gap junction’ group becomes ‘0110*’.

An appropriate value of {epsilon} should be assigned to each functional group so that the obtained BTP can have the maximal self-consistency of identification of its relevant function fulfilled. The self-consistency of the functional identification by the BTP, Sc, is defined as the geometric mean of the sensitivity, Sn, and the specificity, Sp:

Here, the sensitivity and the specificity are the ratios of the correctly identified entries to the total entries in the group and to the total predicted entries across the functional groups with the same n-tms, respectively.

Figure 4 shows how Sc varies with change of {epsilon} for the case of the four-tms ensemble. The self-consistencies increase at first in a range of small {epsilon}, and then decrease with increasing {epsilon} value, except for the receptor group. It is reasonable to employ the smallest value of {epsilon} as the appropriate one for each functional group. Thus, the values of {epsilon} determined for ‘receptor’, ‘gap junction’ and ‘others’ groups are 0.04, 0.01 and 0.16, respectively, which give the maximum values of Sc to their corresponding BTPs: ‘10010’, ‘0110*’ and ‘0*0**’, respectively. We note that all the patterns thus obtained are exclusive of one another: the binary digit is discrepant in four positions (except for the last position) between ‘receptor’ and ‘gap junction’, in the first position between ‘receptor’ and ‘others’, and in the third position between ‘gap junction’ and ‘others’. The BTPs determined are expected to be exclusive of each other with these lt and {epsilon} values so that the individual patterns can identify the corresponding functional groups discriminatively from each other. This means that the appropriate BTPs are determined successfully with these parameter values.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 4. The self-consistency, Sc, versus the permission, {epsilon}, for the four-tms functional groups: ‘receptor’ (circles); ‘gap junction’ (triangles); and ‘others’ (squares).

 
Since the number of members contained in each functional group differs from group to group in each ensemble of the same n-tms, largely in some cases, e.g. in the seven-tms ensemble, in particular, it might be an unfair evaluation to use the original group sizes themselves in the calculation of Sp and Sc. The treatment on a equal-membership basis (i.e. percentage basis) should be adopted in the calculation of the accuracies.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
The BTPs obtained for the individual functional groups and the identification accuracies are summarized in Tables II–XI, together with the determined values of the parameters, lt and {epsilon}.

Two-tms TM proteins

As seen in Table II, the five functional groups, including ‘others’, are discriminated from each other with high accuracies (0.938, 0.929, 0.779, 0.790 and 0.701 for ‘potassium channel’, ‘sodium channel’, ‘receptor’, ‘sensor protein’ and ‘others’, respectively) by using the obtained BTPs. The obtained BTPs, even for the ‘others’ group, are exclusive of one another in at least one digit position. The first position distinguishes the two channel groups from ‘receptor’ and ‘sensor protein’, indicating that the first loops are long (i.e. >=41 residues) for both the channels and short (<41 residues) for ‘receptor’ and ‘sensor protein’. With the channels, the second and third loops characterize both the types complementarily: short (<151 residues, ‘0’) and long (>=209 residues, ‘1’) for the potassium channel, and long (>=151, ‘1’) and short (<209, ‘0’) for the sodium channel. Similarly, the second loop makes a distinction between ‘receptor’ and ‘sensor protein’: long (>=151 residues, ‘1’) for the former and short for the latter (<151, ‘0’).


View this table:
[in this window]
[in a new window]
 
Table II. BTPs and functional identification accuracies for two-tms TM proteins, lt = {41, 151, 209}
 
The pattern of the ‘others’ group has a rather lower accuracy of 0.701 (sensitivity, 0.514 and specificity, 0.957) compared with the other patterns. We do not, however, need to use the ‘others’ pattern for its function identification in actual applications, since we can put the two-tms TM protein sequences, not identified by any of the patterns of ‘potassium channel’, ‘sodium channel’, ‘receptor’ or ‘sensor protein’ into the ‘others’ group, without applying the ‘others’ pattern itself.

Three-tms TM proteins

The BTPs obtained for the four functional groups, except for the ‘others’, are exclusive of each other and identify their respective sequences with quite high self-consistencies: ‘glycoprotein’, 0.966; ‘glutamate receptor’, 1.000; ‘fumarate reductase’, 0.957; ‘kinase’, 0.890, as shown in Table III. The exclusive digits are the first position with ‘1’ for ‘glycoprotein’ and ‘glutamate receptor’, and ‘0’ for ‘fumarate receptor’, ‘kinase’ and ‘others’. Thus, we could successfully perform functional identification of three-tms TM proteins using the obtained patterns, except the ‘others’ group. The BTP obtained for ‘glutamate receptor’ gives the perfect identification accuracy, with all the loops being long, in particular, the third loop, which is distinct from other groups with a short third loop. Similar to the case of two-tms TM proteins, we do not need to use the ‘others’ pattern to identify ‘others’ protein sequences in this case.


View this table:
[in this window]
[in a new window]
 
Table III. BTPs and functional identification accuracies for three-tms TM proteins, lt = {119, 6, 33, 8}
 
Four-tms TM proteins

As shown in Table IV, the BTP of ‘receptor’ identifies only the sequences of the receptor group with high sensitivity, 0.954, and specificity, 1.000 (self-consistency, 0.977). It should be noted that 146 entries identified by the ‘receptor’ pattern belong to the ‘ligand-gated ionic channels family’ with the N-out location, while the seven other entries do not. With the ‘gap junction’ group, the obtained pattern identifies all the ‘gap junction’ entries correctly, and only one ‘others’ entry, in error. Furthermore, the identification accuracy of ‘others’ is also still high enough, 0.904 (0.902), in contrast with the low accuracy in the cases of two-tms and three-tms TM proteins. The obtained patterns are exclusive to each other with these lt and {epsilon} values, so that the individual patterns can identify the corresponding functional groups discriminatively from each other.


View this table:
[in this window]
[in a new window]
 
Table IV. BTPs and functional identification accuracies for four-tms TM proteins, lt = {47, 30, 28, 80, 42}
 
Five-tms TM proteins

In the five-tms ‘transporter’ data set, various kinds of transporters are included, such as triose phosphate/phosphate translocator, cytochrome o ubiquinol oxidase subunit III, histidine transport system permease protein, etc., and there is a wide variety in the length of each loop, except for the fifth and sixth loops. This is reflected in the obtained pattern for the ‘transporter’ group, in that only these two positions have a defined binary loop length and the others do not. Nevertheless, we can classify the five-tms protein sequences into two groups, ‘transporter’ and ‘others’ with high enough accuracies, 0.964 and 0.966, respectively, as shown in Table V.


View this table:
[in this window]
[in a new window]
 
Table V. BTPs and functional identification accuracies for five-tms TM proteins, lt = {17, 12, 54, 26, 13, 22}
 
Six-tms TM proteins

The BTPs obtained with lt = {100, 15, 24, 11, 14, 38, 72} for the three six-tms functional groups show that ‘channel’, ‘MIP channel’ and ‘transporter’ are exclusive of each other, and their self-consistencies are 0.894, 0.934 and 0.849, respectively (Table VI). The ‘MIP channel’ and ‘transporter’ patterns each identify only one ‘others’ entry, even though both patterns are not explicitly exclusive to the ‘others’ pattern. By comparing the ‘MIP channel’ and ‘channel’ patterns, it is realized that not only the long N-tail but also the long 4–5 and C-tail loops distinguish ‘channel’ from ‘MIP channel’. Since the performance of the ‘others’ pattern is not high enough, it is not necessary to actually use this pattern in the six-tms case as well.


View this table:
[in this window]
[in a new window]
 
Table VI. BTPs and functional identification accuracies for six-tms TM proteins, lt = {100, 15, 24, 11, 14, 38, 72}
 
Seven-tms TM proteins

All the BTPs obtained are exclusive of one another, except for the cases between ‘GPCR class A’ and ‘others’, and ‘rhodopsin pump’ and ‘others’ (Table VII). Except for the ‘others’ pattern, the accuracies of the obtained patterns are quite high, for ‘class C’, ‘class E’ and ‘rhodopsin pump’, in particular, which identify themselves perfectly without identifying any entries of other groups. Here, using 22 GPCR sequences which are registered in SwissProt 38.0 but were not used for determining the patterns, we tested the functional identification performance of the obtained patterns. Applying the ‘class A’ pattern to these sequences, we identified 19 entries as ‘GPCR class A’, which are ‘Burkitt’s lymphoma receptor’, ‘chemokine receptor-like protein’, ‘olfactory receptor-like protein’, etc. Out of these 19 sequences, we confirmed 13 sequences that belonged to ‘GPCR class A’. The ‘class B’ pattern identified two sequences, which are ‘glucagon-like peptide 1 receptor precursor’ of ‘GPCR class B’.


View this table:
[in this window]
[in a new window]
 
Table VII. BTPs and functional identification accuracies for seven-tms TM proteins, lt = {100, 12, 10 13, 20, 15, 12, 32}
 
Eight-tms TM proteins

Similar to the five-tms case, the eight-tms ‘transporter’ group is a mixture of various kinds of transporters, such as calcium-transporting ATPase, potassium-transporting ATPase, renal sodium-dependent phosphate transporting protein, etc. As a result of this, the obtained BTPs are not exclusive of each other and are rather ambiguous. The discrimination ability of the ‘transporter’ pattern, however, of 0.874 is still high enough, as depicted in Table VII, since 52 transporters out of 68 sequences are picked up by this pattern.

10-tms TM proteins

As illustrated in Table IX, the BTPs for ‘ATPase’, ‘transporter’, ‘exchanger’ and ‘others’ groups have high self-consistencies, 1.000, 0.949, 0.966 and 1.000, respectively. This result means that we can accurately classify 10-tms TM proteins into four functional groups, at least. Even looking at the patterns in Table IX, we can understand that each group has its special features for the lengths of the loops. For example, we observe that almost all the odd number loops of ‘ATPase’ are long, except for the last one. In particular, the 4–5 loop is longer than 199, and such a long loop is not shown in the other 10-tms TM proteins. The ‘transporter’ has short N-tail and 2–3 loops, and these characteristics are exclusive of ‘ATPase’. For ‘exchanger’, we determined the pattern at all positions, except the 6–7 and 8–9 loops, in spite of the small permission value.


View this table:
[in this window]
[in a new window]
 
Table IX. BTPs and functional identification accuracies for 10-tms TM proteins, lt = {56, 15, 81, 10, 199, 11, 17, 23, 9, 32, 22}
 
11-tms TM proteins

By using the obtained BTPs, 11-tms TM protein sequences can be classified into two functional groups, ‘exchanger’ and ‘others’ with perfect accuracies, as seen in Table X. The 11-tms ‘exchanger’ TM proteins are characterized by an extremely long sixth (5–6) loop and quite short seventh (6–7) and eighth (7–8) loops.


View this table:
[in this window]
[in a new window]
 
Table X. BTPs and functional identification accuracies for 11-tms TM proteins, lt = {33, 37, 15, 13, 8, 418, 12, 5, 24, 13, 56, 1}
 
12-tms TM proteins

The BTPs for ‘sodium transporter’, ‘sugar transporter’ and ‘ABC transporter’ have 0.984, 0.949 and 0.923 sensitivity and 0.867, 0.838 and 1.000 specificity, respectively, as shown in Table XI. The three transporter patterns are exclusive of each other and identified only a few ‘others’ entries. We note that the ‘sugar transporter’ and ‘sodium transporter’ patterns identified a fair number of entries of the ‘others’ group in error (i.e. 13 and 8 entries, respectively). It seems that a number of transporter sequences are included in SwissProt without being given a functional annotation of the transporter.


View this table:
[in this window]
[in a new window]
 
Table XI. BTPs and functional identification accuracies for 12-tms TM proteins, lt = {41, 12, 16, 13, 7, 14, 292, 29, 20, 12, 19, 14, 274}
 
Conclusions

Taken together, the obtained BTPs have high accuracies for consistently identifying the entries of individual functions: the sensitivity, specificity and self-consistency are 0.898, 0.897 and 0.893, respectively, averaged over the 37 functional groups including the ‘others’ group, and 0.940, 0.934 and 0.935, respectively, over the 27 functional groups without the ‘others’ group.

We did not use the information of the N-tail location in this methodology, as some functional groups contain both entries with different N-tail locations, although it is only a small fraction. Incorporating the N-tail location information into the BTP, after improving the prediction performance of the N-tail location, may help to further improve the ability of BTPs in functional classification/identification.

As seen in Table I, some functional groups, i.e. the four ‘transporter’ groups and the two-tms ‘receptor’ group comprise both eukaryotic and prokaryotic sequences. Nevertheless, the individual BTPs determined for these groups exhibit quite high identification accuracies, indicating that the TM topologies with the same function have been well conserved between prokaryotes and eukaryotes.

We did not deal with single-spanning TM proteins in this study. Since only four BTPs, at most, are available for the case of single-spanning TM protein, it is too small to classify all of the single spannings. This will be overcome, however, by applying this method in a stepwise manner, where classification into a few unified groups is performed at first, followed by subdivision into several lower-level subgroups within the individual upper-level groups. This stepwise approach is also applicable successfully to the functional classification of multi-spannings that have a deep hierarchical class structure, such as GPCR (Y.Inoue and T.Shimizu, manuscript in preparation).

Finally, we would like to point out that the TM topology pattern is available not only for functional classification/identification, but also for picking out the loops that seem to make the functional differences among the groups in the ensemble with the same n-tms, as already mentioned.


    Acknowledgements
 
This research was supported in part by a Grant-in-Aid for Scientific Research on Priority Areas (C) ‘Genome Information Science’ (No. 15014203) and a Grant-in-Aid for Scientific Research (C) (No. 14580665) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.


View this table:
[in this window]
[in a new window]
 
Table VIII. BTPs and functional identification accuracies for eight-tms TM proteins, lt = {31, 12, 61, 16, 40, 30, 29, 20, 52}
 

    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Anand,R. (2000) Biochem. Biophys. Res. Commun., 276, 157–161.[CrossRef][ISI][Medline]

Arai,M., Ikeda,M. and Shimizu,T. (2002) Gene, 304, 77–86.[CrossRef][ISI]

Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45–48.[Abstract/Free Full Text]

Bertaccini,E. and Trudell,J.R. (2002) Protein Eng., 15, 443–453.[Abstract/Free Full Text]

Chen,C.P., Kernytsky,A. and Rost,B. (2002) Protein Sci., 11, 2774–2791.[Abstract/Free Full Text]

Claros,M.G. and von Heijne,G. (1994) Comput. Appl. Biosci., 10, 685–686.[Medline]

Clements,J.D. and Martin,R.D. (2002) Eur. J. Biochem., 269, 2101–2107.[Abstract/Free Full Text]

Hirokawa,T., Boon-Chieng,S. and Miraku,S. (1998) Bioinformatics, 14, 378–379.[Abstract]

Hollmann,M., Maron,C. and Heinemann,S. (1994) Neuron, 13, 1331–1343.[ISI][Medline]

Ikeda,M., Arai,M., Lao,D.M. and Shimizu,T. (2002) In Silico Biol., 2, 19–33.[Medline]

Ikeda,M., Arai,M., Okuno,T. and Shimizu,T. (2003) Nucleic Acids Res., 31, 406–409.[Abstract/Free Full Text]

Jones,D.T. (1998) FEBS Lett., 423, 281–285.[CrossRef][ISI][Medline]

Jones,D.T., Taylor,W.R. and Thornton,J.M. (1994) Biochemistry, 33, 3038–3049.[ISI][Medline]

Kall,L. and Sonnhammer,E.L.L. (2002) FEBS Lett., 532, 415–418.[CrossRef][ISI][Medline]

Krogh,A., Larsson,B., von Heijne,G. and Sonnhammer,E.L.L. (2001) J. Mol. Biol., 305, 567–580.[CrossRef][ISI][Medline]

Liu,J. and Rost,B. (2001) Protein Sci., 10, 1970–1979.[Abstract/Free Full Text]

Mitaku,S., Ono,M., Hirokawa,T., Boon-Chieng,S. and Sonoyama,M. (1999) Biophys. Chem., 82, 165–171.[CrossRef][ISI][Medline]

Moeller,S., Croning,M.D.R. and Apweiler,R. (2001) Bioinformatics, 17, 646–653.[Abstract/Free Full Text]

Nilsson,J., Persson,B. and von Heijne,G. (2000) FEBS Lett., 486, 267–269.[CrossRef][ISI][Medline]

Nilsson,J., Persson,B. and von Heijne,G. (2002) Protein Sci., 11, 2974–2980.[Abstract/Free Full Text]

Otaki,J.M. and Firestein,S. (2001) J. Theor. Biol., 211, 77–100.[CrossRef][ISI][Medline]

Promponas,V.J., Palaios,G.A., Pasquier,C.M., Hamodrakas,J.S. and Hamodrakas,S.J. (1999) In Silico Biol., 1, 159–162.[Medline]

Rost,B., Casadio,R. and Fariselli,P. (1996) In States,D.T., Agarwal,P., Gaasterland,T., Hunter,L. and Smith,R.F. (eds), Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA, pp. 192–200.

Serres,M.H., Gopal,S., Nahum,L.A., Liang,P., Gaasterland,T. and Riley,M. (2001) Genome Biol., 2, research0035.1–0035.7.

Sonnhammer,E.L., von Heijne,G. and Krogh,A. (1998) In Glasgow,J., Littlejohn,T., Major,F., Lathrop,R., Sankoff,D. and Sensen,C. (eds), Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA, pp. 175–182.

Stevens,T.J. and Arkin,I.T. (2000) Proteins: Struct. Funct. Genet., 39, 417–420.[CrossRef][ISI][Medline]

Tusnady,G.E. and Simon,I. (1998) J. Mol. Biol., 283, 489–506.[CrossRef][ISI][Medline]

Tusnady,G.E., Bakos,E., Varadi,A. and Sarkadi,B. (1997) FEBS Lett., 402, 1–3.[CrossRef][ISI][Medline]

Wallin,E. and von Heijne,G. (1998) Protein Sci., 7, 1029–1038.[Abstract/Free Full Text]

Yoshizaki,G., Patino,P. and Thomas,P. (1994) Biol. Reprod., 51, 493–503.[Abstract]

Received December 28, 2002; revised May 31, 2003; accepted June 8, 2003.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (6)
Request Permissions
Google Scholar
Articles by Sugiyama, Y.
Articles by Shimizu, T.
PubMed
PubMed Citation
Articles by Sugiyama, Y.
Articles by Shimizu, T.