1Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011 and 2Biomolecular Engineering Research Institute, 6-2-3, Furuedai, Suita, Osaka 565-0874, Japan
3 To whom correspondence should be addressed, at the first address. e-mail: toh{at}kuicr.kyoto-u.ac.jp
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: membrane protein/molecular evolution/positive-inside rule/topology
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Recently, the tertiary structures of aquaporins (Murata et al., 2000; Sui et al., 2001
) and glycerol-conducting channels (Fu et al., 2000
) have been determined. Aquaporin assists the movement of water molecules through the biological membrane whereas glycerol-conducting channels are involved in the conduction of glycerol molecules through the membrane. The two proteins show similarity in both amino acid sequence and tertiary structure. Collectively, this group of proteins is referred to here as the aquaporin family. The tertiary structures of two eubacterial ClC chloride ion channels have also been determined (Dutzler et al., 2002
). As the name suggests, the proteins are involved in the conduction of the chloride ion through the membrane. The tertiary structures and the amino acid sequences of the bacterial channel proteins are highly similar to each other and will be referred to here as the ClC chloride ion channel family. The members of the aquaporin family do not show significant sequence similarity to those of the ClC chloride ion channel family. In addition, the tertiary structures are different. The difference in amino acid sequence and tertiary structure suggests that the two membrane protein families have derived from different evolutionary origins. However, the two protein families share a common structural feature. The primary structures of both aquaporins and ClC chloride ion channels consist of two-fold tandem repeats. Hereafter, the two repetitive units are referred to as the N-terminal domain and the C-terminal domain, according to the positions in the primary structure. Structural studies have revealed that the orientation of the N-terminal domain relative to the membrane is opposite to that of the C-terminal domain (see Figure 1). If an amino acid residue of the N-terminal domain is exposed to the extracellular (cytoplasmic) environment, the corresponding residue of the C-terminal domain is exposed to the cytoplasmic (extracellular) environment, owing to the inverse arrangement of the domains. Therefore, the different arrangement of the two domains suggests a possibility that the N- and the C-terminal domains have evolved under different evolutionary constraints derived from the extracellular and the cytoplasmic environments, even though the two domains are homologous to each other. Furthermore, these constraints should preferentially affect the parts of the domains that are exposed to the cytoplasmic and extracellular environments. A homologous residue pair in the two domains, each of which constitutes the extracellular or cytoplasmic sides of the proteins, would be exposed to the different environments. Likewise, a homologous residue pair in the two domains, which constitute the pore surface of the channel structure, would also face the different environments.
|
In this study, the amino acid sequences of the N- and C-terminal domains of the aquaporin family and those of the ClC chloride ion channel family were aligned. If different evolutionary constraints between the N- and C-terminal domains have affected a particular alignment site, it is expected that the amino acid composition or conservation pattern would be different in the two domains. Previously, many groups have investigated methods to evaluate such differences (Livingstone and Barton, 1993; Casari et al., 1995
; Lichtarge et al., 1996
; Landgraf et al., 1999
; Hannenhalli and Russell, 2000
; Gu, 2001
; Mirny and Gelfand, 2002
; Simon et al., 2002
; del Sol Mesa et al., 2003
; Heo and Meyer, 2003
). In this study, we introduced two approaches to evaluate the differences between the domains at each alignment site. One approach evaluates the difference in terms of amino acid composition. At each alignment site, the amino acid compositions of two domains are estimated and the difference in amino acid composition is calculated using either cumulative relative entropy (Hannenhalli and Russell, 2000
) or the Euclidian distance. Another approach is based on amino acid conservation. In this approach, an ad hoc formula was developed by us to evaluate the class-specific conservation between the two domains at each alignment site. In either method a site was predicted to have evolved under different constraints if it showed significant differences in the two domains. If our hypothesis is true, that is, the N- and the C-terminal domains of aquaporins and ClC chloride ion channels have evolved under different constraints, the amino acid residues corresponding to the alignment sites predicted to be subject to the different constraints would be found at the extracellular and the cytoplasmic sides or the pore surface of the channel proteins.
In spite of the differences in the evaluation method, the same results were obtained. That is, when the residues corresponding to the sites selected by each of the three methods were mapped on the tertiary structures, the residues were clustered at the pore surface of the membrane proteins with statistical significance. In contrast, only a few residues thus obtained were found at the extracellular or cytoplasmic sides of the membrane proteins.
We then modified the site-specific composition for 20 amino acids to be a two-dimensional vector, one of the elements being the sum of the frequencies of Lys and Arg and the other that for the remaining residues. In this case, both the cumulative relative entropy method and the Euclidian distance method improved the sensitivity in detecting the putative sites associated with the positive-inside rule.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Amino acid sequences used in this study were collected by database searching with PSI-BLAST (Altschul et al., 1997) at the NCBI site (http://www.ncbi.nlm.nih.gov/blast/psiblast. cgi). Then, the sequences thus obtained were compared with the sequences whose 3D coordinates are available in PDB, to check the insertions and deletions. The sequences of the aquaporin family were compared with the sequences corresponding to 1K4N and 1FX8 of PDB, whereas the sequence corresponding to 1KPL of PDB was used for the comparison in the case of the ClC chloride ion channel family. When partial or complete deletions were observed in at least one hydrophobic segment corresponding to the membrane-spanning region, the sequence was excluded from this study, although a deletion less than five residues at the boundary of a hydrophobic segment was allowed. The sequences including insertions longer than 20 amino acid residues were also excluded, because such long insertions were rare in both protein families. The resulting datasets consisted of 50 aquaporin sequences and 50 ClC chloride ion channel sequences. The fact that the identical number of sequences was obtained in each case was a coincidencethere was no restriction imposed on the size of the datasets. The sequence data used in this study are cited with the ID codes of the databases (see the Supplementary material, Table 1, available at PEDS online).
The tertiary structures of bovine aquaporin (PDB code 1J4N) and Salmonella enterica serovar typhimurium ClC chloride ion channel (PDB code 1KPL) were used for mapping of the amino acid residues corresponding to the alignment sites selected by the following procedure.
Sequence alignment
The sequence data of the aquaporin family thus collected were divided into the N- and C-terminal domains. Then, the amino acid sequences of the N-terminal domains and those of the C-terminal domains were aligned with CLUSTAL W 1.74 (Thompson et al., 1994). Finally, the two multiple alignments corresponding to the two domains were integrated into a single multiple alignment by using the profile alignment function of CLUSTAL W. Likewise, a multiple sequence alignment of the N- and the C-terminal domains of the ClC chloride ion channels was constructed.
Amino acid composition-based methods
First, we calculated the amino acid composition of the N-terminal domain and that of the C-terminal domain at each alignment site, according to multiple alignment of two domains. We used the method adopted in PSI-BLAST (Altschul et al., 1997) to calculate the site-specific amino acid composition. The weighting method of Henikoff and Henikoff (1994
) was used for the residue count. For the calculation of the pseudocount,
u, a parameter for ungapped BLAST, was calculated for each alignment by the NewtonRaphson method (Ewens and Grant, 2001
). When more than half of the sequences had gaps at an alignment site, the calculation of the site-specific amino acid composition and the following investigation were skipped.
Next, the difference in amino acid composition between the N- and C-terminal domains at each alignment site was calculated by the cumulative relative entropy method developed by Hannenhalli and Russell (2000). Relative entropy is defined as follows:
where p and q are the site-specific amino acid residue compositions for the two domains, which are estimated by the method described above. The parameter i indicates that the summation is taken over 20 amino acid residues. When aligned sequences are classified into two subfamilies as in this study, the cumulative relative entropy is defined as follows:
We used the value obtained by Formula 2 to predict the sites subjected to different evolutionary constraints between the N- and C-terminal domains.
In addition to the cumulative relative entropy, we introduced another measure to evaluate the difference in amino acid composition at an alignment site between the two domains:
Formula 3 indicates the Euclidian distance between a pair of 20-dimensional vectors.
Conservation-based method
We developed the following equation quantitatively to evaluate the degree of class-specific conservation at an alignment site:
S1, S2, S3, 1 and
2 are defined as follows:
i, j alignment of the N-terminal domain
where i, j indicate sequences in the alignment, s(a, b) is an element of a score table and S1 is the same as the conservation score defined by Valdar and Thornton (2001). In this analysis, Dayhoffs PAM matrix (Dayhoff et al., 1978
) was used. wi, wj are the HenikoffHenikoff weights for sequences i and j. a(i) and a(j) are the amino acid residues of sequences i and j at an alignment site under consideration. That is, S1 and
1 are the weighted average and the weighted standard deviation, respectively. In Formula 4, S1 is divided by (
1 + a) as a conservation score at an alignment site of the N-terminal domain. A parameter, a, is introduced to avoid zero division for the residues invariant within the N-terminal domains. In this study, a was set to be 1.0. The definition of S2 is basically the same as that of S1. The difference is that the sequences belong to the alignment of the C-terminal domains.
i alignment of the N-terminal domain,
j alignment of the C-terminal domain
The definition of S3 is basically the same as those of S1 and S2. However, one of the sequences comes from the alignment of the N-terminal domain, whereas another is derived from the alignment of the C-terminal domain. Contrary to S1 and S2, which indicate the degree of conservation within the N- and the C-terminal domains, S3 indicates the degree of conservation between the N- and the C-terminal domains.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Fifty amino acid sequences of the aquaporin family were collected by database searching. The sequences were divided into N- and C-terminal domains and a multiple alignment of 100 sequences was constructed. Likewise, 50 amino acid sequences of the ClC chloride ion channel family were collected by database searching. The sequences were also divided into N- and C-terminal domains. The 100 sequences were then aligned. To save space, the two alignments are not shown in this paper. Instead, the alignments are available at a website (see the Supplementary material, Figures A and B).
Two amino acid composition-based methods were applied to the alignments. When the cumulative relative entropy or the Euclidian distance between the N- and C-terminal domains at an alignment site is large, the site is considered to have different features in the two domains, which could be associated with the topological change. The problem then becomes how large of a cumulative relative entropy or Euclidian distance is sufficient for the selection of the alignment sites to characterize differences between the two domains. To investigate this problem, the frequency distributions of the cumulative relative entropies and the Euclidian distances were examined for both the aquaporin family and the ClC chloride ion channel family. The frequency distributions of the cumulative relative entropy and those of the Euclidian distance are shown in Figure 2a, b, d and e. The frequency distributions thus obtained have long tails towards the right side. That is, the forms of the distributions seemed to be similar to that of a distribution. According to the observed data, the parameters of the
distributions were estimated and expected frequency distributions were generated based on the
distributions. The goodness of fit between the observed and the expected frequency distributions was examined by the
2 test. The differences between the observed and the expected frequency distributions were not statistically significant (see the Supplementary material, Table 2). Therefore, the observed frequency distributions were assumed to follow the
distribution. Under this assumption, we set the significance level at 5% and an alignment site, which had the cumulative relative entropy or the Euclidian distance in the critical region of the corresponding
distribution, was selected as the site to characterize the difference between the two domains. We also examined a significance level of 1%. However, the number of alignment sites selected under the significance level was too small (two or less). Therefore, we used 5% as the significance level for site selection in this study.
|
for this analysis. Then, the observed frequency distribution could be approximated as follows:
where Qh,xi(x) is the kernel function assigned to an observed data, xi, and is defined as follows:
where h is the bandwidth for the quartic function and n indicates the total number of data. Then, we set the significance level at 5% and selected alignment sites whose class-specific scores fell in the critical region of Qh,xi(x). We examined two empirical bandwidths for h, 2.78n1/5 and [Max(data) Min(data)] x 0.15. However, there was no difference between the sets of alignment sites selected by using the different band widths.
The relationships among the sets of the alignment sites selected by the three methods are shown as Venn diagrams in Figure 3a and b. In the case of the aquaporin family, similar sets of the alignment sites were generated by the three different methods. However, in the case of the ClC chloride ion channel family, each set included many sites, which were selected specifically by a method.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The amino acid compositions of the selected sites are shown in the Supplementary material, Table 3a and b. The compositions were calculated with the HenikoffHenikoff weight, but pseudocounts were not introduced. The residues corresponding to the selected sites of aquaporin (1J4N) and those of ClC chloride ion channel (1KPL) are also shown in the table. The residues corresponding to the sites selected by the cumulative relative entropy were mapped on the tertiary structures, 1J4N and 1KPL (Figure 4a and b). The Supplementary material, Figure C(a)(d), also gives schematic diagrams for the mapping of the selected residues. Many mapped residues seemed to be located on the pore surface of the channel in both aquaporin and the ClC chloride ion channel. A similar spatial location pattern was observed from the mappings of the residues corresponding to the sites selected by the other two methods (data not shown), although, as described above, the sizes of the intersections of any pair of sets were small in the case of the ClC chloride ion channel family.
|
The different sizes of the residues constituting pore surface between the 1J4N and 1KPL were considered to be due to the different shapes of the channel structures between 1J4N and 1KPL. The 53 residues of aquaporin (1J4N) were M21, F24, I25, S28, I29, A32, L33, F35, H36, Q43, F58, I62, A75, H76, L77, N78, P79, A80, V81, L85, S88, Q90, T111, L114, T118, L121, N124, S125, G127, N129, T148, L151, V152, V155, L156, T159, D160, R161,R162, I174, V178, H182, G190, C191, G192, I193, N194, P195, A196, R197, S201, V226 and R236. The 33 residues of ClC chloride ion channel (1KPL) were E54, G106, S107, I109, P110, L145, G146, R147, E148, G149, P150, L186, A189, F190, F199, F229, N233, G234, A236, I238, Q277, G315, F317, F348, G355, I356, F357, A358, P359, L444, Y445, I448 and R451. The size of 1J4N is 249 amino acid residues. The frequency of the residues on the pore surface was therefore 53/249 and the frequency of the residues in the other spatial location was 196/249. 1KPL consists of 451 amino acid residues. The frequency of residues on the pore surface was therefore 33/451 and the frequency of the remaining residues was 418/451. When n residues are randomly selected from the tertiary structure of a channel protein and k out of the n residues fall within the pore surface, the probability follows a binary distribution with the frequency of pore surface residues as a parameter. When the cumulative relative entropy was used for the selection of alignment sites of the aquaporin family, 16 residues corresponding to the eight alignment sites were obtained (see the Supplementary material, Table 3a). Of these, 11 residues were present on the pore surface. The probability that the number of pore surface residues is 11 residues when 16 residues are randomly sampled from 1J4N was calculated as follows:
The probability was very small and the null hypothesis of random selection of the residues was rejected with a significance level of 1%. In other words, the clustering of the residues selected by the cumulative relative entropy within the pore surface was statistically significant. Likewise, the clustering of the residues in the ClC chloride ion channel was examined. As shown in the Supplementary material, Table 3b, eight alignment sites had large cumulative relative entropies which fell in the critical region. However, an alignment site did not have corresponding residues, but corresponded to gaps for both the N- and the C-terminal domains of 1KPL. Therefore, 14 residues were selected. Of these, five residues were present on the pore surface. Then, the probability was calculated as follows:
The probability was also small enough to reject the null hypothesis of random selection of the 14 residues with a significance level of 1%.
The residues corresponding to the alignment sites selected by the Euclidian distance were also analyzed in the same manner. As for aquaporin, 14 residues were obtained from the seven selected sites. Of these, nine residues were located on the pore surface. Then, the probability was calculated as follows:
Of 20 residues corresponding to the 10 selected alignment sites of ClC chloride ion channel, seven residues were present on the pore surface. The probability of this case was calculated as follows:
In either protein, the probability was small and the null hypothesis was rejected with a significance level of 1%. That is, the clustering of the residues on the pore surface was statistically significant.
The residues corresponding to the alignment sites selected by the class-specific conservation score were analyzed. As shown in the Supplementary material, Table 3a, six sites were selected for aquaporin, but one of the sites corresponded to a gap in the C-terminal domain of 1J4N. Of nine residues corresponding to the selected sites, six were present on the pore surface. Then, the probability was calculated as follows:
The 11 alignment sites of the ClC chloride ion channel were selected. Excluding gaps, 21 residues of 1KPL corresponded to the 11 alignment sites. Of these, eight residues were present on the pore surface. The probability was calculated as follows:
Therefore, the clustering of the residues on the pore surface was statistically significant in both membrane proteins.
Hence, despite the difference in evaluation method or protein family, the clustering of the selected residues on the pore surface was observed with statistical significance. We also applied the evolutionary trace method (Lichtarge et al., 1996) and the quantified evolutionary trace method (Landgraf et al., 1999
) to the alignment data. First, we applied the evolutionary trace method developed by Lichtarge et al. (1996
). We did not perform iterative division of the alignment, because here we focused only on the division corresponding to the divergence between the N- and C-terminal domains. In the method, two types of conservation are evaluated: invariance and class-specific conservation. The amino acid residues corresponding to such sites are called trace residues. We tried to identify the trace residues from the comparison between the N- and C-terminal domains. However, due to the high sequence divergence even within the N- or C-terminal domains, only one invariant site was obtained from the domain alignment of the aquaporin family, which corresponded to P79 and P195 of 1J4N. The Pro residues constitute the sequence motifs of the aquaporin family, NPA. The site corresponding to N78 and N194 and the site corresponding to A80 and A196 were nearly invariant, but small numbers of substituted residues at the site inhibited the selection of the sites. The two NPA motifs of the aquaporin were present in the loop structures (Murata et al., 2000
; Sui et al., 2001
), which constitute the pore surface. However, no class-specifically conserved sites were obtained because the evolutionary trace method requires complete conservation within each class, but divergence between classes. In the case of the ClC chloride ion channel family, two invariant sites were obtained, which corresponded to (G146, G355) and (E202, E414) of 1KPL, respectively. G146 and G355 are the constituents of the two motifs of the ClC chloride ion channel, G(K/R)EGP and GXFXP. The two motifs correspond to the segments, 146150 and 355359, of 1KPL (Dutzler et al., 2002
) and constitute the pore surface of 1KPL. Interestingly, of the five sites of the motif sequences, three sites were selected by at least one of the methods described above, that is, the sites corresponding to (R147, I356), (E148, F357) and (G149, A358) of 1KPL (see the Supplementary material, Table 3b). However, class-specifically conserved sites were not detected, owing to the requirement for complete conservation within each class, but divergence among the classes.
Next, we applied the quantified evolutionary trace method (Landgraf et al., 1999) to the alignment data. Roughly, the method evaluates the ratio of the degree of conservation within a class against the degree of conservation of entire members included in the alignment at each alignment site. Therefore, in the current case, the alignment sites that show relatively high conservation within the N- or C-terminal domains against entire members were obtained. We selected the alignment sites which marked the scores of the top 5% in at least one of the domains. In the case of the aquaporin family, the alignment sites corresponding to (A16, I143), (S28, V155), (F58, I174), (H76, G192), (V81, R197), (L85, S201), (I97, W212) and (Y99, F214) of 1J4N were selected. The underlined residues were present on the pore surface of 1J4N and the probability of finding the residues more than equal to the observed number under the assumption of random selection was as follows:
For the ClC chloride ion channel family, the alignment sites corresponding to (G47, G263), (G50, G266), (F53, F269), (E148, F357), (G149, A358), (G185, L397), (F190, V402) and (G196, G408) of 1KPL were selected. The underlined residues were present on the pore surface of 1KPL. Then, the probability was calculated as follows:
In the case of the aquaporin family, the set of the alignment sites selected by the quantified evolutionary trace method is similar to those selected by the three different methods. However, the intersection of the set selected by the quantified evolutionary trace method for the ClC chloride ion channel family with any of the three sets described above was small. Nevertheless, the clustering of the residues corresponding to that selected by the quantified evolutionary trace method on the pore surface was again statistically significant for both the aquaporin family and the ClC chloride ion channel family.
The functional meanings of most of the residues obtained by this analysis remain unknown. The only residue whose functional meaning we could find was R147 of the ClC chloride ion channel. This positively charged residue is present in one of the sequence motifs of the family, G(K/R)EGP, which corresponds to the segment 146150 of 1KPL. The positive charge of the residue is considered to contribute the formation of an environment suitable for chloride ion binding (Dutzler et al., 2002). The residues obtained by this analysis may be associated with the adjustment of the conductive function. The clustering on the pore surface of the mapped residues seemed to support this hypothesis, because the pore surface is the center of the conductive activities of the aquaporin family and the ClC chloride ion channel family. At this stage, however, we should wait for further experimental and theoretical studies to identify the functional meanings of the residues identified in this study.
Modification for sensitive detection of the alignment sites related to the positive-inside rule
When we started this study, we expected that we would be able to detect the alignment sites related to the rules to determine the membrane topology, such as the positive-inside rule. As described above, however, the result of the analysis was different from our expectation. Only two alignment sites selected from the ClC chloride ion channel by our method seemed to reflect the positive-inside rule. One of the sites, which corresponded to H175 and G387, was detected by all the three methods. The residues were not present on the pore surface. As shown in the Supplementary material, Table 3b, the positively charged residues were abundant at the N-terminal site corresponding to H175, whereas no positively charged residues were observed at the C-terminal site corresponding to G387. The accurate boundaries between the membrane and the cytoplasmic or extracellular region of ClC chloride ion channel remain unknown. However, the crystal structure indicated that H175 is located on the cytoplasmic side, whereas G387 is present on the extracellular side. Another site, which corresponded to R174 and A386, was detected by only the cumulative relative entropy method. Since the residues neighbor H175 and G387, respectively, the locations of the residues were considered to be identical with those of H175 and G387. Many positively charged residues were observed at the alignment site corresponding to R174, whereas no positively charged residues were found at the alignment site corresponding to A386 (see the Supplementary material, Table 3). In contrast, no site that seemed to follow the positive-inside rule was detected from the alignment of the aquaporin family by any method.
Focusing on the positive-inside rule, we modified the composition-based methods in order to improve the sensitivity for the detection of the alignment sites related to the rule. The amino acid composition at an alignment site was regarded as a 20-dimensional vector. We reorganized the vector to a two-dimensional vector. One of the elements of the two-dimensional vector was the sum of the frequencies of Arg and Lys, and another element was the sum of the frequencies of the remaining residues. The cumulative relative entropy and the Euclidian distance were then calculated with the two-dimensional vectors.
The frequency distributions of the cumulative relative entropy and those of the Euclidian distance are shown in the Supplementary material, Figures D(a) and (b), respectively. None of the frequency distributions could be approximated by the distributions with the parameters estimated from the observed data. Therefore, we used the quartic kernel again to estimate the density functions. We then set the significance level at 5% and selected the alignment sites which fell in the critical region of the estimated density function. As for the aquaporin family, six alignment sites were selected by the cumulative relative entropy method [see the Supplementary material, Figure E(a)]. Of these, the abundance of positively charged residues at the cytoplasmic side was observed for four alignment sites. Likewise, six alignment sites were selected by the Euclidian distance method [see the Supplementary material, Figure E(a)]. Of these, the abundance of the positively charged residues at the cytoplasmic side was observed for four alignment sites. Three alignment sites were common between the four sites selected by the two methods. In the case of the ClC chloride ion channel family, 11 alignment sites were selected by the cumulative relative entropy method [see the Supplementary material, Figure E(b)]. Of these, the abundance of the positively charged residues at the cytoplasmic side was observed for seven alignment sites. Of the remaining four sites, the C-terminal residues, which may correspond to the cytoplasmic side, were deleted in 1KPL. However, the positively charged residues were abundant at the C-terminal sites corresponding to the gaps. Therefore, the two sites may be associated with the positive-inside rule. Likewise, the Euclidian distance method selected 11 alignment sites [see the Supplementary material, Figure E(b)]. Of these, the abundance of positively charged residues was observed for seven sites. The two alignment sites described above, which have gaps at the C-terminal sites in 1KPL, but show high frequencies of positively charged residues, were also detected by the Euclidian distance method. That is, both methods detected nine alignment sites, which may be subject to the positive-inside rule. Of these, eight sites were common between the nine sites selected by the two methods. The residues corresponding to the alignment sites thus obtained were mapped on the tertiary structures. Only the mapping of residues selected by the cumulative relative entropy method are shown in the Supplementary material, Figure F(a) and (b). As described above, the accurate boundaries between the membrane and the cytoplasmic or the extracellular region remain unknown. However, many residues seemed to be present in the cytoplasmic and the extracellular sides of the tertiary structures. The alignment sites corresponding to (R174, A386) and (R175, G387) of 1KPL, which were detected when the 20-dimensional amino acid composition was used, were also detected by both methods.
The results show that only a few residues in surface loops change so as to fulfill the positive-inside rule. It seems that the number of residues is too small to determine the topologies of the membrane proteins. It is possible that our method is not sensitive enough to detect all the residues related to the positive-inside rule. In addition, the requirement for positively charged residues may not be so strong for the topogenesis of multi-spanning membrane proteins. For example, Wessels and Spiess (1988) demonstrated that insertion of multi-spanning membrane proteins requires only one signal sequence at the N-terminus and that the subsequent hydrophobic segments are sequentially integrated into the membrane without recognition by a signal recognition particle. However, the detected change in evolutionary constraints on the positively charged residues associated with the topological inversion suggests that the positively charged residues may play some auxiliary roles to determine the topologies of the membrane proteins, together with other factors such as the N-terminal signal sequences, even if the positive charges may not always be required for the topogenesis.
Nakashima and Nishikawa (1992) reported that that alanine and arginine residues tend to be more prevalent in the cytoplasmic segment, whereas threonine and cysteine residues are preferentially located in the extracellular segment. We tried to check whether these rules are detected by the modified composition-based method. For the modification, two different types of two-dimensional vectors were generated from the site-specific amino acid compositions. In one of the vectors, the sum of the frequencies of Ala and Arg was one element and the sum of the frequencies of the remaining residues was the other element. In the other vector, the sum of the frequencies of Thr and Cys was one element and the sum of the frequencies of the remaining residues was the other element. The distances between the two domains were calculated with each of the two-dimensional vectors and the alignment sites were selected by the same procedure as described above. The residues corresponding to the sites were mapped on the tertiary structures. In either modification, however, no clear tendency in spatial location of the mapped residues was observed (data not shown). That is, the residues corresponding to the selected sites did not cluster near the boundary between the membrane and the cytoplasmic or the extracellular region.
Conclusion
As described above, we hypothesized that the different evolutionary constraints from the extracellular and cytoplasmic environments affect the parts of two domains of the aquaporins and ClC chloride ion channels, owing to the topological difference of the two domains. According to the hypothesis, it was expected that the homologous residue pairs derived from the two domains which are present on the extracellular and cytoplasmic sides or pore surface of the channel proteins would be subject to different constraints. We used three different methods to investigate how and where the different constraints associated with the topological changes of the two domains affect the aquaporin family and the ClC chloride ion channel family. The sizes of the intersections among the sets of the alignment sites detected by the three methods were not always large. In either method, however, the residues corresponding to the selected sites clustered on the pore surface with statistical significance. Thus, half of our hypothesis was supported by the analysis. However, the analysis could not provide a clear answer to the remaining half of our hypothesis. That is, the sensitivity of the three methods to detect the sites related to the determination of the topology was low. We introduced a small modification into the composition-based methods. Following this modification, many detected sites seemed to be subject to the positive-inside rule. The result supports the remaining half of our hypothesis, although the analysis was restricted to the positive-inside rule. In addition, the results suggest that different glasses are required to see different types of evolutionary constraints. That is, it is important to design proper scoring functions to detect different constraints. In this study, we failed to detect the rules reported by Nakashima and Nishikawa (1992). Designing proper functions to detect the rules remains a goal for future work.
It is important to find new rules to determine the topology of membrane proteins, because such rules would improve the accuracy of the topology prediction of membrane proteins. When we started this study, one of our interests was finding new rules to determine the topology of membrane proteins. Although we did not succeed in finding new topological rules, the successful part of our analysis suggests the possibility that the accumulation of structural data with the same features as the aquaporin family and the ClC chloride ion channel family and the development of a proper scoring function would reveal new rules to determine the topology of membrane proteins by comparing the homologous domains with inverted topology. In general, we do not have prior knowledge about what kinds of evolutionary constraints are affecting the proteins. Therefore, it is also important to design flexible functions to detect a wide range of different constraints. The design of such functions will also be the subject of future work.
![]() |
Acknowledgements |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Casari,G., Sander,C. and Valencia,A., (1995) Nat. Struct. Biol., 2, 171178.[ISI][Medline]
Cedano,J., Aloy,P., Perez-Pons,J.A. and Querol,E. (1997) J. Mol. Biol., 266, 594600.[CrossRef][ISI][Medline]
Chou,K.C. and Elrod,D.W. (1998) Biochem. Biophys. Res. Commun., 252, 6368.[CrossRef][ISI][Medline]
Dayhoff,M.O., Schwartz,R.M. and Orcutt,B.C. (1978) In Dayhoff,M.O. (ed.), Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington, DC, pp. 345358.
del Sol Mesa,A., Pazos,F., Valencia,A. (2003) J. Mol. Biol., 326, 12891302.[CrossRef][ISI][Medline]
Dutzler,R., Campbell,E.B., Candene,M., Chait,B.T. and MacKinnon,R. (2002) Nature, 415, 287294.[CrossRef][ISI][Medline]
Ewens,W.J. and Grant,G.R. (2001) Statistical Methods in Bioinformatics. An Introduction. Springer, New York.
Feng,Z.-P. (2001) Biopolymers, 58, 491499.[CrossRef][ISI][Medline]
Fu,D., Libson,A., Miercke,L.J.W., Weltzman,C., Nollert,P., Krucinski,J. and Stroud,R.M. (2000) Science, 290, 481486.
Gu,X. (2001) Mol. Biol. Evol., 18, 453464.
Heo,W.D. and Meyer,T. (2003) Cell, 113, 315328.[ISI][Medline]
Hannenhalli,S.S. and Russell,R.B. (2000) J. Mol. Biol., 303, 6176.[CrossRef][ISI][Medline]
Henikoff,S. and Henikoff,J.G. (1994) J. Mol. Biol., 243, 574578.[CrossRef][ISI][Medline]
Landgraf,R., Fisher,D. and Eisenberg,D. (1999) Protein Eng., 12, 943951.[CrossRef][ISI][Medline]
Lichtarge,O., Bourne,H.R. and Cohen,F.E., (1996) J. Mol. Biol., 257, 342358.[CrossRef][ISI][Medline]
Livingstone,C.D. and Barton,G.J. (1993) Comput. Appl. Biosci., 9, 745756.[Abstract]
Mirny,L.A. and Gelfand,M.S. (2002) J. Mol. Biol., 321, 720.[CrossRef][ISI][Medline]
Murata,K., Mitsuoka,K., Hirai,T., Walz,T., Agre,P., Heymann,J.B., Engel,A. and Fujiyoshi,Y. (2000) Nature, 407, 599605.[CrossRef][ISI][Medline]
Nakashima,H. and Nishikawa,K. (1992) FEBS Lett., 303, 141146.[CrossRef][ISI][Medline]
Nakashima,H. and Nishikawa,K. (1994) J. Mol. Biol., 238, 5461.[CrossRef][ISI][Medline]
Persson,B. and Argos,P. (1996) Protein Sci., 5, 363371.
Reinhardt,A. and Hubbard,T. (1998) Nucleic Acids Res., 26, 22302236.
Sääf,A., Johansson,M., Wallin,E. and von Heijne,G. (1999) Proc. Natl Acad. Sci. USA, 96, 85408544.
Sääf,A., Baars,L. and von Heijne,G. (2001) J. Biol. Chem., 276, 1890518907.
Simon,A.L., Stone,E.A. and Sidow,A. (2002) Proc. Natl Acad. Sci. USA, 99, 29122917.
Sui,H., Han,B.-G., Lee,J.K., Walian,P. and Jap,B.K. (2001) Nature, 414, 872878.[CrossRef][ISI][Medline]
Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 46734680.[Abstract]
Valdar,W.S.J. and Thornton,J.M. (2001) Proteins, 42, 108124.[CrossRef][ISI][Medline]
von Heijne,G. (1986) EMBO J., 5, 30213027.[ISI]
von Heijne,G. (1992) J. Mol. Biol., 225, 487494.[ISI][Medline]
Webb,A.R. (2002) Statistical Pattern Recognition. Wiley, Chichester.
Wessels,H.P. and Spiess,M. (1988) Cell, 55, 6170.[CrossRef][ISI][Medline]
Received December 15, 2003; revised February 23, 2004; accepted March 22, 2004 Edited by Janet Thornton
|