Conservation and covariance in PH domain sequences: physicochemical profile and information theoretical analysis of XLA-causing mutations in the Btk PH domain

Bairong Shen1 and Mauno Vihinen1,2,3

1Institute of Medical Technology, FI-33014 University of Tampere and 2Research Unit, Tampere University Hospital, FI-33520 Tampere, Finland

3 To whom correspondence should be addressed, at the first address. e-mail: mauno.vihinen{at}uta.fi


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Mutations that cause X-linked agammaglobulinemia (XLA) appear throughout the Bruton tyrosine kinase (Btk) sequence, including the pleckstrin homology (PH) domain. To analyze the basis of this disease with respect to protein structure, we studied the relationships between PH domain sequences and structures by comparing sequence-based profiles of physicochemical properties and solvent accessibility profiles. The diversity of the distribution of amino acids was measured by calculating entropies for sequences containing mutations at different positions in multiple sequence alignments. Mutual information was calculated to quantify positional covariation. Eight conserved extrema were apparent in all profiles. The majority of the XLA disease-causing mutations in the Btk PH domain were found at positions having significant mutual information, indicating that there are covariant constraints for both structure and function. Together with additional structural analyses, all the XLA mutations that were analyzed could be explained at the molecular level. The method developed here is applicable to the design of mutations for protein engineering.

Keywords: Bruton tyrosine kinase/conservation/covariation/disease-causing mutations/PH domain/profile comparison


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
X-linked agammaglobulinemia (XLA) is an inherited immunodeficiency that is caused by mutations in the Bruton tyrosine kinase gene (Btk), which encodes a cytoplasmic protein tyrosine kinase (Tsukada et al., 1993Go; Vetrie et al., 1993Go). The Btk protein contains five domains, namely PH (pleckstrin homology), TH (Tec homology), SH (Src homology) 3, SH2 and kinase. XLA-causing mutations are found in all domains. FGD1 (faciogenital dysplasia) contains the only other confirmed disease-causing mutations in a PH domain and these mutations are related to Aarskog-Scott syndrome (Orrico et al., 2000Go; Schwartz et al., 2000Go).

PH domains, which contain 100–120 amino acids (Haslam et al., 1993Go; Mayer et al., 1993Go), are found in a wide range of proteins involved in intracellular signaling or are constituents of the cytoskeleton. Most PH domains bind either plasma membrane phosphoinositides or cytosolic inositol phosphates. The specific binding to inositol phosphates is important for signal-dependent membrane targeting (Krappa et al., 1999Go; Lemmon and Ferguson, 2000Go). Some PH domains also interact with protein kinase C or heterotrimeric G proteins (Abrams et al., 1996Go; Yao et al., 1997Go; Lodowski et al., 2003Go). To date, few computational studies have been published that include homology-based models of PH domain structures (Blomberg and Nilges, 1997Go; Blomberg et al., 1999Go; Okoh and Vihinen, 1999Go). Although many PH sequences are known, the covariant relationships of positions in PH domains have not been investigated. The average sequence identity of PH domains is only 17% (Bateman et al., 2002Go), but all the three-dimensional structures determined to date are similar, having seven ß-strands that form two perpendicular antiparallel ß-sheets and one C-terminal {alpha}-helix (Hurley and Misra, 2000Go). The combination of low sequence homology and conserved structure makes PH domains ideal for sequence–structure–function studies, especially for mutual information analysis.

Many protein families contain specific residues that have evolved in a covariant manner (Clarke, 1995Go; Afonnikov et al., 1997Go; Pazos et al., 1997Go; Larson et al., 2000Go; Madabushi et al., 2002Go; Saraf et al., 2003Go; Schueler-Furman and Baker, 2003Go). Such amino acids, known as covariant or coordinated residues (Vernet et al., 1992Go; Clarke, 1995Go), may make contacts that maintain structural stability, form binding sites/catalytic centers or may be otherwise structurally and/or functionally crucial. Several methods can be used to quantify covariation in protein sequences, including chi-squared statistical analysis (Larson et al., 2000Go) and mutual information (Clarke, 1995Go). Mutual information is defined as the amount of information that one random variable contains concerning another random variable. One can measure the reduction in the uncertainty of a random variable by obtaining data on another. This method has been used to measure covariation in protein sequences (Thomas et al., 1996Go; Larson et al., 2000Go) and has been successfully applied to the analysis of protein function (Clarke, 1995Go) and evolution (Mirny and Gelfand, 2002Go) as well as the identification of RNA recognition motifs (Crowder et al., 2001Go).

Regions with low entropy (i.e. high information) and significant mutual information generally contribute significantly to a protein’s tertiary structure and/or function (Shenkin et al., 1991Go). Residues in areas of low entropy are typically conserved in protein families and may be structurally or functionally important. Domains that exhibit low sequence identity from protein to protein (such as PH) contain few conserved sites of low entropy. Conserved covariant sites can be crucial for folding and therefore covariation puts constraints on protein evolution. For example, covariant amino acid ion pairs may form cooperative salt-bridge networks that contribute significantly to structural stability. If one of the charged residues in such a pair is mutated to yield a reversal of charge, then a similar charge reversal must occur in the other residue to ensure structural stability. Other covariant pairs may be formed via hydrophobic contacts, aromatic–aromatic contacts or hydrogen bond donor–acceptor pairs.

In addition to the experimental analysis of the effect of protein mutations (e.g. Bullock et al., 1997Go; Mateu and Fersht, 1999Go), several sequence- or structure-based methods have been developed. Sequence analysis methods use the information of sequence similarity and calculate position-specific probabilities (Ng and Henikoff, 2001Go; Sunyaev et al., 2001Go). According to the probability, amino acid substitutions are estimated as tolerated or deleterious. Structure-based methods utilize especially amino acid side-chain rotamer information (Tuffery et al., 1997Go; Word et al., 2000Go; Wang and Moult, 2001Go; Wright and Lim, 2001Go; Steward et al., 2003Go). These techniques compare the side-chain chi ({chi}) angle values to a backbone-independent rotamer library and explore conformational space for the mutant side chain. de la Cruz and co-workers investigated disease-associated mutations by analysis of different properties of amino acids (Ferrer-Costa et al., 2002Go). It is also possible to combine structure and multiple sequence information. Topham et al. (1997Go) converted amino acid variations in families of related proteins to propensity and substitution tables, and thereby obtained quantitative measure for the existence of an amino acid in a structural environment and the probability of the substitutions. Multiple regression equations have been derived to predict the folding–unfolding free energy difference for mutations in different secondary structural segments (Gromiha et al., 1999aGo,b, 2000Go). These approaches have met with some success, but still the problem of protein mutation analysis remains unsolved and needs further improvement, since the data sets have been limited or the accuracy of predictions has been relatively low. In the structure-based methods, the flexibility of protein and the interaction energy between residues are difficult to estimate correctly. In the sequence-based methods, in addition to errors in multiple sequence alignment, there are problems in taking into account the residue contacts within a protein. Machine learning methods, such as neural networks and support vector machines, are widely used in protein science to predict secondary structures, solvent accessibility, subcellular location and contact maps (Holley and Karplus, 1989Go; Fariselli et al., 2001Go; Hua and Sun, 2001Go; Ahmad and Gromiha, 2002Go). These methods need large sets of data to train and build models for prediction of new cases. Since large data sets with clear mutation-structure–function relationships are not available, the outcome of the methods is limited.

Mutations in protein sequences can be neutral, beneficial or deleterious. Deleterious or disease-causing point mutations often appear at structurally or functionally important sites. The BTKbase database contains information on >800 XLA cases (Vihinen et al., 1995aGo, 1999Go, 2001Go). With this information at our disposal, it is possible to analyze the basis of XLA mutations toward the goal of understanding disease-causing mechanisms at the molecular level (Vihinen et al., 1995bGo, 1999Go). Here, we applied physicochemical profile comparisons, information theory and structural analyses to a detailed study of the XLA-causing mutations in the Btk PH domain. Several groups of highly covariant residues were identified, many of which coincide with disease-causing mutations.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Sequences, structures and mutations

Sequences of PH domain mutations (Figure 1) were taken from the BTKbase (http://bioinf.uta.fi/BTKbase/) (Vihinen et al., 1999Go, 2001Go). PH domain structures used to analyze sequence–structure relationships included Btk (PDB code: 1b55), Grp1 (1fgz), Plc-{delta} (1mai), spectrin (1btn, 1mph, 1dro), pleckstrin (1pls), ß-Ark (1bak), Dapp1 (1fao), dynamin (1dyn) and UNC-89 (1fho). Sequences were aligned based on structural superposition of conserved regions. For information analysis, 161 seed sequences of PH domains that are a representative, non-redundant set of sequences with a total length of 333 characters (including gaps) were retrieved from the Pfam database (Bateman et al., 1999Go) (see Supplementary material available at http://bioinf.uta.fi/PHdomain.htm).



View larger version (7K):
[in this window]
[in a new window]
 
Fig. 1. XLA-causing mutations in the Btk PH domain. Letters above the sequence indicate mutations.

 
Profile analysis

To understand sequence–structure relationships in PH domains, we investigated a number of profiles characterizing the physicochemical properties of proteins. Figure 2 shows a modified Shannon communication model, which was used to describe the information transformation from sequence to structure. The basis of the model is Anfinsen’s hypothesis that the amino acid sequence determines the native state of the protein (Anfinsen, 1973Go). To understand the fundamental basis of PH domain folding with respect to the distribution of sequence diversity, we did not try to unravel the sequence–structure relationships directly. Rather, we transformed the sequence and structure information into profiles to compare patterns. The profiles of physicochemical properties of PH domains were calculated using the general sliding window averaging technique. The physicochemical properties of amino acids included flexibility (Vihinen et al., 1994Go), hydropathy (Eisenberg et al., 1984Go), isotropic surface area (Collantes and Dunn, 1995Go) and electronic charge concentration (Collantes and Dunn, 1995Go). All the parameters were scaled to a mean of zero and a standard deviation of 1.00. The calculated electronic charge concentration and hydropathy were also scaled by –1 for comparison with the other profiles. The propensities (P) of the amino acids in a sequence were calculated with a window size of 7. The weights (wj) of residues inside the window were 0.25, 0.50, 0.75, 1.00, 0.75, 0.50 and 0.25. The propensities were summed, averaged and assigned to the residue (i) in the middle of the window, i.e.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 2. Modified Shannon information model for the analysis of protein sequence–structure relationships.

 

Insertions are indicated in the profiles with areas of constant values.

Information theoretical analysis

The entropies at all positions and the mutual information between any two positions in the multiple sequence alignment were calculated according to the equations:

and

where Pai and Paj are the probabilities of amino acids ai at position i and aj at position j, respectively and Pai,aj is the probability of the co-occurrence of ai and aj at positions i and j (Shenkin et al., 1991Go; Clarke, 1995Go). The diversity of the amino acid distribution can also be described using the equation

where N is the size of the amino acid alphabet. The more conserved a site in the alignment, the larger is the information. In the analysis of 161 PH domain sequences, the positions in the aligned sequences were represented by the corresponding sites in the Btk PH domain.

Electrostatic potential surface, solvent accessibility and contact analysis

Electrostatic potentials were calculated using a finite difference solution to the non-linear Poisson–Boltzmann equation (Honig et al., 1993Go). The grid was 20 Å larger than the PH domain, containing 123 grid points in the longest dimension. The solute dielectric constant was set to 2.0. Solvent-accessible surface areas (SASA) (Lee and Richards, 1971Go) were calculated using a solvent (water) radius of 1.4 Å. To facilitate comparisons to the predicted properties, the fractions of residues exposed to solvent were calculated directly from the experimental structure and were used to generate profiles with the sliding window averaging technique. For NMR structures missing the average structure (1mph, 1pls, 1bak and 1fho), the first structure in the entry was used.

Contact analysis of amino acids in the Btk PH domain was performed with RankViaContact (Shen and Vihinen, 2003Go) and CSU service (Sobolev et al., 1999Go). With RankViaContact, the residue–residue contact energies were calculated and ranked based on a coarse-grained model (Zhang and Kim, 2000Go). The ranked contact energies were then used to estimate the importance of the residue to the stability of the protein structure. CSU service was used to measure the contact surface area and contact modes between residues. All other structural analyses were performed with InsightII software (Accelrys).


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
To analyze the effects of XLA-causing mutations in the Btk PH domain, we first generated and compared the calculated physicochemical and solvent accessibility profiles for PH domains of known structure. Information/entropies and mutual information of positions in 161 PH domain sequences were then calculated. By combining the structural analysis, energy estimation and information theoretical calculations, we analyzed different types of residue conservation and relation to XLA-causing mutations.

Conserved patterns in sequence and structure profiles

To compare the characteristics of PH domains, a number of profiles were calculated based on amino acid sequence. To correlate the results with structural features, we concentrated on PH domains for which either NMR or crystal structure data were available. Figure 3 shows the alignment of PH domain sequences based on the superposition of their structures. Figure 4 shows the profiles for hydropathy, electronic charge concentration, flexibility and isotropic surface area. Although the average sequence identities of PH domains are very low, the shapes of the profiles are in good agreement. All the profiles have eight extrema located in regions corresponding to the eight conserved secondary structures (seven ß-sheets and one {alpha}-helix) that form the PH domain. Extension of the profile analyses to the 161 PH domain seed sequences in Pfam gave qualitatively similar results showing eight local extrema (see Supplementary material).



View larger version (74K):
[in this window]
[in a new window]
 
Fig. 3. Alignment of PH domain sequences based on the superposition of three-dimensional structures. The amino acids in bold are involved in secondary structural elements (ß-strand or {alpha}-helix) that are given above the alignment.

 


View larger version (46K):
[in this window]
[in a new window]
 
Fig. 4. Sequence profiles of PH domains. (A) Hydropathy; (B) electronic charge concentration; (C) flexibility; (D) isotropic surface area; (E) solvent-accessible surface area (SASA). The SASAs for the PH domains and their complexes with inositol phosphates are indicated by S0 and S1, respectively.

 
Figure 4E indicates that the eight local extrema also appear in the solvent accessibility profile. Thus, the sequence- and structure-derived profiles are clearly related. These results indicate that PH domains exhibit similar folding characteristics despite having low sequence identities. The agreement between the one-dimensional sequence profiles and the three-dimensional structures facilitates the investigation of sequence–structure relationships.

The extrema in the plots were then examined more closely. The residues in the local extrema in the Btk PH domain are F10 (0.005), L31 (0.121), Y39 (0.083), I56 (0.041), V64 (0.038), F102 (0.028), V113 (0.019) and I125 (0.161). The solvent accessibilities, shown above in parentheses, are all <0.2, indicating that all these residues are buried deeply in the hydrophobic core. RankViaContact calculations (Shen and Vihinen, 2003Go) show that these eight residues ranked in the top 15% in the Btk PH domain for contact energies (F10, I56, V64 and F102 ranked in the top 5%), suggesting that they are important for protein stability. The distribution of residues in positions corresponding to the extrema (see Supplementary material) showed that only positions corresponding to F10, I56 and F102 in Btk are relatively conserved. To understand further why a few conserved PH domain residues are sufficient to facilitate the observed folding of the conserved structure, we calculated the entropy/information and mutual information for positions in the PH domain sequences, as discussed below.

Entropy and information analyses

Entropies or information at each position within multiple sequence alignments are widely used to analyze the diversity of amino acid distribution (Shenkin et al., 1991Go). The calculated entropies of PH domains indicated that only a few positions are highly conserved (see Supplementary material), which is to be expected given that the sequence identities are very low (17% on average). In the protein sequence analysis, three types of residue conservation can be distinguished. Residues can be either invariant (type I), replaced, but the physicochemical properties retained (type II), or covariantly conserved (type III), where both of the binding/functional partners covary such that their interaction/function is retained. Entropy calculations can be used to identify types I and II. The entropy calculation for type I conservation, where the normal alphabet of 20 amino acids is used, can be found in the literature (Shenkin et al., 1991Go; Korber et al., 1993Go; Strait and Dewey, 1996Go). PH domains contain only one residue that is nearly invariant (W124 in Btk; site 324 in the aligned sequences), the frequency of which is ~98%. To analyze type II conservation, we first simplified the amino acid alphabet to six physicochemically related groups. It is reported that a simplified five-group alphabet exhibits good folding ability if the 20 amino acids are grouped as follows: (V I L F M W Y C), (A H T), (DE), (G P) and (N Q R K S) (Wang and Wang, 1999Go). We wanted to separate the residues with positive charge and therefore used the following six categories: hydrophobic (V I L F M W Y C), negatively charged (D E), positively charged (R K H), conformational (G P), polar (N Q S) and (A T). The entropy and information of each position in the protein sequence were then calculated by the equations

and

where gi represents the six amino acid groups and Pgi is the probability distribution of the six groups.

Figure 5A shows the information for the N-terminal residues in PH domains. Position 7 (in Btk PH domain F10) is more conserved than position 5 according to the information calculated based on the alphabet of six groups. Figure 5B shows the actual frequencies of the amino acids. Position 7 is populated by hydrophobic residues only, whereas those at position 5 exhibit different physicochemical properties.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 5. (A) Information calculated from 20-amino acid (hatched bars) and 6-amino acid (empty bars) alphabets. For ease of comparison, the 6-amino acid information was scaled by log 20/log 6 (also see the Supplementary material for information on the full-length PH domain sequences). (B) Distribution of amino acids at positions 5 (hatched bars) and 7 (empty bars).

 
PH domains contain a number of physicochemically conserved sites. In addition to position 7, hydrophobic residues are conserved at positions 48 (Btk PH domain L32), 240 (F102), 324 (V124) and 328 (L128). Aromatic residues are conserved at position 324. These conserved residues are important for structural stability. RankViaContact calculations revealed that the contact energies of Btk residues F10, L32, F102 and L128 are ranked in the top 5% and that V124 is ranked in the top 14% of all residues. Thus, the XLA-causing mutation L32S, which abrogates the hydrophobic character of the residue, can be assigned as a structural mutation.

Mutual information analysis

The mutual information of positions in the multiple sequence alignment reflects covariation. We calculated the mutual information for 89 positions (of 333 in total) where the gap frequency was <0.1. The distribution of the 3916 mutual information values is shown in Figure 6. To sort the significant covariant pairs, P values were used as thresholds (Larson et al., 2000Go). The significantly covariant pairs having P < 0.01 or 0.01 < P < 0.05 are listed in Tables I and II, respectively, along with structural information. Most of the residues forming the covariant pairs listed in Table I are in direct contact (as indicated in the three-dimensional structure; see Supplementary material). The types of contacts vary and include hydrophobic–hydrophobic, aromatic–aromatic and hydrogen bond donor–acceptor interactions. Hence most of these pairs are important for structural stability. In protein evolution, structural constraints are generally stricter than functional constraints. In the case of PH domains, the overall three-dimensional structures are conserved although the functions and specificities vary widely. For example, the binding specificities of PH domains to inositol phosphates can be classified into four subgroups (Rameh et al., 1997Go; Kavran et al., 1998Go; Hurley and Misra, 2000Go). However, the PH domain in some proteins (e.g. UNC89) has lost this function (Blomberg et al., 2000Go). The Btk PH domain belongs to the 3-phosphoinositide-binding functional subgroup, a signature motif for which has been reported (Lietzke et al., 2000Go). This motif in the Btk PH domain can be written F10-X-K12-S14-X10-F25-K26-X-R28-X-F30-X-L32-X4-L37-X-Y39 (X denotes any residue). From Tables I and II, it is evident that all the important residues are among the significant covariant pairs except for L32, which is relatively conserved (type II) as explained above.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 6. Distribution of 3916 mutual information values for PH domains.

 

View this table:
[in this window]
[in a new window]
 
Table I. Structural analysis of covariant pairs (P < 0.01)
 

View this table:
[in this window]
[in a new window]
 
Table II. Structural analysis of covariant pairs (0.01 < P < 0.05)
 
Figure 7 (center) shows the residues related to covariation with P < 0.01 (red) or 0.01 < P < 0.05 (blue). The residues having lower P values are located in the core of the structure, whereas those with higher P values are on the surface. Of the 21 amino acids involved in the covariant pairs with P < 0.01, 13 (62%) are related to XLA-causing mutations. Six of these (F10, L31, Y39, F102, V113 and I125) appear in the extrema in the profile analysis. The XLA-causing mutations related to type III conservation are listed in Table III. For these residues, the frequencies of co-occurrence of mutated residues and the covariant counterpart of the normal amino acid are generally zero. For mutations K12R, R28H, V64D and S115F, the disease-causing mutated residues coexist with other covariant residues, but the frequencies for these occurrences are very low. This phenomenon may be related to the functional diversity of the PH domain. The mutant residues may occur in other functional subgroups. For example, K12 and R28 are important for 3-phosphoinositide binding (Lietzke et al., 2000Go). Flexibility and positive charge are important for the interaction because the binding of the Btk PH domain to inositol phosphates requires a conformational change (Baraldi et al., 1999Go). K and S are the most flexible residues, whereas R is less flexible (Vihinen et al., 1994Go). Residue K12 is critical for the twist of the ß1/ß2 and ß3/ß4 loops and for interaction with inositol phosphates. Hence the mutation K12R is likely to affect the necessary conformational change and therefore the binding affinity. This explanation also applies to the K27R and S14F mutations. Figure 7 (right) shows that the inositol phosphate-binding region of the Btk PH domain is strongly positively charged. Therefore, the mutation of R28 to P, L, C or H, any of which either changes or reduces the positive charge, would be expected to decrease the binding affinity for inositol phosphates.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 7. Left: secondary structures in the Btk PH domain (red, {alpha}-helix; blue, ß-stand). Center: the locations of covariant residues (red, P < 0.01; blue, 0.01 < P < 0.05). Right: electrostatic surface representations of the Btk PH domain bound to inositol phosphate (green). The molecular surface electrostatic potential is colored from red (–5 kT/e) to blue (+5 kT/e). The center and left structures form a stereo pair.

 

View this table:
[in this window]
[in a new window]
 
Table III. Analysis of XLA-causing mutations in the Btk PH domain
 
Type III conservation can be understood by visualizing the frequencies of amino acids in the two covariant positions. Figure 8A shows that the covariant residues 46 (Btk F30) and 62 (Btk Y39) form a hydrophobic pair. The XLA-causing mutations Y39 to S, H or C disrupt this hydrophobic interaction. Similarly, positions 9 (K12) and 44 (R28) seem to form an odd positively charged pair (Figure 8B). The explanation for this covariation is that both residues make contact with a bound negatively charged inositol phosphate (see Figure 7, right), thereby creating a functionally based covariant pair.



View larger version (53K):
[in this window]
[in a new window]
 
Fig. 8. (A) Distribution of amino acid pairs between site 46 (Btk PH 30F) and site 62 (39Y). (B) Distribution between site 9 (12K) and site 44 (28R).

 
We also found triplets among the covariant residues. The triplets further form larger networks (Figure 9). The more contacts a residue makes, the more constrained it is with respect to acquiring mutations (e.g. Y39 and Y40). Among the 10 residues in the signature motif for 3-phosphoinositide binding (Lietzke et al., 2000Go), L32 is located at a conserved position and seven other residues (F10, K12, F25, R28, F30, L37 and Y39) are present in the network (Figure 9). Among the 11 residues involved in triplets, six (Y39, Y40, R28, K12, F25 and T33) may harbor XLA-causing mutations. Hence the network actually depicts crucial positions.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 9. Networks formed by covarying pairs having P < 0.05.

 
The above discussion does not provide an explanation for five XLA mutations, but our analysis suggests the following: L11 is buried in the protein core (SAS = 0.02) and the contact energy is within the top 15% for the Btk PH domain. The frequencies of G and P at this site are zero, indicating that the L11P mutation may have structural consequences. Since the positively charged K19 is involved in ligand binding (Baraldi et al., 1999Go), substitution with negatively charged E has an adverse effect on affinity. Finally, missense mutations in the initiation residue M1 (to V, T or I) prevent translation.

In summary, there are altogether 28 XLA-causing mutations in 54 families. The majority of the mutations have been identified from at most two families. However, there are some that appear more frequently. R28H mutation is in a well-known mutational hotspot (12 affected families). M1T has been found from four families and R28C, Y40C, I61N from three unrelated families. R28 has also been mutated to P and L. Eighteen of these mutation types (64%) have evolutionary constraints and have structural consequences. The frequencies of these mutated amino acids are zero in the multiple sequence alignment. Seven mutation types are in functional positions, where they decrease the binding affinity. Three mutations (eight families) in the initiation codon prevent protein translation.

Conclusion

Sequence-based profiles of physicochemical properties of the Btk PH domain were compared with the three-dimensional structure-derived solvent accessibility profiles. Eight equally located prominent extrema were detected in all profiles. Both energy estimation and information theory indicated that the residues in these extrema are important for protein structure and stability.

Amino acid conservation was investigated on three levels. Type I denotes invariance, apparent from sequence alignment. In type II, physicochemical properties are conserved. To identify type II conservation, information and entropies were calculated with a reduced alphabet of six groups. Since the sequence identity of PH domains is very low, only a small number of residues with type I or II conservation were identified. Type III conservation indicates covariance, inferred by calculating mutual information. Mutations in any of the three types of conserved sites may cause structural destabilization or a loss of function.

By combining information theoretical analysis with structural analysis, we were able to describe the molecular basis for mutations in the Btk PH domain. The frequencies of co-occurrence that are presented in the Supplementary material may be useful for future analyses of novel disease-causing mutations. In addition, our results may also facilitate experimental design; for example, residues that are found to participate in interacting networks or those involved in significant covariant pairs could be tested by site-directed mutagenesis, binding affinity analysis or molecular dynamics simulations. This information may also be helpful for designing mutations that alter protein properties in a predictable way.

Supplementary material

Supplementary material is available at http://bioinf.uta.fi/PHdomain.htm.


    Acknowledgements
 
We gratefully acknowledge the financial support of the Finnish Academy and the Medical Research Fund of Tampere University Hospital.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Abrams,C.S., Zhao,W. and Brass,L.F. (1996) Biochim. Biophys. Acta, 1314, 233–238.[CrossRef][ISI][Medline]

Afonnikov,D.A., Kondrakhin,Y.V. and Titov,I.I. (1997) Mol. Biol., 31, 631–637.[ISI]

Ahmad,S. and Gromiha,M.M. (2002) Bioinformatics, 18, 819–824.[Abstract/Free Full Text]

Anfinsen,C.B. (1973) Science, 181, 223–230.[ISI][Medline]

Baraldi,E., Carugo,K.D., Hyvönen,M., Surdo,P.L., Riley,A.M., Potter,B.V., O’Brien,R., Ladbury,J.E. and Saraste,M. (1999) Structure, 7, 449–460.[CrossRef][ISI][Medline]

Bateman,A., Birney,E., Durbin,R., Eddy,S.R., Finn,R.D. and Sonnhammer,E.L. (1999) Nucleic Acids Res., 27, 260–262.[Abstract/Free Full Text]

Bateman,A. et al. (2002) Nucleic Acids Res., 30, 276–280.[Abstract/Free Full Text]

Blomberg,N. and Nilges,M., (1997) Fold. Des., 2, 343–355.[ISI][Medline]

Blomberg,N., Gabdoulline,R.R., Nilges,M. and Wade,R.C. (1999) Proteins, 37, 379–387.[CrossRef][ISI][Medline]

Blomberg,N., Baraldi,E., Sattler,M., Saraste,M. and Nilges,M. (2000) Structure, 8, 1079–1087.[CrossRef][ISI][Medline]

Bullock,A.N., Henckel,J., DeDecker,B.S., Johnson,C.M., Nikolova,P.V., Proctor M.R., Lane D.P. and Fersht A.R. (1997) Proc. Natl Acad. Sci. USA, 94, 14338–14342.[Abstract/Free Full Text]

Clarke,N.D. (1995) Protein Sci., 4, 2269–2278.[Abstract/Free Full Text]

Collantes,E.R. and Dunn,W.J. (1995) J. Med. Chem., 38, 2705–2713.[ISI][Medline]

Crowder,S., Holton,J. and Alber,T. (2001) J. Mol. Biol., 310, 793–800.[CrossRef][ISI][Medline]

Eisenberg,D., Schwarz,E., Komaromy,M. and Wall,R. (1984) J. Mol. Biol., 179, 125–142.[ISI][Medline]

Fariselli,P., Olmea,O., Valencia,A. and Casadio R. (2001) Protein Eng., 14, 835–843.[CrossRef][ISI][Medline]

Ferrer-Costa,C., Orozco,M. and de la Cruz,X. (2002) J. Mol. Biol., 315, 771–786.[CrossRef][ISI][Medline]

Gromiha,M.M., Oobatake,M., Kono,H., Uedaira,H. and Sarai,A. (1999a) J. Protein Chem., 18, 565–578.[CrossRef][ISI][Medline]

Gromiha,M.M., Oobatake,M., Kono,H., Uedaira,H. and Sarai,A. (1999b) Protein Eng., 12, 549–555.[CrossRef][ISI][Medline]

Gromiha,M.M., Oobatake,M., Kono,H., Uedaira,H. and Sarai A. (2000) J. Biomol. Struct. Dyn., 18, 281–295.[ISI][Medline]

Haslam,R.J., Koide,H.B. and Hemmings,B.A. (1993) Nature, 363, 309–310.[ISI][Medline]

Holley,L.H. and Karplus,M. (1989) Proc. Natl Acad. Sci. USA, 86, 152–156.[Abstract]

Honig,B., Sharp,K.A. and Yang,A.-S. (1993) J. Phys. Chem., 97, 1101–1109.[ISI]

Hua,S. and Sun,Z. (2001) Bioinformatics, 17, 721–728.[Abstract/Free Full Text]

Hurley,J.H. and Misra,S. (2000) Annu. Rev. Biophys. Biomol. Struct., 29, 49–79.[CrossRef][ISI][Medline]

Kavran,J.M., Klein,D.E., Lee,A., Falasca,M., Isakoff,S.J., Skolnik,E.Y. and Lemmon,M.A. (1998) J. Biol. Chem., 273, 30497–30508.[Abstract/Free Full Text]

Korber,B.T., Farber,R.M., Wolpert,D.H. and Lapedes,A.S. (1993) Proc. Natl Acad. Sci. USA, 90, 7176–7180.[Abstract]

Krappa,R., Nguyen,A., Burrola,P., Deretic,D. and Lemke,G. (1999) Proc. Natl Acad. Sci. USA, 96, 4633–4638.[Abstract/Free Full Text]

Larson,S.M., Di Nardo,A.A. and Davidson,A.R. (2000) J. Mol. Biol., 303, 433–446.[CrossRef][ISI][Medline]

Lee,B. and Richards,F.M. (1971) J. Mol. Biol., 55, 379–400.[ISI][Medline]

Lemmon,M.A. and Ferguson,K.M. (2000) Biochem. J., 350, Pt 1, 1–18.[CrossRef][ISI][Medline]

Lietzke,S.E., Bose,S., Cronin,T., Klarlund,J., Chawla,A., Czech,M.P. and Lambright,D.G. (2000) Mol. Cell, 6, 385–394.[ISI][Medline]

Lodowski,D.T., Pitcher,J.A., Capel,W.D., Lefkowitz,R.J. and Tesmer,J.J. (2003) Science, 300, 1256–1262.[Abstract/Free Full Text]

Madabushi,S., Yao,H., Marsh,M., Kristensen,D.M., Philippi,A., Sowa,M.E. and Lichtarge,O. (2002) J. Mol. Biol., 316, 139–154.[CrossRef][ISI][Medline]

Mateu,M.G. and Fersht,A.R. (1999) Proc. Natl Acad. Sci. USA, 96, 3595–3599.[Abstract/Free Full Text]

Mayer,B.J., Ren,R., Clark,K.L. and Baltimore,D. (1993) Cell, 73, 629–630.[ISI][Medline]

Mirny,L.A. and Gelfand,M.S. (2002) J. Mol. Biol., 321, 7–20.[CrossRef][ISI][Medline]

Ng,P.C. and Henikoff ,S. (2001) Genome Res., 5, 863–874.[CrossRef]

Okoh,M.P., Vihinen,M. (1999) Biochem. Biophys. Res. Commun., 265, 151–157.[CrossRef][ISI][Medline]

Orrico,A., Galli,L., Falciani,M., Bracci,M., Cavaliere,M.L., Rinaldi,M.M., Musacchio,A. and Sorrentino,V. (2000) FEBS Lett., 478, 216–220.[CrossRef][ISI][Medline]

Pazos,F., Helmer-Citterich,M., Ausiello,G. and Valencia,A. (1997) J. Mol. Biol., 271, 511–523.[CrossRef][ISI][Medline]

Rameh,L.E. et al. (1997) J. Biol. Chem., 272, 22059–22066.[Abstract/Free Full Text]

Saraf,M.C., Moore,G.L. and Maranas,C.D. (2003) Protein Eng., 16, 397–406.[CrossRef][ISI][Medline]

Schueler-Furman,O. and Baker,D. (2003) Proteins, 52, 225–235.[CrossRef][ISI][Medline]

Schwartz,C.E., Gillessen-Kaesbach,G., May,M., Cappa,M., Gorski,J., Steindl,K. and Neri,G. (2000) Eur. J. Hum. Genet., 8, 869–874.[CrossRef][ISI][Medline]

Shen,B.R. and Vihinen,M. (2003) Bioinformatics, 19, 2161–2162.[Abstract/Free Full Text]

Shenkin,P.S., Erman,B. and Mastrandrea,L.D. (1991) Proteins, 11, 297–313.[ISI][Medline]

Sobolev,V., Sorokine,A., Prilusky,J., Abola,E.E. and Edelman,M. (1999) Bioinformatics, 15, 327–332.[Abstract/Free Full Text]

Steward,R.E., MacArthur,M.W., Laskowski,R.A. and Thornton,J.M. (2003) Trends Genet., 19, 505–513.[CrossRef][ISI][Medline]

Strait,B.J. and Dewey,T.G. (1996) Biophys. J., 71, 148–155.[Abstract]

Sunyaev,S., Ramensky,V., Koch,I., Lathe,W.,III, Kondrashov,A.S. and Bork,P. (2001) Hum. Mol. Genet., 10, 591–597.[Abstract/Free Full Text]

Thomas,D.J., Casari,G. and Sander,C. (1996) Protein Eng., 9, 941–948.[ISI][Medline]

Topham,C.M., Srinivasan,N. and Blundell T.L. (1997) Protein Eng., 10, 7–21.[CrossRef][ISI][Medline]

Tsukada,S. et al. (1993) Cell, 72, 279–290.[ISI][Medline]

Tuffery,P., Etchebest,C. and Hazout,S. (1997) Protein Eng., 10, 361–372.[CrossRef][ISI][Medline]

Vernet,T., Tessier,D.C., Khouri,H.E. and Altschuh,D. (1992) J. Mol. Biol., 224, 501–509.[ISI][Medline]

Vetrie,D. et al. (1993) Nature, 361, 226–233.[CrossRef][ISI][Medline]

Vihinen,M., Torkkila,E. and Riikonen,P. (1994) Proteins, 19, 141–149.[ISI][Medline]

Vihinen,M. et al. (1995a) Immunol. Today, 16, 460–465.[CrossRef][ISI][Medline]

Vihinen,M., Zvelebil,M.J., Zhu,Q., Brooimans,R.A., Ochs,H.D., Zegers,B.J., Nilsson,L., Waterfield,M.D. and Smith,C.I. (1995b) Biochemistry, 34, 1475–1481.[ISI][Medline]

Vihinen,M., Kwan,S.P., Lester,T., Ochs,H.D., Resnick,I., Väliaho,J., Conley,M.E. and Smith,C.I. (1999) Hum. Mutat., 13, 280–285.[CrossRef][ISI][Medline]

Vihinen,M. et al. (2001) Adv. Genet., 43, 103–188.[Medline]

Wang,J. and Wang,W. (1999) Nat. Struct. Biol., 6, 1033–1038.[CrossRef][ISI][Medline]

Wang,Z. and Moult,J. (2001) Hum. Mutat. 17, 263–270.[CrossRef][ISI][Medline]

Word,J.M., Bateman,R.C.,Jr, Presley,B.K., Lovell,S.C. and Richardson,D.C. (2000) Protein Sci., 9, 2251–2259.[Abstract]

Wright,J.D. and Lim,C. (2001) Protein Eng., 14, 479–486.[CrossRef][ISI][Medline]

Yao,L., Suzuki,H., Ozawa,K., Deng,J., Lehel,C., Fukamachi,H., Anderson,W.B., Kawakami,Y. and Kawakami,T. (1997) J. Biol. Chem., 272, 13033–13039.[Abstract/Free Full Text]

Zhang,C. and Kim,S.H. (2000) Proc. Natl Acad. Sci. USA, 97, 2550–2555.[Abstract/Free Full Text]

Received December 29, 2003; revised March 2, 2004; accepted March 18, 2004 Edited by Jane Clarke





This Article
Abstract
FREE Full Text (PDF)
All Versions of this Article:
17/3/267    most recent
gzh030v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Shen, B.
Articles by Vihinen, M.
PubMed
PubMed Citation
Articles by Shen, B.
Articles by Vihinen, M.