1 Laboratoire de Toxicocinétique et Pharmacocinétique, Faculté de Pharmacie, Université de la Méditerranée Aix-Marseille II, Marseille, France and 3 Cattedra di Anatomia Patologica, Dipartimento di Ricerche Mediche e Morfologiche, Facoltà di Medicina e Chirurgia, Università degli Studi di Udine, Udine, Italy
2 To whom correspondence should be addressed. E-mail: hongguanglishibahao{at}yahoo.com
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: Gaucher disease/glucocerebrosidase/probability/randomness/variant
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The gene for human glucocerebrosidase is located on chromosome 1q21. To date, ~110 different mutations are known to occur in glucocerebrosidase gene (Incerti, 1995; Grabowski and Horowitz, 1997
; Beutler and Gelbart, 1998
), including point mutations, splice junction mutations, deletions, fusion alleles and recombinant alleles (Stone et al., 2000
). However, with so many variants in the enzyme, little is known about which amino acid sub-sequences in glucocerebrosidase are more sensitive to variants. It is still difficult to draw a general rule on which amino acid sub-sequences are more sensitive to variants and which amino acid sub-sequences are less sensitive to variants. If such a general rule can be drawn, then we could not only gain more insight into the relationship between the glucocerebrosidase and Gaucher disease, but more important we could also give attention to these sensitive sub-sequences in order to prevent them from variants. Moreover, we could even in principle predict the possible sub-sequences sensitive to the currently unknown variants.
This problem can be assessed from different approaches such as empirical (regression analysis), experimental (artificial and natural mutations) and computational (multiple sequence comparisons and alignments), etc. A pseudogene for glucocerebrosidase is located 16 kb downstream from the functional gene, sharing 97% exonic sequence homology (Horewitz et al., 1989; Winfield et al., 1997
). Many mutations are complex alleles due to recombination events between the gene and pseudogene, such as gene conversion or unequal crossing over (Cormand et al., 2000
). However, these explanations still do not answer why some amino acid sub-sequences are sensitive to variants.
Probably the probabilistic approach can contribute the understanding of this problem, because in the past we have used two probabilistic approaches to analyse the primary structures of different proteins with the hope that these approaches might throw light on glucosylceramidase constructions and the related Gaucher disease. In general, our first approach can predict the present and absent amino acid sub-sequences in a protein primary structure. We argue that the randomly predictable present and absent sub-sequences should not be deliberately evolved, whereas the randomly unpredictable present and absent sub-sequences should be deliberately evolved. Accordingly, our first approach can classify the present amino acid sub-sequences as randomly predictable and unpredictable sub-sequences. We suggest that the randomly unpredictable amino acid sub-sequences are more related with protein function and the variants in these sub-sequences may lead to the dysfunction of the protein. More recently, we found that a mutation, which leads to the dysfunction of rat monoamine oxidase B, is located in a randomly unpredictable amino acid pair. In contrast, another mutation, which does not affect rat monoamine oxidase B function, is located in randomly predictable amino acid pairs (Wu and Yan, 2001).
In this study, we attempted to use our first random approach to analyse amino acid pairs in human ß-glucocerebrosidase with its 109 variants in order to determine which amino acid pairs are more sensitive to variants.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Amino acid pairs in human glucocerebrosidase
The human glucocerebrosidase precursor is composed of 536 amino acids. We count the first and second amino acids as an amino acid pair, the second and third as another amino acid pair, the third and fourth, and so on, until the 535th and 536th, hence there is a total of 535 amino acid pairs. As there are 20 types of amino acids, any amino acid pair can be composed from any of 20 types of amino acids, so theoretically there are 400 (202) kinds of amino acid pairs. Again there are 535 amino acid pairs in human glucocerebrosidase, which is more than 400 kinds of theoretical amino acid pairs, hence clearly some of the 400 kinds of theoretical amino acid pairs should appear more than once. Further, we may expect that some of the 400 kinds of theoretical amino acid pairs are absent from human glucocerebrosidase.
Randomly predicted frequency and actual frequency
The randomly predicted frequency is calculated according to the simple permutation principle (Feller, 1968). For example, there are 42 alanines (A) in human glucocerebrosidase, and the predicted frequency of amino acid pair AA would be 3 (42/536x41/535x535 = 3.213). Actually we can find three AAs in human glucocerebrosidase, so the actual frequency of AA is 3. Hence we have three relationships between actual and predicted frequencies, i.e. the actual frequency is smaller than, equal to and larger than the predicted frequency.
Randomly predictable present amino acid pairs
As described in the last section, the frequency of randomly presence of amino acid pair AA would be 3 and AA really appears three times in human glucocerebrosidase, so the presence of AA is randomly predictable.
Randomly unpredictable present amino acid pairs
There are 44 serines (S) in human glucocerebrosidase, and the frequency of random presence of amino acid pair AS would be 3 (42/536x44/535x535 = 3.448), i.e. there would be three ASs in human glucocerebrosidase. However, AS actually appears five times, so the presence of AS is randomly unpredictable. This is also a case where the actual frequency is larger than the predicted frequency. Another case is that the actual frequency is smaller than the predicted frequency. For example, there are 60 leucines (L) and 37 prolines (P) in human glucocerebrosidase and the predicted frequency of LP is 4 (60/536x37/535x535 = 4.142), whereas the actual frequency of LP is 2.
Randomly predictable absent amino acid pairs
There are 23 arginines (R) and eight cysteines (C) in human glucocerebrosidase, and the frequency of random presence of RC would be 0 (23/536x8/535x535 = 0.343), i.e. the amino acid pair RC would not appear in human glucocerebrosidase, which is true in the real situation. Hence the absence of RC is randomly predictable.
Randomly unpredictable absent amino acid pairs
There are 27 phenylalanines (F) in human glucocerebrosidase, and the frequency of random presence of AF would be 2 (42/536x27/535x535 = 2.116), i.e. there would be two AFs in human glucocerebrosidase. However, no AF appears in the enzyme, therefore the absence of AF from human glucocerebrosidase is randomly unpredictable.
Variants in randomly predictable and unpredictable amino acid pairs
Our rationale for the determination of variants in randomly predictable and unpredictable present amino acid pairs is based on the finding of our previous study (Wu and Yan, 2001), which is described as follows. There are two mutations in rat monoamine oxidase B. The first mutation occurs at position 139 changing leucine (L) to histidine (H). The amino acids at positions 138 and 140 are proline (P) and alanine (A), hence this mutation leads to four amino acid pairs changed, i.e. PL
PH and LA
HA. As PL and LA are randomly predictable amino acid pairs according to our random analysis, consequently we would not expect the first mutation to lead to a substantial change in enzymatic activity, which is true in the real situation. The second mutation occurs at position 199 changing I to F leading to the changes in amino acid pairs as II
IF and IS
FS. As IS belongs to the randomly unpredictable amino acid pairs, we would expect the second mutation to bring about a substantial change in enzymatic activity, and such an expectation also is true in the real situation. In this manner we hope to determine whether a variant occurs at randomly predictable or unpredictable amino acid pairs in human glucocerebrosidase in order to gain more insight into the relationship between variants and sensitivity of amino acid pairs.
Difference between actual and randomly predicted frequencies
For the numerical analysis, we calculate the difference between the actual frequency (AF) and predicted frequency (PF) of affected amino acid pairs, i.e. (AF PF). For instance, a variant at position 215 substitutes A for D, which results in two amino acid pairs, LA and AS, changing to LD and DS, because the amino acid is L at position 214 and S at position 216. The actual frequency and predicted frequency are 6 and 5 for LA, 5 and 3 for AS, 4 and 3 for LD and 2 and 2 for DS, respectively. Hence the difference between actual and predicted frequencies is 3 with respect to the substituted amino acid pairs, i.e. (6 - 5) + (5 - 3), and 1 with respect to the substituting amino acid pairs, i.e. (4 - 3) + (2 - 2). In this way, we can compare the frequency differences in the amino acid pairs affected by variants.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Of 400 kinds of theoretical amino acid pairs, 134 kinds are absent from human glucocerebrosidase including 37 randomly predictable and 97 randomly unpredictable. Consequently, 535 amino acid pairs in human glucocerebrosidase include only 266 kinds of theoretical amino acid pairs (400 - 134 = 266), i.e. some amino acid pairs should appear more than once. Actually, of 535 amino acid pairs in human glucocerebrosidase, 119 kinds of theoretical amino acid pairs appear once, 82 kinds twice, 37 kinds three times, 17 kinds four times, one kind five times, eight kinds six times, one kind seven times and one kind 13 times.
Of 266 kinds of theoretical amino acid pairs in human glucocerebrosidase, 107 kinds are randomly predictable and 159 kinds are randomly unpredictable. As mentioned above, some kinds of amino acid pairs appear more than once, thus of 535 amino acid pairs in human glucocerebrosidase, 153 pairs are randomly predictable and 382 pairs are randomly unpredictable. We therefore can find how many variants occur with respect to these present amino acid pairs in human glucocerebrosidase (Table I).
|
As mentioned in the Materials and methods section, a missense mutant protein leads to two amino acid pairs being substituted by another two and their actual frequency can be smaller than, equal to and larger than the randomly predictable frequency. Tables II and III detail the situations related to substituted and substituting amino acid pairs, respectively, and the relationship between their actual and randomly predicted frequencies.
|
|
Tables I and II
indicate that 93.58% of variants occur at randomly unpredictable present amino acid pairs and 6.42% of variants occur in randomly predictable amino acid pairs. These results imply that 159 kinds of randomly unpredictable present amino acid pairs account for 93.58% variants in human glucocerebrosidase, whereas 107 kinds of randomly predictable present amino acid pairs account for only 6.42% of variants. Still, we can see from the ratio in Table I
that the chance of occurrence of variants in unpredictable amino acid pairs is far larger than in predictable amino acid pairs. For example, the chance of occurrence of variant is almost 8-fold higher in unpredictable kind than in predictable kind (0.64 vs 0.07). These results strongly support our rationale that the harmful variants are more likely to occur at randomly unpredictable present amino acid pairs, which therefore are more sensitive to the variants.
When looking at the unpredictable pairs in Table II, we find that the majority of these pairs are characterized by one or both substituted pairs whose actual frequency is larger than the predicted frequency (the first three rows in unpredictable pairs). Comparing each variant, we find that the impact of variants is to diminish the difference between actual and predicted frequencies by means of reducing the actual frequency, which indicates that the variants lead to the construction of amino acid pairs being randomly predictable. In other words, the variants result in the construction of amino acid pairs being more naturally easy to occur. It is interesting that there are only five variants occurring in the amino acid pairs whose actual frequency is smaller than predicted frequency in both pairs. This phenomenon suggests that it is difficult for variants to narrow the difference between actual and predicted frequencies by means of increasing the actual frequency, which, however, would lead to the construction of amino acid pairs opposite to the natural direction.
Table III can be read as follows. The first and second columns indicate the actual and predicted situations in amino acid pairs I and II, the third and fourth columns indicate the number of variants occurring at amino acid pairs I and II and their percentages and the fifth column is the total percentage of our classifications.
Table III shows that 44.04% of variants bring about one or both substituting amino acid pairs which are absent in normal human glucocerebrosidase (AF = 0). Also, 57.80% of variants target one or both substituting amino acid pairs with their actual frequency smaller than the predicted frequency (
). These phenomena indicate that the amino acid pairs in mutant proteins are more randomly constructed.
Frequency difference of amino acid pairs affected by variants
The difference between actual and predicted frequencies represents a measure of randomness of construction of amino acid pairs, i.e. the smaller the difference, the more random is the construction of amino acid pairs. In particular, (i) the larger the positive difference, the more randomly unpredictable is the presence of amino acid pairs; and (ii) the larger the negative difference, the more randomly unpredictable is the absence of amino acid pairs.
Considering all 109 variants, the difference between actual and predicted frequencies is 1.68 ± 0.17 (mean ± SE, ranging from -2 to 6) for substituted amino acid pairs. This means that the variants occur in the amino acid pairs which appear more than their predicted frequency. Meanwhile, the difference between actual and predicted frequencies is -0.11 ± 0.18 (mean ± SE, ranging from -4 to 5) for substituting amino acid pairs, which implies that the substituting amino acid pairs are randomly constructed in the mutant glucocerebrosidase, as their actual and predicted frequencies are about the same. A striking statistical difference is found between the substituted and substituting amino acid pairs (P < 0.0001). Figure 1 shows the distribution of the difference between actual and predicted frequencies.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Based on our previous studies [see our review article (Wu and Yan, 2002), for details of references], our argument is that the functional amino acid pairs should be deliberately evolved and hence the actual frequency should be different from the predicted frequency. As the predicted frequency represents the highest chance for construction of amino acid pairs, it is important to find out whether the variants lead the actual frequency to approach the predicted frequency. If so, we can understand that the protein has a natural trend to variants; if not, the protein does not have a natural trend to variants. The present study demonstrates that the human glucocerebrosidase has a natural trend to variants.
In this study, the unpredictable amino acid pairs account for 71.40% of 535 amino acid pairs in glucocerebrosidase, and the unpredictable amino acid pairs account for 69.32 ± 4.48% of amino acid pairs in 13 different proteins (Wu and Yan, 2002). If we consider that the proteins chosen in our studies were randomly sampled from the Swiss-Prot data bank, we could estimate that all the proteins might have about 70% randomly unpredictable amino acid pairs in their primary structure.
With respect to randomly unpredictable absent and present amino acid pairs, we are interested in the difference between actual and predicted frequencies, because the predictable absence and presence represent the naturally easiest occurring events, i.e. the construction of amino acid pairs should be the least energy and time consuming. Hence the difference between actual and predicted frequencies should be engineered by the evolutionary process: the larger the difference, the greater the impact of the evolutionary process. A diminishing difference between actual and predicted frequencies has been shown in this study, hence the variants in fact represent a degeneration process inducing Gaucher disease related to the glucocerebrosidase variants.
In this study, we focused our efforts on the linear primary structure, i.e. one-dimensional structure. Therefore, we used the (i, i + 1) amino acid pair rather than (i, i + 2), (i, i + 3) and (i, i + k) amino acid pairs and the (i, i + 1) amino acid pairs are constructed by means of peptide bond. On the other hand, we would use the (i, i + 2), (i, i + 3) and (i, i + k) amino acid pairs when we consider high-level protein structures and these amino acid pairs are constructed by means of SS bonds, for instance.
In this study, we are dealing with amino acid pairs rather than triplets, quadruplets, multiplets, because a point mutation is directly related to two amino acid pairs. The first pair is composed of an amino acid preceding the mutated amino acid and the mutated amino acid and the second pair is composed of an amino acid following the mutated amino acid and the mutated amino acid, except for the case when point mutation occurs at the beginning and the end of the amino acid sequence. This means that the neighboring amino acids have direct effects on the stability of the mutated amino acid. On the other hand, the amino acids located beyond the preceding and the following amino acid would have less direct effects on the mutated amino acid, although they still have some effects. These amino acids belong to the triplets and quadruplets, multiple amino acid sequences. Although two amino acids in a triplet have direct effects on the stability of the mutated amino acid when the mutated amino acid is just located in the middle of the triplet, this is still the case of an amino acid pair, and we should consider the indirect effect when the mutated amino acid is not located in the middle of the triplet. Therefore, we consider that we should focus our efforts on the amino acid pairs at the first stage, because the amino acid pairs have a direct effect on the point mutation, and this direct effect can be easily quantified.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Balicki,D. and Beutler,E. (1995) Medicine (Baltimore), 74, 305323.[CrossRef][ISI][Medline]
Berrebi,A., Wishnitzer,R. and Von der Walde,U. (1984) Nouv. Rev. Fr. Hematol., 26, 201203.[ISI][Medline]
Beutler,E. (1997) Curr. Opin. Hemotol., 4, 1929.[Medline]
Beutler,E. (2001) Blood, 98, 25972602.
Beutler,E. and Gelbart,T. (1998) Blood Cells Mol. Dis., 24, 28.[CrossRef][ISI][Medline]
Brady,R.O., Kanfer,J. and Shapiro,D. (1965) Biochem. Biophys. Res. Commun., 18, 221225.[ISI]
Chang-Lo,M. and Yam,L.T. (1967) Am. J. Med. Sci., 254, 303315.[ISI][Medline]
Cormand,B., Diaz,A., Grinberg,D., Chabas,A. Vilageliu,L. (2000) Blood Cells Mol. Dis., 26, 409416.[CrossRef][ISI][Medline]
Daneman,A., Stringer,D. and Reilly,B.J. (1983) Radiology, 149, 463467.[Abstract]
Feller,W. (ed.) (1968) An Introduction to Probability Theory and Its Applications, Vol. I. 3rd edn. Wiley, New York.
Ginsburg,S.J. and Groll,M. (1973) J. Pediatr., 82, 10461048.[ISI][Medline]
Grabowski,G.A. and Horowitz,M. (1997) In Zimran,A. (ed.), Gauchers Disease: Molecular, Genetic and Enzymological Aspects. Baillières Clinical Haematology, London, pp. 635656.
Horewitz,M., Wilder,S., Horowitz,Z., Reiner,O., Gelbart,T. and Beutler,E. (1989) Genomics, 4, 8796.[ISI][Medline]
Incerti,C. (1995) Semin Hematol., 3(suppl 32), 39.
Niederau,C. and Haussinger,D. (2000) Hematogastroenterology, 47, 984997.
Petrides,P.E. (1998) Arzneimitteltherapie, 16, 4951.
Stone,D.L., Tayebi,N., Orvisky,E., Stubblefield,B., Madike,V. and Sidransky,E. (2000) Hum. Mutat., 15, 181188.[CrossRef][ISI][Medline]
Sun,C.C., Panny,S., Combs,J. and Gutberlett,R. (1984) Pathol. Res. Pract., 179, 101104.[ISI][Medline]
Winfield,S.L., Tayebi,N., Martin,B.M., Ginns,E.I. Sidransky,E. (1997) Genome Res., 7, 10201026.
Wu,G. and Yan,S.-M. (2001) Biomol. Eng., 18, 2327.[CrossRef][ISI][Medline]
Wu,G. and Yan,S.-M. (2002) Mol. Biol. Today, 3, 5569.
Received June 25, 2002; revised January 2, 2003; accepted January 23, 2003.