1 Biomolecular Structure and Modelling Group, Biochemistry & Molecular Biology Department, University College London, Gower Street, London WC1E 6BT and 3 Crystallography Department, Birkbeck College,Malet Street, London WC1E 7HX, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: clustering/function/glycosidase/structure/TIM barrels
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Since there are many kinds of carbohydrates in nature, there are many glycosyl hydrolases, i.e. enzymes hydrolysing the glycosidic bond between two carbohydrates or a carbohydrate and a non-carbohydrate moiety. In 1991, Henrissat classified a total of 291 glycosyl hydrolase sequences, corresponding to 39 E.C. numbers, into 35 families (Henrissat, 1991). Since then, he and his collaborator have updated the classification of the enzymes (Henrissat and Bairoch, 1993
, 1996
) and now there are at least 70 sequence-based families of glycosyl hydrolases (Henrissat and Davies, 1997
; http:// expasy.cbr.nrc.ca/cgi-bin/lists?glycosid.txt).
Structural representatives for 27 out of the 60 sequence families have been reported and classified based on the structure, catalytic residues and mechanism (Henrissat and Davies, 1997). Of these, at least nine families adopt the eight-stranded TIM barrel fold (Henrissat and Davies, 1997
), and five of these families (family-1, 2, 5, 10, 17) have been reported to have two catalytic residues located close to the C-termini of the fourth and seventh ß-strands of the barrel structure, with a similar retaining mechanism (Jenkins et al., 1995
; Henrissat and Davies, 1997
). This superfamily was named the 4/7 superfamily due to the positions of the catalytic residues (Jenkins et al., 1995
). However, in the other four sequence families, the catalytic residues occur at different locations in TIM barrel glycosidases and the relationships between them are still not clear.
In this paper, the sequences, structures and functions of all the TIM barrel glycosidases, including 30 distinct sequence families, will be analysed systematically, in order to elucidate more distant relationships between them.
![]() |
Material and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the S-level, where structures within each H-level are subdivided into sequence families on the basis of sequence identity, domains clustered in the same sequence families have sequence identities larger than 35% to another member of the family with at least 60% of the larger domain equivalent to the smaller, indicating highly similar structures and functions. For new TIM barrel structures, it is often necessary to check the classification manually. Multi-domain proteins are subdivided into their constituent domains using a consensus procedure (Jones et al., 1998) and each domain is classified individually.
The PDB codes and E.C. numbers of representative proteins of all the glycosidase sequence families in the PDB are listed in Table I. The nomenclature of the glycosidase family derived by Henrissat is also included in this table (Henrissat, 1991
; Henrissat and Bairoch, 1993
, 1996
) and will be used throughout this paper, using HF-#. Here, # is 1, 2, 5, 10, 13, 14, 17, 18 or 20. As the sequence identity threshold for the CATH sequence families is higher than that of Henrissat, many families in Henrissat's nomenclature are further subdivided in the CATH classification (Table I
).
|
![]() | (1) |
![]() | (2) |
The catalytic sites of glycosidase TIM barrels were analysed by creating three-dimensional templates generated by the TESS algorithm (Wallace et al., 1997) and SPASM algorithm (Kleywegt, 1999
). TESS templates are created by extracting three reference atoms and a selection of surrounding atoms from the active site of a parent structure. Average templates were obtained from combining information from the parent and closely related structures in order to create consensus templates. The consensus templates were screened against the PDB database to find similar active sites. In contrast, a SPASM template is represented by the C
atoms and the centre of gravity of the side chain atoms of each residue, which provides a more relaxed requirement for matching than a TESS template. Conservation and structural superposition of the catalytic residues were checked by the CORA algorithm, which generates a multiple-alignment of protein structures (Orengo, 1999
).
The PSI-BLAST program (Altschul et al., 1997) was also used to detect weak but biologically relevant sequence similarities by following the procedure used to assign genomic sequences to the CATH database (Pearl et al., 2000
). The maximum number of iterations allowed is 20, and the E value for inclusion in the next pass is 0.0005. All sequence segments with E values <0.001 were collected as putative homologous domains in the final run.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Figure 1a illustrates the range of enzyme functions performed by TIM barrels and the corresponding E.C. class distribution for all glycosidases is shown in Figure 1b
. Half of all TIM barrels are classified as hydrolases (primary E.C. number 3) (Figure 1a
). Most of these hydrolases belong to the glycosidase superfamily and according to the E.C. distribution of the glycosidase superfamily, most members are glycosylase (E.C. numbers 3.2.-.-) (Figure 1b
). (Glycosylases or glycosyl hydrolases are enzymes with E.C. 3.2.-.-, whilst glycosidases are a subset of the glycosylases with EC 3.2.1.-, which specifically hydrolyse O-glycosyl compounds.) Exceptions are narbonin and concanavalin B, which have no enzymatic activity (Hennig et al., 1992
, 1995
) and cyclodextrin glycosyltransferase (primary E.C. number 2), which acts on a sugar but is a transferase rather than a hydrolase.
|
To elucidate the structural relationships, PCA was applied to the glycosidase family (Figure 2) based on maximum sequence identity and maximum SSAP scores. The sequence identities vary between 2 and 36%, whilst the SSAP scores are distributed between 58 and 89. The plots in Figures 2A and B
, obtained from the matrices of the sequence identities and SSAP scores, correspond to 32.1 and 41.1% of the total information content, respectively, as given by the eigen values in the PCA analyses. Both plots reveal four large clusters in this glycosidase family (Table II
, Figure 2A,B
).
|
|
The second cluster, S2, was composed of 14 sequences from six families, HF-1, 2, 5, 10, 14 and 17. The two ß-amylases (E.C. 3.2.1.2) in HF-14 were relatively separate from the rest of the S2 subgroup on the PCA plot of sequence identity (Figure 2A). By analysing the data on the third axis against the first axis of PCA analysis of sequence identities, this separation of ß-amylase from other members of S2 was even stronger (data not shown). However, in the PCA plot of the SSAP scores, the separation of the ß-amylase structures from other members in this subgroup was smaller than that derived by sequence (Figure 2B
). This means that ß-amylases and other member enzymes in the S2 group are conserved structurally, but are more distant in terms of their sequence. Xylanase (E.C. 3.2.1.8) in HF-10 is the closest to ß-amylases among the other five families by sequence and structural criteria (Figure 2
). Within the S2 subgroup, sequence identities range from 5 to 33%, whilst the SSAP scores range between 58.7 and 88.9.
The third cluster, S3, comprised only the enzymes from HF-18 and non-enzyme proteins, narbonin and concanavalin B (Hennig et al., 1992, 1995
). Furthermore, within this family, chitinase (E.C. 3.2.1.14) was relatively separate from the other member structures on the PCA plot of SSAP scores (Figure 2B
), although this separation was not observed in the sequence cluster (Figure 2A
). The structural differences in chitinase are due to long insertions in several loops (data not shown). Within the S3 group, sequence identities range from 10 to 33%, whilst the SSAP scores range between 75.9 and 88.6, which means this is the tightest cluster.
The fourth cluster, S4, was composed of chitobiase from HF-20. This cluster was separate from all other clusters and positioned midway between the second and third clusters (S2 and S3). Only when the data was extended to the third axis against the first axis of the PCA analysis of sequence identities, chitobiase, S4, was overlapped with narbonin in the S3 cluster, which means the S4 cluster might be distantly related to the S3 cluster (data not shown).
These results indicate that there are at least four different structurally-related groups within the glycosidase TIM barrels.
Summary of functional subgroups of glycosidase TIM barrel family
The glycosidases have been grouped into at least six functional subgroups based on the positions of the catalytic residues on the barrel structures and knowledge of their catalytic mechanisms, which will be described below in detail. These subgroups are summarized in Figure 3a and Table II
.
|
The first functional subgroup, F1, is composed of S1 members, described above, all of which are member enzymes of HF-13, which has the retention mechanism (aa) (Henrissat and Davies, 1997
). All of them have the catalytic two acidic residues on the C-termini of ß-4 and ß-5, and another acidic residue on the C-terminal loop of ß-7 (Figure 3a
). This functional group can be called either the 4/5 superfamily or the 4/5/7 superfamily in the fashion of the 4/7 superfamily. The substrates of this subgroup range from starch (E.C. 3.2.1.1, 2.4.1.19), glycogen (E.C. 3.2.1.1, 3.2.1.68), dextrin, isomaltose, paratinose (E.C. 3.2.1.10), amylaceous polysaccharides (E.C. 3.2.1.60), maltodextrin (E.C. 2.4.1.19), amylopectin, ß-limit dextrin (E.C. 3.2.1.68), to pullulan (E.C. 3.2.1.135), although the TIM barrel fold is relatively conserved. In the glycosidase structures from HF-13, a distinct domain (40250 amino acids) that protrudes from the catalytic TIM barrel fold between the ß-3 strand and
-3 helix varies with enzyme specificity (Janecek et al., 1997
). Indeed, the substrates interact with this small domain (data not shown). The broad specificity seems to be due to the protruding domain, rather than the conserved TIM barrel fold itself.
The second structural cluster, S2, is divided into two functional subgroups, F2 and F3. The second functional subgroup, F2, is composed of S2 cluster members in the PCA analyses from five Henrissat families, HF-1, 2, 5, 10 and 17, which have got the retention mechanism (ee) (Henrissat and Davies, 1997
). All of them have the two catalytic acidic residues on the C-termini of ß-4 and ß7 (Figure 3a
). This functional group has been dubbed the 4/7 superfamily (Jenkins et al., 1995
). The substrate of the functional subgroup, F2, are from cellulose (E.C. 3.2.1.4, 3.2.1.91), lichenin and ß-D-glucan (E.C. 3.2.1.4, 3.2.1.73), cellotetraose (E.C. 3.2.1.91), xylan (E.C. 3.2.1.91), ß-D-glucoside, ß-D-galactoside,
-L-arabinoside and ß-D-fucoside (E.C. 3.2.1.21, 3.2.1.23), ß-D-xyloside (E.C. 3.2.1.21), ß-D-glucuronoside (E.C. 3.2.1.31), 1,3-ß-D-glucan (E.C. 3.2.1.39), to 6-phospho-ß-D-galactoside, phospho-ß-D-glucoside (E.C. 3.2.1.85) and thioglucoside (E.C. 3.2.3.1), which are very diverse.
The third functional subgroup, F3, is composed of ß-amylase in the S2 cluster from HF-14, whose mechanism is inversion (ae) (Henrissat and Davies, 1997
). All of them have the two catalytic acidic residues on the C-termini of ß-4 and ß-7 like the other original 4/7 superfamily (Figure 3a
). Therefore, this subgroup also can be called the 4/7 superfamily. The substrates of this subgroup are polysaccharide, starch and glycogen (E.C. 3.2.1.2).
The third structural cluster, S3, is divided into two functional subgroups, F4 and F5. The fourth functional subgroup, F4, is composed of chitinase in the S3 cluster from HF-18, which has retention mechanism (ee) (Henrissat and Davies, 1997
) This enzyme has the two catalytic acidic residues on the C-termini of ß-4 and ß-6 (Figure 3a
). The substrates of the F4 subgroup are chitin and chitodextrins, containing the N-acetyl-glucosaminide residues.
The fifth functional subgroup, F5, is composed of havamine and endo-ß-N-acetylglucosamidase in the S3 cluster from HF-18, which has the retention mechanism (ee) (Henrissat and Davies, 1997
). Although the enzymes have the two acidic residues on the C-terminus of ß-4, only one is catalytic as a general acid (Figure 3a
). The second functional group is provided by the substrate. The substrates of the F5 subgroup ranges from chitin (E.C. 3.2.1.14) and chitodextrins (E.C. 3.2.1.14, 3.2.1.17), peptidoglycan (N-acetyl-D-glucosamine) (E.C. 3.2.1.17) to high-mannose glycopeptides and glycoproteins containing the [Man(GlcNAc)2]Asn-structure (E.C. 3.2.1.96), all of which contain the N-acetyl-glucosaminide residues.
The sixth functional subgroup, F6, is composed of chitobiase of the S4 cluster in the PCA plots from HF-20, which has also the retention mechanism (ee) (Henrissat and Davies, 1997
). This group has the two catalytic acidic residues on the C-termini of ß-4 and ß-8 (Figure 3a
). Therefore, this subgroup can be called the 4/8 superfamily in the fashion of the 4/7 superfamily. The substrates of this enzyme are the terminal non-reducing N-acetyl-D-hexosamine residues in N-acetyl-ß-D-hexosaminides, which also have an N-acetyl group.
As the only common feature to the six functional subgroups is that they have got a catalytic acid residue on the C-terminus of the ß-4 strand, a CORA alignment was performed in order to find whether the catalytic residues superposed in the three-dimensional structures. CORA analyses can identify the consensus positions and the most conserved structural characteristics by multiply aligning tertiary structures of proteins (Orengo, 1999). The method is based on the SSAP program and uses residue environments to identify equivalences. In concept it is similar to the `iterative' PSI-BLAST program, but works on structural alignments rather than sequence alignments. However, the catalytic aspartic acid on the ß-4 of
-amylase of the F1 subgroup could not be aligned with any other sequences of subgroups (Figure 3b
). Notwithstanding the insertion of a small helix, the catalytic residue on ß-4 of the F3 subgroup, ß-amylase, could be aligned automatically with the catalytic residue of the F2 subgroup. Surprisingly, the catalytic residues from the F4 subgroup [chitinase (1ctn)], the F5 subgroup [hevamine (2hvm)] and the F6 subgroup [chitobiase (1qba)] could all be aligned together (Figure 3b
). Considering that the substrates of all the subgroups are an N-acetylated sugar molecule, they might be distantly related to each other. (The whole CORA alignment can be found at http://www.biochem.ucl.ac.uk/bsm/barrel/glyco/).
Analysis of catalytic residues by three-dimensional templatesfunctional analysis
By using three-dimensional templates, the relative disposition of catalytic residues in the different families was analysed. To cover all the glycosylase enzymes, excluding the non-enzymes, narbonin and concanavalin B, at least nine TESS templates were required. These templates are summarized in Table III and Figure 4
. For the two subgroups, S1 and S4, described above, only one template was required to cover all the members (Table III
). In contrast, for the other two subgroups, S2 and S3, four and three templates, respectively, were required to cover all the active sites (Table III
). In order to analyse the specificity and sensitivity of the templates, they were scanned against the whole PDB to find if they matched other proteins which are not glycosidases. Such false hits would mean that the templates were not sufficiently specific. In practise, all templates proved very specific, even between glycosidases. Histograms of the true and false hits, which result from scanning the PDB database using the TESS templates from two functional subgroups (F1, F2), are shown in Figure 4A and B
. Here, false hits are matches found to clusters of residues in non-glycosidases or to sites which do not correspond to the active sites within the glycosidases. The template should find all (or most) of the active sites (true hits) with a low RMSD and any additional random matches (false hits) should be found at higher RMSD values with a clear separation between true and false matches. The RMSD, which separates the true hits from false hits, is defined as a cut-off threshold. In practise, a template may miss some true hits (RMSD > threshold), or find some false hits (RMSD < threshold, but an incorrect match). The template with few if any false positives will be considered very specific. Meanwhile, the template with very few false negatives (missed hits) will be considered highly sensitive. Due to the different specificity and sensitivity, the threshold and the number of false positives and false negatives varied for each template (Figure 4A
, B).
|
|
Generally, in retaining enzymes, the two carboxyl groups are close together (4.55.5 Å), whilst, in inverting enzymes, they are further apart (99.5 Å), which allows the insertion of a water molecule (McCarter and Withers, 1994; Davies and Henrissat, 1995
). However, there are some exceptions in the glycosidase TIM barrel superfamily. In spite of having the retaining mechanism, the distances are even larger than the average in chitinase (S3 structural subgroup, F4 functional subgroup) from HF-18 (8.8 Å) and chitobiase (S4-F6 subgroup) from HF-20 (9.5 Å) (see Figure 4C
). In addition, in the case of ß-amylase (S2-F3 subgroup) from HF-14, the distance is slightly shorter than the average observed in the inverting mechanism (7.6 Å) (see Figure 4C
). In the case of chitobiase (template-9), the retention mechanism is very different from many retaining glycosidases, described in detail below, which explains the longer distance between the two carboxyl carbons.
The active site structures hit by template-1 comprise at least five types of glycosidases, as defined by the E.C. numbers and one glycosyltransferase (cyclodextrin glycosyltransferase) all from the S1 or F1 subgroup, which corresponds to HF-13 (Table III). The active sites have three conserved acidic residues, an aspartic acid on the C-terminus of ß-4, a glutamic acid on the C-terminus of ß-5 and an aspartic acid on loop-7, as well as some other active-site residues (see Table III
, Figure 3a
). The aspartic acid on the C-terminus of ß-4 is the nucleophile, whilst the glutamic acid on the C-terminus of ß-5 acts as a general acid (Uitdehaag et al., 1999
) (see Figure 3a
). To make the template specific, four more key residues, which are conserved in the CORA structural alignment and located at the active site, had also to be included in template-1 (Table III
). These four residues are also essential for catalysis. Tyr250 and His297 (of 1bf2) seem to be involved in substrate binding, whilst Arg373 can interact with the highly conserved water molecule in the HF-13 enzymes, which lies at the centre of the three catalytic acidic residues and seems to be essential for hydrolysis (Watanabe et al., 1997
; Katsuya et al., 1998
). Asp292 is involved in bridging between Arg373 and Tyr250, thereby making the orientation of the side chain of Tyr250 favourable for interaction with the substrate sugar residue (Katsuya et al., 1998
). There are only three false hits in the histogram, all of which occur in the enzymes in HF-13 [chains A and B of
-amylase II (RMSD 2.35 and 3.87 Å, respectively) and cyclodextrin glucanotransferase (RMSD 3.35 Å); see Figure 4A
]. For these three false hits, only one residue is incorrectly identified, as the same types of amino acids are located near the active-site residues. In addition, a true hit with a high RMSD (3.63 Å) was also found as chain B of
-amylase II, whose active site has different positions and orientations of the tyrosine and aspartic acid on loop-7 from template-1 (Figure 4a
). The template did not hit any other proteins in the PDB. Therefore, this template is considered very specific and sensitive to the S1-F1 (HF-13) glycosidases.
In the second structural subgroup, S2, four templates were necessary to cover all the active sites (Table III). The glutamic acid on the C-terminus of ß-4 acts as a general acid/base, whereas the one on ß-7 is either a stabilizer of the carbanium ion or a nucleophile (White et al., 1996
; Notenboom et al., 1998
; Figure 3a
). Using CORA to generate the multiple alignment of the three-dimensional structures, the two glutamic acid residues on the C-termini of ß-4 and ß-7 were well-aligned throughout the structures of HF-1, 2, 5, 10 and 17 (F2 subgroup). This result is consistent with the report on the positions of the catalytic residues of the 4/7 superfamily (Jenkins et al., 1995
). Template-2 has two conserved acidic residues, a glutamic acid on the C-terminus of ß-4 and another glutamic acid on ß-7. The template used these two residues, plus an asparagine adjacent to the first glutamic acid on the C-terminus of ß-4. This asparagine is conserved throughout the 4/7-superfamilies (HF-1, 2, 5, 10 and 17) (Jenkins et al., 1995
). Template-2 showed two peaks of true hits, one with an RMSD lower than 2.0 Å (peak position; 1.0~1.1) and a broader one with an RMSD higher than 2.7 Å (see Figure 4B
). The former peak with lower RMSD contains only the glycosidases from HF-1, 2, 5 and 17, whilst the latter with higher RMSD is mostly composed of the HF-10 enzymes. Although a few enzymes from HF-1, 2 and 17 are found among the true hits with higher RMSD than 2.0 Å, most true hits had a lower RMSD. In contrast, all of the true hits from HF-10 had larger RMSD than 2.7 Å. As the second peak with the true hits from HF-10 is buried among other false hits, the active sites of the HF-10 enzymes are practically difficult to identify with this template. Hence, this template can only identify HF-1, 2, 5 and 17, but not HF-10. The false hits, which lie around RMSD 2.0~2.6, are structures from other enzymes, such as anionic trypsin, asparaginase, glucose-specific phosphotransferase and acyl-CoA dehydrogenase, whose folds are not TIM barrels. The residues matched by the template in these proteins were not involved in their active sites. Two of the three residues in the false positive hits tend to be close together on the sequence.
For the active sites from HF-10, which also have two glutamic acids on the C-termini of ß-4 and ß-7, template-3 was created to hit them specifically, without including the asparagine on the C-terminus of ß-4 (Table III). Instead, a histidine residue on ß-6 (H723 of 1xyz), which is specific to HF-10, was included in the template (Table III
). This histidine is likely to be involved in stabilization of the carboxylate of the nucleophilic glutamic acid on the C-terminus of ß-7 and ionization of the nucleophile during the catalysis (Notenboom et al., 1998
). The CORA alignment indicates that this histidine was changed into tyrosine in HF-1, 2, 5 and 17, which makes this template-3 specific to HF-10. Furthermore, by superposing the first glutamic residue of template-2 (from 1bhg; HF-2) and template-3 (from 1xyz; HF-10), the C
atoms of the second glutamic residues from each template were separated by 3.9 Å from each other (Table III
, b). Considering the results of the TESS and CORA analyses, the side chains of the two glutamic residues are oriented differently between HF-1, 2, 5, 17 and 10, although the relative position of these residues is similar to each other.
Thioglucosidase (myrosinase) (E.C. 3.2.3.1), which hydrolyses S-glycosyl compounds, instead of O-glycosyl compounds, is a natural mutant of ß-glucosidase (E.C. 3.2.1.21) from HF-1. The glutamic acid on the C-terminus of ß-4 is mutated to a glutamine residue in myrosinase (Burmeister et al., 1997). Therefore, in order to hit the active site of myrosinase, another template, template-4, was created, where the glutamine residue was included (Table III
). The separation of true and false hits is very clear for template-4 (data not shown). This enzyme indicates that the exchange of the acidic residue to a residue with a carboxyl amide side chain (asparagine or glutamine) might lead to an engineered enzyme, which can cleave S-glycosyl compounds.
In another member of the S2 structural group, [ß-amylase (E.C. 3.2.1.2) from HF-14] loop-4 contains a small helix with a catalytic glutamic acid residue. In this F3 functional subgroup, the second glutamic acid on the C-terminus of ß-7 is hydrogen-bonded to a water, which, in turn, attacks the anomeric carbon on the substrate, reversing the configuration of the substrate (Mikami et al., 1994; Totsuka and Fukazawa, 1996
). This enzyme takes the inverting mechanism (a
e) (Table II
; Henrissat and Davies, 1997
). Although ß-amylase from HF-14 also has two glutamic acid residues on the C-termini of ß-4 and ß-7, neither template-2 nor 3 could hit this active site. Therefore, using two other conserved residues in the active site (Thr342 and Asn381 of 1byb) as well as the two catalytic glutamic acids, template-5 was created to model the active sites in ß-amylase (Table III
). The two additional residues (Thr342 and Asn381), as well as the glutamic acid on the C-terminus of ß-7, are hydrogen-bonded to a water molecule, which may attack the anomeric carbon (Mikami et al., 1994
; Totsuka and Fukazawa, 1996
). The separation of true and false hits is very clear for template-5 (data not shown). By superposing the side chain atoms of the first glutamic residues from template-3 (1xyz; HF-10) and template-5 (1byb; HF-14), the C
atoms of the second glutamic residue from both templates were close to each other (1.5 Å), although the corresponding atoms from template-2 (1bhg; HF-2) and template-5 were separate by 4.5 Å (Table III
, b). In contrast, the CORA alignments indicate that only the glutamic acid on the C-terminus of ß-4 in ß-amylase (HF-14) can be aligned with the other catalytic glutamic acid residues on the C-terminus of ß-4 from HF-1, 2, 5, 10, 17, whilst the second glutamic acid on the C-terminus of ß-7 was shifted by one residue (data not shown). Considering both the CORA and TESS analyses, HF-10 seems to be intermediate between enzymes from four families HF-1, 2, 5, 17 and ß-amylase from HF-14. Indeed, in the PCA analyses, HF-10 appears closer to the ß-amylases than the other S2 proteins (see Figure 2
).
In the third structural cluster, S3, three different templates had to be created to cover the active sites excluding non-enzymes, narbonin and concanavalin B (Table III, a). Chitinase, hevamine and endo-ß-N-acetylglucosamidase have two acidic residues, an aspartic acid and a glutamic acid separated by one residue, on the C-terminus of ß-4. The side chains of the two residues are hydrogen-bonded to each other in the structures of hevamine and endo-ß-N-acetylglucosamidase, although neither narbonin nor concanavalin B have such an acidic-residue pair, as one of these residues has changed, leading to elimination of enzymatic activity (Terwisscha van Scheltinga et al., 1996
). However, in the case of chitinase, the corresponding aspartic acid points down to form a hydrogen bond with a different residue instead of pointing up to the catalytic glutamic acid residue (Terwisscha van Scheltinga et al., 1996
; see Figure 4C
). The distances between the carbon atoms of carboxyl groups of the two residues indicated in Figure 4C
also differ by 2 Å. This leads to the need for a separate template for chitinase, distinct from other members of HF-18 (Table III
, a). Instead, an aspartic acid on the C-terminus of ß-6 (Asp391 of 1ctn) was proposed to have a negative charge, which would allow it to stabilize the carboxy-anion intermediate (Perrakis et al., 1994
). Based on this fact, template-6 was created (Table III
, a). Although the aspartic acid on the C-terminus of ß-6 is well conserved in bacterial chitinases from HF-18 (Perrakis et al., 1994
), this residue could not be found in hevamine, endo-ß-N-acetylglucosamidase and narbonin, using a CORA alignment. Therefore, template-6 is very specific to chitinase enzymes.
In the case of hevamine, the second acidic residue (Glu127 of 2hvm), on the C-terminus of ß-4 is thought to be the proton donor. Although the aspartic acid (Asp125 of 2hvm) is conserved, it seems to be too far away from the substrate to be directly involved in catalysis (Terwisscha van Scheltinga et al., 1994). Instead, the N-acetyl group of the substrate was proposed to stabilize the positively charged intermediate, where a covalent bond is formed between the carbonyl oxygen of the N-acetyl group and the axial position of C1 atom of the glucose residue (Terwisscha van Scheltinga et al., 1995
). The O
1 atom of the aspartic acid (Asp125 of 2hvm) is close enough to interact with the oxygen atom of the oxazoline group formed from the N-acetyl group (Terwisscha van Scheltinga et al., 1995
). In contrast, for endo-ß-N-acetylglucosamidase, although the catalytic mechanism is not clear, the first acidic residue seems to stabilize the intermediate (Rao et al., 1999
). The CORA alignment showed that other residues in these active sites were not conserved between hevamine and endo-ß-N-acetylglucosamidase, so two separate templates were created (template-7 and 8; Table III
, a). Although many regions in HF-18 are highly conserved, only a few residues are fully conserved (Terwisscha van Scheltinga et al., 1996
), leading to the requirement for separate templates. For the three templates in the S3 group, the separation of the true and false hits is very clear, although template-7 had one true hit with larger RMSD than the cut-off (data not shown).
In the fourth structural cluster, S4, a single template (template-9) was created based on the only member enzyme, chitobiase from HF-20 (Table III, a). Chitobiase has a glutamic acid on the C-terminus of ß-4, which functions as an acid, and another glutamic acid on the C-terminus of ß-8 (Tews et al., 1996
). However, the second acidic residue does not function as either a nucleophile or an intermediate stabilizer (Tews et al., 1996
). As in hevamine, the N-acetyl group of the substrate replaces the nucleophilic base or intermediate stabilizer. Instead, the second acidic residue on the C-terminus of ß-8 is located close to the water molecule, which attacks the C1 atom of the substrate to complete the hydrolysis (Tews et al., 1996
). An additional residue in the active site, Arg349, had to be included in template-9 to make it specific. This arginine plays a vital role in anchoring the substrate, N-acetyl-glucosamine (Tews et al., 1996
). The separation of true and false hits is very clear for template-9 (data not shown).
PSI-BLAST sequence analysis beyond the structural subgroups
Using PSI-BLAST, the relationship beyond the functional subgroups was further analysed. Sequence profiles derived from the two closely related subgroups, F4 (chitinase) and F5 (lysozyme, hevamine), crosshit each other with low expectation-values (E < 2e16). Similarly, the profiles derived from endoglucanase (1edg; E.C. 3.2.1.4; HF-5) and ß-mannanase (1bqcA; E.C. 3.2.1.78; HF-5) in the F2 (endoglucanase) subgroup crosshit ß-amylase (1b9z; E.C. 3.2.1.2; HF-14) in the F3 subgroup, although the E values were higher than the lenient threshold (E values <0.001). However, in reverse, the F3 profile did not hit any member of the F2 subgroup.
Moreover, PSI-BLAST detected relationships beyond the structural subgroups (S1S4). Cellulase (1ceo, 1eceA; E.C. 3.2.1.4) in the F2 subgroup crosshits various members of F1 (-amylase) with significantly low E values (E < 1e6), although no profiles derived from the F1 subgroup could hit any member of the F2 subgroup.
The relationships beyond the subgroups are summarized in Figure 5. Considering that the substrates of F4, F5 and F6 subgroups are N-acetylated sugar molecules, and that the catalytic residues on ß-4 could be aligned together, the three functional subgroups could be clustered together using the structural data. Looking at the schematic view of these subgroups, the F1 subgroup has also got the third acidic residue on the C-terminal loop of ß-7 that could correspond to the second acidic residue on the C-terminus of ß-7 in the subgroups, F2 and F3 (Figure 3a
). However, the corresponding residues were not aligned, even in the CORA structure-based alignment.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Two questions remain about the evolutionary relationships in the glycosidases. Firstly, are the four structural groups evolutionarily related? Although the first two groups (S1 and S2) appear related, according to PSI-BLAST, other relationships are still ambiguous (Figure 5). The maximum sequence identities between the structural groups S1S4 are only between 8 and 12%, whilst the maximum SSAP scores range between 72 and 74, which are not high enough to validate the evolutionary relationships. Some SSAP scores are much lower. There is only one common feature in the six functional subgroups (F1F6). All the subgroups have got the first catalytic acidic residue either on the C-terminus of the fourth strand (ß-4) or on the fourth C-terminal loop (loop-4) (see Figure 3a
). However, they could not all be aligned, even in the multiple structural alignments generated by CORA.
Secondly, if they are related, which subgroup is closest to the ancestral glycosidase? Among the six functional subgroups, only one subgroup, hevamine (F5) has got a single domain, whilst the remainder have multiple domains. In particular, the -amylase (F1) and chitinase (F4) subgroups have got a relatively large domain inserted in the TIM barrel (Figure 3
). There must have been some domain shuffling for the enzymes to obtain their specific functions during protein evolution. In addition, the sequence identities and structural similarities (SSAP scores) are very low between the subgroups, and ß-amylase shows higher structural similarities to three other TIM barrel proteins, pyruvate kinase (SSAP 79.6; 215 residues overlapped) and two phosphate-binding enzymes, phosphoribosyl anthranilate isomerase (SSAP 78.8; 197 residues overlapped) and quinolinic acid phosphoribosyltransferase (SSAP 78.7; 147 residues overlapped). Thus, even with all the data considered herein, it is difficult to build a reliable tree to reveal the evolutionary relationships between these proteins and the characteristics of their ancestor.
![]() |
Notes |
---|
4 To whom correspondence should be addressed.E-mail: thornton{at}biochem.ucl.ac.uk
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Burmeister,W.P., Cottaz,S., Driguez,H., Iori,R., Palmieri,S. and Henrissat,B. (1997) Structure, 5, 663675.[ISI][Medline]
Davies,G. and Henrissat,B. (1995) Structure, 3, 853859.[ISI][Medline]
Farber,G.K. and Petsko G.K. (1990) Trends Biochem. Sci., 15, 228234.[ISI][Medline]
Hennig,M., Schlesier,B., Dauter,Z., Pfeffer,S., Betzel,C., Hohne,W.E. and Wilson,K.S. (1992) FEBS Lett., 306, 8084.[ISI][Medline]
Hennig,M., Jansonius,J.N., Terwisscha van Scheltinga,A.C., Dijkstra,B.W. and Schlesier,B. (1995) J. Mol. Biol., 254, 237246.[ISI][Medline]
Henrissat,B. (1991) Biochem. J., 280, 309316.[ISI][Medline]
Henrissat,B. and Bairoch,A. (1993) Biochem. J., 293, 781788.[ISI][Medline]
Henrissat,B. and Bairoch,A. (1996) Biochem. J., 316, 695696.[ISI][Medline]
Henrissat,B. and Davies,G. (1997) Curr. Opin. Struct. Biol., 7, 637644.[ISI][Medline]
Janecek,S., Svensson,B. and Henrissat,B. (1997) J. Mol. Evol., 45, 322331.[ISI][Medline]
Jenkins,J., Lo Leggio,L., Harris,G. and Pickersgill,R. (1995) FEBS Lett., 362, 281285.[ISI][Medline]
Jones,S., Stewart,M., Michie,A., Swindells,M.B., Orengo,C. and Thornton,J.M. (1998) Protein Sci., 7, 233242.
Katsuya,Y., Mezaki,Y., Kubota,M. and Matsuura,Y. (1998) J. Mol. Biol., 281, 885897.[ISI][Medline]
Kleywegt,G.J. (1999) J. Mol. Biol., 285, 18871897.[ISI][Medline]
McCarter,J.D. and Withers,S.G. (1994) Curr. Opin. Struct. Biol., 4, 885892.[ISI][Medline]
Mikami,B., Degano,M., Hehre,E.J. and Sacchettini,J.C. (1994) Biochemistry, 33, 77797787.[ISI][Medline]
Nagano,N., Hutchinson,E.G. and Thornton,J.M. (1999) Protein Sci., 8, 20722084.[Abstract]
Needleman,S.B. and Wunsch,C.D. (1970) J. Mol. Biol., 48, 443453.[ISI][Medline]
NC-IUBMB (1992) In The Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (ed.), Enzyme Nomenclature, 1992. Academic Press, New York.
Notenboom,V., Birsan,C., Nitz,M., Rose,D.R., Warren,R.A. and Withers,S.G. (1998) Nat. Struct. Biol., 5, 812818.[ISI][Medline]
Orengo,C.A. (1999) Protein Sci., 8, 699715.[Abstract]
Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5,10931108.
Pearl,F.M.G., Lee,D., Bray,J.E., Sillitoe,I., Todd,A.E., Harrison,A.P., Thornton,J.M. and Orengo,C.A. (2000) Nucleic Acids Res., 28, 277282.
Perrakis,A., Tews,I., Dauter,Z., Oppenheim,A.B., Chet,I., Wilson,K.S. and Vorgias,C.E. (1994) Structure, 2, 11691180.[ISI][Medline]
Rao,V., Cui,T., Guan,C. and Van Roey,P. (1999) Protein Sci., 8, 23382346.
Taylor,W.R. and Orengo,C.A. (1989) J. Mol. Biol., 208, 122.[ISI][Medline]
Terwisscha van Scheltinga,A.C., Kalk,K.H., Beintema,J.J. and Dijkstra,B.W. (1994) Structure, 2, 11811189.[ISI][Medline]
Terwisscha van Scheltinga,A.C., Armand,S., Kalk,K.H., Isogai,A., Henrissat,B. and Dijkstra,B.W. (1995) Biochemistry, 34, 1561915623.[ISI][Medline]
Terwisscha van Scheltinga,A.C., Hennig,M. and Dijkstra,B.W. (1996) J. Mol. Biol., 262, 243257.[ISI][Medline]
Tews,I., Perrakis,A., Oppenheim,A., Dauter,Z., Wilson,K.S. and Vorgias,C.E. (1996) Nat. Struct. Biol., 3, 638648.[ISI][Medline]
Todd,A.E., Orengo,C.A. and Thornton,J.M. (2001) J. Mol. Biol., 307, 11131143.[ISI][Medline]
Totsuka,A. and Fukazawa,C. (1996) Eur. J. Biochem., 240, 655659.[Abstract]
Uitdehaag,J.C., Mosi,R., Kalk,K.H., van der Veen,B.A., Dijkhuizen,L., Withers,S.G. and Dijkstra,B.W. (1999) Nat. Struct. Biol., 6, 432436.[ISI][Medline]
Wallace,A.C., Borkakoti,N. and Thornton,J.M. (1997) Protein Sci., 6, 23082323.
Watanabe,K., Hata,Y., Kizaki,H., Katsube,Y. and Suzuki,Y. (1997) J. Mol. Biol., 269, 142153.[ISI][Medline]
White,A., Tull,D., Johns,K., Withers,S.G. and Rose,D.R. (1996) Nat. Struct. Biol., 3, 149154.[ISI][Medline]
Received April 17, 2001; revised July 24, 2001; accepted July 31, 2001.