Fold recognition study of [alpha]3-galactosyltransferase and molecular modeling of the nucleotide sugar-binding domain

Anne Imberty5, Cédric Monier, Emmanuel Bettler, Solange Morera1, Paul Freemont2, Manfred Sippl3, Hannes Flöckner3, Wolfgang Rüger4 and Christelle Breton

Centre de Recherches sur les Macromolécules Végétales (associated with Université Joseph Fourier), CNRS, BP 53, F-38041 Grenoble Cedex 9, France, 1Laboratoire d'Enzymologie et de Biologie Structurales, UPR 9063 CNRS, 91198 Gif-sur-Yvette Cedex, France, 2Molecular Structure and Function Laboratory, Imperial Cancer Research Fund, PO Box 123, London WC2A 3PX, UK, 3Center of Applied Molecular Engineering, University of Salzburg, Jakob-Haringerstrasse 1, A-5020 Salzburg, Austria and 4Arbeitsgruppe Molekulare Genetik, Fakultät für Biologie, Ruhr Universität, Bochum, Germany

Received on November 6, 1998; revised on December 14, 1998; accepted on December 17, 1998

The structure and fold of the enzyme responsible for the biosynthesis of the xenotransplantation antigen, namely pig [alpha]3 galactosyltransferase, has been studied by means of computational methods. Secondary structure predictions indicated that [alpha]3-galactosyltransferase and related protein family members, including blood group A and B transferases and Forssman synthase, are likely to consist of alternating [alpha]-helices and [beta]-strands. Fold recognition studies predicted that [alpha]3-galactosyltransferase shares the same fold as the T4 phage DNA-modifying enzyme [beta]-glucosyltransferase. This latter enzyme displays a strong structural resemblance with the core of glycogen phosphorylase b. By using the three-dimensional structure of [beta]-glucosyltransferase and of several glycogen phosphorylases, the nucleotide binding domain of pig [alpha]3-galactosyltransferase was built by knowledge-based methods. Both the UDP-galactose ligand and a divalent cation were included in the model during the refinement procedure. The final three-dimensional model is in agreement with our present knowledge of the biochemistry and mechanism of [alpha]3-galactosyltransferases.

Key words: biosynthesis/xenotransplantation/[alpha]3-galactosyltransferase/[beta]-glucosyltransferase

Introduction

Xenotransplantation is now considered as the main potential remedy for the shortage of donor organs (see Nature, special issue 391, 1998). However, when a pig organ is transplanted in humans, a hyperacute vascular rejection of the graft occurs because human preformed antibodies recognize [alpha]Gal(1-3)[beta]Gal terminal carbohydrates present on porcine endothelial cells (Galili, 1991; Cooper et al., 1994). This so-called xeno-antigen is present on the cells of most mammals with the exception of humans and Old World monkeys. The UDP-Gal:[beta]1-4GlcNAc-R [alpha]3-galactosyltransferase ([alpha]3-GalT, EC 2.4.1.151) is the enzyme responsible for the formation of the [alpha]-Gal epitope and is therefore the subject of high interest in the field of xenotransplantation. Having access to the 3D structure of this enzyme would open new routes for the design of inhibitors, that could act in vitro or in vivo. It will eventually also allow for engineering the enzyme in the aim of producing xeno-oligosaccharides or related ones. Recently, recombinant bovine [alpha]3-GalT has been used for chemoenzymatic synthesis of the [alpha]-Gal epitope (Joziasse et al., 1990; Fang et al., 1998). These oligosaccharides can be immobilized on an affinity column for depleting the anti-Gal antibodies of the recipient (Taniguchi et al., 1996).

Several mammalian [alpha]3-GalT genes have been cloned, i.e., from pig (Strahan et al., 1995), cow (Joziasse et al., 1989), mouse (Joziasse et al., 1992), and marmoset (Henion et al., 1994). As all the other Golgi-resident eukaryotic glycosyltransferases, these enzymes are type-II membrane proteins consisting in a short N-terminal cytosolic tail, a transmembrane region, a stem, and a C-terminal catalytic domain. Amino acid sequence similarities have been found between the [alpha]3-GalTs and others [alpha]3-GalNAc- and [alpha]3-Galtransferases that use different oligosaccharide acceptors (Yamamoto et al., 1990; see Table I). Blood group A and B transferases, which makes [alpha]3-GalNAc- and [alpha]3-Gal linkages, respectively, require fucosylated N-acetyllactosamine as acceptor, whereas the Forssman synthase (Haslam and Baenziger, 1996) requires [beta]-GalNAc. More distant amino acid sequences homologies have been evidenced with [beta]4-GalTs (Joziasse et al., 1989; Breton et al., 1998).

In the absence of crystal structure of glycosyltransferases, the aim of the present work is to extract structural information from the amino acid sequences. Since [alpha]3-GalTs are not homologous to any proteins of known 3D-structures, the first step is to search for proteins that could share the same fold. Fold recognition is a theoretical approach which allows the alignment of one sequence with one structure by a process referred to as "threading" (Lemer et al., 1995). In practice, a library of known 3D structures is searched to determine the folds that gives the best alignments with the sequence of interest. Threading the sequence against all possible folds, and then sorting and ranking the possible solutions form the three steps of a fold recognition study. Several programs are available to carry out such a process and can be classified into two families based on their algorithms: the prediction-based methods align the predicted secondary structure of the searched sequence with the secondary structure elements from known crystal structures (Rost et al., 1997), whereas potential-based methods use mean force potentials derived from a database of known structures (Jones and Thornton, 1996; Vajda et al., 1997; Rooman and Gilis, 1998). Recently, such approaches were successfully used to predict the fold of the C-terminal lectin-like domain of polypeptide-GalNAc transferases (Imberty et al., 1997). Here we have applied these methods to the pig [alpha]3-GalT sequence, the enzyme responsible for the biosynthesis of the xeno-antigen, and have used homology modeling to build the nucleotide-binding domain of this protein.

Table I. Protein sequences with [alpha]3-galactosyltransferase or related activity; accession number refers to GenBank or Swissprot (unless indicated)
Abbreviation Name Origin aab Product Accession number
Pig [alpha]-GalT [alpha]-Galactosyltransferase Sus scrofa 371 [alpha]Gal(1-3)[beta]Gal(1-4)[beta]GlcNAc-R L36152
          L36535 (clone pPGT-3)
Bov [alpha]-GalT [alpha]-Galactosyltransferase Bos taurus 368 [alpha]Gal(1-3)[beta]Gal(1-4)[beta]GlcNAc-R J04989
Mar [alpha]-GalT [alpha]-Galactosyltransferase Callithrix sp. 376 [alpha]Gal(1-3)[beta]Gal(1-4)[beta]GlcNAc-R S71333
Mou [alpha]-GalT [alpha]-Galactosyltransferase Mus musculus 368 [alpha]Gal(1-3)[beta]Gal(1-4)[beta]GlcNAc-R M85153
          M26925
Pig bgA [Agr] Transferase Sus scrofa 364 [alpha]GalNAc(1-3)[[alpha]Fuc(1-2][beta]Gal-R AF050177 (hypothetical)
Dog Forss Forssman synthase Canis familiaris 347 [alpha]GalNAc(1-3)[beta]GalNAc-R U66140
Hum bgAa A Transferase Homo sapiens 354 [alpha]GalNAc(1-3)[[alpha]Fuc(1-2][beta]Gal-R J05175 (A34933, P16442)
          X84746 (complete cDNA)
      338   Y11891 (synthetic)
Hum bgBa B transferase Homo sapiens 354 [alpha]Gal(1-3)[[alpha]Fuc(1-2][beta]Gal-R 1609195 (PRF databank)
      338   X91874 (synthetic)
aFor A and B transferases, partial genes with different lengths are also available for human and monkeys: AF00673, AF016622, AF016624, AF016625, PC1120, PC1165, PC1164, D82835 to D82845 for Homo sapiens, PC1168 for Gorilla gorilla, PC1171, AF052079 to AF052086 for Macaca sp., PC1166, PC1167 for Pan troglodytes, PC1169, PC1172, PC1173, AF0019416 to AF0019418 and AF001427 for Papio hamadryas and PC1169, PC1170 for Pongo pygmaeus.
bNumber of amino acids.

Table II. Percentage of identities in [alpha]3-galactosyltransferase sequences and blood group related enzymes
  Bov [alpha]-GalT Mar [alpha]-GalT Mou [alpha]-GalT Hum bgB Hum bgA Pig bgA Dog Forss
Pig [alpha]-GalT 84.1 82.4 74.8 36.9 36.8 36.1 36.3
  88.6 86.4 79.9 43.8 43.4 43.4 42.1
Bov [alpha]-GalT   82.4 73.6 37.4 37.8 36.4 36.6
    88.3 80.2 45.3 44.9 44.2 43.6
Mar [alpha]-GalT     74.8 36.4 35.8 36.4 33.8
      79.9 45.3 45.3 43.8 41.4
Mou [alpha]-GalT       39.9 39.6 38.5 36.9
        48.9 48.9 46.4 44.0
Hum bgB         96.9 62.8 42.8
          97.8 74.4 50.2
Hum bgA           63.0 43.3
            75.4 50.0
Pig bgA             44.1
              51.5
Numbers in boldface and numbers in italic indicate the percentage of identity in the whole sequence and in the conserved catalytic domain, respectively. The catalytic domain starts at Glu86 in the Pig [alpha]-GalT sequence.

Results

Sequence alignment and secondary structure prediction

Pairwise sequence alignment was performed for all [alpha]3-GalTs and blood group transferases sequences and the resulting scores are listed in Table II. Both complete and truncated sequences were considered since the transmembrane and stem regions display higher sequence variation than the catalytic domain. All the [alpha]3-GalT sequences are very similar, with identity scores ranging from 80 to 89% when the variable stem is omitted from the analysis. Blood group A and B transferases form another homogenous group, the pig bgA sequence being 75% identical with the human enzymes. The Forssman synthase is closer to the A and B transferases (about 50% identity) than to the [alpha]3-GalT (about 42% identity). Nevertheless, when looking at the catalytic domain, the identity score between all these enzymes is always better than 40%, and therefore they could be aligned with little difficulty.

Figure 1 displays the multialignment of all of these sequences. The transmembrane and stem regions were excluded from the alignment since it is known that recombinant [alpha]3-GalTs with truncated N-terminal regions are still active (Baisch et al., 1998, Fang et al., 1998). More precisely, a systematic truncation study on the marmoset [alpha]3-GalT demonstrated that the first 90 amino acids can be deleted without altering the activity (Henion et al., 1994). Human bgA is not displayed in this figure because of its very high homology with the bgB enzyme. Only four amino acids differ between bgA and bgB, i.e., Arg176Gly, Gly235Ser, Leu266Met, and Gly268Ala (Yamamoto et al., 1990), the two last ones being the more crucial in determining nucleotide-sugar specificity (Yamamoto and Hakamori, 1990; Seto et al., 1997).


Figure 1. Multiple alignment of [alpha]3-GalT and blood group transferase sequences. The numbering above refers to the pig [alpha]3-GalT sequence. The different shades of background with white text differentiate the hydrophobic, basic, acidic, and small side-chains amino acids. Pro and Gly have a black background, whereas Trp and Cys are indicated with black text on a white and gray background, respectively. Consensus sequence is given under the alignment: boldface capitals letters are used for fully conserved amino acids and numbers for conserved properties (1 for acidic, 2 for small side chain, 3 for basics, 4 for aromatic, and 5 for hydrophobic). Lowercase letters are used for amino acids with partial conservation. The secondary structure predictions from several programs are also displayed with H for [alpha]-helix and E for [beta]-strand. When a consensus is obtained, the letter is indicated in boldface.

The secondary structure predictions obtained using several programs are also displayed in Figure 1. Both [beta]-strands and [alpha]-helices are predicted to occur. The reliability of the prediction seems to be higher in the first half of the catalytic domain since the different methods used show an excellent agreement. In the second half (residues 250-359 in the pig sequence), there is some degree of variability in the predictions, depending on the method used. However, from these methods it is clearly apparent that [alpha]3-GalTs adopt an [alpha]/[beta] fold with alternating [alpha]-helices and [beta]-strands.

Fold recognition studies of [alpha]3-galactosyltransferases.

We present here the results obtained for [alpha]3-GalTs using the ProFIT program that is based on an energy-function potential (Flöckner et al., 1995, 1997; Sippl and Flöckner, 1996). Two databases of 3D-structures were created for the threading process, in addition to the default one provided with the program. The first one contains 105 3D-structures corresponding to almost all carbohydrate-interacting enzymes (glycosylhydrolases, lectins, toxins, CGTases . . .) whereas the second contains 38 structures representative of nucleotide-binding proteins. All the mammalian [alpha]3-GalTs sequences were used for the threading calculations. The transmembrane and stem regions were not considered. For each sequence, the best hits from the carbohydrate- and the nucleotide-binding proteins databanks are listed in Table III and Table IV, respectively. The class of fold and the topology are described as given in the SCOP database (Murzin et al., 1995). All but two of the predicted folds are composed of a mixture of [alpha]-helices and [beta]-strands. Most of them belong to the ([alpha]/[beta]) class, therefore consisting of alternating [alpha] and [beta] elements, a result which is in agreement with the secondary structure prediction (Figure 1). Several other fold prediction programs (see Materials and methods) were also tested, and they all support an [alpha]/[beta] fold prediction for the [alpha]3-GalTs and more precisely an [alpha]/[beta]/[alpha] domain with parallel [beta] strands.


Table III. Fold prediction for [alpha]3-GalTS sequences performed on a database of all available carbohydrate-recognizing proteins
aEquivalent to PDB code 2BGT but with coordinates for all loops (Morera and Freemont, personal communication).
bNumber of gaps needed for alignment and percentage of amino acids aligned.


Table IV. Fold prediction for [alpha]3-GalTS sequences performed on a database of representative nucleotide-recognizing proteins
aNumber of gaps needed for alignment and percentage of amino acids aligned.

When taking into account the number of hits and values of score, the final ranking of the prediction is: (1) one or two domains consisting of a [beta]-sheet with parallel strands surrounded on each side by [alpha]-helices connecting each strand (described as three layers [alpha]/[beta]/[alpha] in Table III and IV), (2) the classical ([alpha]/[beta])8 TIM-barrel consisting in a parallel [beta]-sheet closed barrel surrounded by connecting helices, and (3) other types of [alpha]/[beta]/[alpha] layers but with mixed orientations of the [beta]-strands. In the final step, when looking for the best candidate, one should also take into account the binding and catalytic function of each candidate protein. All of the nucleotide phosphorylases, synthases, and elongation factors listed in Table IV have the ability to bind nucleotide diphosphate but do not have any specificity for carbohydrate. In Table III, all proteins can bind carbohydrate but only the [beta]-glucosyltransferase from phage T4 ([beta]-GlcT) also contains a nucleotide-binding domain. This enzyme consists of two non-identical domains that adopt similar topology: a [beta]-sheet of five or six parallel strands surrounded by [alpha]-helices (Vrielink et al., 1994). The [beta]-GlcT catalyzes the transfer of glucose from UDP-glucose to hydroxymethylcytosine in modified DNA (Kornberg et al., 1961). Therefore, based on both the fold recognition study and similarity of function, [beta]-glucosyltransferase from phage T4 probably represents the best candidate for the fold of [alpha]3-GalTs.

Molecular modeling of the C-terminal domain

Since the common feature between the phage [beta]-GlcT and the mammalian [alpha]3-GalTs is the use of a UDP-sugar, it can be hypothesized that their 3D similarities will be stronger in the nucleotide-sugar binding domain. In the crystal structure of [beta]-GlcT/UDP complex, the nucleotide sugar binding domain (or at least the nucleotide binding domain) has been identified to be the classical Rossman fold corresponding to the C-terminal domain (Vrielink et al., 1994). The catalytic amino acids cannot be identified, since the mechanism is still unknown. As a consequence of the fold recognition study above, it is predicted that the C-terminal region (about 150 amino acids) of mammalian [alpha]3-GalTs will act as nucleotide-sugar domain, and this is therefore only this domain that can be built by homology methods.

Homology modeling methods can be applied in such a case. Several steps of which are detailed below. In the present case, the main problem was producing a satisfactory alignment between the targeted pig [alpha]3-GalT sequence and a sequence of known 3D-structure. The advantages and limits of threading methods are now well identified. They are able to predict the correct protein fold from a sequence with reasonable confidence but encounter difficulties in producing correct sequence alignments between the sequence of interest and the known structure (Jones and Thornton, 1996). Since no homology for [alpha]3-GalT could be detected by classical alignment methods, a method based on the comparison of hydrophobic structural motifs was used, namely the HCA method (Gaboriaud et al., 1987).

Structural homologues for the [beta]-glucosyltransferase

When the crystal structure of [beta]-GlcT was first published, it was assessed that this special topology consisting of the combination of two Rossman folds connected by an extended loop is not shared by other protein structures (Vrielink et al., 1994). Subsequently, several powerful algorithms for identifying similarities in 3D structures were developed and the entire structure of [beta]-GlcT was demonstrated to be topographically equivalent to the catalytic core of the much larger glycogen phosphorylase (GP) (Artymiuk et al., 1995; Holm and Sander, 1995). More interestingly, when looking at the crystal structure of glycogen phosphorylase complexed with pyridoxal 5[prime]-phosphate (PLP) coenzyme (Acharya et al., 1991), there is a striking similarity in the spatial arrangement of this substrate when compared with UDP in [beta]-GlcT. In most of the classifications now available for protein topologies, [beta]-GlcT from phage T4 is clustered with glycogen phosphorylase from rabbit and yeast and with the maltodextrin phosphorylase from E.coli. In the present study, we used the 3D alignment between [beta]-GlcT (2BGU) and one of the 22 rabbit glycogen phosphorylase (1GPB) proposed in the CAMPASS database (Sowdhamini et al., 1998). Despite a very low sequence identity score (10.6%), all of the secondary structure elements of the [beta]-GlcT can be aligned with those in the GP structure.

Model of the C-terminal domain of pig [alpha]-3GalT

Knowledge-based modeling methods are based on the definition of structurally-conserved regions (SCR) in a family of similar structures, that can be reproduced in the target sequence. In the structural superfamily of [beta]-GlcT/GP, four crystal structures were selected to serve as the basis of the homology modeling procedure: rabbit glycogen phosphorylase (1GPB) (Acharya et al., 1991), phosphorylated form of yeast glycogen phosphorylase (1YGP) (Lin et al., 1996), E.coli maltodextrin phosphorylase (1AHP) (O'Reilly et al., 1997), and [beta]-glucosyltransferase from phage T4, that corresponds to the published crystal structure (2BGT) (Vrielink et al., 1994) but with determination of the loops that were missing in the first crystal structure (Morera and Freemont, personal communication). The three glycogen phosphorylases selected here are complexed with pyridoxal-5[prime]-phosphate.

The HCA method was used to generate a correct alignment between the pig [alpha]3-GalT and the nucleotide-binding domain of the structures from the library (Figure 2).The HCA plot of the C-terminal domain of pig [alpha]3-Gal exhibits similarities with the ones from GP and MP and the plots could be aligned. The human B transferase was also considered in order to help in determining the most conserved motifs. The numbering of the secondary structure elements corresponds to the [beta]-GlcT crystal structure. All the [beta]-strands of the C-term domain ([beta]8 to [beta]11 ) were assigned. For the [alpha]-helices, only the last one at the C-terminal domain is predicted to be missing in the [alpha]3-GalTs sequence. The agreement with the consensus secondary structure prediction displayed in Figure 1 is good, except for the [beta]11 strand that was predicted to be a helix, but the same [beta]-strand in [beta]-GlcT was also incorrectly assigned as a helix from the secondary structure predictions.


Figure 2. HCA alignment of the C-terminal domain of [alpha]3-GalT, [beta]-GlcT, and GP. The one-letter code is used for amino acids except for Gly, Pro, Thr, and Ser, which are represented by a diamond, a star, a square, and a square with a dot, respectively. Conserved hydrophobic clusters have been shaded. The limits of the [alpha]-helices and [beta]-strands of the [beta]-GlcT and GP structures are indicated by vertical lines. In these enzymes, the amino acids directly involved in phosphate recognition are given as white text on gray backgrounds, whereas the ones involved in binding other part of the substrate have a black background. The deduced topology of [alpha]3-GalT is indicated and the secondary elements are numbered following the [beta]-GlcT structure.

A 3D model of the pig [alpha]3-GalT C-terminal domain was built using the COMPOSER program (Blundell et al., 1988). Nine structurally conserved regions (SCRs) were defined, each of them including one or two of the secondary structure elements. After local geometry optimization of the loops, the model does not display any stereochemical defects. According to the PROCHECK program (Laskowski et al., 1993), no backbone linkage lies in the disallowed region of the Ramachandran map. Figure 3 (A and B) displays the predicted model for pig [alpha]3-GalT C-terminal domain. The five [beta]-strands that make the central sheet are composed by amino acids Phe213 to Val217, Ala239 to Lys243, Asp267 to Tyr270, Phe290 to Leu294, and Tyr314 to Asn 318. They are labeled [beta]8 to [beta]12 to follow the nomenclature used in the [beta]-GlcT crystal structure (Vrielink et al., 1994). Only the last small [beta]13 strand of the [beta]-GlcT structure is not present in the pig [alpha]3-GalT model. Helices [alpha]7, [alpha]9, and [alpha]10 could be modeled without difficulties. Only the [alpha]8 helix belongs to a peptide region that was not well conserved among the four structures in the library and was therefore more difficult to model. The C-terminal region consists of three [alpha]-helices. During the first step of homology modeling, the cysteine residues 215 and 295 appeared to be close in space since they are both at the N-terminal side of adjacent strands in the [beta]-central sheet. This allowed for the addition of a disulfide bond that was therefore included in the model. These two cysteine residues appear to be conserved among all [alpha]3-GalTs and blood group transferases (see Figure 1).


Figure 3. (A) and (B), Two orthogonal views of the model structure of the pig [alpha]3-GalT nucleotide-binding domain represented using the MOLSCRIPT program (Kraulis, 1991). Numbering of the secondary structure elements follows the [beta]-GlcT crystal structure. The UDP-Gal is represented using ball and stick and the Mg2+ ion by a sphere. (C) Graphical representation of UDP-Gal in the binding site of the model domain of pig [alpha]3-GalT. The yellow lines represent coordination contacts (d < 2.5 Å) around the Mg2+ cation. Hydrogen bonds are displayed by green dotted lines.

Predicted interactions of [alpha]3-GalT with UDP-Gal

The structural equivalent of the [beta]-GlucT nucleotide binding site is an elongated pocket with [alpha]10 and [beta]8 on each side, [beta]11 and [beta]12 at the rear, and the loops between [beta]10 and [alpha]9 and [beta]11 and [alpha]10 at the bottom and top, respectively (Figure 2C). The UDP ligand was docked into this binding site, by using the orientation observed for UDP in the crystal structure of [beta]-GlcT. Two of the amino acids that bind the phosphate groups in rabbit GP (Thr676 and Lys568) and [beta]-GlcT (Ser189 and Arg269) are replaced by Asp218 and Glu304 in the pig [alpha]3-GalT model. This creates a negatively charged pocket between the protein and the pyrophosphate group which can accommodate a cation. Since mammalian [alpha]3-GalTs require a divalent cation for activity (van den Eijnden et al., 1985), a Mg2+ cation was included in the model. The other structurally conserved contacts are the double hydrogen bonds between Glu308 acidic group and O2 and O3 of ribose (Glu272 in [beta]-GlcT and Lys680 in GP) and the hydrogen bond between the uridine base and the backbone of Ile274 (Ile237 in [beta]-GlcT). After modeling a galactose moiety and optimizing the ligand and the binding site with appropriate energy parameters (Pérez et al., 1995; Petrova and Imberty, unpublished observations), a putative docking mode for UDP-Gal in pig [alpha]3-GalT can be proposed (Figure 3C). In addition to the Glu308-ribose hydroxyl groups and Ile274-uridine hydrogen bonds, electrostatic contacts are also observed between the oxygen of the pyrophosphate and two nitrogen atoms: the side chain of backbone of Gln219 and the side chain of Asn 224. These two residues are structurally equivalent to Arg191 and Arg195 in the [beta]-GlcT structure. The galactose residue establishes three hydrogen bonds: Asn223-N[epsis] - O2, Ile301-N - O3, and O6 - Glu302.O[epsis]1. The magnesium coordination involves contacts with the oxygen of the backbone of the pyrophosphate, the two oxygen atoms of the [beta]-phosphate group, and with the acidic groups of Asp218 and Glu304.

Discussion

When analyzing the sequence conservation among all eukaryotic [alpha]3-Gal or [alpha]-GalNAc transferases, it is notable that Glu308, the amino acid that is hypothesized to be involved in ribose binding is conserved among all members of this family. In terms of Mg2+ binding, Asp218 is also conserved, whereas Glu304 shows more variation. Asn224 is modeled to bind the [beta]-phosphate of UDP-Gal, and in the family is generally replaced by a basic amino acid (His or Lys) that can play the same role. Ile274, which is proposed to bind the pyrimidine moiety of UDP-Gal, is often replaced by another hydrophobic amino acids, that would not disrupt the hydrophobic character of the extended loop between [beta]10 and [alpha]9. Among the amino acids that are proposed to be involved in galactose binding, Ile301 and Glu302 are always conserved (with the exception of a Glu to Met mutation in the dog Forssman synthase) while Asn223 is often replaced by an aspartic acid that still hydrogen bonds with the O2 of the galactose. The model proposed for the [alpha]3-GalT C-terminal domain is therefore consistent for all other sequences of the eukaryotic [alpha]3-GalTs and blood group transferase protein family.

One intriguing question is why do the Leu266Met and Gly268Ala mutations alter the specificity from blood group A to blood group B, i.e., from a GalNAc to a Gal-transferase. In the present model, these two amino acids are in the loops between [beta]10 and [alpha]9, and are therefore close to the uridine base, whereas one might expect that they should directly interact with the sugar moiety, thus providing sugar specificity. However, the change in affinity between the two substrates appears to be due to a difference in kcat rather than in Km (Seto et al., 1997). Furthermore, Leu266 is not absolutely required for blood group A activity since a Ala amino acid is present at this position in the pig blood group A enzyme. Thus, the difference in enzymatic activity and substrate specificity may be due to a difference in the conformational properties of this particular loop, rather than a direct binding effect.

The amino acid sequences of the family of glycosyltransferases studied here, i.e., eukaryotic [alpha]3-GalTs and [alpha]3-GalNAcTs, have been previously compared to those of other classes of galactosyltransferases. A local similarity with eukaryotic [beta]4-GalTs first gave rise to the so-called DKKND motif (Joziasse et al., 1989). More recently, using the HCA method, three regions of sequence similarities were identified and labeled I to III (Breton et al., 1998). Region I contains a conserved DVDxxxxD/N motif. From the fold recognition study, this region corresponds to the loop that connects the N-terminal and C-terminal domain and extends in the [beta]8 strand and following loop. The conserved DVDxxxxD/N motif is located at the end of the [beta]8 strand, and the third residue (D) corresponds therefore to Asp218 that is proposed to interact with Mg2+ in the present model. The D/N corresponds to Asn223 that hydrogen bonds to the oxygen O2 of galactose. Region II of conserved amino acids corresponds to the loop between [beta]10 and [alpha]9. This loop ends up with the Ile residue involved in uridine binding. This region also contains the two amino acids which differs between the A and B transferases which have been suggested to be involved in nucleotide-sugar specificity (Yamamoto and Hakomori, 1990; Seto et al., 1997). Finally, the third conserved region from the sequence comparison study, i.e., region III, corresponds to [alpha]10 and the preceding loop. It appears, therefore, that the three regions which have been demonstrated to be conserved among eukaryotic [alpha]3-GalTs and [beta]4-GalTs correspond to the side, top, and bottom of the UDP-sugar binding site.

The DVD motif at the end of the [beta]8 strand seems to be of particular interest. In all families of bacterial and animal galactosyltransferases but one, this particular DxD motif is located just after a cluster of hydrophobic amino acids (Breton et al., 1998) that would correspond to a [beta]-strand according to HCA. Recently, this motif has been identified in other glycosyltransferases such as mannosyltransferases of which all use nucleoside diphosphate sugars and require divalent cations (Wiggins and Munro, 1998). Using photolabeling method, it has been demonstrated that mutations in the DXD motif in the large clostridial cytotoxins abolish the binding of nucleotide-sugar (Busch et al., 1998). These experimental data are in agreement with the present model since we propose that the second Asp residue of this motif is essential for binding the divalent cation associated with the nucleotide binding site. In the present study, a Mg2+ cation was included in the model, since this divalent cation is needed for [beta]-GlcT activity (Josse and Kornberg, 1962), but it should be noted that [alpha]3-GalT displays maximum activity in the presence of Mn2+ cation (Van den Eijnden et al., 1985).

Both [beta]-GlcT and glycogen phosphorylases catalyze the phosphate-dependent formation or breakdown of a O-glycosidic linkage. Their fold similarity could either be due to convergent evolution, or to a very remote evolutionary relationship. This latter hypothesis was concluded (Artymiuk et al., 1995; Holm and Sanders, 1995) based on the strong structural resemblance. However, the missing link in the evolutionary model could not be determined. Holm and Sanders (1995) searched for such candidates among glycosyltransferases responsible for glycogen or starch synthesis but could not find sequences with compatible secondary structure prediction. From the present study, we can postulate that eukaryotic [alpha]3-galactosyltransferases belongs to this superfamily. A preliminary report on fucosyltransferases (Breton et al., 1996) indicated that these enzymes could also adopt the [beta]-GlcT fold. Our current hypothesis is that most of the eukaryotic glycosyltransferases, as well as many bacterial ones, display a similar two-domains assembly, one being a Rossman type nucleotide binding domain.

Materials and methods

Sequence alignment

Pairwise alignments of [alpha]3-GalT and blood group transferase sequences were performed with the ALIGN program from the FASTA package (Myers and Miller, 1988). Multiple alignment of amino acid sequences was performed using the ClustalW method (Thompson et al., 1994).

Secondary structure prediction

Secondary structure predictions were performed using three programs available on WWW servers. (1) The PHDsec method (Rost and Sander, 1993, 1994; http://www.EMBL-Heidelberg.DE/Services/index.html) uses a system of neural networks scheme that extracts conservation weights from a multiple sequence alignment. (2) PREDATOR (Frishman and Argos, 1997; http://www.embl-heidelberg.de/argos/predator/predator_info.html) takes as input a single protein sequence but also uses information from a set of related sequences. (3) The NPS method (http://pbil.ibcp.fr/NPSA/npsa_prediction.html) provides the consensus secondary prediction for one sequence using a set of different algorithms. The secondary structure predictions were run using either the pig [alpha]3-GalT sequence or the multiple alignment obtained by ClustalW.

Fold prediction

The ProFIT program (Flöckner et al., 1995) from the ProCyon package (King's Beech, Biosoftware Solution, http://www.horus.com/sippl/) was used for fold recognition calculations using all [alpha]3-GalT sequences. The cytoplasmic tail, transmembrane domain, and stem were not included in the query sequence. Several other fold recognition programs were also tested such as TOPITS (Rost, 1995), THREADER2 (Miller et al., 1996), and FORESST (Di Francesco et al., 1997).

Homology modeling

The pig [alpha]3-GalT C-terminal region sequence (160 amino acids) was aligned on the C-terminal domain of [beta]-GlcT, and the corresponding region of glycogen and maltodextrin phosphorylases with the help of the HCA program (Gaboriaud et al., 1987). The hydrophobic clusters were aligned visually, while also taking into account the secondary structure prediction. A library of four crystal structures was prepared containing the [beta]-GlcT (Morera and Freemont, personal communication), rabbit and yeast glycogen phosphorylases (PDB codes 1GPB and 1YHP), and E.coli maltodextrin phosphorylase (code 1AHP) from the Brookhaven protein databank (Abola et al., 1997). The pig [alpha]3-GalT sequence, together with the library of four crystal structures, served as input for the COMPOSER program (Blundell et al., 1988) in the Sybyl package (Tripos). Structurally conserved regions (SCRs) shared by the four crystal structures of the library were defined to correspond, where possible, to the secondary structure elements of [beta]-GlcT. All of the eight loops were modeled by using the most similar fragments in the library of four structures. Each one was submitted to local geometry optimization to release the steric conflicts. The resulting model was then screened using the PROCHECK program (Laskowski et al., 1993), and backbone linkages lying outside the allowed regions of the Ramachandran map were further optimized. Hydrogen were added on all atoms, and partial atomic charges were derived using the Pullman procedure.

Docking of UDP-Gal

Before final optimization of the model, UDP-galactose was docked in the binding site. The UDP moiety was given the location observed in the crystal structure of [beta]-GlcT and the galactose residue was linked and oriented in the only conformation that does not yield to severe steric conflicts. Atom types and energy parameters available for carbohydrates (Pérez et al., 1995) within the TRIPOS force-field (Clark et al., 1989) were used, together with new parameters developed for the sugar-pyrophosphate linkage (Imberty and Petrova, unpublished results). A Mg2+ cation was located between the pyrophosphate group and the protein surface. The optimization of the complex was then run in successive cycles.

Acknowledgments

This work was supported by the following grants: Programme Physique et Chimie du Vivant-CNRS, Immunology Concerted Action 3026PL950004 and Xenotransplantation Project BIO4CT972242 of the BIOTECH program from European Union. C.B. is a staff member of Institut National de la Recherche Agronomique.

Abbreviations

GalT, galactosyltransferase; GlcT, glucosyltransferase; GP, glycogen phosphorylase; Gal, galactose; Glc, glucose; GalNAc, N-acetylgalactosamine; UDP, uridine diphosphate.

References

Abola ,E.E., Sussman,J.L, Prilusky,J. and Manning,N.O. (1997) Protein Data Bank archives of three-dimensional macromolecular structures. Methods Enzymol., 277, 556-571. MEDLINE Abstract

Acharya ,K.R., Stuart,D.I., Varvill,K.M. and Johnson,L.N. (1991) Glycogen Phosphorylase b: Description of the Crystal Structure. World Scientific, Singapore.

Artymiuk ,P.J., Rice,D.W., Poirrette,A.R. and Willet,P. (1995) Beta-glucosyltransferase and phopshorylase reveal their common theme. Nature Struct. Biol., 2, 115-120.

Baisch ,G., Öhrelin,R., Kolbinger,F. and Streiff,M (1998) On the preparative use of recombinant pig [alpha] (1-3)galactosyl-transferase. Bioorg. Med. Chem. Lett., 8, 1575-1578. MEDLINE Abstract

Blundell ,T., Carney,D., Gardner,S., Hayes,F., Howlin,B., Hubbard,T., Overington,J., Singh,D.A., Sibanda,B.L. and Sutcliffe,M. (1988) Knowledge-based protein modelling and design. Eur. J. Biochem., 172, 513-520. MEDLINE Abstract

Breton ,C., Bettler,E., Joziasse,D.H., Geremia,R. and Imberty,A. (1998) Sequence-function relationships of prokaryotic and eukaryotic galactosyltransferases. J. Biochem., 123, 1000-1009. MEDLINE Abstract

Breton ,C., Oriol,R. and Imberty,A. (1996) Sequence alignment and fold recognition of fucosyltransferases. Glycobiology, 6(7), vii-xii. MEDLINE Abstract

Busch ,C., Hofmann,F., Selzer,J., Munro,S., Jeckel,D. and Aktories,K. (1998) A common motif of eukaryotic glycosyltransferases is essential for the enzyme activity of large clostridial cytotoxins. J. Biol. Chem., 273, 19566-19572. MEDLINE Abstract

Clark ,M., Cramer,R.D.,III and van den Opdenbosch,N. (1989) Validation of the general purpose Tripos 5.2 force field. J. Comput. Chem., 8, 982-1012.

Cooper ,D.K., Koren,E. and Oriol,R. (1994) Oligosaccharides and discordant xenotransplantation. Immunol. Rev., 141, 31-58. MEDLINE Abstract

Di Francesco ,V., Geetha,V., Garnier,J. and Munson,P.J. (1997) Fold recognition using predicted secondary structure sequences and hidden Markov models of protein folds. Proteins (Suppl.), 1, 123-128.

Fang ,J., Li,J., Chen,X., Zhang,Y., Wang,J., Guo,Z., Zhang,W., Yu,L., Brew,K. and Wang,P.G. (1998) Highly efficient chemoenzymatic synthesis of [alpha]-galactosyl epitopes with a recombinant [alpha] (1-3)-galactosyltransferase. J. Am. Chem. Soc., 120, 6635-6638.

Flöckner ,H., Braxenthaler,M., Lackner,P., Jaritz,M., Ortner,M. and Sippl,M.J. (1995) Progress in fold recognition. Proteins Struct. Funct. Genet., 23, 376-386. MEDLINE Abstract

Flöckner ,H., Domingues,F.S. and Sippl,M.J. (1997) Protein folds from pair interactions: a blind test in fold recognition. Proteins (Suppl ), 1, 129-133.

Frishman ,D. and Argos,P. (1997) 75% accuracy in protein secondary structure prediction. Proteins, 27, 329-335. MEDLINE Abstract

Gaboriaud ,C., Bissery,V., Benchetrit,T. and Mornon,J.P. (1987) Hydrophobic cluster analysis: an efficient way to compare and analyse amino acid sequences. FEBS Lett., 224, 97-120. MEDLINE Abstract

Galili ,U. (1991) The natural anti-Gal antibody: evolution and autoimmunity in man. Immunol. Ser., 55, 355-373. MEDLINE Abstract

Haslam ,D.B. and Baenziger,J.U. (1996) Expression cloning of Forssman glycolipid synthetase: a novel member of the histo-blood group ABO gene family. Proc. Natl Acad. Sci. USA, 93, 10697-10702. MEDLINE Abstract

Henion ,T.R., Macher,B.A., Anaraki,F. and Galili,U. (1994) Defining the minimal size of catalytically active primate [alpha]1,3 galactosyltransferase: structure-function studies on the recombinant truncated enzyme. Glycobiology, 4, 193-201. MEDLINE Abstract

Holm ,L. and Sander,C. (1995) Evolutionary link between glycogen phosphorylase and a DNA modifying enzyme. EMBO J., 14, 1293-1995.

Imberty ,A., Piller,V., Piller,F. and Breton,C. (1997) Fold recognition and molecular modeling of a lectin-like domain in UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases. Protein Eng., 10, 1353-1356. MEDLINE Abstract

Jones ,D.T. and Thornton,J.M. (1996) Potential energy functions for threading. Curr. Opin. Struct. Biol., 6, 210-216. MEDLINE Abstract

Josse ,J. and Kornberg,A. (1962) Glucosylation of deoxyribonucleic acid. III. [alpha] and [beta]-glucosyl tranferases from T4-infected Escherichia coli. J. Mol. Biol., 237, 1968-1976.

Joziasse ,D.H., Shaper,J.H, van den Eijden,D.H., van Tunen,A.J. and Shaper,N.L. (1989) Bovine [alpha]1-3-galactosyltransferase: isolation and characterization of a cDNA clone. Identification of homologous sequences in human genomic DNA. J. Biol. Chem., 264, 14290-14297. MEDLINE Abstract

Joziasse ,D.H., Shaper,N.L., Salyer,L.S., Van den Eijnden,D.H., van der Spoel,A.C. and Shaper,J.H. (1990) Alpha 1-3-galactosyltransferase: the use of recombinant enzyme for the synthesis of alpha-galactosylated glycoconjugates. Eur. J. Biochem., 191, 75-83. MEDLINE Abstract

Joziasse ,D.H., Shaper,N.L., Kim,D., Van den Eijnden,D.H. and Shaper,J.H. (1992) Murine alpha, 1,3-galactosyltransferase. A single gene locus specifies four isoforms of the enzyme by alternative splicing. J. Biol. Chem., 267, 5534-5541. MEDLINE Abstract

Kornberg ,S.R., Zimmerman,S.B. and Kornberg,A. (1961) Glucosylation of deoxyribonucleic acid by enzymes from bacteriophage-infected Escherichia coli. J. Biol. Chem., 236, 1487-1493.

Kraulis ,P.K. (1991) MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr., 24, 946-950.

Laskowski ,R.A., MacArthur,M.W., Moss,D.S. and Thornton,J.M. (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr., 26, 283-291.

Lemer ,C.M., Rooman,M.J., Wodak,S.J. (1995) Protein structure prediction by threading methods: evaluation of current techniques. Proteins, 23, 337-355. MEDLINE Abstract

Lin ,K., Rath,V.L., Dai,S.C., Fletterick,R.J. and Hwang,P.K. (1996) A protein phosphorylation switch at the conserved allosteric site in GP. Science, 273, 1539-1542. MEDLINE Abstract

Miller ,R.T., Jones,D.T. and Thornton,J.M. (1996) Protein fold recognition by sequence threading: tools and assessment techniques. FASEB J., 10, 171-178. MEDLINE Abstract

Murzin ,A.G, Brenner,S.E, Hubbard,T. and Chothia,C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536-540. MEDLINE Abstract

Myers ,E.W. and Miller,W. (1988) Optimal alignments in linear space. Comput. Appl. Biosci., 4, 11-17. MEDLINE Abstract

O'Reilly ,M., Watson,K.A., Schinzel,R., Palm,D., Johnson,L.N. (1997) Oligosaccharide substrate binding in Escherichia coli maltodextrin phosphorylase. Nature Struct. Biol., 4, 405-412.

Pérez ,S., Meyer;C. and Imberty,A. (1995) Practical tools for accurate modeling of complex carbohydrates and their interactions with proteins. In Pullman,A., Jortner,J. and Pullman,B. (eds.), Modelling of Biomolecular Structures and Mechanisms. Kluwer Academic, Dordrecht, pp. 425-454.

Rooman ,M. and Gilis,D. (1998) Different derivation of knowledge-based potentials and analysis of their robustness and context-dependent predictive power. Eur. J. Biochem., 254, 135-143. MEDLINE Abstract

Rossman ,M.G., Moras,D. and Olsen,K.W. (1974) Chemical and biological evolution of a nucleotide-binding protein. Nature, 250, 194-199. MEDLINE Abstract

Rost ,B. (1995) TOPITS: Threading one-dimensional predictions into three-dimensional structures. In Rawlings,C., Clark,D., Altman,R. Hunter,L., Lengauer,T. and Wodak,S. (eds.), The Third International Conference on Intelligent Systems for Molecular Biology (ISMB). AAAI Press, Menlo Park CA, pp. 314-321.

Rost ,B. and Sander,C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol., 232, 584-599. MEDLINE Abstract

Rost ,B. and Sander,C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 19, 55-72. MEDLINE Abstract

Rost ,B., Schneider,R. and Sander,C. (1997) Protein fold recognition by prediction-based threading. J. Mol. Biol., 270, 471-480. MEDLINE Abstract

Seto ,O.L., Palcic,M.M., Compston,C.A., Li,H., Bundle,D.R. and Narang,S.A. (1997) Sequential interchange of four amino acids form blood group B to blood group A glycosyltransferase boosts catalytic activity and progressively modifies substrate recognition in human recombinant enzymes. J. Biol. Chem., 272, 14133-14138. MEDLINE Abstract

Sippl ,M.J. and Flöckner,H. (1996) Threading thrills and treats. Structure, 4, 15-19. MEDLINE Abstract

Sowdhamini ,R., Burke,D.F., Huang,J., Mizuguchi,K., Nagarajaram,H.A., Srinivasan,N., Steward,R.E. and Blundell,T.L. (1998) CAMPASS: a database of structurally aligned protein superfamilies. Structure, 6, 1087-1094. MEDLINE Abstract

Strahan ,K.M., Gu,F., Preece,A.F., Gustavsson,I., Andersson,L. and Gustafsson,K. (1995) cDNA sequence and chromosome localization of pig alpha 1,3 galactosyltransferase. Immunogenetics, 41, 101-105. MEDLINE Abstract

Taniguchi ,S., Neethling,F.A., Korchagina,E.Y., Bovin,N., Ye,Y., Kobayashi,T., Niekrasz,M., Li,S., Koren,E., Oriol,R. and Cooper,D.K. (1996) In vivo immunoadsorption of antipig antibodies in baboons using a specific Gal (alpha)1-3Gal column. Transplantation, 62, 1379-1384. MEDLINE Abstract

Thompson ,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673-4680. MEDLINE Abstract

Vajda ,S., Sippl,M. and Novotny,J. (1997) Empirical potentials and functions for protein folding and binding. Curr. Opin. Struct. Biol., 7, 222-228. MEDLINE Abstract

Van den Eijnden ,D.H., Blanken,W.M., Winterwerp,H. and Schiphorst,W.E. (1985) Identification and characterization of an UDP-Gal:N-acetyllactosaminide alpha-1,3-d-galactosyltransferase in calf thymus. Eur. J. Biochem., 134, 523-530.

Vrielink ,A., Rüger,W., Driessen,H.P.C. and Freemont,P.S. (1994) Crystal structure of the DNA modifying enzyme [beta]-glucosyltransferase in the absence and the presence of the substrate uridine diphosphoglucose. EMBO J., 13, 3413-3422. MEDLINE Abstract

Wiggins ,C.A. and Munro,S. (1998) Activity of the yeast MNN1 alpha-1,3-mannosyltransferase requires a motif conserved in many other families of glycosyltransferases. Proc. Natl Acad. Sci. USA, 95, 7945-7950. MEDLINE Abstract

Yamamoto ,F., Clausen,H., White,T., Marken,J. and Hakomori,S. (1990) Molecular genetic basis of the histo-blood group ABO system. Nature, 345, 229-233. MEDLINE Abstract

Yamamoto ,F. and Hakomori,S. (1990) Sugar-nucleotide donor specificity of histo-blood group A and B transferases is based on amino acid substitutions. J. Biol. Chem., 265, 19257-19262. MEDLINE Abstract


5To whom correspondence should be addressed


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: jnl.info{at}oup.co.uk
Last modification: 10 Jun 1999
Copyright©Oxford University Press, 1999.