Structural similarity and functional diversity in proteins containing the legume lectin fold

Nagasuma R. Chandra1, M.M. Prabu2, K. Suguna2 and M. Vijayan2,3

1 Bioinformatics Centre and 2 Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results and discussion
 References
 
Knowledge of structural relationships in proteins is increasingly proving very useful for in silico characterizations and is also being exploited as a prelude to almost every investigation in functional and structural genomics. A thorough understanding of the crucial features of a fold becomes necessary to realize the full potential of such relationships. To illustrate this, structures containing the legume lectin-like fold were chosen for a detailed analysis since they exhibit a total lack of sequence similarity among themselves and also belong to diverse functional families. A comparative analysis of 15 different families containing this fold was therefore carried out, which led to the determination of the minimal structural principles or the determining region of the fold. A critical evaluation of the structural features, such as the curvature of the front sheet, the presence of the hydrophobic cores and the binding site loops, suggests that none of them are crucial for either the formation or the stability of the fold, but are required to generate diversity and specificity to particular carbohydrates. In contrast, the presence of the three sheets in a particular geometry and also their topological connectivities seem to be important. The fold has been shown to tolerate different types of protein–protein associations, most of them exhibiting different types of quaternary associations and some even existing as complexes with other folds. The function of every family in this study is discussed with respect to its fold, leading to the suggestion that this fold can be linked to carbohydrate recognition in general.

Keywords: carbohydrate binding/ß-sandwich/structural determinants/structural relationships


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results and discussion
 References
 
Examination of the hitherto identified protein folds reveals that the available protein structures cluster into limited regions of the entire conformational space (Holm and Sander, 1995Go). This means that several protein families share a common structural fold, some of which are obvious from the sequence similarities that they exhibit. It is also well known that protein evolution gives rise to families of structurally related proteins, within which sequence similarities can be extremely low. Such unanticipated relationships in known structures have been identified effectively by structure-based classifications (Holm and Sander, 1996Go). Several excellent databases featuring structural classifications of protein structures have been developed in recent years [SCOP (Murzin et al., 1995Go), FSSP (Holm and Sander, 1996Go), CATH (Orengo et al., 1997Go)]. These databases serve as useful guidelines to study the overall folds and structures of various proteins encoded by a genome. To realize the full potential of these relationships, it is essential to characterize the structural determinants or the minimal structural principles of the individual folds. The completion of several genome sequences including that of the human genome provides an additional impetus to such characterizations. Structures containing the legume lectin-like fold are so diverse in the families to which they belong that they provide classical examples for investigations of this type.

Lectins are carbohydrate-binding proteins that specifically recognize diverse sugar structures and mediate a variety of biological processes such as cell–cell and host–pathogen interactions, serum glycoprotein turnover and innate immune responses (Vijayan and Chandra, 1999Go). Lectins are found in most organisms, ranging from viruses and bacteria to plants and animals (Lis and Sharon, 1998Go). They represent a heterogeneous group of oligomeric proteins that vary widely in size, structure, molecular organization and the constitution of their combining sites. Nonetheless, many of them belong to distinct protein families, classified based on biochemical, functional or structural properties. Although a number of lectins have been well studied, ambiguities still exist in their precise biological roles. A well-studied class of lectins from leguminous plants contain a characteristic fold (Loris et al., 1998Go; Bouckaert et al., 1999Go), often referred to as the legume lectin fold or simply the lectin fold. This fold is one of the widely occurring protein folds represented by 14 distinct protein families, in addition to legume lectins. The most striking feature of the fold is the total lack of sequence similarity among different members exhibiting the fold. It is remarkable indeed that different members exhibiting this fold can show as low as 2% sequence identity. The recent addition of the structures of several legume lectins and also many other proteins possessing the fold (Berman et al., 2000Go; Bettler et al., 2001Go) enables us to analyse and compare these various structures in order to characterize the structural features of this fold. Here, we seek to compare the structures of the 15 different families and derive common structural features and determinants of the fold. In an attempt to relate fold to function, we also analyse features required for carbohydrate recognition, which happens to be the best recognized function of members with this fold.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results and discussion
 References
 
Initial identification of proteins containing the legume lectin-like fold was made using the SCOP (Murzin et al., 1995Go) and FSSP databases (Holm and Sander, 1996Go), which was followed by an analysis of related proteins in the CATH (Orengo et al., 1997Go) and the 3D lectin databases (Bettler et al., 2001Go). Further, to identify structural homologues, a thorough investigation of all available protein structures in the protein data bank was carried out using two separate structure comparison algorithms, DALI (Holm and Sander, 1995Go) and VAST (Gibrat et al., 1996Go). All the identified proteins were analysed for particular features and re-classified based broadly on their known functions. Only those proteins which had a Z-score >2.5, for at least 70% of the contents of the relevant domain, from the DALI comparisons are included here. This study involved an analysis of more than 300 structures, which were obtained from our local repository of coordinates, regularly downloaded from the Protein Data Bank (Berman et al., 2000Go). The MSI software package (InsightII, version 98) was used to visualize, analyse and manipulate various structures. The solvent accessibility calculations were carried out using the Connolly algorithm (Connolly, 1993Go). To deduce topology, sequential connectivity of the individual strands in the three ß-sheets were considered. The curvature of the sheet was computed by measuring virtual angles subtended by the end C{alpha} atoms at their mid-points. The average distance between two sheets was computed by measuring the distance between the centroids of each sheet using only the C{alpha} atoms of each sheet. Hydrophobic cores were considered present, when a minimum of four residues, that had an overall accessibility of <10%, were in contact within a radius of 5 Å of each other, forming a cluster.


    Results and discussion
 Top
 Abstract
 Introduction
 Methods
 Results and discussion
 References
 
General description of the fold

The structure of concanavalin A (ConA), the first legume lectin to be X-ray analysed, in 1972, exhibited a ß-sandwich type of structure (Hardman and Ainsworth, 1972Go). Subsequently, about 100 more structures involving more than 15 different legume lectins in their complexed and uncomplexed forms have been studied. All of them share the same tertiary structure with that of ConA in their individual subunits. These include about 10 structures of peanut and winged bean basic and acidic lectins from our laboratory (Banerjee et al., 1994Go,1996Go; Prabu et al., 1998Go; Manoj et al., 2000Go). A structure-based classification of proteins places them in a super-family of ConA-like lectins and glucanases, one of the many in the all-ß structural class. The subunits of legume lectins are most often made up of single polypeptide chains of ~250 amino acids exhibiting the legume lectin fold. The fold primarily consists of three ß-sheets, a `flat' six-membered `back' ß-sheet, a small `top' ß-sheet and a curved, seven-stranded `front' ß-sheet and a number of loops interconnecting the sheets as well as the strands in them (Banerjee et al., 1996Go). In peanut lectin, for example, 110 out of 228 residues have ß-structures; the remaining are in loops and ß-turns, connecting the strands. Legume lectins within themselves exhibit remarkable sequence homologies and structural similarities, despite differences in sugar specificities and quaternary structures. Superpositions of C{alpha} atoms of the ß-sheets of individual subunits of legume lectins using various combinations results in root mean square deviation (r.m.s.d.) values ranging from about 0.6 to 2.0 Å.

Analysis of structures in the Protein Data Bank reveals that there are many other families of proteins which exhibit the same legume lectin subunit fold. Structural homologues were identified using DALI (Holm and Sander, 1995Go) and VAST algorithms (Gibrat et al., 1996Go), compared with the SCOP and FSSP databases and reclassified based on their known functions, as shown in Table IGo. The first legume lectin to be X-ray analysed, ConA, is somewhat atypical as the post-translational modification involving a circular permutation results in a mature protein with amino and carboxyl termini at locations different from those in all other legume lectins of known three-dimensional structure. Therefore, instead of ConA, a subunit of tetrameric peanut lectin (2PEL:A), another thoroughly characterized lectin, will be used as a representative of legume lectins in the present study. The highest resolution structure of either the carbohydrate complex where available or of the apo protein from each of the other families was chosen as a representative structure for further analysis. The representative structures chosen are listed in Table IGo and illustrated in Figure 1Go. Tumour necrosis factor-{alpha} and the viral capsid proteins were also among the structural homologues of peanut lectin. They were not considered in this study, however, since their similarities lie below the cutoff criteria chosen here.


View this table:
[in this window]
[in a new window]
 
Table I. Similarities in structures containing the legume lectin-like folda
 


View larger version (49K):
[in this window]
[in a new window]
 
Fig. 1. Ribbon diagrams of the 15 representative structures containing the legume lectin-like fold chosen in this study. All the structures are shown in the same orientation. The PDB codes are given below each structure. The inset shows the superposition of the three sheets in all the structures. Figures 1, 2A and 4GoGoGo were prepared using MOLSCRIPT (Kraulis, 1991Go). The back sheets are shown in black, the front sheets in dark grey and the top sheets in light grey. Carbohydrate ligands and metals where present are shown as CPK models.

 
The terms ß-sandwich fold and the jelly roll fold have also been used in the literature to describe these proteins. Whereas the former term is technically correct, since the legume lectin fold is a type of ß-sandwich, use of the latter term is debatable as the legume lectin fold does not strictly conform to the definition of a jelly roll (Chelvanayagam et al., 1992Go). There are minor variations in topology among the different families of proteins exhibiting this fold. Therefore, strictly, they may be described as belonging to a set of closely related folds. However, as explained later, the differences in topology are very small and the term legume lectin fold will be used to encompass all of them.

Characteristic features of the legume lectin fold

The two main ß-sheets, their relative orientation and the third ß-sheet Superposition of the ß-strands in all structures with those in the first subunit of peanut lectin reveals that all of them have two ß-sheets with a roughly similar mutual orientation. R.m.s.ds, number of residues aligned and the structural similarity score are shown in Table IGo. The `back' sheet ranged between four and six strands with seven residues in each strand, in all the structures, while the front sheet showed 5–7 strands with 5–7 residues in each strand. This variation was restricted only to the first two strands in the back sheet and the first three strands in the `front' sheet. Strand regions corresponding to residues 64–70, 162–168, 173–179 and 186–192 from the back sheet and residues 84–90, 117–124, 136–143 and 149–153 of the peanut lectin structure can therefore be said to be invariant. Bovine spermadhesin and the insecticidal toxin, however, exhibit minor variations on this theme, since only those which correspond to the middle strands in both sheets in peanut lectin are present. It must be remembered, however, that these two families deviate the most from the typical legume lectin fold.

The two sheets are approximately parallel to each other and also situated at a distance averaging ~13 Å between the back and the front sheets as measured by computing the distances between the centroids of the C{alpha} atoms of each sheet. The presence of the `back' and the `front' sheets and similarities in their relative orientation, therefore, clearly appears to be a characteristic of the legume lectin fold. Several hydrophobic residues present on both sheets have their side chains positioned between the sheets so as to form a hydrophobic cluster, which provides an important source of stability for maintaining the fold. A prominent feature of this hydrophobic cluster is the aromatic side chains of one sheet stacking against those of the other sheet. The observed distance of ~13 Å, can be justified in terms of the optimal distance required for such interactions.

The third, small `top' anti-parallel sheet made of five strands with only 2–4 residues in each strand exists in all legume lectins. The strands correspond to residues 25–27, 31–34, 217–220, 71–74 and 160–161 in peanut lectin (Banerjee et al., 1996Go). This sheet is not explicitly acknowledged as a separate ß-sheet in most of the other structures. However, upon examination of their backbone dihedral angles and hydrogen bonding patterns, all the 15 representative structures described here were found to contain segments corresponding to this sheet in an orientation analogous to that observed in peanut lectin. As in the case of the `back' and the `front' sheets, here too only the last three strands corresponding to residues 217–220, 71–74 and 160–161 were invariant. A schematic diagram of the fold and a stereo view of the ribbon diagram of a subunit of peanut lectin are shown in Figure 2Go.



View larger version (41K):
[in this window]
[in a new window]
 
Fig. 2. (A) Schematic diagram of the legume lectin-like fold showing the position of the three sheets and the carbohydrate in legume lectins. (B) Stereo view of the ribbon diagram of a subunit of peanut lectin. The colour scheme used is similar to that in Figure 1Go.

 
Concavity of the front sheet The front sheet is curved in all legume lectins. In order to determine how important this curvature was for determining the legume lectin fold, the extent of curvature in all 15 structures was analysed by measuring the distances between the ends of the two middle strands and the virtual angles the ends subtend at the mid-points of the strands. The end-distances and virtual angles average ~22 Å and ~150°, respectively for galectins, Charcot Leyden crystal protein, spermadhesins and the insecticidal toxin, while the average values for legume lectins, arcelins, cellobiohydrolase, glucanases, xylanases, neurexins and tetanus neurotoxin are ~19 Å and ~120°. The pentraxins show values between these two types, with one of the strands showing more curvature than the other. These calculations confirm that whereas the front sheet is almost flat in some proteins, it is significantly curved in most others, giving rise to a concave surface, suggesting that the fold can tolerate considerable variation in this parameter (Table IIGo). It also suggests that the curvature of the front sheet is not too critical for either the formation or the stability of the overall fold.


View this table:
[in this window]
[in a new window]
 
Table II. Descriptors of the folda
 
Hydrophobic cores and surface loops In addition to the first hydrophobic core between the two main ß-sheets, present in all structures, a second hydrophobic patch has also been observed in some structures such as peanut lectin and cellobiohydrolase. Upon examination of these structures, it appears that along with the spatial disposition of long loops connecting the strands, the curvature of the front sheet is responsible for this. Indeed, only those structures that had a curved front sheet and large loops connecting the strands in it exhibited a second hydrophobic patch and those structures with either a nearly flat front sheet or with curved sheets with short loops did not exhibit this phenomenon (Table IIGo). The solvent-accessible surface areas computed for various structures confirm the presence of the first hydrophobic patch in all structures and the presence of the second patch in some, correlating well with the curvature of the front sheet and the presence of large loops. It therefore appears that the second hydrophobic patch is not crucial for the fold. Each family showed a number of varying loops in their member structures and in many cases loops played an important role in carbohydrate binding, e.g. the four large loops in legume lectins. Galectins, on the other hand, demonstrate that the loops can be rather small, as small as just two residues, and yet be capable of carbohydrate binding, as discussed in a later section. Again, this serves to prove that the loops do not play a critical role for the formation, stability or function of the fold but may give rise to specificity or differences in affinity for various oligosaccharides. An examination of the distribution of charged residues on the structures reveals that in most cases they are clustered on the front sheet irrespective of its extent of curvature. Charged residues were found in some structures either on the back sheet or on the top sheet, but were often explainable in terms of their quaternary associations.

Topology The legume lectin fold has three ß-sheets, as has been described in the previous section, described in SCOP as exhibiting complex topology. The fold contains primarily a ß-sandwich made of strands (1, 18, 6, 13, 14, 15)–(2, 5, 16, 8, 9, 10, 11) with a lid made of strands (3, 4, 17, 7, 12) where each number denotes that assigned to the strand based on their position in the sequence. The pairs formed within the fold are all antiparallel and made of strands 1–18, 2–5, 3–4, 4–17, 5–16, 6–13, 6–18, 7–12,7–17, 8–9, 8–16, 9–10, 10–11, 13–14 and 14–15. The connectivities of the strands in the three sheets were identical, except for the small differences indicated below, among the superposable parts of all the representative structures, suggesting that topology is an important characteristic feature of the fold. Figure 3Go depicts the topological connections in peanut lectin in a schematic diagram. The figure also illustrates how this topology is substantially conserved in all the representative structures in spite of variations in the fold itself. The variations are primarily truncations of the sheets, insertions of additional segments or changes in the positions of the N- and C-termini in the sequence. A classical example is provided by ConA, where the topology of the sheets remains identical with that of peanut lectin except that the N- and C-termini are at different positions. The same phenomenon is observed in the case of both the pentraxins, where the termini are merely frame-shifted by three strands in the sequence. Pentraxins also have two {alpha}-helices, one between the first two strands of the back ß-sheet and the other between the front and the top sheets. An insertion is observed in the wings of neuraminidase also, where an {alpha}-helix is observed between the back and the front sheets, without any changes in either the position of the termini or the topology. The structure of cellobiohydrolase presents a good example of an entire domain being inserted between the back and the front sheets and two smaller domains, predominantly consisting of {alpha}-helices, between the front and the top sheets. Xylanases reveal an insertion of additional strands in its front sheet leading to two additional ß-hairpins. Galectins and the Charcot Leyden protein demonstrate yet another variation, that can be described as a truncation of the first four strands in the sequence resulting in one strand less in both the main sheets, as compared with the situation in peanut lectin. Spermadhesins and the insecticidal toxin show larger truncations within the back and the front sheets, but again the topologies of the existing strands remain similar. These observations suggest that the fold can tolerate insertions and deletions such as those described, without any disturbance to its overall nature.



View larger version (71K):
[in this window]
[in a new window]
 
Fig. 3. Topology diagram of (a) peanut lectin shown in the centre panel and its subtle variations in other structures: (b) neuraminidase, (c) galectin,(d) pentraxin, (e) cellobiohydrolase, (f) xylanase, (g) insecticidal toxin and (h) spermadhesin. The topology diagrams of arcelins, a -amylase inhibitor, tetanus neurotoxin, glucanases and the LNSdomains are similar to that of peanut lectin, that of Charcot Leyden crystal protein is similar to that of galectins, while both the pentraxins have the same topological type. The invariant regions in eachof them are highlighted in grey. Strands are numbered according to their three-dimensional positions in the respective á -sheets in peanut lectin. Strands in the other structures are numbered withreference to those in peanut lectin. denote those strands that do not match with any strand in peanut lectin.

 
Derivation of minimal structural determinants Compilation of the residues in the superimposable segments in all structures reveals that they are all close together in their sequential positions, indeed clustered into one long segment of the polypeptide, corresponding to residues 64–192 of peanut lectin. This segment corresponds to a ß-hairpin in the front sheet, a double hairpin of the back sheet and two short strands on the top sheet which are basically extensions of two invariant strands in the back sheet. Interestingly, all these regions are contained within one contiguous segment of the polypeptide chain in all the structures studied here. This segment, highlighted in Figure 3Go, can be described as the minimal structural principle or as the determining region for the legume lectin fold. The minor topological variations observed in some proteins, especially with reference to the positions of their chain termini (e.g. in ConA or the pentraxins), occur between the first and the second strands of the back sheet and do not alter the invariant region in the fold, consistent with the hypothesis that the invariant region is indeed the determining region. The only variation to this general consensus is observed in the structures of the insecticidal toxin and spermadhesin where a segment corresponding to two strands in the back and front sheets within this determining region are deleted, but there too, the remaining strands of the invariant region are within one sequential segment.

Quaternary structure

Legume lectins themselves exhibit different types of quaternary structures not only in terms of the number of subunits involved in the oligomeric molecule but also in the nature or type of oligomerization, despite having nearly identical subunit structures. The dimerization of ConA and several other legume lectins involves the association of the two back ß-sheets into a contiguous 12-stranded ß-sheet in a side-by-side arrangement, with the dyad of the dimer perpendicular to the ß-sheet thus formed. That in peanut lectin, lectin IV of Griffonia simplicifolia, Erythrina corallodendron lectin and winged bean lectins involves the back-to-back arrangements of the two `back' ß-sheets. All tetrameric legume lectins are dimers of dimers, but here again considerable variability exists (Prabu et al., 1999Go). In this context, the most interesting case is presented by peanut lectin in which the tetrameric molecule has an `open' structure without the expected 222 or 4-fold symmetry (Banerjee et al., 1994Go). The ConA type of dimerization is exhibited by galectins, sialidase, a member of the neuraminidase family, arcelin-1 and the lectin-like inhibitor of amylase, while a back-to-back arrangement is found in the porcine seminal plasma spermadhesin heterodimer. The pentameric pentraxins exhibit yet another type of oligomerization. The association of the A–E subunits appears rather weak although involving the back ß-sheets; the association of the A–B subunits involves only the top ß-sheet and some loops around it, thus forming a new mode of dimerization. It is worth mentioning that the invariant region is not associated with quaternary association in any of the structures studied here. Figure 4Go illustrates that structures containing the legume lectin-like fold exist as monomers, different types of dimers, tetramers and also as domains of multi-domain proteins. Differences in quaternary associations have been shown to give rise to differences in oligosaccharide specificities in bulb lectins (Chandra et al., 1999Go), although any such clear correlations have not been discovered so far for the proteins used in this study.



View larger version (67K):
[in this window]
[in a new window]
 
Fig. 4. (A) Various types of quaternary associations exhibited by members of the legume lectin-like fold. (B) Examples where legume lectin-like folds exist as domains of multidomain proteins, as in neuraminidase (1KIT), insecticidal toxin CryIIIA (1CIY) and the tetanus neurotoxin (1DLL).

 
Biological role and carbohydrate binding

The main criterion for a protein to be classified as a lectin is its ability to bind a carbohydrate, most often an oligosaccharide (Lis and Sharon, 1998Go). Lectins are known to mediate a variety of cellular interactions through their ability to bind carbohydrates specifically. It is not surprising, therefore, that different lectins are specific to different carbohydrates. A study of the crystal structures of more than 50 legume lectin–carbohydrate complexes has shown that the lectins bind the carbohydrates at the top of the concave side of the front sheet and involve interactions from the four loops (91–106, 125–135, 75–83 and 211–216). The first three loops are largely common to all legume lectins whereas the fourth one varies in size and conformation and is thought to determine specificity of the lectin. The predominant function of legume lectins, all exhibiting the same fold, is therefore carbohydrate binding (Sharma and Surolia, 1997Go). The obvious questions that this observation raises are whether the legume lectin-like fold is always involved in carbohydrate binding and whether the fold gives rise to that particular function. A brief description of the families of proteins in this study provides some insight into these questions.

Galectins, represented by 1SLC, are a family of soluble animal lectins that are cation dependent and bind to Gal-ß(1,4)GlcNAc terminating oligosaccharides (Bourne et al., 1994Go). They have been implicated in modulation of cell–cell interactions through carbohydrate-mediated recognition. The crystal structures of five galectins reported so far all show that the carbohydrate binds on the front sheet. The front sheet, however, is not curved in galectins and the loops corresponding to the four carbohydrate binding loops on peanut lectin are either very small or are not present. Carbohydrate binding is achieved through interactions of the charged and polar residues on the front sheet, especially an aspartic acid, an asparagine and an arginine. It appears that the longer side chains seen on the front sheet compensate for the loss of the loop regions and succeed in binding the carbohydrate in a somewhat similar position. The human Charcot Leyden crystal protein, also referred to as galectin-10 owing to its structural similarity to galectins, is the major autocrystallizing constituent of human eosinophils and basophils during allergic inflammation and is known to possess lysophospholipase activity (Swaminathan et al., 1999Go). The CLC protein structure possesses a carbohydrate recognition site comprising most of the binding residues that are conserved among galectins. The protein exhibits specific although weak, binding to mannose, N-acetylglucosamine and lactose. The binding site of mannose in the crystal structure of a complex is seen to be similar to the sugar binding site in galectins.

Arcelin-1, a member of the phytohaemagglutinin family, is a glycoprotein from kidney beans (Phaseolus vulgaris) which displays insecticidal properties and protects the seeds from predation by larvae of various bruchids (Mourey et al., 1998Go). This lectin-like protein, although devoid of monosaccharide binding properties, exhibits specificity for various glycoproteins such as fetuin and asialofetuin. The related protein arcelin-5 has a different quaternary structure, binds to monosaccharides specifically, on the concave side of the front sheet involving interactions similar to those observed in legume lectins (Hamelryck et al., 1996Go). The differences in function between the two arcelins, due to the differences in their carbohydrate-binding properties, have been explained in terms of sequence and structural changes in the two proteins. The seeds of Phaseolus vulgaris contain another protein that inhibits {alpha}-amylase in the digestive tract of mammals and coleoptera and the growth of burchid larvae (Bompard-Gilles et al., 1996Go). The structure of this compound, determined a few years ago, reveals a lectin-like domain with a tertiary structure very similar to that of ConA. The carbohydrate-binding loops in this protein are truncated, facilitating its binding to amylase.

Pentraxins are pentameric plasma glycoproteins characterized by calcium-dependent ligand binding. Their overall structures have been described earlier to be similar to that of legume lectins (Srinivasan et al., 1996Go). The loops within and between the ß-sheets are much shorter in this family of proteins. The human serum amyloid P component binds to 4,6-cyclic pyruvate acetal of ß-D-galactose and all forms of amyloid fibrils through calcium ions (Emsley et al., 1994Go). Although the positions of calcium ions are different from the metal ion positions in legume lectins, the mode of recognition bears some resemblance in the two families, especially the role played by Gln148 in the serum amyloid component, as compared with that of Asn127 in peanut lectin. The human C-reactive protein, although belonging to the same structural class as that of the serum amyloid component, is functionally a completely different protein (Shrive et al., 1996Go). It is an acute phase reactant protein that is expressed rapidly as a response to infection or injury and is known to be involved in enhancement of phagocytosis and activation of the complement through its ability to bind to the bacterial polysaccharides. The ligand is expected to bind at the concave side of the front sheet in both members of the pentraxin family.

Many microorganisms produce multiple forms of different 1,4-ß-glycosidases in order to hydrolyse plant polysaccharides such as cellulose and xylan. Because of the complex nature of these polysaccharides, different glycosidases are required to complete their hydrolysis. These glycosidases are classified into endo- or exo-enzymes depending on their ability to catalyse the backbone breakage of their polymeric substrate (Wood, 1989Go). Microbial cellobiohydrolase is a representative of the exoglucanase/cellulase family, evolved to carry out the hydrolysis of cellulose, the major polysaccharide in plants (Divne et al., 1994Go). Its crystal structure shows that this protein specifically binds to the 1,4-{alpha}-D-glucan moiety of cellulose. The glucanases (Keitel et al., 1993Go), also known as lichenases, represent a distinct family of glucanohydrolases, that primarily hydrolyse the disordered amorphous regions of cellulose, cutting at internal glycosidic bonds. The two proteins together act in synergy to achieve complete hydrolysis of cellulose. Although there are several differences in the two structures, especially in terms of insertions and the number of loops, both appear to bind carbohydrate moieties on the concave side of the front sheet, involving interactions of residues structurally related to those in legume lectins. In the case of cellobiohydrolase, the extra loop regions and insertions serve to form a tunnel to encompass the product cellobiose (Divne et al., 1994Go). Xylanases, another class of endoglycosidases, are structurally very similar to glucanases and have their binding sites situated in a cleft on the concave side of the front sheet. In this class of proteins, clear conformational changes have also been observed upon ligand binding (Havaukainen et al., 1996). These changes involve the extra insertions present in the front sheet and also the loop regions located close by.

Two lectin-like domains have been observed on either side of the central sialidase domain of Vibrio cholerae neuraminidase (Crennel et al., 1994Go). Neuraminidase cleaves the glycosidic linkage between a terminal sialic acid and the penultimate sugar in various glycoconjugates. The environment of the small intestine requires the bacteria to secrete several adhesins. It is expected that the lectin-like domains mediate the protein's attachment to the adhesins. The ability to recognize carbohydrates by these domains, however, remains to be proven.

LNS domains are present in diverse proteins such as laminins, neurexins, agrins and slit (Rudenko et al., 1999Go). The structures of the G domain of laminin A (Hohenester et al., 1999Go), steroid binding protein (Grishkovskaya et al., 2000Go) and neurexins (Rudenko et al., 1999Go) have recently been determined and found to be extremely similar to each other. Neurexins are brain-specific cell surface proteins that are believed to be involved in neuron–neuron recognition and neuron–neuron adhesion. The crystal structure of the LNS domains in them responsible for this function, determined recently, reveal ß-sandwich motifs with striking similarity to legume lectins. The LNS domains in agrin and laminin A are known to bind to heparin and other glycosaminoglycan components. The ligand-binding sites in these domains, however, vary from that in legume lectins and also in the different members containing this domain. This is not surprising given the broad range of ligands that these domains can bind. The structural homology between neurexin-1ß and lectins raises the possibility that LNS domains may have a general function as carbohydrate-binding modules and that in neurexins, protein–carbohydrate interactions might contribute to their cell adhesive properties at neuronal junctions (Rudenko et al., 1999Go). While it remains to be investigated whether LNS domains in neurexins do indeed bind sugars, their interactions with protein ligands such as {alpha}-latrotoxin and neuroligins are well characterized.

Tetanus neurotoxin (TeNT) is the sole causal agent of the pathological condition known as tetanus (Umland et al., 1997Go). TeNT is a member of the clostridial neurotoxin family, the most potent toxins known. The botulinum toxin family is closely related to this. The extraordinary toxicity arises from two factors: the first is the critical importance of VAMP/synaptobrevin, the toxins' substrate to neuroexocytosis, and the second is the exquisite transport mechanism exploited by the toxin for delivery to its cytosolic target within the central nervous system. The receptor binding subunit of TeNT plays a dominant role in this delivery process, perhaps through its ability to bind carbohydrates. The structure of the neurotoxin complexed to galactose has been determined recently (Emsley et al., 2000Go). The characteristic structural features of this protein, such as the topology, concavity of the front sheet and the presence of the binding site loops, is remarkably similar to that of legume lectins. The crystal structures of several complexes determined by Emsley et al. indicate that tetanus toxin has multiple carbohydrate-binding sites, the site on the domain adopting the ß-trefoil fold being the best established. In addition, the ß-sandwich domain also has several residues (particularly Tyr909, Glu932, Asp1067 and Asn1069) situated on the concave surface of the front sheet that appear appropriate to bind a carbohydrate molecule, analogous to that in some of the other structures discussed here. Emsley et al. also discuss the possibility of subsite multivalency in these proteins similar to that in lectins. Whether both domains are involved in carbohydrate recognition and, if they are, whether they are involved together or separately, however, still remain to be explored.

The crystal structure of the activated 65 kDa lepidopteran-specific CryIA and CryIIIA toxins from Bacillus thuringiensis, belonging to a large protein of cry proteins, reveals a domain (domain III) containing a ß-sandwich structure made of two twisted antiparallel ß-sheets forming a face-to-face sandwich (Grochulski et al., 1995Go). These toxins, also known as insecticidal crystal proteins, are synthesized intracellularly as inactive prototoxins and, when activated in the gut juice, bind to high-affinity sites of the midgut epithelial cells. Analysis of the structural fold of this domain indicates it to be a variant of the jelly roll fold. The minimal determining region of the legume lectin-like fold appears intact in these proteins also, although it is significantly different from legume lectins because of differences in topology, a smaller number of strands in each sheet and loss of concavity on the front sheet and loss of equivalent binding site loops. The involvement of this domain in receptor recognition has been suggested (Grochulski et al., 1995Go), although its exact role remains to be identified.

Spermadhesins are a family of conserved proteins known to be important in gamete recognition during fertilization (Romero et al., 1997Go). Several members of this family have been studied of which the crystal structures of seminal plasma proteins 1 and 2 (PSP-I and PSP-II) (Varela et al., 1997Go) and the acidic seminal plasma protein (aSFP) (Romao et al., 1997Go) are available. Although all members of the spermadhesin family share 60–98% amino acid sequence, they are not functionally equivalent. For example, the porcine spermadhesin AWN and its equine homologue HSP-7 display carbohydrate-binding activity through which they bind tightly to the sperm head membrane, whereas the bovine aSFP does not show carbohydrate-binding activity but is thought to stimulate progesterone secretion by granulosa cells. All three structures, however, reveal a CUB domain architecture which is a variant of the jelly roll motif. The three proteins have been shown to superimpose well among themselves with r.m.s.ds below 1 Å (Romero et al., 1997Go). The putative carbohydrate-binding site in PSP-II is suggested to be located at a shallow groove on the protein surface, similar in position to that in legume lectins and galectins (Varela et al., 1997Go). This is despite the loss of concavity and shortening of loops in the spermadhesins.

Conclusions

The comparison of 15 different families described here suggests that their overall structures have some characteristic features common to all members. The topology is remarkably conserved in all the members, suggesting it to be a strict prerequisite for the fold. The variations that occur in spermadhesin and the insecticidal toxin do not change the nature of this region. The study also highlights the fact that the fold is compatible with different quaternary structures that are formed in different families or in different members within a family. The fold appears to tolerate a wide variation in the loops, especially those corresponding to those in the carbohydrate binding site in legume lectins. The predominant function of proteins of these 15 families appears to be carbohydrate binding through which they mediate higher order biological events, although in some cases the exact nature and specificity of the sugar are yet to be determined. Variations in the binding site and the loop lengths, observed in some cases, probably tailor the different proteins for differences in ligand specificities that are required to perform a wide range of functions such as those described here and surely many more yet to be determined. The comparative analysis presented here clearly identifies the minimal determining region in the fold and how it provides a common scaffold over which local structural variations can be rendered to achieve flexibility and adaptability required for recognizing diverse carbohydrates. Recognition of such scaffolds in every fold can be useful for automating structural classifications in the future, a need that will be increasingly on the rise as the structural genomics projects begin to make headway. More importantly, these scaffolds provide discrete templates for use in fold recognition of a new protein, another acute need that has arisen out of the sequencing of several genomes.


    Notes
 
3 To whom correspondence should be addressed. E-mail: mv{at}mbu.iisc.ernet.in Back


    Acknowledgments
 
We thank D.V.Nataraj for discussions and help in the early stages of this work and Gosu Ramachandriah for help in preparing the figures. Use of facilities at the Super Computer Education and Research Centre, the Interactive Graphics Based Molecular Modelling Facility and the Distributed Information Centre (both supported by DBT) are acknowledged. Financial assistance from DST is acknowledged.


    References
 Top
 Abstract
 Introduction
 Methods
 Results and discussion
 References
 
Banerjee,R., Mande,S.C., Ganesh,V., Das,K., Dhanaraj,V., Mahanta,S.K., Suguna,K., Surolia,A. and Vijayan,M. (1994) Proc. Natl Acad. Sci. USA, 91, 227–231.[Abstract]

Banerjee,R., Das,K., Ravishankar,R., Suguna,K., Surolia,A. and Vijayan,M. (1996) J. Mol. Biol., 259, 281–296.[ISI][Medline]

Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) Nucleic Acids Res., 28, 235–242.[Abstract/Free Full Text]

Bettler,E., Loris,R. and Imberty,A. (2001) 3D Lectin Data Bank (http://www.cermav.cnrs.fr/databank/lectine).

Bompard-Gilles,C., Rousseau,P., Rouge,P. and Payan,F. (1996) Structure, 4, 1441–1452.[ISI][Medline]

Bouckaert,J., Hamelryck,T., Wyns, L. and Loris,R. (1999) Curr. Opin. Struct. Biol., 9, 572–577.[ISI][Medline]

Bourne,Y., Bolgiano,B., Liao,D.I., Strecker,G., Cantau,P., Herzberg,O., Feizi,T. and Cambillau,C. (1994) Nature Struct. Biol., 1, 863–870.[ISI][Medline]

Chandra,N.R., Ramachandraiah,G., Bachhawat,K., Dam,T.K., Surolia,A. and Vijayan,M. (1999) J. Mol. Biol., 285, 1157–1168.[ISI][Medline]

Chelvanayagam,G., Heringa,J. and Argos,P. (1992) J. Mol. Biol., 228, 220–242.[ISI][Medline]

Connolly,M.L. (1993) J. Mol. Graphics, 11, 139–141.[ISI][Medline]

Crennel,S., Garman,E., Laver,G., Vimr,E. and Taylor,G. (1994) Structure, 2, 535–544.[ISI][Medline]

Divne,C., Stahlberg,J., Reinikainen,T., Ruohonen,L., Petterson,G., Knowles,J.K.C., Teeri,T.T. and Jones,T.A. (1994) Science, 265, 524–527.[ISI][Medline]

Emsley,J., White,H.E., O'Hara,B.P., Oliva,G., Srinivasan,N., Tickle,I.J., Blundell,T.L., Pepys,M.B. and Wood,S.P. (1994) Nature, 367, 338–345.[ISI][Medline]

Emsley,P., Fotinou,C., Black,N., Fairweather,F., Charles,I.G., Watts,C., Hewitt,E. and Isaacs,W. (2000) J. Biol. Chem., 275, 8889–8894.[Abstract/Free Full Text]

Gibrat,J-F., Madej,T. and Bryant,S.H. (1996) Curr. Opin. Struct. Biol., 6, 377–385.[ISI][Medline]

Grishkovskaya,I., Avvakumov,G.V., Skelnar,G., Dales, D., Hammond,G.L. and Muller,Y.A. (2000) EMBO J., 19, 504–512.[Abstract/Free Full Text]

Grochulski,P., Masson,L., Borisova,S., Pusztai-Carey,M., Schwartz,J-L., Brousseau,R. and Cygler,M. (1995) J. Mol. Biol., 254, 447–464.[ISI][Medline]

Hamelryck,T,W., Poortmans,F., Goossens,A., Angenon,G., Van Montagu,M., Wyns,L. and Loris,R. (1996) J. Biol. Chem., 271, 32796–32802.[Abstract/Free Full Text]

Hardman,K.D. and Ainsworth,C.F. (1972) Biochemistry, 11:4910–4919.[ISI][Medline]

Havukainen,R., Torronen,A., Laitinen,T. and Rouvinen,J. (1996) Biochemistry, 35, 9617–9624.[ISI][Medline]

Hohenester,E., Tisi,D., Talts,J.F. and Timpl,R. (1999) Mol. Cell., 4, 783–792.[ISI][Medline]

Holm,L. and Sander,C. (1995) Trends Biochem. Sci., 20, 478–480.[ISI][Medline]

Holm,L. and Sander,C. (1996) Science, 273, 595–602.[Abstract/Free Full Text]

Keitel,T., Simon,O., Borriss,R. and Heinemann,U. (1993) Proc. Natl Acad. Sci. USA, 90, 5287–5291.[Abstract]

Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946–950.[ISI]

Lis,H. and Sharon,N. (1998) Chem. Rev., 98, 637–674.[ISI][Medline]

Loris,R., Hamelryck,T., Bouckaert,J. and Wyns,L. (1998) Biochim. Biophys. Acta, 1383, 9–36.[ISI][Medline]

Manoj,N., Srinivas,V.R., Surolia,A., Vijayan,M. and Suguna,K. (2000) J. Mol. Biol., 302, 1129–1137.[ISI][Medline]

Mourey,L., Pedelacq,J.D., Birck,C., Fabre,C., Rouge,P. and Samama,J.P. (1998) J. Biol. Chem., 273, 12914–12922.[Abstract/Free Full Text]

Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536–540.[ISI][Medline]

Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 1093–1108.[ISI][Medline]

Prabu,M.M., Sankaranarayanan,R., Puri,K.D., Sharma,V., Surolia,A., Vijayan,M. and Suguna,K. (1998) J. Mol. Biol., 276, 787–796.[ISI][Medline]

Prabu,M.M., Suguna,K. and Vijayan,M. (1999) Proteins: Struct. Funct. Genet., 35,58–69[ISI][Medline]

Romao,M.J., Kolln,I., Dias,J.M., Carvalho,A.L., Romero,A., Varela,P.F., Sanz,L., Topfer-Petersen,E. and Calvete,J.J. (1997) J. Mol. Biol., 274, 650–660.[ISI][Medline]

Romero,A., Romao,M.J., Varela,P.F., Kolln,I., Dias,J.M., Carvalho,A.L., Sanz,L., Topfer-Petersen,E. and Calvete,J.J. (1997) Nature Struct Biol., 4, 783–788.[ISI][Medline]

Rudenko,G., Nguyen,T., Chelliah,Y., Sudhof,T.C. and Deisenhofer,J. (1999) Cell, 99, 93–101.[ISI][Medline]

Sharma,V. and Surolia,A. (1997) J. Mol. Biol., 267, 433–445.[ISI][Medline]

Shrive,A.K., Cheetham,G.M., Holden,D., Myles,D.A., Turnell,W.G., Volanakis,J.E., Pepys,M.B., Bloomer,A.C. and Greenhough,T.J. (1996) Nature Struct Biol., 3, 346–354.[ISI][Medline]

Srinivasan,N., Rufino,S.D., Pepys,M.B., Wood,S.P. and Blundell,T.L. (1996) Chemtracts – Biochem. Mol. Biol., 6, 149–164.

Swaminathan,G.J., Leonidas,D.D., Savage,M.P., Ackerman,S.J. and Acharya,K.R. (1999) Biochemistry, 38, 13837–13843.[ISI][Medline]

Umland,T.C., Wingert,L.M., Swaminathan,S., Furey,WF., Schmidt,J.J. and Sax,M. (1997) Nature Struct Biol., 4, 788–792.[ISI][Medline]

Varela,P.F., Romero,A., Sanz,L., Romao,M.J., Topfer-Petersen,E. and Calvete,J.J. (1997) J. Mol. Biol., 274, 635–649.[ISI][Medline]

Vijayan,M. and Chandra,N. (1999) Curr. Opin. Struct. Biol., 9, 707–714.[ISI][Medline]

Wood,T.M. (1989) In Coughlan,M.P. (ed.), Enzyme Systems for Lignocellulose Degradation. Elsevier, London, pp. 19–35.

Received March 7, 2001; revised July 11, 2001; accepted July 31, 2001.