Sequence relationships in the legume lectin fold and other jelly rolls

A. Williams and D.R. Westhead1

School of Biochemistry and Molecular Biology, University of Leeds, Leeds LS2 9JT, UK


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results and discussion
 References
 
Distant sequence relationships in proteins containing the ß jelly-roll fold were investigated using sensitive sequence comparison methods, including PSI-BLAST and Hidden Markov Models. A relationship was identified between the rmlC-like and phosphomannose isomerase SCOP (version 1.53) superfamilies, which were merged in the most recent SCOP release. No other distant sequence relationships linking jelly roll superfamilies were found.

Keywords: ß jelly rolls/protein evolution/sequence relationships/structural homology/superfolds


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results and discussion
 References
 
Recently, in this journal, Chandra and co-workers reported a detailed structural analysis of the legume lectin fold (Chandra et al., 2001Go). This fold is adopted by a number of protein families between which there is little or no sequence similarity. Functionally the proteins are carbohydrate-binding proteins that specifically recognize diverse sugar structures and mediate a variety of biological processes such as cell–cell and host–pathogen interactions, serum glycoprotein turnover and innate immune responses (Vijayan and Chandra, 1999Go). A comparative analysis of 15 different families containing this fold was carried out, which led to the determination of the minimal structural principles or the determining region of the fold. Proteins identified as containing the legume lectin-like fold were identified using the structural databases SCOP (Murzin et al., 1995Go), FSSP (Holm and Sander, 1996Go), CATH (Orengo et al., 1997Go) and the 3D lectin database (Bettler et al., 2001Go). Further structural homologues in the Protein Data Bank (PDB) (Berman et al., 2000Go) were found using two separate structural comparison algorithms: DALI (Holm and Sander, 1995Go) and VAST (Gibrat et al., 1996Go). Over 300 structures were manually inspected. A critical evaluation of the structural features, such as curvature of the front sheet, presence of hydrophobic cores and binding site loops, suggested that none of them are crucial for either the formation or stability of the fold, but are required to generate diversity and specificity to particular carbohydrates. In contrast, the presence of the three sheets in a particular geometry and also their topological connectivities are defining features of the fold.

The legume lectin fold is interesting in that it contains a ß jelly roll (Richardson, 1981Go) sub-topology. This folding topology is found in many protein structures and it has been identified as a superfold (Orengo et al., 1994Go). The defining characteristic of a superfold is its appearance in many protein superfamilies with no detectable similarity in sequence and diverse functions. This is evidence that the fold might have multiple evolutionary origins, rather than a single common ancestor, and that many observed instances of the fold are the result of convergent evolution to a structure that is advantageous from a physico-chemical point of view. There are relatively few superfolds, but examples include the TIM barrel, the immunoglobulin fold, the {alpha}ß plait and the ß jelly roll.

The ß jelly roll consists of four Greek key motifs that adopt an eight-stranded ß sandwich structure (Richardson, 1981Go). The hydrogen-bonding pattern between adjacent strands is broken in two places and as a consequence the structure comprises two four-stranded ß-sheets. Both sheets are purely anti-parallel, with strands adjacent in sequence appearing in different sheets, with the exception of the fourth and fifth strands, which are in the same sheet. This leads to a structure with only one hairpin, all other ß–ß connections being arches.

In the context of structure classifications such as SCOP and CATH, superfolds are defined as folds (SCOP) or topologies (CATH) that contain several homologous superfamilies. The possible evolutionary origin of superfolds has seen interest in the literature recently following the publication by Copley and Bork (Copley and Bork, 2000Go) of data from sensitive sequence comparison using PSI-BLAST (Altschul et al., 1997Go) providing statistically reliable evidence indicating that at least 12 of 23 SCOP TIM barrel superfamilies might share a common origin.

Motivated by the results of Copley and Bork, we used sensitive sequence comparison methods, including PSI-BLAST and Hidden Markov Models (HMMs), to search for evidence of distant sequence relationships between SCOP superfamilies containing the jelly roll topology. This study included, by definition, a thorough sequence level analysis of all the legume lectin-like proteins studied by Chandra and co-workers. Indeed, the existence of these proteins sharing the related functions of carbohydrate binding or catalysis involving carbohydrates was one of the reasons why we believed that hitherto unappreciated evolutionary relationships between jelly roll containing SCOP superfamilies might exist.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results and discussion
 References
 
Construction of the dataset

The ß jelly roll is recognized as a superfold by CATH, where most examples can be found under jelly roll topology level (2.60.120). In SCOP (version 1.53; which was used throughout this work), there is no unique fold level classification for ß jelly rolls, but under the all-beta protein class, folds 9, 12, 13, 17, 18, 21, 22 and 80 are annotated as containing a jelly roll topology. TOPS (Westhead et al., 1998Go; Gilbert et al., 1999Go) has two ß jelly roll predefined patterns that can be used to search the structural database for jelly roll sub-topologies. In SCOP, 13 of the 15 families studied by Chandra and co-workers were classified as belonging to the concanavalin A (Con A)-like lectins/glucanases superfamily. This is described as a sandwich with 12–14 strands in two sheets with complex topology. In CATH this fold is classified as having jelly roll topology. The other two families containing the legume lectin-like fold analysed by Chandra and co-workers belonged to the galactose-binding domain-like superfamily and spermadhesin CUB domain superfamily. These two superfamilies are described as having jelly roll topology in both SCOP and CATH. The initial dataset used here contained all the proteins with ß jelly roll structure from CATH, SCOP and TOPS. This dataset contains 920 structures but is highly redundant. Redundancy was removed at the 90% sequence identity level, resulting in a dataset of 182 structures. The structures were divided into 20 different SCOP superfamilies. Our aim was to seek potential evolutionary relationships between these superfamilies.

The method employed was to carry out sequence similarity searches using sequences from each superfamily in turn as input to the PSI-BLAST and HMM search methods. It is generally accepted that evolutionary relationships are transitive (Park et al., 1997Go), so that if sequence similarity searches from any pair of superfamilies identify similarity to an identical intermediate sequence in the same domain or residue range, a relationship between the superfamilies can be inferred. We refer to the detection of an intermediate sequence of this nature in two superfamilies as ‘cross detection’.

PSI-BLAST-based search methods

For each ß jelly roll superfamily a PSI-BLAST analysis was performed. Every sequence in each superfamily was used to initiate a PSI-BLAST search. The maximum number of iterations was restricted to 20 and the E-value threshold for including sequences in the score matrix model was the default value (0.001). The databases PDB, SwissProt and TrEMBL were searched.

HMM-based search methods

Structure is more highly conserved than sequence during evolution so the alignments used to build the HMMs were structure-based rather than sequence alignments. The software package STAMP 4.2 (Russell and Barton, 1992Go) was used to produce the structural alignments. The STAMP SCAN method was used. This requires a domain with which to scan the other domains to be superimposed. To select this domain for each superfamily the domain with the highest average sequence identity (i.e. the sequence most similar to all the others) was chosen. The SCAN method was chosen because it works particularly well with structures that are very diverse. All superfamilies containing two or more structures were aligned using STAMP; this is a total of 15. Two superfamilies, the Viral coat and capsid and Con A-like lectins/glucanases, were too diverse to align using STAMP and so were split into four sub-groups. The superfamilies were split by constructing a cladogram and aligning the obvious clades. The structural alignments from STAMP were used to build HMMs using HMMER 2.1.1 software (Durbin et al., 1998Go; Eddy, 1996Go, 1998Go). The resulting HMMs were used to search the PDB, SWISSPROT and TrEMBL databases.


    Results and discussion
 Top
 Abstract
 Introduction
 Methods
 Results and discussion
 References
 
General

Table IGo shows different ß jelly roll superfamilies found in SCOP, CATH and TOPS using the SCOP classification. Additional columns show the number of non-redundant structures, average percentage sequence identity for each superfamily and the r.m.s.d. of the structures when aligned using STAMP. The legume lectin-like superfamilies studied by Chandra and co-workers are indicated. From Table IGo, it can be seen that the jelly rolls are a very diverse set of proteins with superfamilies that have a range of very different functions. Also two-thirds of superfamilies have an average percentage identity of less than 30% (the ‘twilight zone’, the level at which current search methods can detect relationships), indicating substantial diversity even within superfamilies.


View this table:
[in this window]
[in a new window]
 
Table I. SCOP superfamily classification of the different beta jelly roll folds
 
Search results

There were six occurrences of apparent cross detections between superfamilies. Two cases were ignored owing to extremely high E-values (above 1.0) and four were investigated. There appeared to be cross detections between the spermadhesin CUB domain, galactose-binding domain-like and Con A-like lectins/glucanases superfamilies. However, on closer inspection it became clear that the superfamilies are detecting different domains within the same sequence. To confirm this, the PSI-BLAST results were analysed. These revealed that the superfamily sequences are aligning to different regions of the cross-detected hit and this is not evidence therefore of an evolutionary relationship.

Analysis of cross detection between RmlC-like and phosphomannose isomerase superfamilies

Unlike the cross detections between the spermadhesin CUB domain, galactose-binding domain-like and con A-like lectins/glucanases superfamilies, these two superfamilies are classified in the same fold group. This may indicate that this result is more likely to be a true cross detection. Analysis of the PSI-BLAST results show that they are detecting the same domain in the cross-detected hit. Figure 1Go shows the sequence alignments of 1cavB (rmlC-like superfamily) and 1pmi (phosphomannose isomerase superfamily) to the cross-detected hit (trEMBL Accession No.: Q41674). The E-value for rmlC-like superfamily detecting Q41674 is 7.8x10-107 and for phosphomannose superfamily 0.28. Although the phosphomannose isomerase E-value is not low enough to be absolutely conclusive, it is objective evidence of a possible distant evolutionary relationship.



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 1. PSI-BLAST alignments showing the cross-detected hit between the phosphomannose isomerase (1pmi) and rmlC-like (1cavB) superfamilies and the intermediate sequence Q41674 (convicilin precursor protein).

 
The SCOP version used in this study was 1.53. From version 1.55 onwards the phosphomannose isomerase and rmlC-like superfamilies have been merged. Thus our finding of a weak cross detection offers independent and objective confirmation of a change recently introduced in the SCOP database view of evolutionary relationships between the jelly rolls.

Conclusion

We have added to the work of Chandra and co-workers a detailed and sensitive sequence analysis of jelly roll containing folds and have investigated in detail possible evolutionary relationships between them. Since the seminal work of Copley and Bork, which revealed relationships between the TIM barrel superfamilies, it has been important to address the question of whether similar relationships between superfamilies within other superfolds might exist. This work has provided a negative answer for the jelly roll superfold, except in the case described, but nevertheless some functional similarities between the families investigated by Chandra and co-workers suggest that evolutionary relationships may be present but not detectable by current methods at the level of sequence. We expect that study of the evolution of this fascinating superfold and others will remain of interest for years to come.


    Notes
 
1 To whom correspondence should be addressed. E-mail: westhead{at}bmb.leeds.ac.uk Back


    Acknowledgments
 
We thank the BBSRC for sponsorship.


    References
 Top
 Abstract
 Introduction
 Methods
 Results and discussion
 References
 
Altschul,S.F.L.M.T., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) Nucleic Acids Res., 28, 235–242.[Abstract/Free Full Text]

Bettler,E., Loris,R. and Imberty,A. (2001) 3D Lectin Data Bank (http://www.cermav.cnrs.fr/databank/lectine).

Chandra,N.R., Prabu,M.M., Suguna,K. and Vijayan,M. (2001) Protein Eng., 14, 857–866.[Abstract/Free Full Text]

Copley,R.R. and Bork,P. (2000) J. Mol. Biol., 303, 627–640.[CrossRef][ISI][Medline]

Durbin,R., Eddy,S.R., Krogh,A. and Mitchison,G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (http://hmmer.wustl.edu/).

Eddy,S.R. (1996) Curr. Opin. Struct. Biol., 6, 361–365.[CrossRef][ISI][Medline]

Eddy,S.R. (1998) Bioinformatics, 14, 755–763.[Abstract]

Gibrat,J.-F., Madej,T. and Bryant,S.H. (1996) Curr. Opin. Struct. Biol., 6, 377–385.[CrossRef][ISI][Medline]

Gilbert,D.R., Westhead,D.R., Nagano,N. and Thornton,J.M. (1999) Bioinformatics, 15, 317–326.[Abstract/Free Full Text]

Holm,L. and Sander,C. (1995) Trends Biochem. Sci., 20, 478–480.[CrossRef][ISI][Medline]

Holm,L. and Sander,C. (1996) Science, 273, 595–602.[Abstract/Free Full Text]

Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536–540.[CrossRef][ISI][Medline]

Orengo,C.A., Jones,D.T. and Thornton,J.M. (1994) Nature, 37, 631–634.

Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 1093–1108.[ISI][Medline]

Park,J., Teichmann,S.A., Hubbard,T. and Chothia,C. (1997) J. Mol. Biol., 273, 349–354.[CrossRef][ISI][Medline]

Richardson,J.S. (1981) Adv. Protein Chem., 34, 167–339.[Medline]

Russell,R.B. and Barton,G.J (1992) Proteins: Struct. Funct. Genet., 14, 309–323.[ISI][Medline]

Vijayan,M. and Chandra,N. (1999) Curr. Opin. Struct. Biol., 9, 707–714.[CrossRef][ISI][Medline]

Westhead,D.R., Hatton,D.C. and Thornton,J.M. (1998) Trends Biochem Sci., 23, 35–36.[CrossRef][ISI][Medline]

Received March 5, 2002; revised July 12, 2002; accepted July 24, 2002.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Williams, A.
Articles by Westhead, D.R.
PubMed
PubMed Citation
Articles by Williams, A.
Articles by Westhead, D.R.