School of Biochemistry and Molecular Biology, University of Leeds, Leeds LS2 9JT, UK
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: ß jelly rolls/protein evolution/sequence relationships/structural homology/superfolds
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The legume lectin fold is interesting in that it contains a ß jelly roll (Richardson, 1981) sub-topology. This folding topology is found in many protein structures and it has been identified as a superfold (Orengo et al., 1994
). The defining characteristic of a superfold is its appearance in many protein superfamilies with no detectable similarity in sequence and diverse functions. This is evidence that the fold might have multiple evolutionary origins, rather than a single common ancestor, and that many observed instances of the fold are the result of convergent evolution to a structure that is advantageous from a physico-chemical point of view. There are relatively few superfolds, but examples include the TIM barrel, the immunoglobulin fold, the
ß plait and the ß jelly roll.
The ß jelly roll consists of four Greek key motifs that adopt an eight-stranded ß sandwich structure (Richardson, 1981). The hydrogen-bonding pattern between adjacent strands is broken in two places and as a consequence the structure comprises two four-stranded ß-sheets. Both sheets are purely anti-parallel, with strands adjacent in sequence appearing in different sheets, with the exception of the fourth and fifth strands, which are in the same sheet. This leads to a structure with only one hairpin, all other ßß connections being arches.
In the context of structure classifications such as SCOP and CATH, superfolds are defined as folds (SCOP) or topologies (CATH) that contain several homologous superfamilies. The possible evolutionary origin of superfolds has seen interest in the literature recently following the publication by Copley and Bork (Copley and Bork, 2000) of data from sensitive sequence comparison using PSI-BLAST (Altschul et al., 1997
) providing statistically reliable evidence indicating that at least 12 of 23 SCOP TIM barrel superfamilies might share a common origin.
Motivated by the results of Copley and Bork, we used sensitive sequence comparison methods, including PSI-BLAST and Hidden Markov Models (HMMs), to search for evidence of distant sequence relationships between SCOP superfamilies containing the jelly roll topology. This study included, by definition, a thorough sequence level analysis of all the legume lectin-like proteins studied by Chandra and co-workers. Indeed, the existence of these proteins sharing the related functions of carbohydrate binding or catalysis involving carbohydrates was one of the reasons why we believed that hitherto unappreciated evolutionary relationships between jelly roll containing SCOP superfamilies might exist.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The ß jelly roll is recognized as a superfold by CATH, where most examples can be found under jelly roll topology level (2.60.120). In SCOP (version 1.53; which was used throughout this work), there is no unique fold level classification for ß jelly rolls, but under the all-beta protein class, folds 9, 12, 13, 17, 18, 21, 22 and 80 are annotated as containing a jelly roll topology. TOPS (Westhead et al., 1998; Gilbert et al., 1999
) has two ß jelly roll predefined patterns that can be used to search the structural database for jelly roll sub-topologies. In SCOP, 13 of the 15 families studied by Chandra and co-workers were classified as belonging to the concanavalin A (Con A)-like lectins/glucanases superfamily. This is described as a sandwich with 1214 strands in two sheets with complex topology. In CATH this fold is classified as having jelly roll topology. The other two families containing the legume lectin-like fold analysed by Chandra and co-workers belonged to the galactose-binding domain-like superfamily and spermadhesin CUB domain superfamily. These two superfamilies are described as having jelly roll topology in both SCOP and CATH. The initial dataset used here contained all the proteins with ß jelly roll structure from CATH, SCOP and TOPS. This dataset contains 920 structures but is highly redundant. Redundancy was removed at the 90% sequence identity level, resulting in a dataset of 182 structures. The structures were divided into 20 different SCOP superfamilies. Our aim was to seek potential evolutionary relationships between these superfamilies.
The method employed was to carry out sequence similarity searches using sequences from each superfamily in turn as input to the PSI-BLAST and HMM search methods. It is generally accepted that evolutionary relationships are transitive (Park et al., 1997), so that if sequence similarity searches from any pair of superfamilies identify similarity to an identical intermediate sequence in the same domain or residue range, a relationship between the superfamilies can be inferred. We refer to the detection of an intermediate sequence of this nature in two superfamilies as cross detection.
PSI-BLAST-based search methods
For each ß jelly roll superfamily a PSI-BLAST analysis was performed. Every sequence in each superfamily was used to initiate a PSI-BLAST search. The maximum number of iterations was restricted to 20 and the E-value threshold for including sequences in the score matrix model was the default value (0.001). The databases PDB, SwissProt and TrEMBL were searched.
HMM-based search methods
Structure is more highly conserved than sequence during evolution so the alignments used to build the HMMs were structure-based rather than sequence alignments. The software package STAMP 4.2 (Russell and Barton, 1992) was used to produce the structural alignments. The STAMP SCAN method was used. This requires a domain with which to scan the other domains to be superimposed. To select this domain for each superfamily the domain with the highest average sequence identity (i.e. the sequence most similar to all the others) was chosen. The SCAN method was chosen because it works particularly well with structures that are very diverse. All superfamilies containing two or more structures were aligned using STAMP; this is a total of 15. Two superfamilies, the Viral coat and capsid and Con A-like lectins/glucanases, were too diverse to align using STAMP and so were split into four sub-groups. The superfamilies were split by constructing a cladogram and aligning the obvious clades. The structural alignments from STAMP were used to build HMMs using HMMER 2.1.1 software (Durbin et al., 1998
; Eddy, 1996
, 1998
). The resulting HMMs were used to search the PDB, SWISSPROT and TrEMBL databases.
![]() |
Results and discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Table I shows different ß jelly roll superfamilies found in SCOP, CATH and TOPS using the SCOP classification. Additional columns show the number of non-redundant structures, average percentage sequence identity for each superfamily and the r.m.s.d. of the structures when aligned using STAMP. The legume lectin-like superfamilies studied by Chandra and co-workers are indicated. From Table I
, it can be seen that the jelly rolls are a very diverse set of proteins with superfamilies that have a range of very different functions. Also two-thirds of superfamilies have an average percentage identity of less than 30% (the twilight zone, the level at which current search methods can detect relationships), indicating substantial diversity even within superfamilies.
|
There were six occurrences of apparent cross detections between superfamilies. Two cases were ignored owing to extremely high E-values (above 1.0) and four were investigated. There appeared to be cross detections between the spermadhesin CUB domain, galactose-binding domain-like and Con A-like lectins/glucanases superfamilies. However, on closer inspection it became clear that the superfamilies are detecting different domains within the same sequence. To confirm this, the PSI-BLAST results were analysed. These revealed that the superfamily sequences are aligning to different regions of the cross-detected hit and this is not evidence therefore of an evolutionary relationship.
Analysis of cross detection between RmlC-like and phosphomannose isomerase superfamilies
Unlike the cross detections between the spermadhesin CUB domain, galactose-binding domain-like and con A-like lectins/glucanases superfamilies, these two superfamilies are classified in the same fold group. This may indicate that this result is more likely to be a true cross detection. Analysis of the PSI-BLAST results show that they are detecting the same domain in the cross-detected hit. Figure 1 shows the sequence alignments of 1cavB (rmlC-like superfamily) and 1pmi (phosphomannose isomerase superfamily) to the cross-detected hit (trEMBL Accession No.: Q41674). The E-value for rmlC-like superfamily detecting Q41674 is 7.8x10-107 and for phosphomannose superfamily 0.28. Although the phosphomannose isomerase E-value is not low enough to be absolutely conclusive, it is objective evidence of a possible distant evolutionary relationship.
|
Conclusion
We have added to the work of Chandra and co-workers a detailed and sensitive sequence analysis of jelly roll containing folds and have investigated in detail possible evolutionary relationships between them. Since the seminal work of Copley and Bork, which revealed relationships between the TIM barrel superfamilies, it has been important to address the question of whether similar relationships between superfamilies within other superfolds might exist. This work has provided a negative answer for the jelly roll superfold, except in the case described, but nevertheless some functional similarities between the families investigated by Chandra and co-workers suggest that evolutionary relationships may be present but not detectable by current methods at the level of sequence. We expect that study of the evolution of this fascinating superfold and others will remain of interest for years to come.
![]() |
Notes |
---|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) Nucleic Acids Res., 28, 235242.
Bettler,E., Loris,R. and Imberty,A. (2001) 3D Lectin Data Bank (http://www.cermav.cnrs.fr/databank/lectine).
Chandra,N.R., Prabu,M.M., Suguna,K. and Vijayan,M. (2001) Protein Eng., 14, 857866.
Copley,R.R. and Bork,P. (2000) J. Mol. Biol., 303, 627640.[CrossRef][ISI][Medline]
Durbin,R., Eddy,S.R., Krogh,A. and Mitchison,G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (http://hmmer.wustl.edu/).
Eddy,S.R. (1996) Curr. Opin. Struct. Biol., 6, 361365.[CrossRef][ISI][Medline]
Eddy,S.R. (1998) Bioinformatics, 14, 755763.[Abstract]
Gibrat,J.-F., Madej,T. and Bryant,S.H. (1996) Curr. Opin. Struct. Biol., 6, 377385.[CrossRef][ISI][Medline]
Gilbert,D.R., Westhead,D.R., Nagano,N. and Thornton,J.M. (1999) Bioinformatics, 15, 317326.
Holm,L. and Sander,C. (1995) Trends Biochem. Sci., 20, 478480.[CrossRef][ISI][Medline]
Holm,L. and Sander,C. (1996) Science, 273, 595602.
Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536540.[CrossRef][ISI][Medline]
Orengo,C.A., Jones,D.T. and Thornton,J.M. (1994) Nature, 37, 631634.
Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 10931108.[ISI][Medline]
Park,J., Teichmann,S.A., Hubbard,T. and Chothia,C. (1997) J. Mol. Biol., 273, 349354.[CrossRef][ISI][Medline]
Richardson,J.S. (1981) Adv. Protein Chem., 34, 167339.[Medline]
Russell,R.B. and Barton,G.J (1992) Proteins: Struct. Funct. Genet., 14, 309323.[ISI][Medline]
Vijayan,M. and Chandra,N. (1999) Curr. Opin. Struct. Biol., 9, 707714.[CrossRef][ISI][Medline]
Westhead,D.R., Hatton,D.C. and Thornton,J.M. (1998) Trends Biochem Sci., 23, 3536.[CrossRef][ISI][Medline]
Received March 5, 2002; revised July 12, 2002; accepted July 24, 2002.