Identification of conserved residue patterns in small ß-barrel proteins

Rohini Qamra, Bhupesh Taneja1 and Shekhar C. Mande2

Centre for DNA Fingerprinting and Diagnostics, ECIL Road, Nacharam, Hyderabad 500 076, India


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Our abilities to predict three-dimensional conformation of a polypeptide, given its amino acid sequence, remain limited despite advances in structure analysis. Analysis of structures and sequences of protein families with similar secondary structural elements, but varying topologies, might help in addressing this problem. We have studied the small ß-barrel class of proteins characterized by four strands (n = 4) and a shear number of 8 (S = 8) to understand the principles of barrel formation. Multiple alignments of the various protein sequences were generated for the analysis. Positional entropy, as a measure of residue conservation, indicated conservation of non-polar residues at the core positions. The presence of a type II ß-turn among the various barrel proteins considered was another strikingly invariant feature. A conserved glycyl-aspartyl dipeptide at the ß-turn appeared to be important in guiding the protein sequence into the barrel fold. Molecular dynamics simulations of the type II ß-turn peptide suggested that aspartate is a key residue in the folding of the protein sequence into the barrel. Our study suggests that the conserved type II ß-turn and the non-polar residues in the barrel core are crucial for the folding of the protein’s primary sequence into the ß-barrel conformation.

Keywords: ß-barrel/molecular dynamics/protein folding/SH3/type II ß-turn


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The landmark work of Anfinsen indicated for the first time that the primary structure of a protein dictates its tertiary structure (Anfinsen, 1973Go). In a sequential protein folding model the primary structure of a protein initially yields {alpha}-helices, ß-sheets and turns, the predominant secondary structural elements in proteins. By arranging these simple elements in precise patterns, complex protein structures assemble to achieve the diversity of protein functions. A major goal in understanding how the amino acid sequence of a protein specifies its structure is to understand how these elements of secondary structure are organized onto a tertiary scaffold. This requires learning how properties of individual amino acids are exploited in guiding an amino acid sequence into a particular fold. Much progress has been made in the last decade towards understanding the relationship between a protein’s sequence and structure, yet the protein folding problem remains a captivating puzzle.

Researchers have learnt several rules governing the formation of helices and turns. However, the principles behind ß-sheet formation are much less understood (Serrano, 2000Go). It is therefore especially intriguing to speculate how ß-sheet proteins, having complex topologies and involving numerous contacts between residues distant in sequence, acquire their native structure. Several features of ß-sheet proteins have been suggested to be important for efficient folding and stability. The overall hydrophobic and polar pattern of amino acids may be a dominant driving force for defining a protein’s topology (Eisenberg et al., 1984Go; Bowie et al., 1990Go; Kamtekar et al., 1993Go). Recognition between amino acid side chains on neighboring ß-strands may guide a correct strand register and hence stabilize the resulting ß-sheets (Merkel et al., 1999Go; Mandel-Gutfreund et al., 2001Go). Another possibility is the formation of turns at critical locations in the protein structure. Turns may be particularly important for anti-parallel sheet formation and hence defining the protein topology. Supporting this hypothesis, recent studies indicate that the residues in the distal loop in the SH3 domain are important for nucleation of protein folding (Martinez and Serrano, 1999Go; Riddle et al., 1999Go). Different combinations of these possible interactions are the most likely determinant of ß-sheet topology and hence protein stability.

In the course of evolution, three-dimensional structures of proteins are conserved to a greater degree than their sequences, which determine their structure. Residue substitutions, which tend to destabilize a particular site, would probably be compensated by other substitutions that confer greater stability on the structure. For example, if volume conservation were important to structure and function, a substitution involving a reduction of volume in the protein core might result in a destabilizing pocket in the core. In this case, it might become necessary to substitute another residue at a position distant in the sequence but near in space. This second substitution should then have a larger side chain in order to conserve the overall volume of the core and therefore the overall folded structure. Thus, if structural compensation is a general phenomenon, neighbouring sites in the three-dimensional structure will tend to evolve in a correlated fashion owing to the compensation process. In the past decade there has been a great deal of progress in the development of methods for predicting interactions in protein structures by analysis of correlated changes in sequence evolution (Altschuh et al., 1988Go; Shindyalov et al., 1994Go; Pollock and Taylor, 1997Go).

In this study, we have undertaken a comprehensive analysis of the sequence and structural variation seen in the small ß-barrel proteins. A ß-barrel is essentially identified by two geometric characteristics: the number of ß-strands in the barrel (n) and the number of ß-bridge staggers across the ß-sheet (the shear number, S) (Murzin et al., 1994Go). Within the all-ß protein class in the Structural Classification of Proteins (SCOP) database there exist five folds which can be grouped together as small ß-barrels (Murzin et al., 1995Go). These barrels are characterized by the presence of four ß-strands (n = 4) and a shear number of 8 (S = 8) (Murzin et al., 1994Go). Although the five folds have similar secondary structure composition, each has a distinct topology. The goal of this study was to identify conserved features across these ß-barrel folds, which may also be important in the initial steps of the folding pathway and in guiding the protein’s primary sequence into a ß-barrel with the specific topology. In the work described here, we constructed and analyzed multiple sequence alignments for protein sequences in each of these five barrel folds. We also aligned structures of the different proteins, within and across the folds. In order to determine certain structural features common to the barrel folds at both the sequence and structural level, we studied the conservation and covariation in the SH3-like barrel, GroES-like and the PDZ domain-like folds. Molecular dynamics (MD) simulations on a GroES peptide, derived from a conserved ß-turn, were also carried out in order to address its role as a possible nucleation site in the folding pathway. By combining sequence and structural analysis it was possible to interpret the pattern of conservation seen in the three protein folds.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The SCOP database classifies all-ß proteins into 93 folds according to their topology and evolutionary relationships (Murzin et al., 1995Go). Each fold is divided into superfamilies which are further classified into different families, that is, a group consisting of proteins with residue identities of 30% and greater or those having similar structure and function. The five folds include the SH3-like barrel, GroES-like, PDZ domain-like, N-terminal domains of the minor coat protein g3p and the Sm motif of the small nuclear ribonucleoproteins, SNRNP. A total of 42 different protein structures are grouped together in the SH3-like barrel fold. In the GroES-like and PDZ domain-like folds a total of nine and eight different structures have been solved, respectively. Only one structure each has been solved for both the N-terminal domains of the minor coat protein, g3p fold and the Sm motif of small nuclear ribonucleoproteins, SNRNP fold. A representative structure from each family within a fold was considered for the sequence and structure comparisons carried out in this study. Table IGo lists the initial target sequences considered for initiating the analysis.


View this table:
[in this window]
[in a new window]
 
Table I. ß-Barrel proteins considered for sequence and structure comparisons
 
Structure comparison

A total of 20 structures were considered for structural comparison of proteins (Table IGo). Of these, 13 belonged to the SH3-like barrel fold, three to the GroES-like barrel and two to the PDZ domain-like barrel fold. One structure was selected for each of the N-terminal domains of the minor coat protein g3p and the Sm motif of small nuclear ribonucleoproteins, SNRNP fold. Coordinates for each of these proteins were retrieved from the PDB (Bernstein et al., 1977Go). Superimpositions were done among the structures within the same and also across the various ß-barrel folds. For inter-fold superimpositions, one representative from each fold was taken. The representative structure corresponded to any one protein in a fold for which complete sequence analysis was done as per the criteria of more than 30 sequences in the multiple sequence alignment (addressed later). Hence, the structures chosen for the inter-fold comparison were the {alpha}-spectrin SH3 domain protein, 1shg; the Escherichia coli GroES, 1aon and the rat neuronal nitric oxide synthase, 1qav from the SH3-like barrel, GroES-like and the PDZ domain-like fold, respectively. The structures were superposed by visualization, followed by least-squares fitting using the lsq commands of O (Jones et al., 1991Go).

Residue conservation

To measure the level of conservation at each position in the alignment, the frequency of occurrence of an amino acid at each position was determined. This was achieved by the calculation of the positional entropy at each position in the alignments obtained. A positional entropy of n is equivalent to the diversity of n residues occurring at the position with a frequency of 1/n. A position that is completely conserved will thus have a positional entropy of 1. For position i, with residues r = (A, C, D, ..., V, W, Y) occurring at frequencies pi(r), the entropy H(i) is defined as

This entropy is known as the Shannon informational entropy (Shenkin et al., 1991Go).

The positional entropy is expressed as

Volume correlation

The correlation coefficient at each residue position in the alignment was calculated as a measure of covariation in the volumes of the side chains. The side-chain volumes were taken fromHarpaz (Harpaz et al. 1994Go). A pairwise correlation coefficient, r(x,y) determined the correlation between two residue positions and was expressed as

where r = correlation coefficient, n = number of sequences, x = volume at residue position i and y = volume at residue position j.

Sequence alignment

A total of 20 initial target sequences corresponding to the representative protein in each family were considered for the analysis (Table IGo). The chosen target sequence was used for a BLAST search (E < 0.001) of the non-redundant database compilation (Altschul et al., 1997Go). Homologous sequences were retrieved and the stretch of residues, aligned to the initial target domain, extracted from each protein sequence. Two or more domains within a protein sequence were considered as separate sequences. Thus, a sequence having two domains was split into two, each corresponding to a different domain, within the protein. These sequences were aligned using the ClustalW program (Thomson et al., 1994Go). Once the initial alignment was constructed, sequences with ClustalW score of >90 were removed in order to remove any bias in the sequence analysis due to high degree of similarity. To avoid artifactual results arising out of inaccurate sequence alignments, sequences with a score of <25 were also removed. The remaining sequences were realigned such that, in the final alignment, no two sequences had a score of <25 or >90. Families where, after the editing, the number of sequences in the alignment was <30 were not considered for further analysis.

Only five of a total of 20 families in the n = 4, S = 8 ß-barrel protein folds fulfilled the criterion of >30 sequences in the multiple alignment. These included the SH3 domain and the C-terminal domain of ribosomal protein L2 in the SH3-like barrel fold, GroES and alcohol dehydrogenase-like, N-terminal domain in the GroES-like fold and PDZ-domain in the PDZ domain-like fold. Sequence alignment data corresponding to these families were considered for statistical analysis.

Molecular dynamics simulations

MD simulations for a small peptide of the E.coli GroES were performed using the Discover module in the InsightII molecular modelling package (MSI/Biosys, San Diego, CA, 1997). The simulations were performed with a cubic periodic boundary condition (box dimensions 25x25x25) and consisted of the peptide solvated with water molecules. The effective water density in the solvation box was 0.96 g/cm3. All atoms were considered explicitly and their interactions were computed using the CVFF force field. The time step in the MD simulations was 1 fs. All simulations began with 100 iterations of the energy minimizations of the peptide to relax the local forces. Subsequently, MD simulations were performed at 300 K for 500 ps. A seven-residue peptide with the original conformation as in the protein with an intact type II turn was the starting structure. Simulations were performed for the wild-type sequence of the peptide and also on two other peptides. In one of these, the aspartate was mutated to asparagine and in the second the aspartate was mutated to alanine. The native-like side chain–main chain hydrogen bond was retained in the aspartate to asparagine mutant.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The study involved comparison of sequence and structure data for the different four-stranded ß-barrel folds. According to the number of strands forming a compact globular structure, these constitute the smallest barrels known. The difference among these different folds essentially lies in the manner in which the four ß-strands are connected, thereby generating a unique topology (Figure 1Go). In each topological class multiple sequence alignments were generated for the sequence analysis and structures of proteins within a ß-barrel fold superimposed on one another.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 1. Topologies of the ß-barrel folds characterized by n= 4 and S= 8 according to the SCOP classification. The folds included in this category are (a) SH3-like barrel, (b) GroES-like, (c) PDZ domain-like, (d) N-terminal domains of the minor coat protein, g3p and (e) Sm motif of small nuclear ribonucleoproteins, SNRNP. The Sm motif of small nuclear ribonucleoproteins, SNRNP fold has the same topology as the SH3-like barrel. ß-Strands are indicated by arrows. The boxes indicate the helices in the different folds.

 
Comparisons within the SH3-like barrel fold

Superpositions were done among 13 structures in the SH3-like barrel fold (Table IGo). These structures superposed well on one another with a maximum r.m.s. deviation of 2.24 Å for 29 atoms between the DNA binding domain of HIV-I integrase, 1ex4 and the diphtheria toxin repressor, 2dtr (Table IIaGo). The minimum r.m.s. deviation of 1.23 Å for 38 atoms was seen between the {alpha}-spectrin SH3 domain protein, 1shg and the CcdB protein, 2vub.


View this table:
[in this window]
[in a new window]
 
Table II. Pair-wise r.m.s. deviations (Å) when superimposing structurally equivalent C{alpha} positions within a ß-barrel fold. The numbers in parentheses represent the equivalent residues considered during the superpositions (a) SH3-like barrel fold
 
Of interest is the region at the type II ß-turn of SH3-like barrel fold proteins. The turn, referred to as the diverging turn in the SH3 domain (Yi et al., 1998Go), is present in the loop connecting strands 1 and 2 of the SH3-like barrel fold. Intriguingly, the occurrence of the type II turn in the SH3-like barrel fold proteins seems to be related to the length of the loop preceding the diverging turn. Of the 13 different SH3-like barrel fold structures studied (Table IGo), the type II turn was present in six. None of the remaining seven structures contained the type II turn. A common feature among these seven structures, lacking the type II turn, was the presence of a short loop between strands one and two of the barrel. The presence of a short intervening loop probably reduces the likelihood of the polypeptide chain from deviating from the folded barrel structure. Proteins containing the type II turn included the {alpha}-spectrin SH3 domain, 1shg; photosystem I accessory protein, 1psf; diphtheria toxin repressor, 2dtr; nitrile hydratase ß-chain, 2ahj; the ribosomal protein L24, 1ffk and ferredoxin thioredoxin reductase, 1dj7. A stretch of >11 residues in the loop connecting strands 1 and 2 of the ß-barrel necessitated the presence of the type II turn, as observed in five of these structures. The presence of the turn, in these structures, appears to guide the polypeptide into the ß-barrel helping in the formation of the folded barrel structure.

Multiple sequence alignments for each of the 13 proteins considered for structural comparisons were generated as described in Materials and methods. A BLAST search with the amino acid sequence of the {alpha}-spectrin SH3 domain (SH3 domain family) gave 219 hits with E < 0.001. Splitting of multi-domain sequences augmented this number to 302. Exclusion of sequences with ClustalW scores of <25 and >90 drastically reduced the number of sequences in the final alignment to 30. A BLAST search for the sequence of the C-terminal domain of ribosomal protein L2 (translation proteins – SH3-like domain family) resulted in an initial number of 132 hits with E < 0.001. A total of 65 sequences homologous to the ribosomal protein were finally obtained by editing the sequences in a manner similar to that described above. For all the remaining sequences subjected to BLAST search, the number of sequences after editing was <30. These sequence alignments were hence not considered for further analysis for reasons described in the Materials and methods section.

The degree of conservation at each position in the multiple alignment generated was determined using the Shannon Informational entropy calculation (Shenkin et al., 1991Go). Tables IIIa and bGo give the positional entropy values at the core residue positions for the SH3 domain and the C-terminal domain of ribosomal protein L2 families, respectively. Core residue positions for the representative proteins in each family were identified by calculating the percentage accessibility of side chains using the NACCESS program (Hubbard et al., 1991Go). Residues with side chain accessibilities of <7% were considered part of the core. A total of nine core positions were identified in the {alpha}-spectrin SH3 domain. The positional entropies were <3 at all nine core positions in the SH3 domain protein (Table IIIaGo). In the case of the C-terminal domain of ribosomal protein L2, 10 of the 12 core positions showed high residue conservation as indicated by a positional entropy of <3 at these positions (Table IIIbGo). Tables IIIa and bGo also indicate the prevalence of amino acid residues at the core residue positions for the SH3-like barrel fold families, SH3 domain and the C-terminal domain of ribosomal protein L2. These core positions, as seen from the data, are predominantly occupied by valine, leucine or isoleucine in both the families. The other residues occupying positions in the barrel core are the non-polar residues including phenylalanine, methionine, alanine and glycine. The presence of these residues contributes to the high hydrophobicity at the core of the barrel in this fold. High conservation of non-polar residues at the core residue positions suggests the importance of a hydrophobic interior in maintaining the integrity of the fold.


View this table:
[in this window]
[in a new window]
 
Table III. Position-specific statistics for occurrence and conservation of residues at the core positions in the five ß-barrel families. The first column indicates residue position numbers corresponding to sequence of (a) the {alpha}-spectrin SH3 domain protein, 1shg; (b) C-terminal domain of ribosomal protein L2, 1rl2; (c) the E.coli GroES protein, 1aon; (d) alcohol dehydrogenase-like, N-terminal domain protein, 3bto; and (e) the PDZ-domain protein, 1qav. Occurrence of the most prevalent residue is indicated in the last column
 
A covariance analysis of residue volumes indicated a significantly high correlation between residues at the core positions in the SH3-like barrel (Figure 2Go). A high negative correlation of –0.97 was seen between residue positions 23 and 44 (residue numbers corresponding to the {alpha}-spectrin SH3 domain). An increase in volume of the core due to a larger side chain at residue position 23 (mostly leucine or valine) is compensated by a reduction in the side chain volume at the correlated position 44 (mostly valine or glycine) (Table IIIaGo). Similarly, a negative correlation is seen between the volumes of residues 25 and 44 (a correlation value of –0.64). Since the side chain volumes of positions 23 and 25 are negatively correlated with respect to position 44, it was anticipated that the correlation coefficient between them would be positive. Indeed, the correlation coefficient of side chain volumes at positions 23 and 25 shows a high positive value of 0.68. Another compensatory pair of residues is present at positions 44 and 53. Residue 44 exists in the third strand of the {alpha}-spectrin SH3 domain and residue 53 in the fourth strand. A negative correlation of –0.84 is seen between these core residue positions. Positive correlations of 0.75 between residues 23 and 53 and 0.42 between residues 25 and 53 result in compensation of the overall core volume. This observation strongly supports the belief that maintenance of the total volume of the core would be important to keep the barrel structure intact.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 2. Stereo-view of the SH3 domain protein, 1shg showing side chains of the correlated amino acid residues, 23, 25, 44 and 53. Figures 2 and 3GoGowere generated using Molscript (Kraulis, 1991Go).

 
Comparisons within the GroES-like fold

Three representative proteins in the GroES-like fold were considered for the comparative study (Table IGo). Superposition of proteins within the GroES-like fold was done as in the case of the SH3-like barrel fold. The representative proteins included for analysis superposed well with one another with a minimum r.m.s. deviation of 1.35 Å for 51 atoms between the horse alcohol dehydrogenase, 3bto and the E.coli GroES, 1aon (Table IIbGo). Different proteins within the alcohol dehydrogenase-like, N-terminal domain family also superimposed very well on one another (data not shown). Comparisons revealed an overall conservation of the ß-barrel core in the representative proteins.

A BLAST search yielded 170 hits for the E.coli GroES protein. Splitting of the multi-domain sequences increased this number to 174. Further editing as described earlier for the SH3-like barrel fold, however, reduced the number to 86. An initial number of 412 hits in a BLAST search for the alcohol dehydrogenase reduced to 52 sequences in the final alignment after appropriate editing. In case of the SacY protein, the number of sequences in the final alignment was <30. This protein and the corresponding family were thus omitted from further sequence and structural comparisons.

Core residue positions were identified in the two GroES-like fold proteins, the E.coli GroES and the horse alcohol dehydrogenase. Positional entropy values for the corresponding families at these core positions are shown in Tables IIIc and dGo. Of the 11 core residue positions in the E.coli GroES, eight were highly conserved with positional entropies of <3. Two of the three high-entropy positions, 84 and 86, were largely occupied by non-polar residues, the most predominant being leucine. Another variable position in the core, 73, was mostly occupied by threonine. Ten core positions were identified in the alcohol dehydrogenase. High residue conservation is seen at nine of the 10 core positions, indicated by a positional entropy value of <3 (Table IIIdGo). These positions were largely occupied by small hydrophobic amino acid residues. The predominant residue at position 152, the only high-entropy position in the alcohol dehydrogenase, was valine. Tables IIIc and dGo indicate the prevalence of non-polar amino acid residues, including valine, leucine and isoleucine, at the core residue positions for the GroES-like fold proteins. The presence of polar, uncharged residues at the core positions, however, is not uncommon. As reported earlier, valines at the core positions in these proteins are seen to be mutable into isoleucines but not to leucines (Table IIIGo) (Taneja and Mande, 1999Go). Unlike the SH3-like barrel fold proteins, proteins in the GroES-like fold did not show a high correlation between residue volumes in the barrel core.

Comparisons within the PDZ domain-like fold

The two representative structures of the PDZ domain-like fold interleukin 16, 1il16 and the neuronal nitric oxide synthase, 1qav superposed well on one another with an r.m.s. deviation of 1.90 Å for 77 atoms. Proteins within the PDZ domain family when compared among themselves superposed well on one another with an overall conservation of the ß-barrel (data not shown).

A BLAST search with the amino acid sequence of the neuronal nitric oxide synthase (representative of the PDZ-domain family) gave 230 hits. This initial number first rose to 386 owing to splitting of multi-domain sequences, but a final number of 35 sequences was obtained after editing. The number of sequences in the final alignment obtained from the protein interleukin 16 was <30. This protein and the corresponding family were thus not considered for further analysis.

A total of 16 core positions were identified in the neuronal nitric oxide synthase. Of these, 12 positions show high residue conservation with a positional entropy <3 at each of these positions (Table IIIeGo). These core positions are predominantly occupied by valine, leucine or isoleucine. Alanine seems to be the residue of choice for the remaining four positions. As in the GroES-like fold, the valines appear to be mutable to leucines rather than to isoleucines (Table IIIeGo). Core residue positions did not show a significant correlation among residues in proteins considered in this fold.

The final number of sequences in the multiple sequence alignments generated for the representative proteins in the N-terminal domains of the minor coat protein, g3p and Sm motif of small ribonucleoproteins, SNRNP families was <30. Since there were no sequence data for the two families owing to lack of fulfillment of the set criteria for sequence analysis, the two families and hence the corresponding folds were excluded from the study.

Comparisons across the ß-barrel folds

One of the objectives of the study was to identify similarities and dissimilarities across the ß-barrel folds. One representative structure from the three ß-barrel folds, viz. {alpha}-spectrin SH3 domain (SH3-like barrel fold), E.coli GroES (GroES-like fold) and the neuronal nitric oxide synthase (PDZ domain-like fold) were therefore considered for the comparisons. The topologies of the three representative structures are different from one another as shown in Figure 1Go. Comparison of topologies of the SH3-like barrel and GroES-like fold shows the presence of a 310 helix interrupting the fourth strand in both the ß-barrel folds. The first three strands of the barrel form a similar anti-parallel ß-sheet in the two protein folds, yet the two have distinct topologies. The difference lies in the way in which the fourth ß-strand hydrogen bonds with the other strands forming the barrel. In the case of the SH3-like barrel fold, the fourth strand runs anti-parallel to the third ß-strand followed by the 310 helix. This short helix juxtaposes the fourth strand to the first resulting in the formation of the barrel. The 310 helix in GroES-like fold, however, juxtaposes the fourth strand to the third, for the formation of the complete barrel. Figure 1Go also indicates the topology of the PDZ domain-like fold. This fold consists of two helices, one in the region connecting the ß-strands two and three and the other between the third and the fourth strands.

The difference in topology of the three representative proteins thus makes it difficult to superpose the corresponding structures. The presence of a common ß-barrel structural core, however, may allow comparison of the secondary structural elements forming the barrel in these proteins. Hence, ignoring the topology of the three folds, ß-strands of the representative proteins were superimposed on one another. Structural comparisons yielded two alternative ways in which ß-strands of the three proteins could be superimposed on one another with minimal r.m.s. deviation values. In one of the superimpositions, the 310 helix of the {alpha}-spectrin domain superposes very well on that in the E.coli GroES (Figure 3aGo). The superposition is such that residues of strands 1, 2 and 3 of the {alpha}-spectrin domain align with those in strands 3, 2 and 1 of the E.coli GroES respectively. The r.m.s. deviation data are as shown in Table IVaGo. In the case of the PDZ domain-like fold, the structural alignment superposes strands 2, 3 and 4 of the neuronal nitric oxide synthase onto strands 1, 2 and 3 of the {alpha}-spectrin SH3 domain, respectively. In an alternative superposition, ß-strands of the representative structures align such that strands 4, 1 and 2 of the {alpha}-spectrin SH3 domain align with strands 1, 2 and 3 of the E.coli GroES, respectively. The r.m.s. deviation data for this superimposition are given in Table IVbGo . Interestingly, this alternative superposition superimposes a type II ß-turn present in the three structures (Figure 3bGo).



View larger version (53K):
[in this window]
[in a new window]
 
Fig. 3. (A) Superposed structures of the representative proteins of the SH3-like barrel, GroES-like and PDZ domain-like folds. The {alpha}-spectrin SH3 domain protein (1shg) is shown in pink, the E.coli GroES (1aon) in blue and the neuronal nitric oxide synthase (1qav) in green. The superposed 310helix is highlighted in dark blue. (B) Superposed structures of the representative proteins of the SH3-like barrel, GroES-like and PDZ domain-like folds. The {alpha}-spectrin SH3 domain protein (1shg) is shown in pink, the E.coli GroES (1aon) in blue and the neuronal nitric oxide synthase (1qav) in green. The superposed type II ß-turn is highlighted in dark blue.

 

View this table:
[in this window]
[in a new window]
 
Table IV. Pair-wise r.m.s. deviations (Å) when superimposing structurally equivalent C{alpha} positions across the ß-barrel folds. The numbers in parentheses represent the equivalent residues considered during the superpositions
 
The ß-turn, referred to as the diverging turn in the SH3 domain, occurs at the intervening loop connecting strands 1 and 2. The turn has previously been reported to play a role in protein folding (Riddle et al., 1999Go; Larson and Davidson, 2000Go). A similar ß-turn is present just before the third strand in the PDZ domain-like fold. The {phi}{Psi} values at this turn correctly place this turn in the type II category ({phi}i + 1 = –63, {Psi}i + 1 = 140; {phi}i + 2 = 98.6, {Psi}i + 2 = –8.6). In the GroES-like fold the turn ({phi}i + 1 = –58.8, {Psi}i + 1 = 134.6; {phi}i + 2 = 96.3, {Psi}i + 2 = –104) is present at the initiation of the third ß-strand following the dome loop. The presence of the type II turn in the GroES-like and PDZ domain-like folds suggests that the turn in these protein folds may play a role similar to that observed in case of the SH3 domain. We considered this turn to be a crucial folding nucleus in the ß-barrel folds treated in this study. Further analyses were therefore carried out in relation to the superimposition where the ß-turn of all the three representative structures superposed on one another as indicated in Figure 3bGo.

Residue conservation

In order to assess the variability of amino acid residues across the three ß-barrel folds, positional entropies were compared at the structurally aligned positions of the three folds. Upon alignment of the representative structures in the SH3-like barrel, GroES-like and PDZ domain-like folds, three ß-strands of each structure superimposed well on one another (Figure 3bGo). Sequences of the representative proteins were then aligned on the basis of the structural alignment. Residues spanning the first ß-strand of the {alpha}-spectrin SH3 domain (9–14) aligned with positions 38–43 (strand 2) in the E.coli GroES and with residues 106–111 (strand 2) in the neuronal nitric oxide synthase. Residues 25–32, spanning the diverging type II turn (26–29) of the {alpha}-spectrin SH3 domain, aligned with positions 59–66 and 123–130 of the E.coli GroES and neuronal nitric oxide synthase, respectively. Residues 28 and 29 (numbers correspond to the {alpha}-spectrin SH3 domain) form the i + 2 and i + 3 positions of the type II turn. Residues spanning strand 4 of the {alpha}-spectrin SH3 domain (57–61) structurally aligned with those in the first ß-strand of E.coli GroES (residues 11–15) as also in the neuronal nitric oxide synthase (residues 95–98). Figure 4Go shows the positional entropies at the structurally aligned residues. Among these 19 structurally aligned positions, high conservation in each of the three proteins is seen at eight positions. The positional entropies at these positions are <3 in all the three protein sequence alignments. Remarkably, five of these eight highly conserved positions form the core of the barrel in all the three protein structures (Table VGo). The core positions are largely occupied by small non-polar residues. High conservation at core positions in the protein barrel suggests the importance of core residues in the formation and maintenance of the barrel structure.



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 4. Positional entropies at the structurally aligned positions of the three ß-barrel folds: SH3-like barrel fold, PDZ domain-like fold and GroES-like fold. The position numbers correspond to that of the {alpha}-spectrin SH3 domain protein. Position numbers 9, 11, 25, 31 and 58 are the core positions in all the three protein barrels. Residues 28 and 29 correspond to the i+ 2 and i+ 3 positions in the type II turn in the three-dimensional structure.

 

View this table:
[in this window]
[in a new window]
 
Table V. Comparison of positional entropies of the ß-barrel folds at the structurally aligned positions. Residue position numbers in the first column correspond to the {alpha}-spectrin SH3 domain protein, 1shg
 
Of the remaining three highly conserved residue positions, two correspond to residues in the type II ß-turn mentioned earlier. The i + 2 and i + 3 residue positions of the type II turn (corresponding to residues 28 and 29 in the {alpha}-spectrin SH3 domain) show a high residue conservation. The predominant residue at the i + 2 position is glycine and that at i + 3 is aspartate. A high residue conservation at the type II ß-turn has earlier been reported for the GroES-like fold at the corresponding positions 62 and 63 of E.coli GroES. H-bonding between the side-chain carboxylate of aspartate and main chain amide of the first residue of the turn has been suggested to be important in juxtaposing the ß-strands of the barrel, such that the barrel structure is maintained (Taneja and Mande, 1999Go). A similar side chain–main chain interaction in the SH3 domain has been suggested to stabilize the type II ß-turn (Larson and Davidson, 2000Go). The high degree of conservation seen at the i + 2 and i + 3 residue positions of the type II turn in the three barrel folds suggests the importance of this turn in the formation or maintenance of the ß-barrel structure.

Molecular dynamics simulations

Since the side chain–main chain interaction in the type II ß-turn appears to be important for the barrel structure formation and maintenance, disruption of this interaction should result in the disintegration of the type II turn. This would ultimately result in the loss of the barrel structure. To investigate the stability of the type II ß-turn upon alteration of the aspartate, we performed extensive MD simulations for the GroES peptide and its mutants. The sequence of the peptide taken for MD simulations was VKVGDIV (corresponding to residues 59–65 in the E.coli GroES). The starting conformation for the MD simulations was as observed in the crystal structure of the protein. Additional MD simulations were done with the aspartate mutated to asparagine in one case and to alanine in the other. Any alteration in the conformation in the type II turn would immediately be evident from changes in values of the dihedral angles in the type II turn.

A comparison of the {phi} and {Psi} values was performed for the different residue positions in the ß-turn during the 500 ps simulation. The {phi} and {Psi} values of the i + 1 residue remain more or less similar in all the three peptides (data not shown). However, major deviations occur in the {phi}{Psi} values of the i + 2 residue, glycine, when the i + 3 residue is mutated from aspartate to asparagine or alanine. While in the native peptide, the {phi} and {Psi} values at i + 2 position fluctuate around the value of +100 and –40, respectively, {phi}i + 2 changes to about +150 in the mutant peptides. The {Psi}i + 2 value also drops from –2 to about –100 for both the mutant forms. This, however, occurs after an initial sudden rise of {Psi}i + 2 from –2 to +70. A large deviation is seen at the fourth residue position in the ß-turn in the mutant peptide. In the case of the aspartate to asparagine mutation, {phi}i + 3 drops from about –60 to –141 while it remains stabilized in the native peptide. Comparison of the distance between Cß atoms of the i and i + 3 residue further corroborates the disintegration of the type II turn upon mutation of the fourth residue in the turn (Figure 5Go). While this distance is maintained at around 5.6 Å in the native peptide, it increases to about 8 Å in the aspartate to asparagine mutant and to about 10 Å in the aspartate to alanine mutant peptide. These results indicate the importance of the side chain–main chain H-bond interaction in the type II turn maintenance. Alteration of aspartate to asparagine is hence sufficient to cause the disruption of the type II turn conformation. A high conservation of the turn, as also the residues in the turn, thus might be of evolutionary importance in maintaining the structure of the ß-barrel.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 5. Comparison of the Cß–Cß distance among the native and mutant peptides of E.coli GroES during molecular dynamics simulations. It can be seen clearly that the native peptide with aspartate at i+ 3 position shows a stable Cß–Cß distance. The distance increases rapidly in the aspartate to alanine mutant peptide. See text for details.

 

    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Within various protein families such as serine proteases, cysteine proteases and globins, the three-dimensional structure is remarkably similar despite considerable variations in the amino acid sequences. To a certain extent, conserved residues or conservative changes account for the structural conservation. In addition, correlated pairs of residues have an important role in stabilizing the protein structure. Determination of these conserved features along with the compensatory substitution patterns helps in increasing our understanding of features that may determine the three-dimensional structure of a protein.

In this study, we have attempted to identify folding determinants in the small ß-barrel proteins. Conservation patterns across these ß-barrel folds reveal interesting similarities of residues at the core of the protein barrels. Irrespective of the topologies, these proteins show a high conservation of small non-polar amino acid residues at the core positions. The core residue positions are predominantly occupied by valine, leucine and isoleucine. Interestingly, valines at the core positions are seen to be mutable into isoleucine and not leucine, an observation reported earlier (Taneja and Mande, 1999Go). The higher frequency of substitution of isoleucine by valine has been attributed to a higher ß-sheet propensity of isoleucine and valine than leucine (Wilmot and Thornton, 1988Go). Branching of side chains at the Cß positions in both valine and isoleucine, but not leucine has previously been suggested as a possible reason for such a mutation pattern (Taneja and Mande, 1999Go). The observed mutation pattern and a high conservation of non-polar side chains suggest that the overall hydrophobic pattern of amino acids may drive the protein sequence to collapse into the ß-barrel conformation.

Correlation analysis of the SH3-like barrel fold suggests that maintenance of the total core volume occurs within the SH3 domain family of proteins. Amino acid substitutions resulting in an increase or a decrease in the volume of the core is compensated by replacement of another amino acid residue. This amino acid residue is present at a position that might be distant in sequence, but near in space to the mutated residue so as to conserve the total volume of the core and hence the overall folded structure. Interestingly, an earlier analysis of 266 SH3 sequences did not find evidence for correlated substitutions (Larson and Davidson, 2000Go). We suggest that our criteria of choosing sequence identities between 25 and 90 generates a more accurate multiple alignment for a robust statistical analysis. Accuracy of the alignment is reflected in observation of the covarying mutations.

Of the common features, the presence of a type II ß-turn is the most intriguing. This turn seems to be important in the formation of the ß-barrel. Earlier studies have reported the importance of the ß-turn in SH3 domain (Riddle et al., 1999Go; Larson and Davidson, 2000Go). Conservation of this turn, not only in proteins constituting one of the ß-barrel folds but also across the various ß-barrel folds considered in this study, suggests that this region might be an important nucleation site in the folding pathway of the ß-barrel proteins (Riddle et al., 1999Go; Larson and Davidson, 2000Go). This nucleation appears to be guided by the residues present within the turn. High residue conservation has been seen at the i + 2 and i + 3 residue positions. While the presence of glycine at i + 2 guides the protein sequence into a turn, aspartate at i + 3 is important for a unique side chain–main chain interaction. Simulation studies corroborate similar conclusions of independent work carried on an SH3 peptide (Krueger and Kollman, 2001Go). Furthermore, our analysis suggests that alteration of aspartate to asparagine or alanine destabilizes the ß-turn conformation. The glycyl-aspartyl dipeptide hence appears to be a major factor in helping maintain the integrity of the barrel.

Our study shows interesting similarities among proteins in the different ß-barrel folds. Despite large differences in sequence and function, the occurrence of a conserved glycyl-aspartyl dipeptide, intriguingly at a conserved type II turn, suggests the importance of the turn and the residues forming it in the formation of the ß-barrel. In addition, a conserved hydrophobic core suggests its role in maintenance of the barrel structure. Further studies such as site-directed mutagenesis should confirm the importance of these conserved features in the formation and maintenance of protein structure.


    Notes
 
1 Present address: Northwestern University, Chicago, IL, USA Back

2 To whom correspondence should be addressed. E-mail: shekhar{at}cdfd.org.in Back


    Acknowledgments
 
We thank Debasis Mohanty and Sharmila Mande for useful comments and suggestions. Coordinates for Thermoplasma acidophilum glucose dehydrogenase were kindly supplied by Garry Taylor. B.T. and R.Q. are CSIR Senior and Junior Research Fellows, respectively. Financial support for the work was provided by the Department of Biotechnology and by the Council of Scientific and Industrial Research.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Altschuh,D., Vernet,T., Berti,P., Moras,D. and Nagai,K. (1988) Protein Eng., 2, 193–199.[Abstract]

Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

Anfinsen,C.B. (1973) Science, 181, 223–230.[ISI][Medline]

Bernstein,F.C., Koetzle,T.F., Williams,G.J., Meyer,E.E., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535–542.[ISI][Medline]

Bowie,J.U., Reidhaar-Olson,J.F., Lim,W.A. and Sauer,R.T. (1990) Science, 247, 1306–1310.[ISI][Medline]

Eisenberg,D., Schwarz,E., Komaromy,M. and Wall,R. (1984) J. Mol. Biol., 179, 125–142.[ISI][Medline]

Harpaz,Y., Gerstein,M. and Chothia,C. (1994) Structure, 2, 641–649.[ISI][Medline]

Hubbard,S.J., Campbell,S.F. and Thornton,J.M. (1991) J. Mol. Biol., 220, 507–530.[ISI][Medline]

Jones,T.A., Zou,J.Y., Cowan,S.W. and Kjeldgaard,M. (1991) Acta Crystallogr., A47, 110–119.

Kamtekar,S., Schiffer,J.M., Xiong,H., Babik,J.M. and Hecht,M.H. (1993) Science, 262, 1680–1685.[ISI][Medline]

Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946–950.[CrossRef][ISI]

Krueger,B.P. and Kollman,P.A. (2001) Proteins: Struct. Funct. Genet., 45, 4–15.[CrossRef][ISI][Medline]

Larson,S.M. and Davidson,A.R. (2000) Protein Sci., 9, 2170–2180.[Abstract]

Mandel-Gutfreund,Y., Zaremba,S.M. and Gregoret,L.M. (2001) J. Mol. Biol., 305, 1145–1159.[CrossRef][ISI][Medline]

Martinez,J.C. and Serrano,L. (1999) Nature Struct. Biol., 6, 1010–1016.[CrossRef][ISI][Medline]

Merkel,J.S., Sturtevant,J.M. and Regan,L. (1999) Structure, 7, 1333–1343.[CrossRef][ISI][Medline]

Murzin,A.G., Lesk,A.M. and Chothia,C. (1994) J. Mol. Biol., 236, 1382–1400.[ISI][Medline]

Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536–540.[CrossRef][ISI][Medline]

Pollock,D.D. and Taylor,W.R. (1997) Protein Eng., 10, 647–657.[Abstract]

Riddle,D.S., Grantcharova,V.P., Santiago,J.V., Alm,E., Ruczinski,I. and Baker,D. (1999) Nature Struct. Biol., 6, 1016–1024.[CrossRef][ISI][Medline]

Serrano,L. (2000) Adv. Protein Chem., 53, 49–85.[ISI][Medline]

Shenkin,P.S., Erman,B. and Mastrendrea,L.D. (1991) Proteins, 11, 297–313.[ISI][Medline]

Shindyalov,I.N., Kolchanov,N.A. and Sander,C. (1994) Protein Eng., 7, 349–358.[Abstract]

Taneja,B. and Mande,S.C. (1999) Protein Eng., 12, 815–818.[Abstract/Free Full Text]

Thomson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 4673–80.[Abstract]

Wilmot,C.M. and Thornton,J.M. (1988) J. Mol. Biol., 203, 221–232.[ISI][Medline]

Yi,Q., Bystroff,C., Rajagopal,P., Klevit,R.E. and Baker,D. (1998) J. Mol. Biol., 283, 293–300.[CrossRef][ISI][Medline]

Received April 26, 2002; revised October 3, 2002; accepted October 10, 2002.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (1)
Request Permissions
Google Scholar
Articles by Qamra, R.
Articles by Mande, S. C.
PubMed
PubMed Citation
Articles by Qamra, R.
Articles by Mande, S. C.