A conformational analysis of Walker motif A [GXXXXGKT (S)] in nucleotide-binding and other proteins

C. Ramakrishnan1,2, V.S. Dani1 and T. Ramasarma3

1 Molecular Biophysics Unit and 3 Department of Biochemistry, Indian Institute of Science, Bangalore 560012, India


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results of data analysis
 Discussion
 References
 
The sequence GXXXXGKT/S, popularly known as Walker motif A, is widely believed to be the site for binding nucleotides in many proteins. Examination of the crystal structures in the Protein Data Bank showed that about half of the examples having these sequences do not bind or use nucleotides. Data analyses showed 92 different Walker sequences of the variable quartet (XXXX). Ramachandran angles in this segment revealed conformational similarity in the group of 45 proteins, known to bind or utilize nucleotides. The conformations of this segment in other proteins differ widely and it is not known whether they play any role in their functions. A flip of a peptide unit at different locations, with little change in the backbone conformation was noted in nine pairs of these proteins having same Walker sequence. An examination of the immediate neighborhood of the Walker sequence indicates that this region is preceded by a ß-strand and followed by an {alpha}-helix, resulting in the motif ß–W–{alpha}, an invariant feature amongst nucleotide-binding proteins.

Keywords: peptide flip/Ramachandran angles/ß-turn/Walker motif


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results of data analysis
 Discussion
 References
 
The motif GXXXXGKT (X, any residue) as a common nucleotide binding fold in the {alpha}- and ß-subunits of F1-ATPase, myosin and other ATP-requiring enzymes was first recognized in 1982 by Walker and colleagues (Walker et al., 1982Go). Since then, this sequence has been found in many proteins that bind nucleotides and thereby gained predictive value for nucleotide binding site in proteins. Crystal structure data of such proteins (Berchtold et al., 1993Go; Abrahams et al., 1994Go; Chattopadhyay et al., 2000Go) indicated that this motif is present in the shape of a loop around nucleotides and utilizes its highly conserved residues of lysine and threonine to bind to their phosphate-oxygen atoms. This consensus sequence of GXXXXGKT (S), with serine substituting threonine in some cases, is more popularly known as Walker loop or P-loop (phosphate binding loop).

In view of growing interest in the proteins containing a segment with Walker sequence, the Brookhaven Protein Data Bank (Berman et al., 2000Go) was searched and 649 polypeptide chains were found to have such a sequence. Many of these proteins do not bind or use nucleotides in their reactions. Therefore, it appeared that the sequence of the variant quartet and the specific loop structure might have a role in nucleotide binding. To fill the lacunae of information, conformations of the backbone of the peptide fragments of GXXXXGKT (S) were examined using Ramachandran angles. The data analysis in this paper indicates that different foldings are possible for the Walker sequences and only in the nucleotide-binding proteins they have a distinctive loop structure.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results of data analysis
 Discussion
 References
 
The Ramachandran angles ({phi}, {Psi}) (Ramachandran et al., 1963Go; Ramachandran and Sasisekharan, 1968Go) were computed from the coordinates of atoms available in the Brookhaven Protein Data Bank (Berman et al., 2000Go). The segment structure similarity was obtained by evaluating the root mean square (r.m.s.) values of the Ramachandran angles. The package of RASMOL (Sayle and Milner-White, 1995Go) was used to draw the figures.


    Results of data analysis
 Top
 Abstract
 Introduction
 Materials and methods
 Results of data analysis
 Discussion
 References
 
Proteins containing Walker sequences

Search for the sequence GXXXXGKT (S) in the Protein Data Bank (April 2001 release) revealed 649 entries having this sequence, occurring in 395 protein structures with a resolution of 4 Å or better. Out of the 204 combinations of sequence possible for the variable region XXXX, only 92 were found to occur, of which 18 had only one entry. The present analysis is limited to these data

The Ramachandran angles of Walker sequence

Groups having more than one entry were examined from the structural viewpoint. The mean and r.m.s. values of the Ramachandran angles {phi} and {Psi} were computed at the eight residues of the segment. Should the same sequence give the same structure, as is widely believed, the r.m.s. values for a group would be small. Using a liberal upper limit of 40°, dissimilar structures were found to be present in 10 of these groups, as revealed by the high r.m.s. values for some of the Ramachandran angles. Using similarity of the Ramachandran angles as the criterion, these were divided into further sub-groups. The various sequences and location of the segment in the protein of the group thus obtained are given in Table IGo, along with the PDB code, chain identifier, resolution of the structure and r.m.s. for those groupings with more than one entry (the protein names are not included in Table IGo owing to the large number of examples; however, they are included in Table IIGo, which gives the selected set). The sub-groups with same sequence are indicated by suffixes A, B and C, to the group number. It can be seen that the r.m.s. values are now reasonably small. Some sequences assume more than one conformation: two for six sequences (005 – GAGALGKT, 012 – GLRSDGKT, 016 – GLPAIGKT, 030 – GATGTGKT, 058 – GTAFEGKS and 077 – GLYRTGKS); three for three sequences (006 – GHVDHGKT, 033 – GPTGVGKT and 059 – GKGGIGKS); and four for one sequence (003 – GGAGVGKT). These data implied that highly localized conformational variants are possible in these segments retaining overall structural similarity.


View this table:
[in this window]
[in a new window]
 
Table I. Proteins containing the consensus sequence of GXXXXGKT(S): the location of the segment in the chain, PDB code and resolution of the crystal structure are given
 

View this table:
[in this window]
[in a new window]
 
Table II. The entries selected from Table IGo regrouped based on their structural similarity [the examples in set VII do not possess any structural similarity, as analyzed using the Ramachandran angles ({phi},{Psi})]
 
Conformational variants of segments of Walker sequences

The next step was the grouping of the conformations irrespective of the sequence of the variable region of the Walker segment. This was done as follows: (1) for those groups in Table IGo which had only one entry, the choice was unambiguous; (2) for those groups having more than one entry, one with the best resolution shown in bold face in Table IGo had been picked up as the representative of the group/sub-group. These collectively gave 107 examples which were regrouped solely on the basis of similarity of the Ramachandran angles ({phi}, {Psi}). Of the new sets thus obtained, 53 (out of a total of 107) entries constituted the major set. Another set had five entries, while seven others had two entries each. The last set comprised 35 entries, without any structural similarity among them. In any particular set, proteins with high overall sequence homology could be found, although the sequences of the variable region were different. These are as follows: (1AYL, 1OEN); (1A4R, 1MH1); (1DPF, 1TX4); (1CIP, 1AS0), (1AGP, 1CTQ, 1RVD, 421P, 821P); pairs [(2CYP, 1CCG); (1MHY (D), 1MTY (D)], as well as [1DT0 (A), 1ISA (B)]. Since the structures in such cases are expected to be similar, the entries that had the best resolution were retained. These were 1AYL, 1MH1, 1TX4, 1CIP, 1CTQ, 2CYP, 1MTY (D) and 1ISA (B). The final grouping thus obtained is given in Table IIGo, which has 45 proteins in set I, five in set II, two each in sets III, IV, V and VI and the remaining 38 in set VII. The mean and r.m.s. ({phi}, {Psi}) values of sets I–VI are given in Table IIIGo and these are small enough to warrant structural similarity among the members. The r.m.s. has no relevance for the last set (set VII).


View this table:
[in this window]
[in a new window]
 
Table III. Mean and r.m.s. ({phi},{Psi}) values (°) for the first six sets given in Table IIGo
 
For easy comprehension of the structural grouping, the line diagrams of the backbone of GXXXXGKT (S) segments of the proteins in sets I–VI, drawn with the peptide unit spanning residues 5–(X) and 6–(G) as the common internal frame of reference, are shown in Figure 1Go. The sickle-like folding with overlap of the atoms of the backbone is seen with members of the set I (Figure 1Go). Nearly the same structure of segment 5–8 is found in set II, but that of segment 1–4 is different (Table IIIGo). Set VII is comprised of structures of differing conformations indicating the flexibility of Walker sequences to acquire random folding. Out of the 38 examples in this set, only those which have resolution of 1.8 Å or better are shown in Figure 2Go.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 1. Wire-frame diagrams of the backbone atoms in the segment GXXXXGKT (S) in the proteins given in sets I–VI of Table IIGo.

 


View larger version (8K):
[in this window]
[in a new window]
 
Fig. 2. Wire-frame diagrams of the backbone atoms in the segment GXXXXGKT (S) in the proteins belonging to set VII of Table IIGo. Only those examples occurring in protein structures which have a resolution of 1.8 Å or better are shown.

 
Differing structures with same Walker sequence

Structural differences between segments having the same Walker sequence are also perceivable from the foregoing data. There are nine such examples in the present data set. The PDB codes along with the Ramachandran angles at the eight positions of Walker sequence of these nine pairs are given in Table IVGo. Large differences in Ramachandran angles are observed at four different locations within the segment (shown in bold face in the table). These are as follows: (i) 1VOM–2MYS (A), (ii) 1F5N (A)–1DG3 (A), (iii) 1FP6 (A)–1G20 (E) and (iv) 1EFT–1EFU (A) all at locations 3/4; (v) 1MMO (D)–1MMO (E) at locations 4/5; (vi) 1ISA (A)–1ISA (B), (vii) 1D9X (A)–1D9Z (A) and (viii) 1BMF (D)–1BMF (E) at locations 5/6; and (ix) 1G3I (A)–1G3I (S) at locations 6/7. These large changes arise owing to a flip of the peptide unit spanning the two residues.


View this table:
[in this window]
[in a new window]
 
Table IV Ramachandran angles ({phi},{Psi}) (°) in the segment GXXXXGKT for those examples with same sequence for XXXX but with different conformations
 
There are more than two examples with differing conformations for two of the sequences. The first example consists of 1EFT, 1EFU (A) and 1ETU, which have the same sequence, GHVDHGKT (iv in Table IVGo), but the peptide unit between residues 3 and 4 in 1EFT and 1EFU (A) is flipped. The conformation of the third member of this group, 1ETU, does not match in entirety with the other two examples. A close examination reveals that the conformations of 1EFT and 1ETU differ only in the segment 18–20. The second example is of 1BMF (D, E), 1MAB (B) and 1E1R (E) having the same sequence GGAGVGKT. The peptide unit between locations 5 and 6 in 1BMF (E) and 1BMF (D) is flipped, as also is the one between locations 6 and 7 in 1BMF (D) and 1MAB (B) (viii in Table IVGo, shown as overlapping boxes). The fourth entry, 1EIR (E), has an altogether different conformation.

The last entry in Table IVGo corresponds to the sequence GPTGVGKT occurring in the two chains A and S of the protein 1G3I and the conformations are different. In this case the peptide unit between locations 6 and 7 show a rotation of {approx}90° about the virtual C{alpha}–C{alpha} bond, instead of a flip, as is found in the other examples.

The ball and stick diagrams of these nine examples with a flipped peptide unit shown within a box are given in Figure 3Go. The overlap of the polypeptide backbones appears good. The examples of pairs i–vi correspond to the flip occurring at the middle peptide unit of the well-known 4->1 hydrogen-bonded ß-turns of types I and II (Venkatachalam, 1968Go; Gunasekaran et al., 1998Go). However, the flip of the peptide unit observable in pairs vii–ix does not correspond to the ß-turn flip as the values of ({phi}, {Psi}) are far different from those characteristic of ß-turn ranges. Further, the 4->1 hydrogen bond is also absent. Notwithstanding the flip, the same overall backbone structure is retained.





View larger version (48K):
[in this window]
[in a new window]
 
Fig. 3. Ball and stick diagrams of the backbone atoms in the segment GXXXXGKT (S) in the examples given in Table IIGo. The flip of the peptide unit can be seen in the box, shown at the stated positions; 3 and 4 for i–iv; 4 and 5 for v; 5 and 6 for vi–viii; 6 and 7 for viii-a and ix. Shown as a white ball in this peptide unit, the hydrogen atom has been geometrically fixed.

 
The examples of nucleotide-binding proteins are arranged in Figure 3Go with the nucleotide bound forms on the left and the free forms on the right. These are as follows: myosin ATPase [1VOM–2MYS (A)], guanylate binding protein [1F5N (A)–1DG3(A)], nitrogenase, [1FP6(A)–1G20(E)] elongation factor Tu [1EFT–1EFU(A), uvrB protein (A)–DNA helicase [1D9Z(A)–1D9X(A)], F1-ATPase [1BMF(D)–1BMF(E) and 1MAB(B)] and the HSLUV protease chaperone complex [1G3I(A)–1G3I(S)]. Wherever the nucleotide is bound the N–H of the flipped peptide unit projects inwards of the loop. In the case of F1-ATPase (Abrahams et al., 1994Go), this N–H forms a hydrogen bond with P=O of the ß-phosphate. It appears that the presence or absence of the nucleotide makes the difference between the two structural forms. The residues of Walker sequence in such proteins not only bind to the nucleotide phosphates but also show consequent localized structural changes. This feature has important implications in the biochemical events that occur at this site.

In the case of proteins with oxygen-related reactions, the difference appears to be present in the polypeptides as isolated. The two proteins of methane monooxygenase, showing a flip at position 4/5, are derived from two organisms. The Walker sequence is present only in the Fe-form of superoxide dismutase and the two identical subunits of this enzyme protein exhibit this flip of a peptide unit. It is possible that the O=O group may act as the P=O in nucleotide phosphate in protein–substrate interactions. No relationship has so far been found between the peptide flips in Walker sequences and the activities of these proteins.

The secondary structures flanking the Walker sequence

The foregoing analysis indicated that the variable region is unlikely to determine the conformation of the Walker sequence A found in many nucleotide-binding proteins. The characteristic loop structure of the Walker sequence in these proteins is known to be preceded by a ß-strand and followed by an {alpha}-helix (see, for an example, Abrahams et al., 1994Go). It was therefore of interest to examine the occurrence of the flanking secondary structure of Walker sequences in proteins listed in Tables I and IIGoGo. For this purpose, segments of eight residues on either side of Walker sequences were examined for the presence of secondary structures ({alpha} = {alpha}-helix; ß = ß-strand;X = neither {alpha} nor ß; W = Walker sequence A). All nine possible combinations do occur and their distribution is given in Table VGo. The majority of the examples fall into the category of ß–W–{alpha}. This structural motif is present in all cases in the sets 1, 2 and 6 and some in set 7 of Table IIGo. Interestingly, each of these proteins can bind to nucleotides leading to hydrolysis of the terminal phosphate to provide energy for accompanying reactions (e.g. ATPases) in a large number of cases and in some cases transfer the phosphate to acceptors (kinases). This is true of the examples of proteins in the miscellaneous set 7. Hence it appears that the structural motif ß–W–{alpha}, but not W alone, is the determining factor for nucleotide binding. The examples in sets 3, 4 and 5 of Table IIGo, although small in number (only two each), show distinctive motifs of X–W–ß, {alpha}–W–ß and {alpha}–W–{alpha}, respectively.


View this table:
[in this window]
[in a new window]
 
Table V. Distribution of the different types secondary structures flanking Walker sequence GXXXXGKT (S) in the examples given in Tables I and IIGoGo
 

    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results of data analysis
 Discussion
 References
 
The noteworthy observation in this study is that the Walker sequence is present in many proteins and is not limited to those that bind and/or use nucleotides in their actions. Because of the belief that it provides the loop for phosphate binding, the so-called P-loop, this sequence was looked for only in such proteins and was invariably found. A search in the PDB files for its general occurrence, undertaken in this study, revealed its broad distribution (Tables I and IIGoGo). The diversity of these proteins is truly amazing. These include peroxidases (of cytochrome, lignin), proteases (cathepsin, collagenase, serine protease), enzymes (methane monooxygenase, superoxide dismutase, {alpha}-amylase, glutamate dehydrogenase, carbonic anhydrase, Taq polymerase, etc.), binding proteins (lectin, trypsin inhibitor) and miscellaneous proteins ({alpha}-toxin, cyclophilin B, enterotoxin). An examination of the structures of cytochrome peroxidase and superoxide dismutase indicated that these sequences are present at some distance from the active metal centers. It is to be ascertained whether Walker sequences in these proteins are utilized in their actions or their presence is incidental. It becomes obvious that the Walker sequence is more widely distributed and presence of the P-loop seems to be restricted to the nucleotide binding proteins.

The second observation in this study is the sharing of a common loop structure in proteins of the major group which use and bind nucleotide phosphates. These include kinases, phosphatases, ATPases, heat shock proteins, transfer/transport ATPases, permeases, myosin motor domain and elongation factor. The variable quartet (XXXX) has little influence on the bend as seen from the minor variation of overlap in this region (Figure 1Go). Indeed, the variable quartet is so highly random in sequence that it gives no clue on the looping. Of these, G (13.3%), A (11.9%), S (9.8%), V (8.4%) and T (5.9%) occur more commonly than other amino acids, but no sequence can be identified with a set or a sub-set of proteins. Thus the formation of the ß-turn loop seems to depend less on this sequence and more on the polypeptide chains on either side of the P loop, characteristically a ß-sheet at the N-terminus and an {alpha}-helix at the C-terminus. The absence of the classical 4->1 hydrogen bond in these loop structures appears to provide more room to surround and manipulate the phosphate chain of nucleotides for exchanging terminal phosphate.

Finally, the minor, local differences in the structures with the same Walker sequence, in our opinion, are of importance as they offer possibilities of participation in the functions of these proteins. These relate to the flip of peptide units in four positions (3–4, 4–5, 5–6, 6–7 in Table IVGo) in these sequences. The large differences in Ramachandran angles indeed brings to light these structural variants. Three examples are noted in the pairs that show these flips: the same enzyme protein from two different organisms (methyl monooxygenase), the two subunits of a homodimer protein (Fe-superoxide dismutase) and the binding of nucleotide to one of the two subunits (F1-ATPase, ß-subunit). The last example is a case with possible interaction of the substrate and the backbone structure of the enzyme active site and offers interesting mechanistic possibilities. Details of this have been reported elsewhere (Ramasarma and Ramakrishnan, 2002Go).


    Notes
 
2 To whom correspondence should be addressed. E-mail: ramki{at}crmbu2.mbu.iisc.ernet.in Back


    Acknowledgments
 
T.R. is a senior scientist of the Indian National Science Academy, New Delhi. C.R. and V.S.D. acknowledge financial assistance from the Council of Scientific and Industrial Research, New Delhi, India.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results of data analysis
 Discussion
 References
 
Abrahams,J.P., Leslie,A. G., Lutter,R. and Walker,J. E. (1994) Nature, 370, 621–628.[CrossRef][ISI][Medline]

Berchtold,H., Reshetnikova,L., Reiser,C.O., Schirmer,N.K., Sprinzl,M. and Hilgenfeld,R. (1993) Nature, 365, 126–132.[CrossRef][ISI][Medline]

Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) Nucleic Acids Res., 28, 235–242.[Abstract/Free Full Text]

Chattopadhyay,D., Langsley,G., Carson,M., Recacha,R., DeLucas,L, and Smith,C. (2000) Acta Crystallogr., D56, 937–944.

Gunasekaran,K., Gomathi,L., Ramakrishnan,C. Chandrasekhar,J. and Balaram,P. (1998) J. Mol. Biol., 284, 1505–1516.[CrossRef][ISI][Medline]

Ramachandran,G.N., Ramakrishnan,C. and Sasisekharan,V. (1963) J. Mol. Biol., 7, 95–99.[ISI][Medline]

Ramachandran,G.N. and Sasisekharan,V. (1968) Adv. Protein Chem., 23, 283–437.[Medline]

Ramasarma,T. and Ramakrishnan,C. (2002) Indian J. Biochem. Biophys., 39, 5–15.[ISI]

Sayle,R.A. and Milner-White,E.J. (1995) Trends Biochem. Sci., 20, 374–376.[CrossRef][ISI][Medline]

Venkatachalam,C.M. (1968) Biopolymers, 6, 1425–1436.[ISI][Medline]

Walker,J.E., Saraste,M., Runswick,M.J. and Gay,N.J. (1982) EMBO J., 1, 945–951.[ISI][Medline]

Received August 31, 2001;