Division of Chemistry, Graduate School of Science, Kyoto University, Kitashirakawa, Sakyo-ku, Kyoto 606-8502, Japan
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: convergent evolution/phosphate-binding site/structural motif/three-dimensional structure comparison
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Comparative structural studies on mononucleotide-binding proteins have identified various sequence motifs (Traut, 1994) and a variety of protein folds (Schulz, 1992
). Recently, structural analyses of atomic interactions between the base part of ATP or GTP and proteins revealed the structural variety of the purine base recognition (Kobayashi and Go, 1997
) and showed that proteins, having totally different folds, may adopt similar recognition schemes (Kobayashi and Go, 1996
). This study indicates that merely the sequence or the protein fold information is not sufficient to justify the variations of the binding schemes; rather, comparison at the atomic structure level is required.
Keeping this view in mind, we have extended the previous study (Kobayashi and Go, 1997) and analyzed, using an all-against-all comparison of the local atomic environments around the recognition site, the phosphate-binding sites taken from 491 coordinate sets in Protein Data Bank (October 1997 release).
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Since the number and the species of corresponding atoms in a common local environment are different, a cluster analysis, based on a single similarity scale, is not suitable for the purpose of classifying the common local environments. Instead, we represent the results in a correlation map shown in Figure 1a. This is a 491x491 symmetric matrix, where a dot indicates a pair of binding sites, within the 491 structures, that have similar configurations of protein atoms interacting with a phosphate group. The 491 structures are aligned on the axis so that the dots (binding site pairs) pack as tightly as possible on the diagonal line. This procedure places structures sharing similar local environments close to each other on the axis and clusters of dots are formed on the diagonal line as in Figure 1a
.
|
![]() |
Results and discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The results of the all-against-all comparison among the 491 mononucleotide-binding sites are summarized in the 491x 491 correlation map in Figure 1a. Dots are colored according to the level of similarity of the pair of binding sites: red dots are for highly similar pairs (Nc
30) and green dots are for less similar ones (23
Nc
29). This lowest bound, 23, was chosen so that most of the common local environments defined between two superfamilies have a unique set of corresponding atoms. Below this threshold level, the common local environments for two superfamilies tend to have variations in the corresponding atoms. An example of the higher level similarity denoted by a red dot is found between entries (1) and (2) in Figure 1b
, where atoms in a similar configuration exist in main or side chains indicated by red or green traces. Metal cations, indicated by gray spheres, also occupy similar positions. Such similarity is found in square-shaped clusters of red dots in Figure 1a
. A detailed analysis of the clusters showed that each of these clusters corresponds to a superfamily defined in the SCOP database (Murzin et al., 1995
). In other words, each superfamily appears to assume a different phosphate-binding scheme.
On the other hand, a lower level of similarity, denoted by green off-diagonal dots, is found in a more limited portion of the binding site, such as between entries (1) and (3) in Figure 1b, where the atoms with green traces assume similar configurations. We call such a fragment, having an inter-cluster or inter-superfamily similarity, a structural motif.
Many structural motifs, represented by green off-diagonal dots, are found in a square enclosed by the solid black line containing 13 superfamilies. The structural motifs within this square have a common spatial arrangement of the backbone and phosphate atoms with some variations in their side chain structures. We call this common structure, characterizing the phosphate-binding sites of the 13 superfamilies, a structural P-loop. The other three sets of structural motifs enclosed by circles are shared by pairs of superfamilies: (1) protein kinase and glutathione synthetase ATP-binding domain like (Figure 3a); (2) nucleotidyltransferase and actin-like ATPase domain (Figure 3b
); and (3) FMN-linked oxidoreductase and PRTase catalytic (C) site (Figure 3c
). The other green dots in Figure 1a
, those above circle 3 and on the left of circle 2, are found in only a limited number of the superfamilies. These do not represent any structural motif, i.e. a unique set of corresponding atoms for a pair of superfamilies.
|
The common structure shared by 13 superfamilies, the structural P-loop, contains a four-residue backbone fragment whose conformation is illustrated in Figure 1b. The fragment forms hydrogen bonds between its backbone atoms and a phosphate group. The structural alignments, shown schematically in Figure 2
, indicate that the first residues of the structural P-loops are always glycine and the other three residues are not conserved at all. Since side chain atoms do not participate in binding of the common phosphate, it is reasonable that the amino acid sequences are not conserved. The importance of the main chain atoms in nucleotide binding has already been pointed out by Swindells (1993) in loop search studies of nucleotide-binding sites in doubly wound
/ß proteins. However, the reason for the strict conservation of the glycine residue is not clear; The glycine (
,
) angles are scattered all over the Ramachandran (
,
) map and their supposed Cß positions are not necessarily occupied by one of the phosphate atoms. Figure 2
also shows that the structural P-loop is mostly on a loop connecting a ß-strand to an
-helix, with some exceptions. This observation is consistent with the tendency that a negatively charged group frequently binds to the N-terminus of an
-helix (Hol et al., 1978
).
|
Protein kinase versus glutathione synthetase ATP-binding domain like
Figure 3a shows a representative case of a common structural motif found in two superfamilies: protein kinase and glutathione synthetase ATP-binding domain like.
The divalent cations are coordinated by the two phosphate groups of ATP and Asp184 in cAPK. In DD-ligase, the divalent cations are coordinated by a phosphate group of ADP, Glu270 and a phosphino-phosphate 3 in PHY (an inhibitor). In addition, the phosphate groups are hydrogen-bonded with the backbone amides of Ser150 in cAPK and Ser53 in DD-ligase. The r.m.s.d. value is 0.85 Å and Nc = 32.
In these proteins, the negative charges on the phosphate groups are neutralized by, among others, two divalent cations and an -NH3+ group of lysine, Lys72 in cAPK and Lys97 in DD-ligase. The lysine side chains in the two proteins have totally different orientations, while the
-NH3+ groups occupy very similar positions, as illustrated in the Figure 3a
. It should be noted that these lysine residues do not have similar positions along the sequence.
Although the local environments around the phosphate groups in cAPK and DD-ligase are strikingly similar to each other, the atomic configurations, other than the common phosphate-binding site, are totally different. This structural difference may reflect the difference of the target molecule to which a -phosphate is transferred, i.e. phosphorylase kinase in cAPK and a D-alanine in DD-ligase.
Nucleotidyltransferase versus actin-like ATPase domain
An example of the structural motif found both in nucleotidyltransferase and actin-like ATPase domain is shown in Figure 3b. The r.m.s.d. value after superposition is 0.92 Å and Nc = 28.
This structural motif consists of three components which are not consecutive along the sequence, i.e. two backbone fragments (179A181A, 187A189A in D pol B and 1214, 201203 in HSP70) and an aspartic acid (D192 in D pol B and D10 in HSP70) coordinating a divalent cation (Mn2+ for Dpol B and Mg2+ for HSP70). The two backbone fragments are located in one case on a loop and in the other on two loops forming hydrogen bonds to the phosphate group through backbone atoms. The divalent cation in Dpol B interacts with , ß and
phosphate groups in thymidine-5'-triphosphate (TTP), while the cation in HSP70 interacts with phosphoric acid and ß phosphate group in ATP. The positions of the sugar moieties in mononucleotides are totally different from each other. Such a situation is also observed in all the other structural motifs found in our study.
FMN-linked oxidoreductase versus PRTase C site
The structural motif shown in Figure 3c is found in the two superfamilies, FMN-linked oxidoreductase and PRTase C site. The r.m.s.d. value is 0.57 Å and Nc = 28.
The key features of similarity are two backbone fragments (325326 and 345348 in OYE and 353354 and 347350 in PRTase), one of which contains an arginine residue at the end (R348 in OYE and R350 in PRTase) making hydrogen bonds to the phosphate groups. In addition to the arginine residue, the backbone atoms also interact with the phosphate group. It should be noted that the two corresponding loops are in the opposite order along the sequence.
Structural comparison versus sequence comparison
The structural comparisons have revealed striking similarities in the phosphate-binding sites beyond the level of superfamily. Therefore, it is not possible for the sequence comparison alone to detect such similarities. It is found that the main chain atoms in the structural motifs are mostly responsible for the phosphate binding. This is the reason for the large variation in the local sequences (Figures 1 and 3). Furthermore, Figure 3
shows that the corresponding atoms in the structural motifs are not necessarily aligned along the sequence, and sometimes are even in the opposite order. All of these structural motifs found to be common among different superfamilies and folds should be the result of the convergent evolution.
Comparison of the structural motifs with all PDB structures
As in the sequence motifs, it should be possible to identify phosphate-binding sites by searching the Protein Data Bank with a query on the structural motifs. To examine this possibility, we compared 651 structures in the PDB-select dataset (25% list; October 1997 release) with the structural motifs, considering only the protein atoms. The result of the comparison was assessed by a Z-score, a normalized difference between the two kinds of the average values of Nc, one for proteins containing the structural motif and the other for all proteins. The normalization was done in terms of the standard deviation of Nc for all proteins.
The Z-scores were calculated to be 0.19 for the structural P-loop (the backbone structure in Figure 1b; 16 protein atoms), 8.4 for protein kinase and glutathione synthetase ATP-binding domain like (Figure 3a; 31 protein atoms), 7.6 for nucleotidyltransferase and actin-like ATPase domain (Figure 3b; 38 protein atoms) and 3.9 for FMN-linked oxidoreductase and PRTase catalytic site (Figure 3c; 25 protein atoms). Each distribution for proteins containing one of the structural motifs, except the structural P-loop, is well separated from the distribution for all other proteins. The structural P-loop is only one of the necessary conditions of phosphate-binding, but not a sufficient condition. When non-protein atoms are ignored, the main chain trace of a ß-turn of four residues becomes widespread, giving rise to the observed small Z-score for the structural P-loop. Therefore, to recognize the structural P-loop properly, it should ideally be defined in a similar manner as a template of a sequence motif, which contains not only the common part (or a product-set of various structures, structural P-loop in this case) but also some of the frequently occurring peripheral parts (or a sum-set). Actually, we found a high Z-score in an example containing both the structural P-loop and some peripheral parts; Z-score = 5.58 for the corresponding atoms between 6q21c and 1ayl shown in Figure 1b (drawn as red or green traces, 56 protein atoms). We are currently working on such a template representation, together with the appropriate search algorithm using the template.
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bernstein,F.C., Koetzle,T.F., Williams,G.J., Meyer,E.E., Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535542.[ISI][Medline]
Bron,C. and Kerbosch,J. (1973) Commun. A.C.M., 16, 575577.[ISI]
Hanks,S.K., Quinn,A. and Hunter,T. (1988) Science, 241, 4252.[ISI][Medline]
Hol,W.G.J., van Duijnen,P.T. and Berendsen,H.J.C. (1978) Nature, 273, 443446.[ISI][Medline]
Kobayashi,N. and Go,N. (1996) Nature Struct. Biol., 4, 67.[ISI]
Kobayashi,N. and Go,N. (1997) Eur. Biophys. J., 26, 135144.[ISI][Medline]
Kraulis, P.J. (1991) J. Appl. Crystallogr., 24, 946950.[ISI]
Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536540.[ISI][Medline]
Saraste,M., Shibbald,P.R. and Wittinghofer,A. (1990) Trends Biochem. Sci., 15, 430434.[ISI][Medline]
Schulz, G.E. (1992) Curr. Opin. Struct. Biol., 2, 6167.
Swindells M.B. (1993) Protein Sci., 2, 21462153
Traut, W.T. (1994) Eur. J. Biochem., 222, 919.[Abstract]
Walker,J.E., Saraste,M., Runswick,M.J. and Gay,N.J. (1982) EMBO J., 1, 945951.[ISI][Medline]
Warme,P.K. and Morgan,R.S. (1978) J. Mol. Biol., 118, 273287.[ISI][Medline]
Received April 30, 1998; revised July 15, 1998; accepted September 29, 1998.