Glycine Residues Provide Flexibility for Enzyme Active Sites*

(Received for publication, February 5, 1996, and in revised form, October 23, 1996)

Bo Xu Yan Dagger § and Ying Qing Sun

From the Dagger  Institute of Microbiology, National Laboratory of Microbial Technology, Shandong University, Jinan 250100, China and the  Center for Life Sciences, National Laboratory of Protein Engineering and Plant Genetic Engineering, Peking University, Beijing 100871, China

ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS AND DISCUSSION
FOOTNOTES
Acknowledgments
REFERENCES


ABSTRACT

The high resolution refined structures of 23 enzymes were analyzed to determine the properties of amino acids involved in active site regions. These regions were found to be rich in G-X-Y or Y-X-G oligopeptides, where X and Y are polar and non-polar residues, respectively, that are small and with low polarity. Other regions of the enzyme molecules have significantly fewer of these sequences. These features suggest that glycine residues may provide flexibility necessary for enzyme active sites to change conformation, and the G-X-Y or Y-X-G oligopeptides may be a motif for the formation of enzyme active sites.


INTRODUCTION

In 1958, Koshland (1) proposed the "induced fit" model of enzyme action in which conformational changes induced by substrate binding could orient functional groups on an enzyme so as to enhance the efficiency of the subsequent chemical process. Since then, the number of examples where ligand binding and solvation alter three-dimensional structures seems to increase proportionally with the information available from structure biology. Conformational flexibility of enzymes required for this effect is well recognized (2-4). Recent evidence from folding and unfolding of enzymes indicated that loss of enzyme activity can precede marked changes in protein conformation. Tsou (5, 6) has demonstrated that enzyme active sites may be conformationally more flexible than the intact enzymes.

On the other hand, the result that ribonuclease can be denatured reversibly leads to the contemporary view that protein conformation is determined by the amino acid sequence (7). Therefore, methodologies allowing comparison of amino acid sequences to investigate enzyme structure and function have been developed (8-11). The information obtained by comparative analysis of the known protein structures is of potential value in understanding their architecture and the principles that govern the polypeptide chain folding. Structural comparisons of unrelated proteins are most important since structural similarities between proteins suggest that some basic principles, rather than the evolutionary divergence or functional convergence of proteins, are the basis of the similarity. Many significant insights into enzyme structure and function have been provided from the methodologies above. However, the properties of enzyme active site structures have not been investigated in detail. Functional properties, for example binding to ligands and catalysis, require precise distributions of appropriate groups. Therefore assessing these properties may allow an understanding of the formation of enzyme active sites and may provide guidance for protein design.

The Gly residue is unique among the amino acids in that all side chains are hydrogen atoms. Its conformation has greater freedom so that it can provide flexibility for adjacent residues. Because of this, it is not surprising that Gly plays a special role in enzyme structure and function. In this paper, we present a method to investigate the properties of enzyme active sites by comparing amino acid sequence. This analysis suggests that Gly residues may provide flexibility for enzyme active sites.


MATERIALS AND METHODS

X-ray crystallographic data for various proteins were used in the analysis as indicated in the tables. Only non-homologous structures were used in the study. The amino acid sequences of the enzymes were divided into several parts according to the number of essential amino acids involved in the enzyme active sites. The number of amino acid residues investigated for each enzyme was less than 30% of the total amino acids of the enzyme except copper-zinc superoxide dismutase, for which 73 of 151 residues were examined. The hydrophobicity of amino acid residues in proteins was estimated according to the method of Rose et al. (12). Hydrophobic residues include Cys, Val, Ile, Leu, Met, Phe, and Trp; amphipathic residues include Ala, His, Thr, and Tyr; hydrophilic residues include Ser, Pro, Asp, Asn, Glu, Gln, Lys, and Arg. Specifically, continuous 3-residue segments containing Gly (G-X-Y and Y-X-G) in active site regions were extracted to investigate their properties. Both Y-X-G and G-X-Y oligopeptides are designated G-X-Y oligopeptides under "Results and Discussion."


RESULTS AND DISCUSSION

In the majority of enzymes, the residues involved in binding and catalysis come close together in the tertiary structure to form the enzyme active site "pocket" (Fig. 1) but are dispersed along the amino acid sequence. Binding and catalysis take place within the pocket (some workers term it as tunnel or cleft). In the case of cellobiohydrolase I from Trichoderma reesei, small angle x-ray scattering techniques indicate a maximum length of 18 nm and a largest diameter of 4.4 nm (13). The crystal structure for the catalytic core of the enzyme showed that it consists of a large single domain with dimensions of approximately 6 × 5 × 4 nm and contains a 4-nm-long active site tunnel (14). On the other hand, enzymes are usually built from domains. When ligands are present in the pocket this is designated as the "closed form" of the enzyme molecule. Without bound ligands, the domains are farther apart and the pocket is accessible. This is the "open form." The solvent-accessible area is increased in the open form compared with the closed form. The main function of the open form is to allow access to the active site. The major force driving the closing of domains is probably the exclusion of water from the binding site, maximizing interdomain salt links and electrostatic and hydrophobic interactions. Domain motion is critical for a variety of the enzyme activities. In this work, first we evaluate the intrinsic flexibility of enzyme active sites, and then discuss the constraints of packing and the limited flexibility of the main amino acid chain of enzymes.


Fig. 1. A model for an enzyme molecule structure. The enzyme is built from domains. The active site pocket is situated in a limited region of the enzyme conformation. R, essential residues for the enzyme activity.
[View Larger Version of this Image (9K GIF file)]


Table I summarizes the amount and location of G-X-Y oligopeptides in the active site regions of the 23 enzymes examined (15-37). Fig. 2 shows the frequency of the X, Y residues occurring in the G-X-Y oligopeptides. In general, the X and Y residues in the G-X-Y oligopeptides of these enzymes possess the following properties: 1) X and Y are often polar and non-polar residues, respectively; 2) X and Y are usually small in size and polarity; and 3) the frequency of such G-X-Y oligopeptides is significantly higher in active site regions than in other parts of the enzyme molecule.

Table I.

The amounts and location of the G-X-Y oligopeptides in enzymes

The essential amino acid residues are indicated by numbers, and the G-X-Y oligopeptides are underlined. Total, total oligopeptides containing Gly residue in an enzyme molecule; %, percentage of the G-X-Y oligopeptides in total oligopeptides containing Gly residue in enzymes; CBH II, catalytic core of cellobiohydrolase II from T. reesei; GPDH, D-glyceraldehyde 3-phosphate dehydrogenase from lobster; PTPase, phosphotyrosine protein phosphatase; PFK, phosphofructokinase from Bacillus stearothermophilus; CGTase, cyclodextrin glycosyltransferase from Bacillus circulans; Rubisco, ribulose 1,5-bisphosphate carboxylase/oxygenase from Spinach; GDH, D-glycerate dehydrogenase from Hyphomicrobium methylovorum; Cu, Zn-SOD, copper-zinc superoxide dismutase; adenylate kinase from Escherichia coli.
Enzymes No. of amino acids Total Location

%
177           240
1. GDH 321 23 52  <UNL>TLGIYGFGSIGQA</UNL>LAKR<UNL>AQGFD</UNL>MDIDYFD---<UNL>PQGAI<UNL>RGDL</UNL></UNL>---<UNL>EAGRL</UNL>A<UNL>YAGFD</UNL>V<UNL>FAG</UNL>-
  269               287
  EPNI<UNL>NEGYY</UNL>DLPNTFLFPHI<UNL>GSA</UNL>
        176179                                              231
GPDH 333 29 31  <UNL>VEGLM</UNL>TTVHAVTATQKT<UNL>VDGPS</UNL>AKD<UNL>WRGGRGAA</UNL>---<UNL>STGAA</UNL>K<UNL>AVGKV</UNL>IPE<UNL>LDGKLTGMA</UNL>FR---
               4446              61  69       78 81
Cu, Zn-SOD 151 23 61  <UNL>VTGSITGLTEGDHGF</UNL>HVH<UNL>EFGDNTNG</UNL>---<UNL>SAGP</UNL>H---H<UNL>GGPK</UNL>DEERH<UNL>VG</UNL>D<UNL>LGNV</UNL>TAD<UNL>KNGVA</UNL>---
          118
  <UNL>IIGRT</UNL>MVVHEKPD<UNL>DLGRGGNE</UNL>
                 58   63 66  114
Glutathione reductase 478 38 50  <UNL>ELGAR</UNL>AAVVESH<UNL>KLGGT</UNL>CV<UNL>NVG</UNL>CVPK---YENNLTKSHIEI<UNL>IRGHA</UNL>---<UNL>IPGASLGITSDGFF</UNL>N-
                197 201               291
  LEE<UNL>LPGRS</UNL>V<UNL>IVGAG</UNL>YIANE<UNL>MAGIL</UNL>S<UNL>ALGSK</UNL>---<UNL>AIG</UNL>RVPNTKDLSLNKLGINTD<UNL>DLGHI</UNL>---
          331                               467  472
  <UNL>VKGIYAVG</UNL>D<UNL>VCGKA</UNL>LLTPVAI<UNL>AAGRK</UNL>---<UNL>GAT</UNL>KADFDNTVAIHPTSSE
                   72                 162
2. PFK 319 40 50  <UNL>EVGDVGDI</UNL>I<UNL>HRGGTI</UNL>LYTARCPEFKT<UNL>EEGOKKGIE</UNL>---RTYVIE<UNL>VMGRHAGDI</UNL>AL<UNL>WSGLAGGA</UNL>-
                    222                   243     252
  <UNL>E</UNL>---<UNL>GHERGK</UNL>KHSIIIVAE<UNL>GVGSGVDFGRQ</UNL>IQE<UNL>ATGFE</UNL>TRVT<UNL>VLGHV</UNL>QR<UNL>GGS</UNL>
2729 3233      5153                   139
CGTase 684 53 34 D<UNL>G</UNL>NPSNN<UNL>PTGAA</UNL>---G<UNL>G</UNL>D<UNL>WQGLI</UNL>---<UNL>AKGIK</UNL>TVIDFAPNHTSPAMETDTSFA<UNL>ENGRL</UNL>Y<UNL>DNGTLV</UNL>-
                           190       199          229  233
  <UNL>GGYT</UNL>ND<UNL>TNGYF</UNL>H<UNL>HNGGSD</UNL>FSSL<UNL>ENG</UNL>IYKNLYDLADD---<UNL>DMGVDGIR</UNL>VDAVKHM<UNL>PLGWQ</UNL>---
     257       328
  <UNL>TFG</UNL>EW<UNL>FLGSA</UNL>---D---
       13                   123                        156 167
Adenylate kinase 214 20 45  <UNL>LLGAPGA</UNL>K<UNL>GTQ</UNL>AQFIME<UNL>KYGIP</UNL>---<UNL>IVG</UNL>RRVHA<UNL>PSGRV</UNL>---<UNL>VEGKDDVTGEE</UNL>LTTR---RLVEY-
  HQMTAP<UNL>LIGYY</UNL>
    69 72                145                     196
3. Carboxypeptidase A 307 21 67  <UNL>DLGI</UNL>HSPEWITN<UNL>ATGVY</UNL>---<UNL>GVD</UNL>ANRNW<UNL>DAGFGAGAS---GNF</UNL>KAFLSIHSYSNLLLY<UNL>PYG</UNL>---
         248               270
  <UNL>GSI</UNL>ITTIYN<UNL>SAGGSI</UNL>---<UNL>GIK</UNL>YSFTFGLR<UNL>DTGRYGEL</UNL>
      12              119
RNase A 124 3 33 ------H------<UNL>CEGDP</UNL>YVPVH---
              35     52       62 63              108
Lysozyme 129 11 64  <UNL>GLYSLGNY</UNL>VCAAKFE---<UNL>GST</UNL>D<UNL>YGIL</UNL>N---WWC<UNL>DNGRTPGSR</UNL>---<UNL>GMN</UNL>AW---
       25              106                   158
Papain 212 26 38  <UNL>NAGSCGS</UNL>C---<UNL>GGIFVGPCGNK</UNL>VDHAVA<UNL>AVGYNPGYT</UNL>---<UNL>SWGPG</UNL>Y<UNL>DCGYS</UNL>---
     16      57      102        194195     214
Chymotrypsin A 245 21 38  <UNL>GLS</UNL>RIV<UNL>NGEE</UNL>---H<UNL>CGVT</UNL>---D---<UNL>GVSSCMG</UNL>DS<UNL>GGP</UNL>---<UNL>GIV</UNL>SW<UNL>GSS</UNL>---
175                   221                      272       364367
CBH II 447 27 41 DCAALA<UNL>SNGEY</UNL>SI<UNL>ADGGVA</UNL>---DSLANLVT<UNL>NLGTP</UNL>---<UNL>PAGHAGWLG</UNL>W---<UNL>PTG</UNL>NNNW<UNL>GD</UNL>WCN<UNL>V</UNL>-
  <UNL>IGTGFGIR</UNL>---
12    18              129
PTPase 157 6 50 C<UNL>LGNI</UNL>CR---<UNL>GSY</UNL>DPQKQLIIEDP<UNL>YYG</UNL>
         179       204
 alpha -Amylase 403 40 35  <UNL>DIGFDGWRF</UNL>DF<UNL>AKGYS</UNL>---EIWTSL<UNL>AYGG</UNL>D<UNL>GKP</UNL>---<UNL>TKGIL</UNL>NVA<UNL>VEGEL</UNL>WR<UNL>LRGTDGKAPGMIG</UNL>-
  276277    289
  WWPAKAVTFVDNHD<UNL>TGST</UNL>QHMWPFPSDRV<UNL>MOGYA</UNL>ILT<UNL>HPGTP</UNL>
           9293 96    102              149
Astacin 200 14 43  <UNL>ANGCVYHGTI</UNL>IHELLMH<UNL>AIGPY</UNL>H---<UNL>YVGED</UNL>YQYYSIMHY<UNL>GKY</UNL>SPSI<UNL>QWGVL</UNL>
                 70 73                130
Lactamase 291 15 53  <UNL>KSGKE</UNL>VKFNSDKRFAYASTSK---<UNL>YVGKD</UNL>ITLKALIEASMTYSDNTANNKIIK<UNL>EIGGIK</UNL>KVKQR-
                166                      234 236
  LK<UNL>ELGDK</UNL>VTNPVRYE---<UNL>KSGDT</UNL>LI<UNL>KDGVP</UNL>KPKDYKVADKSGQA
124125128         175        223226      260262       289
Dehalogenase 310 15 53 DW<UNL>GG</UNL>F<UNL>LGLT</UNL>---<UNL>ADGFT</UNL>AW---<UNL>NAGVR</UNL>KFPKMV---<UNL>AIGMK</UNL>DKL<UNL>LGPD</UNL>---<UNL>DAG</UNL>HFN<UNL>EFGEN</UNL>-
                 60  6566                       123
4. Rubisco 476 41 56  <UNL>QPGVP</UNL>PE<UNL>EAGAA</UNL>VAAES<UNL>STG</UNL>TWTTVW<UNL>TDGLT</UNL>---<UNL>EEGSV</UNL>TNMFTS<UNL>IVG</UNL>N<UNL>VFGFK</UNL>---<UNL>KYGRPL</UNL>-
       175177                           334                  381
  <UNL>LGCT</UNL>IKPK<UNL>LGLS</UNL>AK<UNL>NYGRA</UNL>---<UNL>LSGGDH</UNL>I<UNL>HSGTVVG</UNL>KLERDI<UNL>TLGFV</UNL>---<UNL>TPGVL</UNL>PVASG<UNL>GI</UNL>-
                403404
  <UNL>H</UNL>---<UNL>IFGDD</UNL>SVLQFGG<UNL>GTLGHPWGNAPGAV</UNL>
Enzymes No. of amino acids Total Location
%
238                         274
Citrate synthase 437 34 56 H<UNL>EGGNV</UNL>SAHTSH<UNL>LVGSA</UNL>---<UNL>MNGLAGPL</UNL>H<UNL>GLA</UNL>NE<UNL>VLGWL</UNL>---<UNL>AGAD</UNL>ASLRDYIWNTL<UNL>NSGRVVP</UNL>-
    320      329                             375
  <UNL>GYG</UNL>HAVLRKTDPRYTVNREFALKH<UNL>LPGDP</UNL>---<UNL>ENGAA</UNL>ANPWPNVDA<UNL>HSGVL</UNL>LN<UNL>YYGMT</UNL>---
       401             421
  <UNL>LFGVS</UNL>R<UNL>ALGVL</UNL>---<UNL>ALGFP</UNL>LERPKSMS<UNL>TDGLI</UNL>
      13               95          165
5. Phosphate isomerase 248 23 35  <UNL>FVGGNW</UNL>K<UNL>MNG</UNL>---<UNL>DIGAA</UNL>WV<UNL>ILG</UNL>HSERRH<UNL>VFG</UNL>---EPVW<UNL>AIGTGKT</UNL>---
158                          195   201 204 210
6. Glutathione synthetase 316 22 50 ILKP<UNL>LDGMGGAS</UNL>---<UNL>GVL</UNL>AETLT<UNL>EHGTR</UNL>YCMAQNYLPAI<UNL>KDGD</UNL>KVLV<UNL>VDGEP</UNL>---<UNL>QIGPT</UNL>LK<UNL>EK</UNL>-
          275           289
  <UNL>GLIFVGLD</UNL>I<UNL>GDR</UNL>LTEINVTSPTCIREIEAEFPVS<UNL>ITGML</UNL>
   47                         269                      397
Glutamine synthetase 468 31 32  <UNL>EEG</UNL>KM<UNL>FDGSSIGGWKGIN</UNL>---<UNL>MFGDNGSGM</UNL>HCHMSLS<UNL>KNGVN</UNL>---<UNL>HPGEA</UNL>MDKNLYDLPPEIPN-
  <UNL>VAG</UNL>


Fig. 2. Frequency of amino acid residues observed in x, y position in the G-X-Y or Y-X-G oligopeptides. Taken from the data set of 381 G-X-Y or Y-X-G oligopeptides in 23 enzymes examined. Data were from non-homogenous structures determined.
[View Larger Version of this Image (22K GIF file)]


These observations suggest that such G-X-Y oligopeptides cannot interact strongly with other parts of the enzymes that contain bulky and large charged or hydrophobic residues. In other words, Gly residues provide more flexibility for enzyme active sites than for other regions. In the active site regions, the most probable amino acids in either X or Y position in the G-X-Y oligopeptides include Val, Leu, Ile, Ala, Ser, Thr, and Asp (frequency, >6%) (Fig. 2). By contrast, Cys is rare in such G-X-Y oligopeptides. The reason for this is not clear; the frequency of His, Gln, Met, and Trp is also low (frequency, <2%). It is worthy to note that the overall frequency of Pro in G-X-Y oligopeptides is moderate (4.2% for the X position and 5.8% for the Y position, respectively). With the increasing size of the protein sequence database, it is becoming apparent that Pro residues are found at a much higher frequency than average in many proteins. MacArthur and Thorton (38) indicated that the overall frequency of Gly residues in X-Pro pairs of proteins was 8.8%, and that Pro influenced the conformation of both the preceding and following residues when the X-Pro and Pro-X pairs are in the beta -conformation. Gly appears to be restricted in its conformational freedom when followed by a Pro residue. Thus the unit Gly-Pro nearly always adopts the extended conformation. In the active site regions, the relative low frequency of Pro in G-X-Y oligopeptides may be favored to reduce the influence of the rigid dihedral conformation. In addition, enzyme active sites were not always observed in the helix structure. This may be because the G-X-Y oligopeptide often makes a bend or coil structure.

In principle, both X and Y residues with strong polarity would result in a 41% exposure of the G-X-Y oligopeptide to the protein surface, and large bulky hydrophobic residues in the X and Y position would lead to the G-X-Y that might be buried within enzyme molecules (12). This rule is generally obeyed by enzyme active site structures (Fig. 2). The exceptions are in phosphofructokinase, where two GKKs were found, and in glutathione reductase, D-glyceraldehyde 3-phosphate dehydrogenase, alpha -amylase, and glutamine synthetase, where one GRK, one GRW, one GWR and one GWK were found, respectively. However, these oligopeptides are not associated with the active sites directly, and their functional role is not well understood.

It should be noted that in some enzymes of the oligopeptides containing Gly residues, the percentage of the G-X-Y oligopeptides is low (<25%), especially in large enzymes such as DNA topoisomerase I (864 residues), ribonucleotide reductase protein R1 (1522 residues), prostaglandin H2 synthase-1 (576 residues), myeloperoxidase (578 residues), and bile salt-activated lipase (722 residues). One possible explanation for this discrepancy is that the oligopeptides containing Gly residue in non-active site regions often have strongly polar residues and large basic residues; in the meantime, because proteins are folded, enzyme active sites are not sequentially contiguous, and they may be far apart along the sequence and brought into physical proximity by the fold of the enzyme. As a result, such G-X-Y oligopeptides would be likely to interact strongly with some other portion of the enzyme or play a structural role. The enzyme conformation requires a coil or bend structure, and turn or coil propensity is expected with Gly. In the case of lysozyme, oligopeptide segments containing Gly residues (4-6, 13-16, 114-117, and 125-128) that were outside of the active site regions are Gly-Arg-Cys, Lys-Arg-His-Gly, Arg-Cys-Cys-Gly, and Arg-Gly-Cys-Arg, respectively. The properties of these oligopeptides are obviously different from the G-X-Y oligopeptides we discuss here.

To further investigate the properties of the G-X-Y oligopeptides in enzymes, the main chain conformation of residues in the G-X-Y oligopeptides of copper-zinc superoxide dismutase was studied in more detail. The secondary structure and dihedral angles of each residue are shown in Table II. The most striking feature here is the beta  and loop structures adopted by the G-X-Y oligopeptides. On the other hand, 9 of 14 Gly residues in the G-X-Y oligopeptides have conformations outside the energetically favorable regions. Three non-glycine residues in the G-X-Y oligopeptides that fall outside the normally allowed regions are Asn-90, Leu-124, and Asn-129, respectively. These unusual dihedral angles suggest that they have steric strained backbones as demonstrated by Herzberg and Moult (39), who examined 24 high resolution refined proteins and indicated that 10 of the 24 proteins have a strained backbone. A residue with a strained backbone is rare in proteins, and where it does occur, it is always in regions of the structures intimately involved in function. In the present work, residues with strained backbones were also found in some enzymes as shown in Table III. Gly residue has little influence on the conformation of a residue preceding or following it, thus a strained backbone in active site regions may be understood in terms of more stringent structural requirements for enzyme function.

Table II.

The conformations of residues involved in the active site regions of copper-zinc superoxide dismutase


No. Residue  phi  psi Structure No. Residue  phi  psi Structure

29 V  -112 123  beta 78 H  -89 125 Loop
30 T  -133 165  beta 79 V  -40  -38 Loop
31 G 148  -179  beta 80 G  -73  -3 Loop
32 S  -125 152  beta 81 D  -113 146 Loop
33 I  -140 135  beta 82 L  -135 8  beta
34 T  -137 159  beta 83 G 79  -130  beta
35 G 81 39  beta 84 N  -123 136  beta
36 L  -102 161  beta 85 V  -92 151  beta
37 T  -103 112  beta 86 T  -114 152  beta
38 E  -99 147  beta 87 A  -137 140  beta
39 G 124  -170  beta 88 D  -139  -175  beta
40 D  -85 141  beta 89 K  -86 95  beta
41 H  -137 150  beta 90 N  -122  -97  beta
42 G  -47 128  beta 91 G  -149 78  beta
43 F  -143 101  beta 92 V  -120 134  beta
44 H  -142 159  beta 93 A  -126 108  beta
45 V  -111 129  beta
46 H  -99 147  beta 110 I  -119 1 Loop
47 Q  -49  -56 Loop 111 I  -71 113 Loop
48 F  -105 155 Loop 112 G 109 3 Loop
49 G  -94 26 Loop 113 R  -107 166  beta
50 D  -83 109 Loop 114 T  -105 132  beta
51 N  -101 20 Loop 115 M  -85 131  beta
116 V  -137 142  beta
57 S  -79 1 Loop 117 V  -115 105  beta
58 A  -75  -34 Loop 118 H  -79 176  beta
59 G 70  -163 Loop 119 E  -81  -25 Loop
60 H  -57 136 Loop 120 K  -132 162 Loop
121 P  -61 140 Loop
69 H  -87 132 Loop 122 D  -76  -149 Loop
70 G  -145  -158 Loop 123 D  -75  -22 Loop
71 G  -98 173 Loop 124 L  -40 11 Loop
72 P  -50  -37 Loop 125 G 117  -13 Loop
73 K  -78  -30 Loop 126 R  -78  -28 Loop
74 D  -79 155 Loop 127 G  -114 91 Loop
75 E  -75  -31 Loop 128 G 122  -73 Loop
76 E  -93 89 Loop 129 N  -140  -142 Loop
77 R  -155 155 Loop 130 E  -90  -35 Loop

Table III.

Residues with the strained backbone in enzyme active site regions


Enzyme Residue  phi  psi Location

1. Copper-zinc Asn-90  -122  -97 In the G-X-Y
  superoxide Leu-124  -40  -11 In the G-X-Y
  dismutase Asn-129  -140  -142 In the G-X-Y
2. Glutathione His-52  -123  -118 Not in the G-X-Y
  reductase His-219  -119  -146 Not in the G-X-Y
3. CGTase Ala-152 44  -127 Not in the G-X-Y
Tyr-195  -62  -114 Not in the G-X-Y
4.  alpha -Amylase Asp-214  -170  -20 In the G-X-Y
Ser-292  -50  -120 In the G-X-Y
5. Astacin Ser-72  -82  -143 In the G-X-Y
6. Dehalogenase Asp-124  -49  -135 At active site
7. Lactamase Ala-69  -42  -140 Not in the G-X-Y
Leu-220  -108  -125 Not in the G-X-Y
8. Carboxyl Ser-199  -150 0 Not in the G-X-Y
  peptidase A Ile-247  -103  -89 Not in the G-X-Y
Asp-273  -99  -150 In the G-X-Y
9. Citrate synthase His-274  -124  -127 At active site
10. Glutathione Ser-155  -76  -58 Not in the G-X-Y
  synthetase Cys-289  -86  -6 At active site

Gerstein et al. (40) classified the structural mechanisms for domain movements of proteins into "shear motion" and "hinge motion." One of the main features for domain closure is that the main chain packing for shear motion is constrained by close packing, while for hinge motion it is free to kink. Citrate synthase is one of the clearest examples of a domain closure occurring through shear motions in which a steric-strained backbone was found (Table II). However, there is also evidence that adenylate kinase and lysozyme possess hinge motions, but no strained backbone was found in two enzymes (26, 39). Indeed, many enzymes have domain linkages. The advantage of a single linkage is a wide degree of flexibility, allowing the two domains considerable conformational freedom. However, for many large proteins, too much flexibility may be a considerable disadvantage. The addition of a second or third connecting segment could reduce the flexibility and restrict the possible modes of domain movements. This situation is often found in some enzymes with hinge motions (40). Therefore the observations above may suggest that the shear motions controlling domain movements in enzymes may be due to a sterically strained backbone. More detailed studies of the relationships in these examples would be informative.

The analysis suggests that with more evaluation, prediction of some regions that contain the enzyme active sites may be possible. For an enzyme of unknown structure when the amino acid sequence is obtained, the regions rich in the G-X-Y oligopeptides might contain the active sites of the enzyme. By contrast, amino acid residues, which were not located in the regions, are unlikely to be involved in enzyme active sites. It may be useful for predicting the active sites of some glycosyl hydrolytic enzymes such as cellulase, which was determined to possess the same mechanism of lysozyme. For endoglucanase A (41) from Cellulomonas fimi, Glu-90, Asp-93, Asp-126, Trp-233, Glu-250, Asp-252, Trp-293, Asp-297, Glu-309, and Trp-384 had been found to be in the G-X-Y oligopeptide-rich regions. Therefore some of them may be involved in the enzyme active sites. An obvious limitation of the method for many enzymes is that we can locate only the enzyme active site regions rather than the particular amino acids involved in the active sites. However, our analysis indicated that the high frequency of occurrence of the G-X-Y oligopeptides in enzyme active site regions may be of some relevance to enzyme function. Collectively, the present analysis demonstrated that enzyme active sites are formed by relatively weak molecular interactions. The G-X-Y oligopeptides we defined here may be a structural motif for the formation of enzyme active sites. The evidence available from crystal analysis of some proteins suggests that the open and closed forms are only slightly different in energy and at room temperature are in dynamic equilibrium (42, 43). Therefore domain closure must be fast, and the transition between open and closed forms cannot involve a high energy barrier. Indeed, rapid diffusion in and out by the substrate and product requires a more open binding site. In catalysis, domain closure often excludes water from the active site, helps position catalytic groups around the substrate, and also traps substrates and prevents the escape of reaction intermediates. Therefore, efficient catalysis and substrate recognition and alignment may be expected to require flexible enzyme active sites with local strain.

Our findings give a new clue to the design of functional proteins, especially in the design of enzyme de novo. Although several designed peptides and proteins have been reported to have substantial enzymatic activity (44, 45), the structural basis for this activity has been established for only one peptide. Protein design requires consideration of various aspects involved in protein folding and stability. A rational strategy for the design of enzymes should be based on the recognition that the formation of enzyme active site structures needs only one or, at most, a few structural motifs. If this reasoning is accepted, it may provide an intuitive basis for expected motifs. The fact that structural motifs recur in many proteins even though the sequences vary widely led us to hypothesize that proteins are assembled by the combination of individual folding motifs and that these motifs represent specific functional modules of the protein. Consequently, the construction of such enzyme active site motifs for the G-X-Y oligopeptides is an attractive strategy for the design of enzymes. On the one hand, the strategy should be based on geometric considerations to ensure the formation of the active site pocket, with additional checks to avoid unfavorable steric interactions. On the other hand, a rational strategy for enzyme design must reflect the hierarchy of forces required for stabilizing tertiary structure, beginning with hydrophobic forces and adding more specific interactions as required to achieve a unique functional enzyme. Recently, some proteins with defined conformational properties have been designed (46-50), indicating that the design of proteins is indeed a feasible enterprise; similar progress on the design of enzymes will certainly follow. The ultimate goal in protein design is to elucidate the fundamental principles that determine structure. With increased understanding of the molecular basis underlying the sequence-structure relationship may come the ability to control it and generate proteins with desired specifications. Efforts toward such exquisite designs will undoubtedly foster our understanding of how enzymes work at a molecular level.


FOOTNOTES

*   The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
§   To whom correspondence should be addressed. Tel.: 86-0531-8902610; Fax: 86-0531-8915234.

Acknowledgments

We thank professor C. L. Tsou for the helpful comments. We are grateful to Drs. D. W. Lin and Y. Guo for valuable discussions. Drs. W. F. Liu, C. X. Wang, Y. G. Mu, and X. L. Cheng are thanked for their critical reading of the manuscript.


REFERENCES

  1. Koshland, D. E., Jr. (1958) Proc. Natl. Acad. Sci. U. S. A. 44, 98-104
  2. Kraut, J. (1988) Science 242, 533-540 [Medline] [Order article via Infotrieve]
  3. Jonson, K. A. (1993) Annu. Rev. Biochem. 62, 685-713 [CrossRef][Medline] [Order article via Infotrieve]
  4. Bone, R., Sile, J. L., and Aghard, D. A. (1989) Nature 339, 191-195 [CrossRef][Medline] [Order article via Infotrieve]
  5. Tsou, C. L. (1986) Trends Biochem. Sci. 111, 427-429
  6. Tsou, C. L. (1993) Science 262, 380-381 [Medline] [Order article via Infotrieve]
  7. Anfinsen, C. B. (1973) Science 181, 223-230 [Medline] [Order article via Infotrieve]
  8. Sali, A., Shakhnovich, E. I., and Karplus, M. (1994) Nature 369, 248-251 [CrossRef][Medline] [Order article via Infotrieve]
  9. Crippen, G. M. (1991) Biochemistry 30, 4232-4237 [Medline] [Order article via Infotrieve]
  10. Sippl, M. J. (1990) J. Mol. Biol. 213, 859-883 [Medline] [Order article via Infotrieve]
  11. Monge, A., Friesner, R. A., and Honig, B. (1994) Proc. Natl. Acad. Sci. U. S. A. 91, 5027-5029 [Abstract]
  12. Rose, G. D., Geselowitz, A. R., Lesser, G. J., Lee, R. H., and Zehfus, M. H. (1985) Science 229, 834-837 [Medline] [Order article via Infotrieve]
  13. Schumuck, M., and Pilz, I. (1986) Biotechnol. Lett. 8, 397-402
  14. Divne, C., Stahlberg, J., Reinikainen, T., Rouhonen, L., Pettersson, G., Knowles, J. K. C., Teeri, T., and Jones, T. A. (1994) Science 265, 524-528 [Medline] [Order article via Infotrieve]
  15. Tao, W. S., Lee, W., Jiang, Y. M., Luo, G. M., and Lin, Y. Q. (1982) Molecular Basis of Proteins, pp. 203-205, People Education Press (in Chinese), Beijing
  16. Fersht, A. (1985) Enzyme Structure and Mechanism, pp. 4-5, W. H. Freeman and Co., London
  17. Smyth, D. G., Stein, W. H., and Moore, S. (1963) J. Biol. Chem. 238, 227-234 [Free Full Text]
  18. Rouvinen, J., Bergfors, T., Teeri, T., Knowles, J. K. C., and Jones, T. A. (1990) Science 249, 380-386 [Medline] [Order article via Infotrieve]
  19. Drenth, J., Jansonius, J. N., Koekoek, R., Swen, H. M., and Woltlers, B. G. (1968) Nature 218, 929-932 [Medline] [Order article via Infotrieve]
  20. Tulinsky, A., and Wright, L. H. (1973) J. Mol. Biol. 81, 47-59 [Medline] [Order article via Infotrieve]
  21. Biesecker, G., Harris, J. I., Thierry, J. C., Walker, J. E., and Wonacott, A. J. (1977) Nature 266, 328-333 [Medline] [Order article via Infotrieve]
  22. Corran, P. H., Furth, A. J., Milman, J. D., Offord, R. E., Priddle, J. D., and Waley, S. G. (1975) Nature 255, 609-614 [Medline] [Order article via Infotrieve]
  23. Camici, G., Manao, G., Cappugi, G., Modesti, A., Stefani, M., and Ramponi, G. (1989) J. Biol. Chem. 264, 2560-2567 [Abstract/Free Full Text]
  24. Shirakihara, Y., and Evans, P. R. (1988) J. Mol. Biol. 204, 973-994 [Medline] [Order article via Infotrieve]
  25. Colombo, G., and Villafranca, J. J. (1986) J. Biol. Chem. 261, 10587-10591 [Abstract/Free Full Text]
  26. Muller, C. W., and Schulz, G. E. (1992) J. Mol. Biol. 224, 159-177 [Medline] [Order article via Infotrieve]
  27. Yamaguchi, H., Hiroaki, K., Hata, Y., Nishioka, T., Kimura, A., Oda, J., and Katsube, Y. (1993) J. Mol. Biol. 229, 1083-1100 [CrossRef][Medline] [Order article via Infotrieve]
  28. Goldberg, J. D., Yoshida, T., and Brick, P. (1994) J. Mol. Biol. 236, 1123-1140 [Medline] [Order article via Infotrieve]
  29. Knight, S., Andersson, I., and Branden, C.-I. (1990) J. Mol. Biol. 215, 113-160 [Medline] [Order article via Infotrieve]
  30. Klein, C., and Schulz, G. E. (1991) J. Mol. Biol. 217, 737-750 [CrossRef][Medline] [Order article via Infotrieve]
  31. Blosham, D., Parmelee, D. C., Kumer, S., Wade, P. D., Erisson, L. H., Neuragh, H., Walsh, K. A., and Titani, K. (1981) Proc. Natl. Acad. Sci. U. S. A. 78, 5381-5385 [Abstract]
  32. Herzberg, O. (1991) J. Mol. Biol. 217, 701-719 [Medline] [Order article via Infotrieve]
  33. Janssen, D. B., Pris, F., Ploeg, J. V., Kazemier, B., Terpstra, P., and Witholt, B. (1989) J. Bacteriol. 171, 6791-6799 [Medline] [Order article via Infotrieve]
  34. Gomis-Ruth, F. X., Stocker, W., Huber, R., Zwilling, R., and Bode, W. (1993) J. Mol. Biol. 229, 945-968 [CrossRef][Medline] [Order article via Infotrieve]
  35. Kadziola, A., Abe, J. I., Svensson, B., and Haser, R. (1994) J. Mol. Biol. 239, 104-121 [CrossRef][Medline] [Order article via Infotrieve]
  36. Tainer, J. A., Getzoff, E. D., Beem, K. M., Richardson, J. S., and Richardson, D. C. (1982) J. Mol. Biol. 160, 181-217 [Medline] [Order article via Infotrieve]
  37. Karplus, P. A., and Schulz, G. E. (1987) J. Mol. Biol. 195, 701-729 [Medline] [Order article via Infotrieve]
  38. MacArthur, M. W., and Thorton, J. M. (1991) J. Mol. Biol. 218, 397-412 [Medline] [Order article via Infotrieve]
  39. Herzberg, O., and Moult, J. (1991) Proteins Struct. Funct. Genet. 11, 223-229 [Medline] [Order article via Infotrieve]
  40. Gerstein, M., Lesk, A. M., and Chothia, C. (1994) Biochemistry 33, 6739-6749 [Medline] [Order article via Infotrieve]
  41. Wong, W. K. R., Gerhard, B., Guo, Z. M., Kilburn, D. G., Warren, R. A. J., and Miller, R. C., Jr. (1986) Gene (Amst.) 44, 315-324 [Medline] [Order article via Infotrieve]
  42. Sharff, A. T., Rodseth, L. E., Spurlino, J. C., and Quiocho, F. A. (1992) Biochemistry 31, 10657-10663 [Medline] [Order article via Infotrieve]
  43. Baker, E. N., Rumball, S. V., and Anderson, B. F. (1987) Trends Biochem. Sci. 12, 350-355
  44. DeGrado, W. F. (1993) Trends Biochem. Sci. 365, 488-490
  45. Johnsson, K., Allemann, R. K., Wildmer, H., and Benner, S. A. (1993) Trends Biochem. Sci. 365, 530
  46. Hecht, M. H., Richardson, J. S., Richardson, D. C., and Ogden, R. C. (1990) Science 249, 884-891 [Medline] [Order article via Infotrieve]
  47. Ho, S. P., and DeGrado, W. F (1987) J. Am. Chem. Soc. 109, 6751-6758
  48. Ghadiri, M. R., Granja, J. R., and Buelhle, L. K. (1994) Nature 369, 301-305 [CrossRef][Medline] [Order article via Infotrieve]
  49. Regan, L. (1993) Annu. Rev. Biophys. Biomol. Struct. 22, 257-281 [CrossRef][Medline] [Order article via Infotrieve]
  50. Montal, M. (1995) Annu. Rev. Biophys. Biomol. Struct. 24, 31-57 [CrossRef][Medline] [Order article via Infotrieve]

©1997 by The American Society for Biochemistry and Molecular Biology, Inc.