(Received for publication, February 5, 1996, and in revised form, October 23, 1996)
From the Institute of Microbiology, National
Laboratory of Microbial Technology, Shandong University, Jinan 250100, China and the ¶ Center for Life Sciences, National Laboratory of
Protein Engineering and Plant Genetic Engineering, Peking University,
Beijing 100871, China
The high resolution refined structures of 23 enzymes were analyzed to determine the properties of amino acids involved in active site regions. These regions were found to be rich in G-X-Y or Y-X-G oligopeptides, where X and Y are polar and non-polar residues, respectively, that are small and with low polarity. Other regions of the enzyme molecules have significantly fewer of these sequences. These features suggest that glycine residues may provide flexibility necessary for enzyme active sites to change conformation, and the G-X-Y or Y-X-G oligopeptides may be a motif for the formation of enzyme active sites.
In 1958, Koshland (1) proposed the "induced fit" model of enzyme action in which conformational changes induced by substrate binding could orient functional groups on an enzyme so as to enhance the efficiency of the subsequent chemical process. Since then, the number of examples where ligand binding and solvation alter three-dimensional structures seems to increase proportionally with the information available from structure biology. Conformational flexibility of enzymes required for this effect is well recognized (2-4). Recent evidence from folding and unfolding of enzymes indicated that loss of enzyme activity can precede marked changes in protein conformation. Tsou (5, 6) has demonstrated that enzyme active sites may be conformationally more flexible than the intact enzymes.
On the other hand, the result that ribonuclease can be denatured reversibly leads to the contemporary view that protein conformation is determined by the amino acid sequence (7). Therefore, methodologies allowing comparison of amino acid sequences to investigate enzyme structure and function have been developed (8-11). The information obtained by comparative analysis of the known protein structures is of potential value in understanding their architecture and the principles that govern the polypeptide chain folding. Structural comparisons of unrelated proteins are most important since structural similarities between proteins suggest that some basic principles, rather than the evolutionary divergence or functional convergence of proteins, are the basis of the similarity. Many significant insights into enzyme structure and function have been provided from the methodologies above. However, the properties of enzyme active site structures have not been investigated in detail. Functional properties, for example binding to ligands and catalysis, require precise distributions of appropriate groups. Therefore assessing these properties may allow an understanding of the formation of enzyme active sites and may provide guidance for protein design.
The Gly residue is unique among the amino acids in that all side chains are hydrogen atoms. Its conformation has greater freedom so that it can provide flexibility for adjacent residues. Because of this, it is not surprising that Gly plays a special role in enzyme structure and function. In this paper, we present a method to investigate the properties of enzyme active sites by comparing amino acid sequence. This analysis suggests that Gly residues may provide flexibility for enzyme active sites.
X-ray crystallographic data for various proteins were used in the analysis as indicated in the tables. Only non-homologous structures were used in the study. The amino acid sequences of the enzymes were divided into several parts according to the number of essential amino acids involved in the enzyme active sites. The number of amino acid residues investigated for each enzyme was less than 30% of the total amino acids of the enzyme except copper-zinc superoxide dismutase, for which 73 of 151 residues were examined. The hydrophobicity of amino acid residues in proteins was estimated according to the method of Rose et al. (12). Hydrophobic residues include Cys, Val, Ile, Leu, Met, Phe, and Trp; amphipathic residues include Ala, His, Thr, and Tyr; hydrophilic residues include Ser, Pro, Asp, Asn, Glu, Gln, Lys, and Arg. Specifically, continuous 3-residue segments containing Gly (G-X-Y and Y-X-G) in active site regions were extracted to investigate their properties. Both Y-X-G and G-X-Y oligopeptides are designated G-X-Y oligopeptides under "Results and Discussion."
In the majority of enzymes, the residues involved in binding and
catalysis come close together in the tertiary structure to form the
enzyme active site "pocket" (Fig. 1) but are
dispersed along the amino acid sequence. Binding and catalysis take
place within the pocket (some workers term it as tunnel or cleft). In the case of cellobiohydrolase I from Trichoderma reesei,
small angle x-ray scattering techniques indicate a maximum length of 18 nm and a largest diameter of 4.4 nm (13). The crystal structure for the
catalytic core of the enzyme showed that it consists of a large single
domain with dimensions of approximately 6 × 5 × 4 nm and
contains a 4-nm-long active site tunnel (14). On the other hand,
enzymes are usually built from domains. When ligands are present in the
pocket this is designated as the "closed form" of the enzyme
molecule. Without bound ligands, the domains are farther apart and the
pocket is accessible. This is the "open form." The
solvent-accessible area is increased in the open form compared with the
closed form. The main function of the open form is to allow access to
the active site. The major force driving the closing of domains is
probably the exclusion of water from the binding site, maximizing
interdomain salt links and electrostatic and hydrophobic interactions.
Domain motion is critical for a variety of the enzyme activities. In
this work, first we evaluate the intrinsic flexibility of enzyme active
sites, and then discuss the constraints of packing and the limited
flexibility of the main amino acid chain of enzymes.
Table I summarizes the amount and location of G-X-Y oligopeptides in the active site regions of the 23 enzymes examined (15-37). Fig. 2 shows the frequency of the X, Y residues occurring in the G-X-Y oligopeptides. In general, the X and Y residues in the G-X-Y oligopeptides of these enzymes possess the following properties: 1) X and Y are often polar and non-polar residues, respectively; 2) X and Y are usually small in size and polarity; and 3) the frequency of such G-X-Y oligopeptides is significantly higher in active site regions than in other parts of the enzyme molecule.
|
These observations suggest that such G-X-Y
oligopeptides cannot interact strongly with other parts of the enzymes
that contain bulky and large charged or hydrophobic residues. In other
words, Gly residues provide more flexibility for enzyme active sites than for other regions. In the active site regions, the most probable amino acids in either X or Y position in the
G-X-Y oligopeptides include Val, Leu, Ile, Ala,
Ser, Thr, and Asp (frequency, >6%) (Fig. 2). By contrast, Cys is rare
in such G-X-Y oligopeptides. The reason for this
is not clear; the frequency of His, Gln, Met, and Trp is also low
(frequency, <2%). It is worthy to note that the overall frequency of
Pro in G-X-Y oligopeptides is moderate (4.2% for
the X position and 5.8% for the Y position,
respectively). With the increasing size of the protein sequence
database, it is becoming apparent that Pro residues are found at a much
higher frequency than average in many proteins. MacArthur and Thorton (38) indicated that the overall frequency of Gly residues in X-Pro pairs of proteins was 8.8%, and that Pro influenced
the conformation of both the preceding and following residues when the
X-Pro and Pro-X pairs are in the
-conformation. Gly appears to be restricted in its conformational
freedom when followed by a Pro residue. Thus the unit Gly-Pro nearly
always adopts the extended conformation. In the active site regions,
the relative low frequency of Pro in G-X-Y
oligopeptides may be favored to reduce the influence of the rigid
dihedral conformation. In addition, enzyme active sites were not always
observed in the helix structure. This may be because the
G-X-Y oligopeptide often makes a bend or coil
structure.
In principle, both X and Y residues with strong
polarity would result in a 41% exposure of the
G-X-Y oligopeptide to the protein surface, and
large bulky hydrophobic residues in the X and Y
position would lead to the G-X-Y that might be
buried within enzyme molecules (12). This rule is generally obeyed by
enzyme active site structures (Fig. 2). The exceptions are in
phosphofructokinase, where two GKKs were found, and in glutathione
reductase, D-glyceraldehyde 3-phosphate dehydrogenase,
-amylase, and glutamine synthetase, where one GRK, one GRW, one GWR
and one GWK were found, respectively. However, these oligopeptides are
not associated with the active sites directly, and their functional
role is not well understood.
It should be noted that in some enzymes of the oligopeptides containing Gly residues, the percentage of the G-X-Y oligopeptides is low (<25%), especially in large enzymes such as DNA topoisomerase I (864 residues), ribonucleotide reductase protein R1 (1522 residues), prostaglandin H2 synthase-1 (576 residues), myeloperoxidase (578 residues), and bile salt-activated lipase (722 residues). One possible explanation for this discrepancy is that the oligopeptides containing Gly residue in non-active site regions often have strongly polar residues and large basic residues; in the meantime, because proteins are folded, enzyme active sites are not sequentially contiguous, and they may be far apart along the sequence and brought into physical proximity by the fold of the enzyme. As a result, such G-X-Y oligopeptides would be likely to interact strongly with some other portion of the enzyme or play a structural role. The enzyme conformation requires a coil or bend structure, and turn or coil propensity is expected with Gly. In the case of lysozyme, oligopeptide segments containing Gly residues (4-6, 13-16, 114-117, and 125-128) that were outside of the active site regions are Gly-Arg-Cys, Lys-Arg-His-Gly, Arg-Cys-Cys-Gly, and Arg-Gly-Cys-Arg, respectively. The properties of these oligopeptides are obviously different from the G-X-Y oligopeptides we discuss here.
To further investigate the properties of the
G-X-Y oligopeptides in enzymes, the main chain
conformation of residues in the G-X-Y
oligopeptides of copper-zinc superoxide dismutase was studied in more
detail. The secondary structure and dihedral angles of each residue are
shown in Table II. The most striking feature here is the
and loop structures adopted by the G-X-Y
oligopeptides. On the other hand, 9 of 14 Gly residues in the
G-X-Y oligopeptides have conformations outside
the energetically favorable regions. Three non-glycine residues in the
G-X-Y oligopeptides that fall outside the
normally allowed regions are Asn-90, Leu-124, and Asn-129,
respectively. These unusual dihedral angles suggest that they have
steric strained backbones as demonstrated by Herzberg and Moult (39),
who examined 24 high resolution refined proteins and indicated that 10 of the 24 proteins have a strained backbone. A residue with a strained
backbone is rare in proteins, and where it does occur, it is always in
regions of the structures intimately involved in function. In the
present work, residues with strained backbones were also found in some
enzymes as shown in Table III. Gly residue has little
influence on the conformation of a residue preceding or following it,
thus a strained backbone in active site regions may be understood in
terms of more stringent structural requirements for enzyme
function.
|
|
Gerstein et al. (40) classified the structural mechanisms for domain movements of proteins into "shear motion" and "hinge motion." One of the main features for domain closure is that the main chain packing for shear motion is constrained by close packing, while for hinge motion it is free to kink. Citrate synthase is one of the clearest examples of a domain closure occurring through shear motions in which a steric-strained backbone was found (Table II). However, there is also evidence that adenylate kinase and lysozyme possess hinge motions, but no strained backbone was found in two enzymes (26, 39). Indeed, many enzymes have domain linkages. The advantage of a single linkage is a wide degree of flexibility, allowing the two domains considerable conformational freedom. However, for many large proteins, too much flexibility may be a considerable disadvantage. The addition of a second or third connecting segment could reduce the flexibility and restrict the possible modes of domain movements. This situation is often found in some enzymes with hinge motions (40). Therefore the observations above may suggest that the shear motions controlling domain movements in enzymes may be due to a sterically strained backbone. More detailed studies of the relationships in these examples would be informative.
The analysis suggests that with more evaluation, prediction of some regions that contain the enzyme active sites may be possible. For an enzyme of unknown structure when the amino acid sequence is obtained, the regions rich in the G-X-Y oligopeptides might contain the active sites of the enzyme. By contrast, amino acid residues, which were not located in the regions, are unlikely to be involved in enzyme active sites. It may be useful for predicting the active sites of some glycosyl hydrolytic enzymes such as cellulase, which was determined to possess the same mechanism of lysozyme. For endoglucanase A (41) from Cellulomonas fimi, Glu-90, Asp-93, Asp-126, Trp-233, Glu-250, Asp-252, Trp-293, Asp-297, Glu-309, and Trp-384 had been found to be in the G-X-Y oligopeptide-rich regions. Therefore some of them may be involved in the enzyme active sites. An obvious limitation of the method for many enzymes is that we can locate only the enzyme active site regions rather than the particular amino acids involved in the active sites. However, our analysis indicated that the high frequency of occurrence of the G-X-Y oligopeptides in enzyme active site regions may be of some relevance to enzyme function. Collectively, the present analysis demonstrated that enzyme active sites are formed by relatively weak molecular interactions. The G-X-Y oligopeptides we defined here may be a structural motif for the formation of enzyme active sites. The evidence available from crystal analysis of some proteins suggests that the open and closed forms are only slightly different in energy and at room temperature are in dynamic equilibrium (42, 43). Therefore domain closure must be fast, and the transition between open and closed forms cannot involve a high energy barrier. Indeed, rapid diffusion in and out by the substrate and product requires a more open binding site. In catalysis, domain closure often excludes water from the active site, helps position catalytic groups around the substrate, and also traps substrates and prevents the escape of reaction intermediates. Therefore, efficient catalysis and substrate recognition and alignment may be expected to require flexible enzyme active sites with local strain.
Our findings give a new clue to the design of functional proteins, especially in the design of enzyme de novo. Although several designed peptides and proteins have been reported to have substantial enzymatic activity (44, 45), the structural basis for this activity has been established for only one peptide. Protein design requires consideration of various aspects involved in protein folding and stability. A rational strategy for the design of enzymes should be based on the recognition that the formation of enzyme active site structures needs only one or, at most, a few structural motifs. If this reasoning is accepted, it may provide an intuitive basis for expected motifs. The fact that structural motifs recur in many proteins even though the sequences vary widely led us to hypothesize that proteins are assembled by the combination of individual folding motifs and that these motifs represent specific functional modules of the protein. Consequently, the construction of such enzyme active site motifs for the G-X-Y oligopeptides is an attractive strategy for the design of enzymes. On the one hand, the strategy should be based on geometric considerations to ensure the formation of the active site pocket, with additional checks to avoid unfavorable steric interactions. On the other hand, a rational strategy for enzyme design must reflect the hierarchy of forces required for stabilizing tertiary structure, beginning with hydrophobic forces and adding more specific interactions as required to achieve a unique functional enzyme. Recently, some proteins with defined conformational properties have been designed (46-50), indicating that the design of proteins is indeed a feasible enterprise; similar progress on the design of enzymes will certainly follow. The ultimate goal in protein design is to elucidate the fundamental principles that determine structure. With increased understanding of the molecular basis underlying the sequence-structure relationship may come the ability to control it and generate proteins with desired specifications. Efforts toward such exquisite designs will undoubtedly foster our understanding of how enzymes work at a molecular level.
We thank professor C. L. Tsou for the helpful comments. We are grateful to Drs. D. W. Lin and Y. Guo for valuable discussions. Drs. W. F. Liu, C. X. Wang, Y. G. Mu, and X. L. Cheng are thanked for their critical reading of the manuscript.