©1996 by The American Society for Biochemistry and Molecular Biology, Inc.
Substrate Recognition by Recombinant Serine Collagenase 1 from Uca pugilator(*)

(Received for publication, October 3, 1995; and in revised form, February 9, 1996)

Christopher A. Tsu (§) Charles S. Craik (¶)

From the Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94143-0446

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

Uca pugilator serine collagenase 1 was cloned and sequenced from a fiddler crab hepatopancreas cDNA library. A full-length sequence encodes a 270-amino acid pre-pro-enzyme highly identical in structure to the chymotrypsin family of serine proteases. The zymogen form of the enzyme was expressed in Saccharomyces cerevisiae as a fusion with the alpha-factor signal sequence under control of the alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase promoter. Upon activation with trypsin, the recombinant collagenase possesses collagenolytic properties identical to those of the enzyme isolated from the crab hepatopancreas. The collagenase substrate binding pocket recognizes a wide range of basic, hydrophobic, and neutral polar residues. beta-Branched and acidic amino acids are poor substrates. Acylation is rate-limiting for collagenase versus peptidyl amides, rather than deacylation, as for trypsin and chymotrypsin. Correlations relating substrate volume and hydrophobicity to catalysis were found for collagenase and compared to those for chymotrypsin and elastase. Relative enzyme efficiencies on single amino acid versus tetrapeptide amide substrates show that collagenase derives less catalytic efficiency from binding of the primary substrate residue than trypsin or chymotrypsin, but compensates in binding of the extended peptidyl residues. Serine collagenase 1 is a novel member of the chymotrypsin protease family, by virtue of its amino acid sequence and multifunctional active site.


INTRODUCTION

The chymotrypsin family of serine proteases is a paradigm for enzymic substrate recognition. The family is subdivided on the basis of four major classes of P1 (^1)residue substrate specificity: basic, aromatic, aliphatic, and acidic. These specificities are usually mutually exclusive; substrate discrimination is on the order of 10^4 to 10^5 in k/K for trypsin (Lys > Phe), chymotrypsin (Phe > Lys, Phe > Ala), elastase (Ala > Tyr), and V8 protease (Glu > Ala)(2, 3) . These distinct specificities arise from subtle modification of surface loops surrounding a conserved double beta barrel core structure(3) . Sequence and structural similarity suggested a classical model in which only a few critical residues determine substrate specificity(4) . However, recent studies demonstrate that the conversion of one protease into another is complex, requiring the transplantation of several active site loops(5, 6) . Thus, the evolutionary optimization of this enzyme family may obscure important mechanistic and structural commonalities regarding substrate specificity.

Serine collagenase 1 (EC 3.4.21.32) isolated from the hepatopancreas of the fiddler crab, Uca pugilator, is a serine protease capable of cleaving native triple helical collagen(7) . The serine collagenases comprise a large family of homologous, yet nonidentical enzymes of mostly invertebrate origin(8) . These collagenases appear to serve primarily a digestive function. Other serine collagenases have been implicated in pulmonary, parasitic, and bacterial diseases (9, 10, 11) . The enzymology of crab collagenase is unusual, as it possesses activities similar not only to the matrix metallocollagenases, but also to the serine proteases trypsin, chymotrypsin, and elastase(12, 13, 14) . The collagen cleavage sites of crab collagenase have recently been identified and are located in the protease-sensitive region 3/4 of the length of the collagen chain from the amino terminus(14) . Given the similar location of the crab and metallocollagenases in their attack on collagen, crab collagenase is an alternative model system for the elucidation of protease-collagen interactions. Crab collagenase also presents the opportunity to study, in a unified manner, the nature of hydrophobic and basic substrate specificity in the chymotrypsin family of serine proteases.

We present here the cloning, expression, and characterization of crab serine collagenase 1. The collagenolytic activity of the recombinant enzyme is identical to that isolated from crab hepatopancreas. Quantitative structure activity relationships are determined for collagenase and compared to the serine protease homologs trypsin, chymotrypsin, and elastase. These criteria show serine collagenase 1 to be a novel member of the chymotrypsin protease family.


EXPERIMENTAL PROCEDURES

RNA Isolation and cDNA Library Construction

Live fiddler crabs (U. pugilator) were obtained from Gulf Specimen Marine Laboratory (Panacea, FL). The hepatopancreas was dissected, immediately frozen in liquid nitrogen, and stored at -80 °C. Total RNA was extracted from the frozen hepatopancreas using guanidine thiocyanate and partially purified by ultracentrifugation through a cesium trifluoroacetate gradient(15) . Poly(A) RNA was isolated from total RNA by hybridization to biotinylated oligo(dT), which was recovered from solution using streptavidin-coated paramagnetic beads (Poly(A)Tract, Promega). All RNA was stored under ethanol at -80 °C.

A Lambda Zap II crab hepatopancreas cDNA library was constructed and amplified by Clontech Laboratories (Palo Alto, CA). The library contains 1.8 times 10^6 independent clones, with a cDNA insert size range of 1.0-5 kilobase pairs.

Isolation of the Crab Collagenase cDNA

The polymerase chain reaction (PCR) (^2)was used to amplify a fragment of the crab collagenase cDNA from the U. pugilator hepatopancreas library. Two degenerate PCR primers denoted FCN1 and FCC1 were synthesized based on the amino and carboxyl termini of the mature protease amino acid sequence (16) (FCN1, 5`-TGCTCTAGA-GTI-GA(A/G)-GCI-GTI-CCI-AA(T/C)-TCI-TGG-3`; FCC1, 5`-GATAAGCTTGA-TTA-IGG-IGT-IAT-ICC-IGT-(T/C)TG-IGT-(T/C)TG-IAT-CCA-3`). Inosine was used to reduce the degeneracy of the oligonucleoide pool by broadening the base pairing potential at these positions. 5 µl of library stock containing 3.5 times 10^8 phage were subjected to PCR with the FCN1 and FCC1 oligonucleotides using standard conditions (17) . The PCR reaction consisted of five cycles of 1 min of annealing at 44 °C, 2 min of polymerization at 72 °C, and 1 min of denaturation at 95 °C; followed by 30 cycles with an elevated annealing temperature of 50 °C. The single-band PCR product was purified by agarose gel electrophoresis and Geneclean (Bio 101). The PCR product was sequenced by the dideoxy method, using Sequenase T7 DNA polymerase (U. S. Biochemical Corp.) and the FCN1 and FCC1 primers.

The library was plated with Escherichia coli strain XL1-Blue, adsorbed in duplicate to nitrocellulose filters, denatured, and fixed according to standard manufacturer's instructions (Stratagene, Clontech). The probe 5`-CA-(G/A)AA-(G/A)TA-CAT-(G/A)TC-(G/A)TC-(G/A/T)AT-(G/A)AA-3` was a degenerate oligodeoxynucleotide based on the FIDDMYFC (residues 34-42) motif of the crab collagenase protein sequence(16) . The 5` end of the degenerate probe was radiolabeled using T4 polynucleotide kinase and [-P]ATP and hybridized to the plaque lifts overnight at 42 °C as described(18) . The filters were washed at 47 °C and autoradiographed(18) . Excision and rescue of the Bluescript plasmid containing the cDNA insert was carried out according to the manufacturer's instructions (Stratagene). Both strands of the cDNA clones comprising the composite map were sequenced by the dideoxy method using Sequenase.

Subsequent screens of the library were carried out using homologous probes generated by [alpha-P]dCTP PCR from the collagenase clone denoted FC1 (see below)(19) . Either an EcoRI fragment containing the entire FC1 cDNA or a 200-bae pair EcoRI-NheI fragment of the 5` end of the cDNA were used as templates. Under the conditions of limiting dCTP and high template concentration, the reaction products resembled those of primer extension rather than fragment amplification. These homologous probes were hybridized overnight at 50 °C(18) . The filters were then washed at 65 °C and autoradiographed as described (18) .

Amino Acid Alignment and Secondary Structure Modeling of Crab Collagenase

The putative signal peptide of crab collagenase was determined by the hydrophobic nature of the amino acids(20) . The amino acid sequences of crab procollagenase and shrimp chymotrypsinogen (EMBL accession no. X66415), rat anionic trypsinogen 2 (Protein Identification Resource (PIR) code, TRRT2; Protein Data Bank (PDB) code, 1BRA), bovine chymotrypsinogen A (PIR code, KYBOA; PDB code, 7GCH), and porcine proelastase 1 (PIR code, ELPG; PDB code, 3EST) were aligned using the PILEUP program of the GCG software package (Genetics Computer Group, Madison, Wisconsin), and consensus structural constraints, as derived from alignment of proteases of known three-dimensional structure(21, 22) .

Expression and Purification of the Recombinant Crab Procollagenase in Yeast

The zymogen form of crab collagenase (procollagenase) was cloned in frame with the alpha-factor leader of the PsT vector(5) . PCR with Pfu DNA polymerase (Stratagene) was used to generate the necessary HindIII and SalI restriction endonuclease cleavage sites. This construct was named PsFC. The full expression vector was created by subcloning the PsFC SstI/SalI fragment containing the alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase promoter, alpha-factor leader and procollagenase into the PyT 1 µM circle yeast/E. coli shuttle vector(5) , yielding PyFC.

The PyFC construct was electroporated into the AB110 or DM101alpha strain of Saccharomyces cerevisiae, and transformants were selected by growth at 30 °C on SD (8% glucose) plates lacking either uracil or leucine(23) . A small culture was grown up in SD-Leu (8% glucose) for 36 h at 30 °C with gentle shaking. This culture was diluted 1:20 into YPD (2% glucose) and grown for 60-72 h at 30 °C with gentle shaking. The yeast cells were removed by centrifugation and the supernatant was adjusted to pH 7.4 by addition of Tris base to a final concentration of 10 mM. DEAE chromatography was performed as described for the enzyme isolated from the crab hepatopancreas(14) . Fractions were assayed for procollagenase either by Western blot analysis or by activation with trypsin. The activation assay contained 20 µl of sample, 5 µl of 1 µM TPCK-treated bovine trypsin (Sigma), and 200 µl of 400 µM Suc-AAP-Leu-pNA in 50 mM Tris, 100 mM NaCl, 20 mM CaCl(2), pH 8.0. The reaction course was monitored at 405 nM at room temperature using UV(max) microtiter plate reader (Molecular Devices). The fractions containing procollagenase were pooled and adjusted to 50 mM Tris, 100 mM NaCl, 20 mM CaCl(2), pH 8.0. Addition of a 0.5% volume of TPCK-treated, agarose-immobilized bovine trypsin (Sigma) resulted in complete activation of the zymogen after 2 h of gentle shaking at room temperature, as monitored by increase in activity toward Suc-AAP-Leu-pNA. The activated collagenase was further purified by bovine pancreatic trypsin inhibitor affinity chromatography(14) . An overall yield of 1 mg of recombinant collagenase/liter of yeast culture was achieved.

Kinetic Analysis of Recombinant Collagenase, Trypsin, Chymotrypsin, and Elastase

Collagenase was prepared from crab hepatopancreas as described(14) . Recombinant rat trypsin was purified as described(24) . Other reagents were purchased from the following sources: p-tosyl-L-lysine chloromethyl ketone-treated bovine chymotrypsin (Sigma), porcine elastase (Calbiochem), bovine calf skin collagen (U. S. Biochemical Corp.), Suc-AAP-Abu-pNA (Bachem, Torrance, CA) and Z-GPR-Sbzl (Enzyme Systems Products). All other substrates were from Bachem Bioscience. All enzyme active site titrations, substrate calibrations, kinetic assays, and collagen digestions were carried out as described(14, 25) . Briefly, pNA kinetic assays were monitored at 410 nm (E = 8,480 M cm) in 50 mM Tris, 100 mM NaCl, 20 mM CaCl(2), pH 8.0, at 25 °C. A total of 1-4% N,N-dimethylformamide or 2% Me(2)SO was present in the final reaction buffer. Benzylthioester kinetic assays were monitored at 324 nm (E = 19,800 M cm) in the above buffer at 25 °C with the inclusion of 250 µM dithiodipyridine (Chemical Dynamics) and 2% N,N-dimethylformamide. 7-Amino-4-methylcoumarin spectrofluorimetric assays were monitored at an excitation wavelength of 380 nm and an emission wavelength of 460 nm, under conditions identical to those for pNA. Assays were done in duplicate for 5 substrate concentrations, except for Suc-AAP-Asp-pNA, for which the k/K(m) was determined using three substrate concentrations in duplicate. The steady state kinetic parameters were determined by non-linear regression fit to the Michaelis-Menten equation. Standard deviation in k/K(m) was generally less than 10%, though individual rate and binding constants varied to a greater extent. In particular, error for elastase was 15% in kversus Suc-AAP-Val-pNA and 25% in K(m)versus Suc-AAP-Ile-pNA. Kinetic parameters were plotted versus P1 residue volume (26) and the hydrophobicity constant, (27) .


RESULTS

Detection and Isolation of Crab Collagenase Clones from the Hepatopancreas cDNA Library

Crab collagenase clones were detected in the cDNA library by two methods utilizing degenerate oligonucleotides based on the amino acid sequence of the protease(16) . In the first method, a set of oligonucleotides, FCN1 and FCC1, complementary to the amino and carboxyl termini of mature collagenase were used in the polymerase chain reaction to amplify a DNA fragment from the cDNA library. A single, intense band of approximately the size of the mature protease (670 base pairs) was produced. (^3)Direct sequencing of the PCR DNA yielded sequence around His, Gly, and Phe (chymotrypsinogen numbering) of the collagenase. The cDNA library was also screened with a degenerate oligonucleotide complementary to the FIDDMYFC sequence of the collagenase (residues 34-42). This sequence was chosen for three reasons: 1) minimal sequence identity to other serine proteases, 2) proximity to the 5` end of the gene permitting isolation of more full-length clones from the oligo(dT)-primed cDNA library, and 3) low amino acid coding degeneracy (96-fold degenerate). 40,000 plaques were screened, yielding 10 primary, 7 secondary, and 3 tertiary isolates. The most complete clone, denoted FC1, contains a 15-amino acid signal sequence, a 29-amino acid zymogen peptide, and the entire 226-amino acid mature form of the collagenase, as well as 143 bases of 5`- and 153 bases of 3`-untranslated sequence (see Fig. 1and below). The likely start codon of clone FC1 is a non-optimal AGG (Arg), rather than the expected ATG (Met)(28) . Further screening of the library was indicated, as no ATG start codon could be located in any reading frame near the expected start site. Screening of an additional 30,000 plaques with PCR fragments generated from the FC1 template yielded 15 primary, 9 secondary, and 6 tertiary isolates. Two clones, FC2 and FC3, yielded necessary sequence data. Clone FC2 provided the requisite ATG start codon, though uncharacterized recombination events rendered the 5`-untranslated region and the 3` third of the cDNA unusable. Clone FC3 encoded the complete collagenase zymogen minus the signal sequence and 5`-untranslated region, while the 3`-untranslated region extends into the poly(A) tail. The cDNA presented in Fig. 1is a composite of FC1, the ATG start of FC2, and the poly(A) tail of FC3. The coding sequences of all clones were identical.


Figure 1: Composite U. pugilator serine collagenase 1 cDNA. Nucleotides 1-144 and 146-1042 are of clone FC1, 144-146 are of clone FC2, and 1043-1109 are of clone FC3. The 1.1-kilobase pair cDNA is underwritten by the open reading frame corresponding to the putative coding sequence. The predicted zymogen peptide begins at nucleotide 189 (Ser) and the mature collagenase begins at nucleotide 276 (Ile), as indicated in bold.



Sequence Analysis of Recombinant Collagenase

The published amino acid sequence (16) contained six changes relative to the sequence predicted from the cDNA. These changes appear to reflect errors in the original amino acid sequence determination, rather than amino acid variation due to the cloning of an isozyme of crab collagenase. (^4)The discrepancies and the possible causes are: I106V, carryover of Val; S110V, weak detection of Ser; S164N/N165S, acid-induced N O acyl shift, weak detection of Ser and Asn; N192D and N202D, acid-induced deamination (chymotrypsinogen numbering, where the first letter denotes the amino acid predicted from the cDNA sequence and the second letter denotes the amino acid from the original sequence determination). One of the errors in the protein sequence, N192D, maps to the rim of the S1 site, and must be considered regarding the possible effect of the negative charge on substrate recognition. The other errors appear to map to the surface of the enzyme and are most likely functionally inconsequential.

The amino acid sequence of mature crab collagenase is homologous to the mammalian serine proteases trypsin, chymotrypsin, and elastase (35% identity) and to shrimp chymotrypsin (75% identity), another serine collagenase (Fig. 2)(16, 29) . Virtually all major structural features of a chymotrypsin-like serine protease are found in crab collagenase. Three disulfide bonds (residues 42:58, 168:182, and 191:220) are conserved. Conservation of the double beta barrel core is strict, and the surface loops are similar in size to those of the vertebrate paradigms. Some are of unique sequence and may play a role in determining the broad substrate specificity of crab collagenase. An unusual crab collagenase active site geometry of Gly and Asp, as compared to Asp and Gly in trypsin, is maintained in the cDNA(16) .


Figure 2: Amino acid sequence alignment of crab collagenase (FC), shrimp chymotrypsinogen (SK), rat anionic trypsinogen 2 (TN), bovine chymotrypsinogen A (CT), and porcine elastase 1 (EL). CT#, chymotrypsinogen numbering; SS, secondary structure. b, beta sheet; a, alpha helix; t, turn ( (21) and (22) ). Catalytic residues and cysteines are highlighted in bold.



Comparison of the zymogen peptides of these enzymes serves to further delineate the group, as they are of variable length and share little identity (Fig. 2). Crab collagenase and shrimp chymotrypsin possess zymogen peptides that are 2-3 times longer than those of the vertebrate proteases. The purpose of these large activation domains is unclear, as they are not required for heterologous expression of vertebrate proteases such as trypsin(30) . The activation site of procollagenase, VKSSR-IVGG, is more similar to those of chymotrypsinogen, SGLSR-IVVG, and proelastase, ETNAR-VVGG, which are activated by trypsin, than that of trypsinogen, DDDDK-IVGG, which is activated by enterokinase(31) . Crab collagenase may self-activate, or another trypsin-like protease in the crab hepatopancreas may perform this function(32) . The primary sequence alignment suggests that crab collagenase and shrimp chymotrypsin are members of a novel serine protease subfamily.

Expression and Purification of Crab Collagenase in S. cerevisiae

Crab procollagenase was cloned into the PyT S. cerevisiae expression vector (5) as a fusion with the alpha-factor signal sequence under the transcriptional control of the alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase promoter and alcohol dehydrogenase terminator, yielding the PyFC construct. Yeast containing PyFC secrete a 30-kDa protein into the medium, which cross-reacts with anti-crab collagenase antibodies on Western blots.^3

The recombinant procollagenase is purified from the yeast medium in much the same manner as the native collagenase from crab hepatopancreas (14) . DEAE chromatography, trypsin activation, and subsequent bovine pancreatic trypsin inhibitor affinity chromatography are used to purify the recombinant enzyme to homogeneity. The mature recombinant collagenase is identical in size to that isolated from the hepatopancreas (Fig. 3a).


Figure 3: Comparison of recombinant and hepatopancreas crab collagenase. Panel a, Molecular weight determination. Lane M, molecular weight markers; lane H, 10 µg of hepatopancreas collagenase; lane R, 10 µg of recombinant collagenase. Panel b, collagen cleavage assays. Reactions included bovine skin collagen in 50 mM Tris, 300 mM NaCl, 20 mM CaCl(2), pH 8.0 at 25 °C, enzyme added in a 1:24 weight ratio as indicated. Lane C, no enzyme; lanes 30`, 60`, and 120`, hepatopancreas or recombinant collagenase incubated for the indicated time; lane 120`+I, recombinant collagenase + 1 mM 4-(2-aminoethyl)benzenesulfonyl fluoride, incubated for 120`; lane E, 5 µg of recombinant collagenase alone; lane M, molecular weight markers.



Activity of Recombinant Collagenase Versus Type I Collagen

The collagenolytic activity of the recombinant collagenase was compared directly to that of the enzyme isolated from the crab hepatopancreas (Fig. 3b). The specificity and rate of collagen cleavage are similar. The signature 3/4- and 1/4-length fragments are identical in morphology, including the 1/4-length triplet. Furthermore, the collagenolytic activity of the recombinant enzyme is completely inhibited by the serine protease inhibitor 4-(2-aminoethyl)benzenesulfonyl fluoride, as previously demonstrated for the hepatopancreas collagenase(14) .

Activity of Recombinant Collagenase Versus Peptidyl pNA Substrates

The Michaelis constants of the recombinant collagenase were determined for a matched set of 15 Suc-AAP-Xaa-pNA substrates, varying only in the P1 residue (Table 1). The relative balance of specificities (k/K(m)) of the recombinant enzyme is similar to that reported previously for the hepatopancreas enzyme versus the Arg, Lys, Gln, Leu, and Phe substrates, within an error of 15-30%(14) . The remaining 10 substrates Ala, Abu, Nva, Val, Nle, Ile, Met, Orn, Asp, and Glu were selected to more fully map the specificity of crab collagenase for hydrophobic, basic, and acidic residues. The substrate preference of the collagenase is quite broad. The most striking aspect of the specificity of the enzyme regards the amino acids residues it rejects (Fig. 4). beta-Branched and acidic side chains are extremely poor substrates. Although the apparent binding constants (K(m)) for Val and Ile are similar to those of the other hydrophobic substrates, k is as much as 10^3-fold lower. Acidic residues are generally poor substrates. There is no correlation in K(m) for the various substrates (r = 0.57; see Equation 3 in Table 2), suggesting that there are several modes of ground state binding. This implies the existence of several distinct S1 sites or a single flexible site(14) . A correlation (r = 0.76; Equation 1 in Table 2) for log kversus P1 residue volume (Å^3) is observed, irrespective of hydrophobicity(26) . The correlation is improved and slope essentially unchanged (r = 0.95; Equation 2 in Table 2) if only the hydrophobic residues Ala, Abu, Nva, Nle, Leu, Met, and Phe are included. A weaker correlation of log(k/K(m)) versus residue volume (r = 0.89; Equation 4 in Table 2) for this hydrophobic subset is found (Fig. 5). These results suggest that the transition state may be stabilized in part by hydrophobic interactions. It is unclear how the enzyme binds the neutral hydrophilic and basic residues so as to minimize the effects of charge or polarity in the transition state. Bias or insensitivity in the data set may also affect the interpretation of the correlations.




Figure 4: Substrate specificity of crab collagenase versus Suc-AAP-Xaa-pNA. Data are from Table 1. Black bars, k (/min); striped bars, K (µM); gray bars, k/K(/min/µM).






Figure 5: Quantitative structure-activity relationships serine protease substrate specificity. Log(k/K) of collagenase (FC, square), chymotrypsin (CT, circle), and elastase (EL, diamond) for the Suc-AAP-Xaa-pNA series, where Xaa = Ala (A), Abu (O), Val (V), Nva (U), Nle (J), Ile (I), Leu (L), Met (M), and Phe (F), are plotted versus P1 residue volume. Data are from Table 1, omitting Ile and Val for collegenase and chymotrypsin, and Nva and Leu for elastase. Correlations are from Table 2(collegenase: y = 0.016(Å^3) - 2.2, r = 0.89; chymotrypsin: y = 0.038(Å^3) - 6.2, r = 0.99; elastase: y = -0.039(Å^3) + 5.1, r = 0.95).



Correlations of Serine Protease Specificity

The steady state kinetic parameters of chymotrypsin and elastase versus the Suc-AAP-Xaa-pNA substrate set were determined under conditions identical to those for crab collagenase (Table 1). This was necessary in order to accurately compare the activities of these different enzymes. Strong positive (chymotrypsin) and negative (elastase) correlations were found for log k or log(k/K(m)) versus P1 residue volume (r geq 0.95; Equations 6, 8, 10, and 12 in Table 2; Fig. 5). Val and Ile were omitted for chymotrypsin, while Nva and Leu were deleted for elastase, as these points deviated significantly from the rest of the data (see ``Discussion''). A tight negative correlation of K(m)versus volume was found for chymotrypsin (r = 0.95; Equation 7 in Table 2), while a much weaker positive correlation was seen for elastase (r = 0.68; Equation 11 in Table 2). The sensitivities of chymotrypsin and elastase log(k/K(m)) to residue volume are identical and twice that of collagenase (Fig. 5). Chymotrypsin log(k/K(m)) also correlated with , the log of the octanol:water partition coefficient of the residue minus the log of the coefficient for Gly (27) (m = 2.0, r = 0.98; Equation 9 in Table 2). This result with tetrapeptide amides is consistent with the correlation of log(k(2)/K(S)) for single-residue esters with , where a slope of 2.2 was found (33) . Collagenase log(k/K(m)) is less sensitive to (m = 0.80, r = 0.89; Equation 5 in Table 2), while elastase log(k/K(m)) correlated well, with a slope equal and opposite that for chymotrypsin (m = -2.0, r = 0.94; Equation 13 in Table 2).

Contribution of the P1 Residue to Catalytic Efficiency

The relative contribution of the P1 residue to the cleavage of peptidyl substrates was estimated by comparing the catalytic efficiencies of collagenase, trypsin, chymotrypsin, and elastase versus single-residue and tetrapeptide P1-Arg, Phe, or Ala substrates (Fig. 6). While k/K(m) of all enzymes for the peptidyl substrates are similar, within 2-20-fold, there is a 10- to 10^4-fold difference in k/K(m) for the single-residue substrates. Trypsin derives the highest k/K(m) from its single-residue Arg substrate, manifesting a 100-fold differential as compared to the peptidyl Arg cognate. Chymotrypsin shows a 10,000-fold differential in efficiency for single-residue Phe versus peptidyl Phe substrates, while elastase k/K(m)versus single-residue Ala is 100,000-fold less than that for peptidyl Ala. Interestingly, collagenase demonstrates identical 100,000-fold differences in k/K(m) for both single-residue Arg and Phe substrates, 10-1,000-fold greater than chymotrypsin or trypsin and similar to elastase. Collagenase and elastase show the most dependence on the P2-P4 residues for catalytic efficiency, with the low activity on single-residue substrates being a consequence of small P1 residue size or non-optimal P1 residue binding.


Figure 6: k/Kof single-residue and tetrapeptide substrates. Tetrapeptide data (Suc-AAP-Xaa-pNA) are from Table 1, except for trypsin, which is from (14) . Single-residue substrates are Ac-Arg-pNA, Suc-Phe-pNA, and Ac-Ala-pNA. Enzymes are grouped according to P1 residue. Conditions were 50 mM Tris, 100 mM NaCl, 20 mM CaCl(2), pH 8.0 at 25 °C, as described under ``Experimental Procedures.'' Gray bars, single residue; striped bars, tetrapeptide.



Structurally, the degree of P2-P4 binding correlates with the length of the residue 215-220 domain (Fig. 2). This loop forms the lip of the binding pocket and forms a beta sheet with the P2-Pn substrate residues(3) . Elastase and collagenase have the longest loops, while chymotrypsin and trypsin are 1 and 2 residues shorter, respectively.

Acylation Is Rate-limiting for Crab Collagenase, Versus Deacylation for Trypsin and Chymotrypsin

The relationship between broad specificity and catalysis was further investigated by determining the steady-state Michaelis constants for collagenase, trypsin, and chymotrypsin versus two series (P1-Arg or Phe) of peptidyl amides and esters, varying only in leaving group (Table 3). The highly specific enzymes trypsin and chymotrypsin maintain high levels of k independent of either the activated amide 7-amino-4-methylcoumarin and pNA or the benzylthioester leaving groups. Either deacylation (^5)or product dissociation is rate-limiting for these enzymes(34) . In contrast, collagenase reacts with both sets of substrates and shows an increase of up to 1,000-fold in k as the leaving group is changed from 7-amino-4-methylcoumarin to the more labile pNA and Sbzl moieties. Acylation is therefore the likely rate-limiting step for collagenase-catalyzed cleavage of both the P1-Arg and P1-Phe peptidyl amide substrates(34) .




DISCUSSION

The cloning and expression of the crab serine collagenase 1 has resolved several issues regarding the molecular biology and enzymology of this unusual enzyme. 1) The sequence was verified, and minor errors were corrected. 2) Heterologous expression verified that collagenolytic activity was intrinsic to this serine protease and provided a source of reagent quantities of the enzyme. Serine proteases, along with the matrix metalloproteases, can now be considered true collagenases. The unique nature of the collagenase active site justifies its classification as a major new branch of the chymotrypsin family of serine proteases.

Crab Collagenase and Shrimp Chymotrypsin: Implications for Collagen Recognition and Cleavage

High levels of identity between the pre-pro forms of crab collagenase and shrimp chymotrypsin, another serine collagenase(29) , suggest that a region responsible for collagen recognition and cleavage may include the S4-S`2 substrate binding sites of the enzyme. Most of these sites are conserved between the crab collagenase and shrimp chymotrypsin, including the acidic residues thought to be important in the recognition of Arg in the P`1 position by the crab enzyme(14) . This suggests that the two enzymes bind collagen by a similar mechanism. A notable structural dissimilarity between the two enzymes occurs in the primary substrate binding (S1) site. A major determinant of the trypsin-like (Arg, Lys) P1 specificity of the crab collagenase is likely to be Asp(13, 14, 16) . Shrimp chymotrypsin lacks an Asp at this position, possessing an Ala instead. Several other conservative substitutions at positions 189, 217a, and 218 may further perturb the P4-P1 specificity of the shrimp enzyme. This suggests that shrimp chymotrypsin may cleave collagen at a subset of the sites (Gln and Leu, but not Arg) recognized by crab collagenase(14) .

The Active Site of Collagenase Is Less Hydrophobic than That of Chymotrypsin and Larger than That of Elastase

Extensive quantitative analysis of serine protease specificity has provided the foundation for general theories concerning the interaction of enzymes and substrates (see (27) and (35) for early reviews). However, much of the groundbreaking work regarding the specificity of the S1 site was carried out utilizing single-residue esters(27) . As these compounds bear little structural or chemical resemblance to the presumed physiological peptide substrates, one might question their use in examining biological function. Partial data sets for chymotrypsin and elastase versus the peptidyl amides Suc-AAP-Xaa-pNA demonstrated the utility of this substrate series in mapping specificity(36, 37, 38) . Our results agreed well with that reported previously for single-residue esters (33) and confirmed the assumption that, at least for hydrophobic P1 substrates, S1 site specificity is largely independent of the nature of the scissile bond, as well as NH(2)-terminal groups(27) . This allowed the accurate comparative analysis of the recombinant crab collagenase.

Correlations of P1 residue volume and log k or log(k/K(m)) were found for serine protease paradigms chymotrypsin and elastase. Although these enzymes are commonly considered to be specific for aromatic or small hydrophobic residues, respectively, these specificities represent only the upper range of linear continuums that span more than 4 orders of magnitude in k/K(m). The sensitivities of chymotrypsin and elastase to P1 side chain volume, as reflected in the slopes of the correlations, are equal and opposite. This is also the case for the hydrophobicity constant , a measure of the free energy of transfer of an amino acid side chain from octanol to water. (^6)The slope of +2.0 found for chymotrypsin log(k/K(m)) versus suggests that the free energy of transfer of a hydrophobic amino acid side chain from the active site of chymotrypsin to water is double the free energy of transfer from octanol to water (-40-50 cal/Å^2/mol versus -20-25 cal/Å^2/mol, where Å^2 refers to the solvent-accessible surface area of the side chain)(39, 40, 41, 42) . This behavior is attributed to the favorable desolvation of both free enzyme and free substrate in forming the hydrophobic enzyme-substrate complex, equivalent to two transfers from water to octanol(42) . Full desolvation of the complex occurs when the hydrophobic surfaces of enzyme and substrate are complementary. The relative slopes of the and P1 residue volume correlations are identical, suggesting that the interactions observed are either purely hydrophobic or that steric and hydrophobic effects contribute equally in this system. The inverse correlation of elastase log(k/K(m)) with may represent increasing solvation of the complex as larger substrates are bound to the enzyme, but is likely to also include unfavorable steric effects.

Collagenase log(k/K(m)) is half as sensitive to P1 residue volume and than chymotrypsin and elastase, which possess strongly hydrophobic S1 sites. According to the desolvation model, the collagenase S1 site is less hydrophobic than those of the other two enzymes. The positive slope of the correlation also suggests an active site which is larger than that of elastase. The collagenase S1 site increasingly, but never completely, desolvates larger substrates. The S1 site may also be partially exposed to bulk solvent. Hydrophilic residues, such as Asp, involved in binding Arg, Lys, Orn, and Gln substrates, likely compromise the hydrophobicity of the region.

Several amino acid residues were consistent outliers in the correlations. The beta-branched amino acids Val and Ile are unexpectedly poor substrates for chymotrypsin and collagenase, indicating a constriction in the S1 sites of these enzymes around the beta carbon. In contrast, Nva and Leu (and, to some extent, Abu) are exceptionally good substrates for elastase, suggesting that they may bind productively in a hydrophobic region not accessible to other residues. A detailed analysis must await three-dimensional structural verification.

Ground-state Substrate Binding Does Not Correlate with Transition State Catalysis

Although the serine protease kinetic mechanism^5(34, 43) describes the formation of a ground-state Michaelis complex (K(S)) prior to several steps of transition state catalysis (rate-determining step approx k), the tightness of the complex may not in itself predict the rate of catalysis. Collagenase illustrates the generality of this hypothesis, given its broad specificity for basic, neutral hydrophilic, and hydrophobic residues. The value of k correlates well with P1 residue volume, irrespective of chemical nature, suggesting size is a component of transition state stabilization. In contrast, there is no correlation of K(m) with residue volume or k (assuming that acylation is rate-limiting for most substrates, K(m) approx K(S)). Similar k values are achieved for Gln, Arg, and Phe with K(m) values ranging 100-fold. This indicates that ground-state binding is independent of transition state catalysis. Elastase and chymotrypsin also show better correlations in k than K(m) with P1 residue volume, again suggesting that these enzymes are designed for transition state catalysis rather than ground-state binding. Site-directed mutagenesis studies of trypsin further support the hypothesis that ground-state binding does not correlate with transition state catalysis (44) .

The Coupling of Primary and Subsite Binding in Serine Protease Catalysis

One striking observation of this study is the similar rate of catalysis and level of catalytic efficiency for all enzymes versus their preferred tetrapeptide substrates, despite the large differences in enzyme and substrate structure. Trypsin, chymotrypsin, elastase, and collagenase cleave their preferred tetrapeptide substrates with k values within 2-fold of one another. This suggests that all serine proteases of the chymotrypsin family reach a common maximal level of transition state stabilization in the limit of full subsite-induced activation, given the shared chemical mechanism and the similar nature of their physiological oligopeptide substrates. A key component of high level catalysis is the coupling of the S1 and S2-S4 . . . Sn sites(5, 45) . The structural basis of this productive substrate recognition is different for each enzyme, and is a major contributor to substrate discrimination(5, 46) . This is illustrated by the 35,000-fold variation in k/K(m) for single-residue substrates versus the 20-fold variation for the cognate tetrapeptides. Clearly, there are several different compensatory mechanisms of substrate binding for the chymotrypsin class of serine proteases. The degree of productive P2-P4 binding correlates inversely with the selectivity of the S1 site or the size of the preferred P1 residue. Collagenase, possessing the P1 specificities of both chymotrypsin and trypsin, relies to a greater extent, up to 1,000-fold in k/K(m), on the S2-S4 sites than the more specific enzymes. Collagenase P1-Phe and P1-Arg k/K(m) are equally sensitive to peptide binding, suggesting that nondiscriminant P2-P4 interactions are a critical component of its broad specificity.

Mechanistic Consequences of Broad Specificity

The optimization of enzyme specificity can also be assessed mechanistically. The serine proteases hydrolyze substrates by two chemical steps after the formation of the Michaelis complex^5(34, 43) . The carbonyl carbon of the amide or ester substrate is attacked (k(2)) by Ser, forming the acyl enzyme and free amine or alcohol. This covalent intermediate is deacylated (k(3)) by water, generating the carboxylic acid product and free enzyme. Acylation is generally rate-limiting for amides, and deacylation is rate-limiting for esters, in part due to the higher pK(a) of the leaving group amine versus the alcohol(43) . Although this is almost invariably true for single-residue substrates, deacylation can be rate-limiting for longer peptidyl amide substrates containing more potential binding energy(5) . Trypsin and chymotrypsin are highly efficient, specific proteases, and deacylation (or product dissociation preceding or following deacylation) is likely rate-limiting for their preferred peptide substrates(5, 6, 25, 47) . In contrast, acylation remains the rate-limiting step for collagenase versus peptide substrates, apparently as a consequence of its much broader activity. The fact that acylation is rate-limiting for collagenase is advantageous for future work, especially in the area of protein engineering. A key issue in mutagenesis studies is the shift in rate-limiting step of variants relative to the wild-type enzyme. For example, variant trypsins are often severely deficient in catalysis (48, 49, 50) . Acylation rather than deacylation is then rate-limiting(5, 24, 47) . This in turn alters mechanistic definitions of k and K(m),^5 preventing accurate structure/function correlations. Corrective measures include the use of single-residue substrates, the estimation of mechanistic constants from steady-state parameters, or ultimately, presteady-state kinetics(5, 6, 24) . These results are specific to the substrates examined here and should not be extrapolated, as other mechanistic steps may be rate-limiting for longer oligopeptide or natural substrates. In this regard, collagenase may prove especially useful in exploring the interplay between substrate binding and catalysis at a macromolecular level.


FOOTNOTES

*
This work was supported by National Science Foundation Grant DMB-8904956 (to C. S. C.) and National Institutes of Health Predoctoral Training Grant GM07175 (to C. A. T.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U49931[GenBank].

§
Present address: Dept. of Chemistry and Biochemistry, University of Colorado, Boulder, CO 80309-0215.

To whom correspondence should be addressed. Tel.: 415-476-8146; Fax: 415-476-0688; craik{at}cgl.ucsf.edu.

(^1)
The nomenclature for the substrate amino acid residues is (Pn, . . ., P2, P1, P`1, P`2, . . . , P`n), where P1-P`1 denotes the hydrolyzed bond. (Sn, . . . , S2, S1, S`1, S`2, . . . , S`n) denote the corresponding enzyme binding sites (1) .

(^2)
The abbreviations used are: PCR, polymerase chain reaction; SD, synthetic dextrose media; YPD, yeast extract peptone dextrose media; Suc-AAP-Xaa-pNA, succinyl-Ala-Ala-Pro-Xaa-pNA; pNA, p-nitroanilide; TPCK, p-tosyl-L-phenylalanine chloromethyl ketone; Sbzl, benzylthioester; Suc-AAPF-Sbzl, succinyl-Ala-Ala-Pro-Phe-Sbzl; Z, carbobenzoxy; Z-GPR-Sbzl, Z-Gly-Pro-Arg-Sbzl.

(^3)
C. A. Tsu and C. S. Craik, unpublished observations.

(^4)
R. A. Bradshaw, personal communication.

(^5)
The serine protease mechanism can be depicted as shown by below.

Under conditions where acylation is rate-limiting, k = k(2) and K = K. Under conditions where deacylation is rate-limiting, k = k(3) and K = K[k(3)/(k(2)+k(3))].

(^6)
DeltaG - DeltaG = 2.303RT (27) .


ACKNOWLEDGEMENTS

We thank Prof. Ralph Bradshaw and Drs. Christopher Carreras, Ann Eakin, and John Perona for helpful discussions. We also thank Prof. Richard Schowen and Jennifer Harris for a careful reading of the manuscript. We acknowledge Shane Atwell, Dr. W. Scott Willett, and Dr. Sergio Pichuantes for their advice and efforts concerning yeast expression.


REFERENCES

  1. Schechter, L., and Berger, A. (1968) Biochem. Biophys. Res. Commun. 27, 157-162
  2. Breddam, K., and Meldal, M. (1992) Eur. J. Biochem. 206, 103-107 [Abstract]
  3. Perona, J. P., and Craik, C. S. (1995) Protein Sci. 4, 337-360 [Abstract/Free Full Text]
  4. Stroud, R. (1974) Sci. Am. 23, 74-88
  5. Hedstrom, L., Szilágyi, L., and Rutter, W. J. (1992) Science 255, 1249-1253 [Medline] [Order article via Infotrieve]
  6. Hedstrom, L., Perona, J. J., and Rutter, W. J. (1994) Biochemistry 33, 8757-8763 [Medline] [Order article via Infotrieve]
  7. Eisen, A. Z., Henderson, K. O., Jeffrey, J. J., and Bradshaw, R. A. (1973) Biochemistry 12, 1814-1822 [Medline] [Order article via Infotrieve]
  8. Van Wormhoudt, A., Le Chevalier, P., and Sellos, D. (1992) Comp. Biochem. Physiol. 103B, 675-680
  9. Mainardi, C. L., Hasty, D. L., Seyer, J. M., and Kang, A. H. (1980) J. Biol. Chem. 255, 12006-12010 [Abstract/Free Full Text]
  10. Lecroisey, A., Boulard, C., and Kiel, B. (1979) Eur. J. Biochem. 101, 385-393 [Abstract]
  11. Kortt, A. A., Caldwell, J. B., Lilley, G. G., Edwards, R., Vaughan, J., and Stewart, D. J. (1994) Biochem J. 299, 521-525 [Medline] [Order article via Infotrieve]
  12. Welgus, H., Jeffrey, J., and Eisen, A. (1981) J. Biol. Chem. 256, 9511-9515 [Free Full Text]
  13. Grant, G. A., and Eisen, A. Z. (1980) Biochemistry 19, 6089-6095 [Medline] [Order article via Infotrieve]
  14. Tsu, C. A., Perona, J. J., Schellenberger, V., Turck, C. W., and Craik, C. S. (1994) J. Biol. Chem. 269, 19565-19572 [Abstract/Free Full Text]
  15. Okayama, H., Kawaichi, M., Brownstein, F. L., Yokota, T., and Arai, K. (1987) Methods Enzymol. 154, 3-29 [Medline] [Order article via Infotrieve]
  16. Grant, G. A., Henderson, K. O., Eisen, A. Z., and Bradshaw, R. A. (1980) Biochemistry 19, 4653-4659 [Medline] [Order article via Infotrieve]
  17. Friedman, K. D., Rosen, N. L., Newman, P. J., and Montgomery, R. R. (1990) in PCR Protocols: A Guide to Methods and Applications (Innis, M. A., Gelfand, D. H., Sninsky, J. J., and White, T. J., eds) pp. 253-258, Academic Press, San Diego
  18. Craig, S., McKerrow, J., Newport, G., and Wang, C. (1988) Nucleic Acids Res. 16, 7087-7101 [Medline] [Order article via Infotrieve]
  19. Schowalter, D. B., and Sommer, S. S. (1989) Anal. Biochem. 177, 90-94 [Medline] [Order article via Infotrieve]
  20. von Heijne, G. (1985) J. Mol. Biol. 184, 99-105 [Medline] [Order article via Infotrieve]
  21. Craik, C., Rutter, W., and Fletterick, R. (1983) Science 220, 1125-1129 [Medline] [Order article via Infotrieve]
  22. Greer, J. (1981) J. Mol. Biol. 153, 1027-1042 [Medline] [Order article via Infotrieve]
  23. Ausubel, F., Brent, R., Kingston, R., Moore, D., Seidman, J., Smith, J., and Struhl, K. (eds) (1988) Current Protocols in Molecular Biology , John Wiley & Sons, New York
  24. Perona, J. J., Tsu, C. A., McGrath, M. E., Craik, C. S., and Fletterick, R. J. (1993) J. Mol. Biol. 230, 934-949 [CrossRef][Medline] [Order article via Infotrieve]
  25. Corey, D. R., and Craik, C. S. (1992) J. Am. Chem. Soc. 114, 1784-1790
  26. Chothia, C. (1984) Annu. Rev. Biochem. 53, 537-572 [CrossRef][Medline] [Order article via Infotrieve]
  27. Hansch, C., and Coates, E. (1970) J. Pharmacol. Sci. 59, 731-743 [Medline] [Order article via Infotrieve]
  28. Peabody, D. S. (1989) J. Biol. Chem. 264, 5031-5035 [Abstract/Free Full Text]
  29. Sellos, D., and Van Wormhoudt, A. (1992) FEBS Lett. 309, 219-224 [CrossRef][Medline] [Order article via Infotrieve]
  30. Vásquez, J., Evnin, L., Higaki, J., and Craik, C. (1989) J. Cell. Biochem. 39, 265-276 [Medline] [Order article via Infotrieve]
  31. Light, A., and Janska, H. (1989) Trends Biochem. Sci. 14, 110-112 [Medline] [Order article via Infotrieve]
  32. Welgus, H. G., and Grant, G. A. (1983) Biochemistry 22, 2228-2233 [Medline] [Order article via Infotrieve]
  33. Dorovskaya, V. N., Varfolomeyev, S. D., Kazanskaya, N. F., Klyosov, A. A., and Martinek, K. (1972) FEBS Lett. 23, 122-124 [CrossRef][Medline] [Order article via Infotrieve]
  34. Zerner, B., and Bender, M. L. (1964) J. Am. Chem. Soc. 86, 3669-3674
  35. Bender, M., and Kédzy, F. (1965) Annu. Rev. Biochem. 34, 49-76 [Medline] [Order article via Infotrieve]
  36. DelMar, E. G., Largman, C., Brodrick, J. W., Fassett, M., and Geokas, M. C. (1980) Biochemistry 19, 468-472 [Medline] [Order article via Infotrieve]
  37. Largman, C. (1983) Biochemistry 22, 3763-3770 [Medline] [Order article via Infotrieve]
  38. DelMar, E., Largman, C., Brodrick, J., and Geokas, M. (1979) Anal. Biochem. 99, 316-320 [Medline] [Order article via Infotrieve]
  39. Eriksson, A. E., Baase, W. A., Zhang, X.-J., Heinz, D. W., Blaber, M., Baldwin, E. P., and Mathews, B. W. (1992) Science 255, 178-183 [Medline] [Order article via Infotrieve]
  40. Richards, F. M. (1977) Annu. Rev. Biophys. Bioeng. 6, 151-176 [CrossRef][Medline] [Order article via Infotrieve]
  41. Tanford, C. (1980) The Hydrophobic Effect: Formation of Micelles and Biological Membranes , 2nd Ed., John Wiley & Sons, New York _
  42. Fersht, A. (1985) Enzyme Structure and Mechanism, 2nd Ed., pp. 299-301, W. H. Freeman & Co., New York
  43. Fink, A. L. (1987) in Enzyme Mechanisms (Page, M. I., and Williams, A., eds) pp. 159-177, Burlington House, London
  44. Hedstrom, L., Farr-Jones, S., Kettner, C., and Rutter, W. (1994) Biochemistry 33, 8764-8769 [Medline] [Order article via Infotrieve]
  45. Thompson, R. (1974) Biochemistry 13, 5495-5501 [Medline] [Order article via Infotrieve]
  46. Perona, J., Hedstrom, L., Rutter, W., and Fletterick, R. (1995) Biochemistry 34, 1489-1499 [Medline] [Order article via Infotrieve]
  47. Perona, J. J., Hedstrom, L., Wagner, R. L., Rutter, W. J., Craik, C. S., and Fletterick, R. J. (1994) Biochemistry 33, 3252-3259 [Medline] [Order article via Infotrieve]
  48. Craik, C. S., Largman, C., Fletcher, T., Roczniak, S., Barr, P. J., Fletterick, R., and Rutter, W. J. (1985) Science 228, 291-297 [Medline] [Order article via Infotrieve]
  49. Evnin, L. B., Vásquez, J. R., and Craik, C. S. (1990) Proc. Natl. Acad. Sci. U. S. A. 87, 6659-6663 [Abstract]
  50. Gráf, L., Jancsó, A., Szilágyi, L., Hegyi, G., Pintér, K., Náray-Szabó, G., Hepp, J., Medzihradszky, K., and Rutter, W. J. (1988) Proc. Natl. Acad. Sci. U. S. A. 85, 4961-4965 [Abstract]

©1996 by The American Society for Biochemistry and Molecular Biology, Inc.