(Received for publication, November 9, 1994)
From the
The lysosomal cysteine proteinase cathepsin B (EC 3.4.22.1)
plays an important role in protein catabolism and has also been
implicated in various disease states. The crystal structures of two
forms of native recombinant rat cathepsin B have been determined. The
overall folding of rat cathepsin B was shown to be very similar to that
of the human liver enzyme. The structure of the native enzyme
containing an underivatized active site cysteine (Cys)
showed the active enzyme conformation to be similar to that determined
previously for the oxidized form. In a second structure Cys
was derivatized with the reversible blocking reagent pyridyl
disulfide. In this structure large side chain conformational changes
were observed for the two key catalytic residues Cys
and
His
, demonstrating the potential flexibility of these
side chains. In addition the structure of the complex between rat
cathepsin B and the inhibitor benzyloxycarbonyl-Arg-Ser(O-Bzl)
chloromethylketone was determined. The complex structure showed that
very little conformational change occurs in the enzyme upon inhibitor
binding. It also allowed visualization of the interaction between the
enzyme and inhibitor. In particular the interaction between Glu
and the P
Arg residue was clearly demonstrated, and
it was found that the benzyl group of the P
substrate
residue occupies a large hydrophobic pocket thought to represent the
S`
subsite. This may have important implications for
structure-based design of cathepsin B inhibitors.
Cathepsin B (EC 3.4.22.1) is a lysosomal cysteine proteinase
belonging to the papain superfamily (1) but is unique in its
ability to act as both an endo- and an exopeptidase. It is synthesized
as an inactive zymogen (2, 3) which, in the case of
the rat enzyme, is substituted with two N-linked
oligosaccharide units, one in the proregion and a second at
Asn. Following the modification of mannose residues to
mannose 6-phosphate, the proenzyme is targeted to the lysosome through
the mannose 6-phosphate receptor-mediated transport
pathway(4) .
Recently, procathepsin B has been expressed in the yeast Saccharomyces cerevisiae(5) and used to study the mechanism of proenzyme processing. Autoprocessing of the recombinant proenzyme occurs in acidic environments yielding a single chain form of cathepsin B with a 6-residue N-terminal extension relative to the fully processed lysosomal form. Since the N-linked oligosaccharides synthesized by S. cerevisiae are extremely heterogeneous, the consensus sequence for oligosaccharide substitution in cathepsin B was mutated yielding a homogeneous, non-glycosylated protein which was shown to be functionally equivalent to cathepsin B purified from rat liver(6) .
Activation of procathepsin B in the mammalian cell, with concomitant propeptide removal, occurs on acidification of the transport vesicles and inside the lysosome. Additional processing steps include N-terminal trimming and the removal of 6 residues from the C terminus to yield the mature, single chain, 254 residue protein. In a subsequent but much slower processing step, the single chain form of cathepsin B is cleaved internally with excision of residues 48 and 49 to generate a two-chain form of the enzyme(3) .
The three-dimensional structure of the two-chain
form of human liver cathepsin B, determined recently by two different
groups (7) , ()reveals the overall similarity
between this enzyme and the stereotypical cysteine proteinase, papain.
Several important differences are, however, apparent. In particular, a
large insertion consisting of an 18-residue surface loop that occludes
the active site cleft was found to be present in the primed substrate
binding region in cathepsin B. The region expected to define the
S
pocket was seen to contain a negatively charged residue,
Glu
, which has been shown to be involved in the binding
of Arg residues(8) .
In addition to its role in normal
intracellular protein breakdown, the action of cathepsin B has also
been implicated in several pathological conditions, in particular
arthritis(9, 10) , muscular dystrophy(11) ,
and tumor invasion and metastasis(12) . This has stimulated the
search for specific cathepsin B inhibitors for therapeutic use. While
rational, structure-based design of specific inhibitors requires a
detailed description of the native enzyme, it also depends on the
information available from the structures of inhibitor complexes which
demonstrate the different binding sites in the enzyme(13) .
Here, in addition to the structure of the single chain form of rat
cathepsin B, both as the free enzyme and as the 2-pyridyl disulfide
adduct, we present the structure of a complex between the enzyme and
the peptidyl inhibitor,
benzyloxycarbonyl-Arg-Ser(O-Bzl)-chloromethylketone. This
inhibitor was chosen to allow investigation of the structural basis for
the unique ability of cathepsin B, relative to other cysteine
proteinases, to accept an Arg at P(1) and for the
previously reported high affinity of cathepsin B for
Ser(O-Bzl) in the P
position(14, 15) .
The propeptide of cathepsin B (residues -57 to -7) was prepared using an Applied Biosystems 431A synthesizer and purified as described previously(18) . The inhibitor CBZ-Arg-Ser(O-Bzl)-chloromethylketone was custom synthesized by Enzyme Systems Products Ltd. (Dublin, CA).
A
second native enzyme crystal was obtained at pH 4.6. Our intention was
to obtain a complex of rat cathepsin B with the propeptide segment of
procathepsin B. Before crystallization, dithiothreitol was added to the
inactivated cathepsin B to eliminate the pyridyl sulfide group,
regenerating active free enzyme. Droplets containing 5 µl of a
mixture of 5.25 mg/ml cathepsin B, 3.27 mg/ml synthetic propeptide, and
5 µl of reservoir solution were equilibrated with reservoir
solution of 20.5% ammonium sulfate, 0.1 M sodium acetate (pH
4.6), 1% PEG 4000. A washed seed was then introduced into the drop 1
day later. Crystals usually grew in a few days to a maximum size of 0.4
0.1
0.1 mm and proved to contain the unoxidized native
enzyme uncomplexed with the propeptide, which most probably was
hydrolyzed because of the low pH(18) . This form is designated
as Native Enzyme 2.
The cathepsin B-inhibitor complex was formed
through the reaction between cathepsin B and the peptidyl inhibitor, as
follows. Dithiothreitol was added to freshly purified enzyme in 50
mM sodium phosphate, 1 mM EDTA, 0.001% Brij-35 (pH
6.0), to a final concentration of 0.2 mM to activate the
enzyme, and the solution was allowed to stand for 30 min at 4 °C. A
3-fold excess of
benzyloxycarbonyl-Arg-Ser(O-Bzl)-chloromethylketone dissolved
in water was added to the activated enzyme solution and the mixture
allowed to incubate at 4 °C for 30 min. Residual enzyme activity
was monitored spectrophotometrically and determined to be no more than
0.2% of the initial active enzyme concentration. The enzyme-inhibitor
covalent complex was crystallized using the hanging drop method, in 50
mM phosphate buffer (pH 6.0), 0.001% Brij-35, and 0.05%
NaN at a protein concentration of 7.7 mg/ml. A mixture of 5
µl of the complex solution and 5 µl of the reservoir solution
was equilibrated against the reservoir solution of 10% PEG 8000, 0.2 M ammonium sulfate, and 0.1 M sodium citrate buffer (pH 4.0).
Crystals usually appeared in 24 h and are designated as Complex.
The structure of the
inhibitor complex was solved first by molecular replacement. Program
packages used included the CCP4 suite(20) , BRUTE (21) , and X-PLOR(22, 23) . These programs
were used to either obtain a molecular replacement solution or confirm
the results from other programs. The tetragonal form of human liver
cathepsin B was used as the search model. This structure
represents the two chain form of human liver cathepsin B and has about
83.5% sequence identity with rat cathepsin B. Calculations with CCP4
produced two sharp peaks in the rotation search, which correspond to
the two molecules in the asymmetric unit. Independent calculations
using an X-PLOR conventional rotation search and then Patterson
coefficient (PC) refinement gave rise to equivalent orientations of the
molecule. In addition, similar results were also obtained using a
``direct search'' or a brute-force Patterson coefficient
refinement as implemented in X-PLOR. X-PLOR and BRUTE were used
independently for translation searching. An individual translation
search was first carried out for each of the two molecules in the
asymmetric unit, and the same clear translation peaks were obtained
using both programs. Then a combined translation search was performed
to determine the relative translation between the two molecules. The
translation along the y axis of one molecule is arbitrary in
the polar space group P2
. The positioned molecules were
subject to refinement using X-PLOR. After one round of X-PLOR
refinement, the resulting
(2F
-F
) and (F
-F
) maps showed clear
density for the inhibitor. The native enzyme structures were
isomorphous with the complex, and a refined set of complex coordinates
was used as starting point for the native enzyme structures. The
simulated annealing procedure was used in the initial refinement, but
in the later stages only conventional positional refinement was carried
out. The refinement was gradually extended to higher resolution and
many cycles of alternating manual fitting using the graphics program
FRODO (24) and refinement calculations were carried out.
Unless otherwise noted figures were prepared using the program
SETOR(25) .
Figure 1:
Overall folding of the native
structure, showing the two cathepsin B molecules in the asymmetric unit
related by a non-crystallographic 2-fold axis. -Helices are shown
as cylinders in blue,
-strands as arrowed
ribbons, non-secondary structures as purple rope,
disulfide bridges are shown in yellow, and the side chains of
the active site residues are shown in green.
The overall structure of the recombinant rat
cathepsin B is very similar to that of human liver cathepsin B in its
tetragonal form with a backbone root-mean-square deviation
of only approximately 0.5 Å. It also appears to closely resemble
the human liver cathepsin B structure previously reported by Musil et al.(7) . Although the latter also belongs to space
group P2
, it has very different unit cell dimensions, and
thus is not isomorphous with the present structures.
In each of the three crystal structures reported here, the two molecules in the asymmetric unit are very similar with backbone root-mean-square deviations of no more than 0.35 Å. The backbone root-mean-square deviations between the three crystal structures are also very low (about 0.4 Å). As shown previously(7) , cathepsin B has a very similar overall structure to that of papain and can be divided into two domains, arbitrarily designated left and right. The long active site cleft is situated at the interface between these two domains. Well-defined major secondary structural elements are illustrated in Fig. 1. At the protein sequence level, rat cathepsin B is 83.5% identical to the human enzyme. Comparison of the three-dimensional structures shows that the non-identical residues are situated at the surface of the molecule and do not introduce significant changes in main chain folding.
The recombinant cathepsin
B used in the present work differs from the previously studied human
liver cathepsin B which is a completely processed lysosomal product.
Autoprocessing of recombinant procathepsin B results in the single
chain form containing a 6-residue N-terminal
extension(5) . Insufficient electron density was observed to
determine the positions of the N-terminal residues indicating
high flexibility of this region. In contrast residues 48 and 49, which
are lacking in the lysosomal two chain form, could be inserted into the
recombinant cathepsin B structure based on difference Fourier density
and are seen to form part of a turn linking the left domain major
-helix to a segment containing no regular secondary structure (Fig. 2). The positions of these 2 residues are consistent with
their accessibility to cleavage and dipeptide excision, during
processing, to yield the two chain form. The recombinant enzyme also
differs from the natural enzyme in the elimination of the glycosylation
consensus sequence at Asn
-Ser
, and it is
noteworthy that the deletion of N-linked carbohydrate at
Asn
shows no effect on the protein folding relative to
the natural enzyme.
Figure 2: Stereo diagram showing the CA structure of rat cathepsin B. The side chains are shown for the active site residues and for other residues discussed in the text.
Figure 3:
Part
of the structure around the active site of the underivatized native rat
cathepsin B. Electron density from a
(2F-F
) map
is shown in blue, and from a (F
-F
) map
is shown in red. The contour levels are 1.5 and 3.0
.
The residual positive density nearest Cys
is in the wrong
location to represent oxygen atoms bonded to the sulfur
atom.
Figure 4: Comparison of the active site in native cathepsin B and the 2-pyridyl disulfide complex. The crystal structure without blocking reagent is shown in blue and the structure with blocking reagent bound shown in orange. The blocking reagent itself is shown in light purple.
However, the
possibility that the pK of the pyridyl group is
significantly perturbed, as are those of Cys
and
His
, cannot be excluded. In any case, the number of
hydrogen bonding interactions is equal for the two conformations of
His
, and this factor does not apparently favor either
conformation. Furthermore, both rotamers of His
are
favored conformations. After
angle change, the
His
side chain is more solvent exposed in the derivatized
cathepsin B. Modeling shows that if His
were to maintain
its original position then it would be difficult for the pyridyl ring
to find a location that is not solvent exposed. As the imidazole ring
is more hydrophilic than the pyridyl ring, it is energetically less
costly for the imidazole ring to occupy the more solvent exposed
location.
Figure 5:
An
omit map (F-F
) of
the inhibitor. The final structure of the active site and the
superimposed inhibitor are also shown. The inhibitor was left out in
both refinement and the calculation of the map which is contoured at a
3
level.
Figure 6: Surface representation of the structure of the CBZ-Arg-Ser(O-Bzl)-cathepsin B complex. The inhibitor is represented in green and the enyzme in red. The surface was calculated and illustrated using the program QUANTA(40) .
Figure 7: Scheme illustrating the interactions of the CBZ-Arg-Ser(O-Bzl) group with cathepsin B. Thick lines represent the enzyme, and thin lines represent the inhibitor. Hydrogen bonds and electrostatic interactions are illustrated as dashed lines.
A guanidinium N of the Arg P side chain
is salt bridged to the Glu
carboxylate at an average
distance of 2.90 Å. This salt bridge has been widely
expected(7, 8) . There are no other interactions
between the P
Arg residue and the S
subsite.
However, as observed in the crystal structure of the complex, a bulky
P
residue such as Phe might interact with some or all of
residues Pro
, Ala
, Ala
, and
the side chain of Glu
.
It is also interesting to note
that the carbonyl portion of the benzyloxycarbonyl (CBZ) group, which
could be considered as a pseudo-P carbonyl, makes no direct
interaction with the enzyme. However, the benzyl ring of the CBZ moiety
makes a vertical or edge-on aromatic-aromatic interaction with
Tyr
, with the shortest distance from aromatic atom to
aromatic atom of 3.71 Å. This type of interaction, with
positively polarized hydrogen atoms of one ring interacting with the
`
`
-electron cloud of a second ring, has
been described by Burley and Petsko(35, 36) .
As
outlined above the main chain of the PO-Bzl-Ser
residue interacts as in previous inhibitor complexes of papain,
however, the side chain O-benzyl group is located in a
hydrophobic pocket bounded by Trp
, His
,
Leu
, Phe
, Met
,
Val
, His
, Phe
, and
Gly
. Among these residues, Met
partially
covers the hydrophobic pocket and shields it from external solvent (Fig. 6).
When the structural data on the
cathepsin B-inhibitor complex are considered in light of the results of
previous kinetic studies using different synthetic substrates and
inhibitors(6, 8, 39) , it is clear that two
relatively well defined substrate-binding subsites can be described in
this enzyme. The S subsite is a wide pocket extending from
the active site cleft toward Glu
and is largely open to
solvent. A large P
side chain, such as Phe, could be
accommodated comfortably in the space available. Kinetic studies have
shown that cathepsin B has a broad S
specificity, accepting
both Phe and Arg at the P
position in substrates but the
former is preferred 7-fold over Arg. In the case of a P
Arg
residue, site-directed mutagenesis studies have indicated that the
electrostatic interaction with Glu
is important for
substrate binding and contributes to transition state-complex
stabilization(8) .
There is also another pocket on the left
of the cleft, defined by the segment Asn to Tyr
and residue Asp
. If the main chain of a longer
peptidyl inhibitor (with P
and P
residues)
occupied the active site cleft, the P
side chain could fit
into the pocket on the left side, as suggested by the orientation of
the CBZ group, simulating a P
residue, in the present
complex, This arrangement would be better for the binding of a longer
peptide or protein substrate, since the lower part of the active site
cleft floor would be made available to bind the main chain of the
peptide or protein to achieve S
/P
main chain
interaction, thus maximizing interaction between cathepsin B and its
substrate. One way of achieving the maximum S
/P
backbone interaction might be to alter the P
residue
to D-configuration, thus forcing the P
side
chain to take an alternative position which might occupy the left-hand
pocket (Asn
, Tyr
, and Asp
) and
thereby leave space for the P
main chain atoms in the lower
active site cleft.
While there appears to be no strong preference in
the S subsite, kinetic evidence (39) indicates that
the S
` subsite is relatively large and certainly very
hydrophobic. The crystal structure of the inhibitor complex shows the
P
O-benzyl group to be located at the entrance to
a large hydrophobic pocket. The volume enclosed by this pocket would be
capable of accommodating a bulky hydrophobic side chain, such as Phe
and Trp, comfortably and could therefore be considered to represent the
S
` subsite suggested by Ménard et
al.(39) . Because of the different main chain tracing in
the region 191-198 in cathepsin B relative to papain and other
similar cysteine proteinases the S
` subsite in cathepsin B
is more hydrophobic and less solvent exposed. Thus, in the absence of a
P
` residue, if the P
side chain is long and
hydrophobic it could reach into the hydrophobic S
` pocket
and achieve some van der Waals contacts. In the case of a normal
polypeptide substrate, however, the P
side chain would be
expected to project out from the active site cleft and might interact
with the flexible high wall on the left.