(Received for publication, December 1, 1994; and in revised form, January 9, 1995)
From the
The discovery of useful peptide substrates for proteases that
recognize many amino acids in their active sites is often a slow
process due to the lack of initial substrate data and the expense of
analyzing large numbers of peptide substrates. To overcome these
obstacles, we have made use of bacteriophage peptide display libraries.
We prepared a random hexamer library in the fd-derived vector fAFF-1
and included a ``tether'' sequence that could be recognized
by monoclonal antibodies. We chose the matrix metalloproteinases
stromelysin and matrilysin as the targets for our studies, as they are
known to require at least 6 amino acids in a peptide substrate for
cleavage. The phage library was treated in solution with protease and
cleaved phage separated from uncleaved phage using a mixture of
tether-binding monoclonal antibodies and Protein A-bearing cells
followed by precipitation. Clones were screened by the use of a rapid
screening assay that identified phage encoding peptide sequences
susceptible to cleavage by the enzymes. The nucleotide sequence of the
random hexamer region of 43 such clones was determined for stromelysin
and 23 for matrilysin. Synthetic peptides were prepared whose sequences
were based on some of the positive clones, as well as consensus
sequences built from the positive clones. Many of the peptides have k/K
values as good
or better than those of previously reported substrates, and in fact, we
were able to produce stromelysin and matrilysin substrates that are
both the most active and smallest reported to date. In
addition, the phage data predicted selectivity in the P
and
P`
positions of the two enzymes that were supported by the
kinetic analysis of the peptides. This work demonstrates that the phage
selection techniques enable the rapid identification of highly active
and selective protease substrates without making any a priori assumptions about the specificity or the ``physiological
substrate'' of the protease under study.
The selectivity of a protease is dictated, in part, by the
sequence of amino acids it recognizes in its active site before
cleaving the substrate. From a kinetic point of view, the higher the
value of the specificity constant k/K
(1) ,
the better a peptide is as a substrate for that protease. Highly
selective proteases will have high k
/K
values for
only a small number of peptides, whereas a larger number of peptides
will have similar k
/K
values for broad specificity proteases. The identification
of the optimized peptide substrates is thus an important part of
protease characterization. A time consuming step in protease
characterization, however, is finding an optimal substrate. In the case
of enzymes like the matrix metalloprotease (MMP), (
)fibroblast collagenase, or the protease of the human
immunodeficiency virus (HIV protease), a peptide substrate can be
prepared based on the cleavage site of the physiological substrate
collagen (2) or the HIV polyprotein (3, 4) ,
respectively. These peptides can be dramatically improved by
substitution of individual residues with other amino
acids(5, 6, 7) , so that the original
substrate-derived peptide may not represent the optimal substrate. In
some cases a true physiological substrate is unknown. This was true in
the case of the MMP stromelysin. Investigators (8) found
obtaining good peptide substrates so frustrating that they randomly
screened commercially available peptides for active compounds. Even
when substrate information is available, investigators (9, 10) have used proteins as substrates to further
assess the sequence specificity of the protease under study.
To overcome these problems, we have made use of filamentous bacteriophage-based peptide display libraries to find optimal substrates. Phage display libraries have been used with great success to find epitopes for monoclonal antibodies and to improve the affinity of peptides for receptors(11, 12) . Recently Matthews and Wells (13) have presented a method for the use of monovalent ``substrate phage'' libraries for discovering peptide substrates for proteases. These investigators screened a random pentamer library and isolated clones carrying substrates for Factor X and a mutant form of subtilisin. However, these sequences were not tested as solution phase peptide substrates, and hence the predictive nature of the method was not evaluated. We have been developing an analogous method using polyvalent phage. The approaches presented here have enabled us to screen a greater number of phage (and thus larger libraries), characterize putative substrate clones more quickly, and to generate consensus sequences from these hits. The peptides prepared based on our screen are as good or better substrates than literature standards for the protease being examined.
To critically assess the
method, we have chosen the MMPs stromelysin and matrilysin as the focus
of our studies. The MMPs represent a family of enzymes that recognize
at least 6 amino acids in their subsites, as shown by the sensitivity
of k/K
to the
substitution of amino acids in positions P
to
P`
(2, 5, 14, 15, 16, 17) .
In addition, when this work was initiated, only a limited amount of
information had been available for stromelysin (8, 18) or matrilysin(16) . Using recombinant
forms of these enzymes, we have screened a random hexamer library and
have used the sequence information from positive clones to prepare new,
highly active peptide substrates. The best of these peptides are the
most active stromelysin and matrilysin substrates reported to date. In
addition, previous investigators have always used peptides no smaller
than heptamers in the concern that shorter compounds would not be
sufficiently active. With the availability of optimized amino acids at
each position, we have demonstrated the opposite: that hexapeptides are
superior MMP substrates, a finding not expected at the outset of this
study. Finally, the phage data successfully predicted ways in which
these substrates could be made selective toward matrilysin or
stromelysin.
Most importantly, we have demonstrated that we can identify protease substrates without making any a priori assumptions about the specificity or the ``physiological substrate'' of the enzyme. This approach will be valuable for the study of proteases where peptide substrates are unavailable and sites of cleavage in vivo are unknown.
Figure 2:
Construction of fTC and its derivatives. A, fAFF-tether C: the oligo pair 1200/1201 (see
``Materials and Methods'') were hybridized and ligated into BstXI fAFF-1(19) . The lower line shows the
translated peptide sequence of the pIII protein starting from the
predicted site of signal peptide cleavage, leaving YGGFL at the
NH terminus of the phage(19) . The underlined peptide sequence (ACLEPYTACD) is the epitope for mAb 179.
B, sequence of inserts of fTC-Good and Bad. These clones
were derived from fTC as described under ``Materials and
Methods.'' The sequences shown start at the end of the mAb 179
epitope (ACLEPYTACD). C, construction of the library. fTC-LIB
(see ``Materials and Methods'') was cleaved and ligated with
the degenerate oligo pair 1439/1440 (I indicates inosines,
which are expected to hybridized to all bases (24) and M
= A or C). The resultant library is shown, where X indicates any amino acid or stop codon. The sequences shown start
at the end of the mAb 179 epitope (ACLEPYTACD) as in B.
Phage peptide display vectors have been used successfully to
identify peptides that bind to proteins such as antibodies. In
contrast, the goal of this work is to use this methodology to identify
peptides that are efficiently cleaved by a specific protease.
In the most commonly used phage display format, the random peptide
sequence is placed at or near the NH terminus of the
display protein, generally
pIII(19, 22, 23, 24, 25) .
In the method described here, we have added an additional functional
group to the NH
terminus of pIII, a peptide
``tether.'' The tether is a peptide sequence that enables
attachment of the phage through binding of the tether sequence to an
immobile phase: the simplest example being a peptide epitope tether
bound to a monoclonal antibody on an agarose bead. The strategy of the
selection is to subject a population of tether phage to the protease in
solution, and then separate the cleaved (substrate) from the uncleaved
(non-substrate) phage by capturing the undigested phage with a
tether-binding resin. A schematic of a variation of this approach, used
in this work, is shown in Fig. 1.
Figure 1:
Outline of the tether phage
selection. Shown are diagrams of the fTC phage (not drawn to scale for
clarity). The gene III protein extends from phage body from
the COOH to NH terminus. At the NH
termini are
the peptide tether and the protease target domain.
The phage are treated in solution with the protease. Phage carrying
substrate sequences (left side) are cleaved in the target
domain, whereas others (right) are not. The entire digest is
treated with antibodies to the tether(s) and captured using a resin that carries protein A. The protein A-antibody-phage complexes are
precipitated by centrifugation, and the phage that are not bound by the
antibodies remain in solution and can be recovered for
amplification.
We have designed and prepared the vector fAFF-tether C (fTC; see ``Materials and Methods'') from the fd-derived plasmid fAFF-1(19) . The salient features of this phage, shown in Fig. 1and 2 include the (i) target region, which can consist of random amino acids, or predetermined sequences (i.e. GOOD and BAD, Fig. 2, B and C) for use as positive or negative controls. The design of the control sequences is described below; (ii) the tether region. We have employed a dual tether design, in which the tether consists of the epitopes for the anti-dynorphin mAb 3-E7 (YGGFL) (19, 20) and the mAb 179 epitope ACLEPYTACD (see Fig. 2, A and C).
A requirement of this
approach is that the phage molecule be cleaved only in the
``target'' random peptide region. Although this is
anticipated, as filamentous phage are generally viewed as being
protease resistant(26) , we chose to test this proposal. We
prepared two phage clones in fTC, one carrying a good substrate
(fTC-Good) sequence for the MMP stromelysin, and one carrying a poor
substrate (fTC-Bad) (see ``Materials and Methods'' and Fig. 1). We were faced with two choices of known substrate
sequences for stromelysin: those based on substance P (8, 17) or a sequence based on collagenase cleavage
sites(18) . We chose to build a generic MMP good substrate,
Pro-Leu-Ala-Leu-Trp-Ala, based on the results of Netzel-Arnett et
al.(16) that was suitable not only for stromelysin, but
matrilysin as well. As shown later, this peptide has a k/K
value for sfSTR
comparable with substance P (8) and a
2,4-dinitrophenyloctapeptide fluorogenic substrate(18) . The
control phage were incubated with sfSTR at various times for up to 1 h.
The digests were then analyzed by immunoblot analysis using the
anti-tether mAb 179. As shown in Fig. 3A, after
exposure to sfSTR, the pIII protein of fTC-Bad was essentially
undisturbed, whereas the pIII of fTC-Good lost its tether. This
indicates that only the target sequence of the pIII protein was
specifically cleaved by sfSTR, and other regions of the protein were
untouched. Similar results were observed for the matrilysin (Fig. 3B), tissue plasminogen activator, and HIV
protease (using appropriate positive controls; data not shown). In
addition, titering of phage (fTC-Good and fTC-Bad) before and after
digest showed no effect of proteolysis (with any of the proteases
tested) on infectivity (data not shown).
Figure 3:
Immunoblot analysis of phage clones
treated with protease. sfSTR (at 5 µg/ml, panel A) or
matrilysin (at 5 ng/ml, panel B) were incubated with fTC-Good
or fTC-Bad (Good or Bad) at 5 10
tu/ml (approximately 10 pM phage) at 37 °C, and
aliquots removed at 0, 20, 40, and 60 min and added to 5 µl of SDS
gel buffer and boiled to quench the reaction. The samples were
subjected to electrophoresis through 12% polyacrylamide gels containing
SDS and transferred to nitrocellulose, which was probed with mAb 179
and detected as described under ``Materials and Methods.''
The positions of the protein standards on the gels are indicated by dots and are M
106, 80, 49.5, and 32.5 (panel A only)
10
.
An
important concept in phage display is affinity selection. This involves
using limiting levels of receptor or phage proteins in order to isolate
phage carrying peptides of high affinity(27, 28) . The
analogous approach for a protease selection would be the use of less
protease in order to drive the selection toward those clones carrying
better substrates (practically speaking, those cleaved at a lower or
limiting protease concentrations). As a first test of this proposal, we
examined recovery of good phage from the mock library at decreasing
protease concentration. As shown in Fig. 4, when the mock
library was treated with less sfSTR, the amount of good phage recovered
decreased, and in an essentially linear fashion. This result indicates
that the use of lower concentrations of proteases could be employed to
gain more selectivity (that is, selection of clones that require lower
protease concentrations to be cleaved). This is also a useful test for
determining starting protease concentrations. By working with protease
levels above that which gives 100% recovery, more protease is
being used than is needed. In contrast, use of too low a protease
concentration could result in the recovery of too few clones.
Figure 4:
Recovery of good phage with decreasing
concentrations of sfSTR. A preparation of 10 fTC-Bad phage
(in 100 µl) was spiked with 10
fTC-Good-Kan
phage and treated with varying concentrations of sfSTR for 1 h at
37 °C. The reaction was quenched with EDTA and subjected to one
round of the selection procedure described under ``Materials and
Methods.'' Various dilutions of the final supernatant solution
were titered on both tetracycline and kanamycin. The percentage of
recover of fTC-Bad (
) and fTC-Good-Kan
(
).
Figure 5: Proteolytic analysis of phage clones. Individual clones from the initial library or round 3 of screening were prepared, digested with sfSTR for 0, 10, or 60 min, dotted onto nitrocellulose, and detected as described under ``Materials and Methods.'' Also shown are identical treatments with fTC-Good and fTC-Bad, showing digestion of the former, but not the latter. Six clones from round three are hits based on this assay.
The nucleotide sequence of a number of positive sequences
from round 3 were determined, and the deduced amino acids corresponding
to the random target sequences are shown in Table 2. Also shown
for comparison are sequences from clones chosen randomly from the
library. No pattern is apparent for the negative clones, but some
patterns did appear for the positive sequences. To further characterize
these sequence trends, the selection with sfSTR was pursued for three
more rounds. In addition, since the majority of the clones screened
were judged to be hits at this point, the concentration of sfSTR used
in the selection was dropped 5-fold for round A4 and again at round A6.
In addition, a new selection series (series C) was initiated, this time
starting with only 2 10
tu from the library, and
using an initial sfSTR at 1 µg/ml (see Table 1). As can be
seen in Table 2A, in both the A and C series, the number of
non-reactive clones increases as the selective pressure (decreasing
protease activity) is increased. Positive sequences from the rounds of
series A, as well as those from the most recent selection series (C),
are also shown in Table 2A.
The trends from the 43 sequences
in Table 2are summarized in Fig. 6. Most invariant was
the B position, where the bulk of clones carried a Pro. In all MMP
substrates examined thus far, Pro appears to be favored in the P position(2, 5, 14, 15, 16, 17) ,
suggesting that we should ``lock in'' the B position as being
equivalent to P
(this was confirmed by later analysis of
the cleaved peptides; see Table 3). Looking at trends in other
positions, we found stromelysin favored large hydrophobic groups (Leu,
Met, Phe, and Tyr) in positions C (P
), Glu, or Ala in D
(P
), and Leu or Met in E (P`
). Positions A and
F were not as selective: Ala and Val predominated at A (P
),
whereas hydroxy (Ser and Thr) and small aliphatic residues were favored
at F (P`
). Since the protease seemed to recognize the
N
target site of the library as P
-P`
sites, we have little information on trends in the P`
site.
Figure 6:
Frequency analysis of substrate phage
clones for stromelysin (left) and matrilysin (right) listed in Table 3. For each position in the
N library protease target sequence (A-F), the
frequency of occurrence for each of the 20 natural amino acids is
shown. The y axis indicates the number of times a particular
residue occurred in that position. Note that the scale for each
position ranges from 0 to 20 clones, except for position B (under stromelysin), which had been reduced for
0-40 clones, and positions A, C, D,
and F (under matrilysin), which had been magnified
for 0-10 clones.
The first test peptides prepared were
designed directly from phage clones. The k/K
values for these
peptides (3-5) are comparable, and in some cases better,
than the values for the good control, indicating that the selection
scheme yields high quality substrates. In addition, peptide 6 was prepared based on a consensus sequence of the original 12
hits and was also a good substrate for stromelysin. When an additional
31 positive clones were sequenced, we noted the residues favored in
each position (Fig. 6), and made appropriate substitutions in
preparing synthetic peptides 7-10. Converting P
to Ala from Arg (peptide 7) gave a 5-fold increase in k
/K
, whereas replacing Leu
with Met in P`
(peptide 10) gave only a modest
(about 30%) increase in activity. In the P
position, Glu
dominated over Ala. When we substitute the P
Ala with Glu
(peptide 9), we saw a 3-fold increase in activity.
Interestingly, Niedzwiecki et al.(17) found Ala
favored of Gln at P
; they did not test Glu, however. As
noted earlier, the P
position was dominated by Pro. The
only other residue found repeatedly in the B position was Ala, although
at much lower frequency. When Ala was substituted for Pro at P
(peptide 8), cleavage rates dropped about 3-fold, similar
to the result seen previously(17) . The results in Table 3show that not only can we obtain new peptide substrate
sequences from the positive clones, but that the consensus data
obtained (Fig. 6A) can also yield valuable information
in the design of substrates.
The data for three clones, as well as fTC-Good and Bad, is shown
in Fig. 7. The decrease of log {P}
with time is plotted. The slope for each curve was determined by linear
regression and plotted versus the k/K
values (Fig. 7, inset) for the known synthetic peptides (from Table 3).
The relationship is no t an ideal line, suggesting that the structure
of the protein flanking the 6 amino acids in the target region
influences relative to the k
/K
values for the peptides on the phage. Thus, while the assay may
not be able to discriminate between clones with
similark
/K
values(fTC-Good,clonesA3-3andA3-9),
the data do indicate that the assay is fairly predictive in
distinguishing poor (fTC-Bad) from highly active substrates (clone
A5-2).
Figure 7:
Quantitative analysis of the proteolytic
analysis of phage clones. Preparations of phage clones A3-3,
A3-9, A5-2, and fTC-Good and Bad were made and titered.
Equivalent titers of each clone were treated with sfSTR at 5 µg/ml
at 37 °C, and the reaction stopped at the time points indicated,
dotted onto nitrocellulose, and detected as in Fig. 5. The blot
was then scanned by laser densitometry, and the relative phage
concentration at each time point was determined by comparison to a
serial 2-fold dilution of untreated phage. The log of
{P} for each clone is shown at the different time
points. Inset, the slope of each decay curve is plotted
against the known k/K
values of the synthetic peptides (from Table 3).
The discovery of peptide substrates for proteases has
traditionally been a slow and expensive exercise, requiring the
synthesis and testing of large numbers of synthetic peptides. Recently,
several groups have developed innovative methods for preparing and
analyzing large numbers of peptides as protease substrates. A number of
these methods utilize a pool of chemically synthesized peptides and are
used to screen substitutions in one or two positions at a time (30, 31, 32, 33) . A few methods use
recombinant techniques and offer the opportunity to screen large number
of peptides(34, 35, 36) . None of these
techniques offered a practical approach to screening very large numbers
of substrates until the efforts of Matthews and Wells (13) made
it possible to screen >10 pentameric substrates at once.
We have here presented an analogous system that we have used to screen
>10
hexameric sequences.
In addition, we have introduced a method for assaying putative substrate clones. This assay is simple, rapid, and requires only very small amounts (100 µl) of culture supernatant. With the phage proteolysis assay, the degree of enrichment achieved in each round of selection is easily monitored, allowing adjustment of the selection conditions (i.e. protease concentration) for each subsequent round without delaying the selection process. Moreover, the use of solution phase digests allows us to have precise control over enzyme and substrate concentrations, which is not possible in assay systems where one or more of the assay components is immobilized on beads or microtiter plates.
We have tested our system
with recombinant forms of the MMPs stromelysin and matrilysin, as we
know that substitutions in any of the six positions from P to P`
will affect the specificity constant of the
peptide substrates. This is the only information we used to start our
screen. Our goal was to find substrates as potent as the current
literature
standards(8, 16, 17, 18) . We were
able to find clones that were as good or better than these standards
(clones A5-2 and A3-3, Table 2and Table 3).
Due to the large number of sequences obtained, we could identify trends
toward certain amino acids in some positions. Using the consensus
sequence suggested by the positive phage clones available at the time,
we also designed and tested five consensus peptides
(6-10). We were pleased to find that these consensus
peptides were better than the literature standards, and that after
having prepared only four synthetic peptides based on consensus
results, we had achieved a k
/K
value nearly 20-fold better than our positive control. In fact,
peptides 14 and 15 represent both the smallest and the most active peptide substrates of stromelysin and matrilysin,
respectively, described to date.
The use of the phage system also
gave us an unexpected insight into sites of selectivity between the two
enzymes. While initial examination of the data reveals that these
proteases have overlapping substrate specificity, closer inspection of
the phage and peptide results indicates that certain subsites show
distinct preferences: in particular, the opposing preferences for Phe
and Leu at P (peptides 14 and 15) and Leu and
Met at P`
(peptides 6 and 10). Thus, while we
learned of the similarity in substrate preferences for the two enzymes,
there was sufficient data generated to allow the construction of
substrates with differential sensitivities (see for example, peptide
3).
One important piece of information that cannot be derived
from the peptides while they are on phage is the location of the site
of peptide cleavage. We were only able to determine this once synthetic
peptides were prepared and analyzed. It is thus interesting that the
sequences obtained ``lined up'' so well in that the vast
majority of the clones had Pro in their B position. Without this
invariant residue, lining up the hits to build a consensus sequence
might have been more difficult. It is unclear why the predicted P position was nearly always located in B, and only rarely in A.
Possibilities include biases due to the sequence flanking the random
N
region, as well those due to steric hindrance from the
tethers and pIII protein.
Another advantage of the phage systems over the screening of pools of synthetic peptides is that we acquire discreet rather than averaged data. The synthetic screening methods described above yield only averaged results for each position, as summarized for the phage data in Fig. 6. The discreet results, shown in Table 2, allow us to search for trends which might indicate interactions between one or more subsites. For example, the most abundant amino acids in the D position (P1) were Glu, followed by Ala. Examination of Table 2shows that when D = Glu, the residue in the C (P2) position is generally a bulky hydrophobic residue (Tyr, Leu, Met, and Phe). In contrast, when D = Ala, there is no marked preference. Another example can be shown for the eight cases where C = Met, where in six of these seven, E (P`2) is also Met. Yet when C instead is the closely related reside Leu (in nine of the hits), there is no such correlation: the most common residue in the E position in these cases is the hydrophilic Ser (three times) followed by Leu (twice) and then other residues. Although the significance of such correlations remains to be tested, they are intriguing and can lead to models that can be further tested using newly designed substrates. However, these correlations can be made with only with techniques that yield discreet results.
The differences in the
approaches used for the use of monovalent and polyvalent phage as
protease substrate discovery tools invites comparison of the two
systems. While the monovalent system has been shown to be quite useful,
the polyvalent system possesses certain advantages: 1) all phage can
act as substrate phage in the polyvalent system. In monovalent systems,
it is estimated that only 10% of the phage particles carry one copy of
the recombinant pIII protein(25) . This increases the effective
substrate concentration in the polyvalent system by at least 50-fold,
thus increasing the sensitivity of the system (due to higher
concentration of substrate at a given level of phage). Polyvalent phage
preps give stronger signal on Western blots, making the phage
proteolysis assay possible. 2) Since 90% of the monovalent phage do not
carry pIII fusions, the non-recombinant phage lacking the tether must
be removed prior to selection. This is accomplished by immobilizing the
recombinant phage in microtiter plates coated with tether-binding
protein and treating the phage with protease while immobilized.
Polyvalent phage, being 100% recombinant, are digested in solution
rather than immobilized on a solid surface. The advantages of this are:
(i) there is little restriction on number of phage that can be
screened in solution, but the surface system limits the number of phage
that can be routinely immobilized on microtiter plates. Scale up of the
solution phase system is thus very convenient when significantly larger
libraries are prepared (i.e. in our first experiment,
10 phage were treated in a single reaction); (ii)
protease resistance of tether binding protein (i.e. mAb) is
not an issue; (iii) solution proteolysis offers more precise
control of cleavage conditions. This has proven especially useful in
the quantitative dot-blot assay.
The major disadvantage of the polyvalent system described here is the appearance of non-reactive phage clones, which does not occur in the monovalent system because of the pre-binding step which essentially eliminates clones with defective epitopes. Nothing about the polyvalent method, however, precludes our use of a binding step in later rounds to eliminate non-reactive clones.
In summary, we describe a system that can be used for the routine
isolation of new substrates for poorly characterized endoproteases. The
system is simple and rapid. Few assumptions about the nature of the
selected protease need be made (what is the true physiological
substrate; is it a serine or cysteine protease, etc). In fact, the
protease need not be pure; it should only be free of other proteolytic
(or protease inhibiting) activities. Filamentous phage are valuable
tools for studying proteases as they are generally protease resistant.
We have found no nonspecific degradation due to stromelysin,
matrilysin, HIV protease, and tissue type plasminogen activator. ()If a protease is found that degrades the phage, one should
be able to treat the phage vector with an excess of protease and select
for mutant phage that have become resistant to proteolysis and use this
modified vector to prepare a new substrate library.
Note Added in
Proof-A highly active peptide substrate for stromelysin with
a k/K
value of 218,000 was recently identified (Nagase, H.,
Fields, C. G., and Fields, G. B. (1994) J. Biol Chem.269, 20952-20957). Although this substrate is more active than our
best substrate, it does contain 11 amino acids (some of which are
unnatural) and is thus much larger than our substrates.