(Received for publication, November 3, 1994; and in revised form, January 9, 1995)
From the
This report describes the cloning and sequencing of a novel
protease gene derived from Streptomyces griseus. Also
described is the heterologous expression of the gene in Bacillus
subtilis and characterization of the gene product. The sprD gene encodes a prepro mature protease of 392 amino acids
tentatively named S. griseus protease D (SGPD). A significant
component of the enzyme preregion was found to be homologous with the
mitochondrial import signal of hsp60. The sprD gene was
subcloned into an Escherichia coli/B. subtilis shuttle vector
system such that the pro mature portion of SGPD was fused in frame with
the promoter, ribosome binding site, and signal sequences of
subtilisin. The gene fusion was subsequently expressed in B.
subtilis DB104, and active protease was purified. SGPD has a high
degree of sequence homology to previously described S. griseus proteases A, B, C, and E and the -lytic protease of Lysobacter enzymogenes, but unlike all previously
characterized members of the chymotrypsin superfamily, the recombinant
SGPD forms a stable
dimer. The amino acid sequence of
the protein in the region of the specificity pocket is similar to that
of S. griseus proteases A, B, and C. The purified enzyme was
found to have a primary specificity for large aliphatic or aromatic
amino acids. Nucleotide sequence data were used to construct a
phylogenetic tree using a method of maximum parsimony which reflects
the relationships and potentially the lineage of the chymotrypsin-like
proteases of S. griseus.
Serine proteases catalyze the hydrolysis of amides and esters by a common catalytic mechanism involving a triad of the residues serine, histidine and aspartic acid. Beyond mechanism, this family of enzymes has two branches that are differentiated from one another by the type of protein fold. One branch is comprised of enzymes which have a subtilisin-like tertiary structure; the other branch has a chymotrypsin-like tertiary structure. It is commonly believed that the two branches of the family evolved independently and converged upon the same catalytic mechanism(1) .
In terms of function,
proteases of the chymotrypsin superfamily are an extraordinarily
divergent group of enzymes. The group encompasses enzymes involved in
mammalian blood clotting cascades, digestive enzymes of the
pancreas(2) , enzymes involved in the regulation of the cell
cycle(3) , and enzymes involved in the maturation and secretion
of other proteins(4) . In a previous study(5) , we
isolated two genes of the organism Streptomyces griseus by
virtue of their genetic homology to the chymotrypsin-like S.
griseus protease B (SGPB). ()In that study the sequence
and preliminary characterization of one of two enzymes, designated S. griseus protease C (SGPC), was presented. SGPC was found to
have a primary specificity for large aliphatic or aromatic amino acids
and, remarkably, possessed a carboxyl-terminal domain with homology to
chitin-binding domains of certain chitinases. We now present the
sequence and preliminary characterization of the gene encoding a second
enzyme, tentatively named S. griseus protease D (SGPD). This
enzyme also has a primary specificity for substrates with large
aliphatic and aromatic side chains, but has an exceptional quaternary
structure. Unlike any known protease of the chymotrypsin superfamily,
SGPD is a stable dimer and, unlike the known homologues found in S.
griseus and Lysobacter enzymogenes, SGPD has an acidic
isoelectric point.
Restriction endonucleases and DNA modifying enzymes were purchased from either New England Biolabs or Life Technologies, Inc., with the exception of T7 DNA polymerase from Pharmacia Biotech Inc. and calf intestinal phosphatase (CIP) from Boehringer Mannheim. All chemicals and reagents were of the highest grade commercially available.
Acetone was added to the
retentate with stirring to a final concentration of 60% (v/v). After
stirring for 10 min, the mixture was centrifuged at 4,000 g for 15 min, and the pellet was discarded. Acetone was added to the
supernatant to a final concentration of 75% (v/v), and the mixture was
again stirred and centrifuged as above. The pellet from this second
fractionation was resuspended in 150 ml of 100 mM sodium
phosphate (pH 7.0). Proteolytic activity was monitored during all
fractionations.
The sample was applied to a 60 3-cm
S-Sepharose cation exchange column (Pharmacia) equilibrated with 10
mM sodium phosphate, pH 7.0 (buffer A), in order to remove
cationic contaminants. The column was washed with the same buffer, and
the flow-through was collected in 25-ml fractions. Fractions with
activity toward N-succinyl-Ala-Ala-Pro-Phe-p-nitroanilide were pooled
and dialyzed against 10 mM Tris, pH 8.0 (buffer B), overnight
at 4 °C.
The dialyzed sample was applied to a Pharmacia Mono-Q
anion exchange column using a Pharmacia fast protein liquid
chromatography system. The column was washed with buffer B until A baselined. The enzyme was then eluted in a
salt gradient from 0 to 0.25 M NaCl in buffer B in 60 min. The
proteolytic activity of the recombinant enzyme was monitored during all
purification steps with the chromogenic substrate N-succinyl-Ala-Ala-Pro-Phe-p-nitroanilide (12) as described previously(5) . Fractions with
activity toward N-succinyl-Ala-Ala-Pro-Phe-p-nitroanilide were
analyzed by SDS-PAGE in 12% gels(13) , and those fractions
exhibiting a single 34-kDa band were pooled. Protein concentrations
were determined using the method of Lowry(14) . The
amino-terminal sequence of the purified protein was determined using an
Applied Biosystems model 473 protein sequencer at the Microsequencing
Center of the University of Victoria, British Columbia, Canada.
The 2.3-kbp insert of pDS-D contains a gene,
designated sprD, which encodes a polypeptide of 392 amino
acids (Fig. 1). The organization of the open reading frame is
analogous to that of the previously characterized S. griseus proteases A, B(6) , E(7) , and C (5) and
the -lytic protease of L. enzymogenes(20) . On
the basis of sequence alignments with the open reading frames of these
protease genes, we concluded that sprD encodes a prepro mature
form of an uncharacterized serine protease which we designated S.
griseus protease D (SGPD).
Figure 1: Nucleotide sequence of the sprD gene and the deduced amino acid sequence of SGPD. The numbering to the right of the sequences is relative to the first nucleotide in the known sequence and to the first amino acid coded by the gene. The numbering that appears above the sequence is relative to the first amino acid in the mature protease. A putative ribosome binding site is indicated by a series of dots preceding the initiation codon. Inverted repeated regions which follow the termination codon are underlined. Junctions between the pre- and proregions and pro and mature regions are indicated by a closed and an open triangle, respectively.
The prepro and pro mature junctions shown in Fig. 1were initially assigned on the basis of sequence alignments with the other S. griseus and L. enzymogenes serine proteases (amino-terminal analysis of the mature enzyme confirmed the location of the junction between the pro and mature portions of the polypeptide; see below). Mature SGPD, encompassing the final 188 amino acids of the open reading frame, is preceded by a leader peptide of 204 amino acids. The amino-terminal 64 residues of this leader peptide constitute a pre-peptide while the remaining 140 residues form the propeptide.
The 64-residue preregion of SGPD is
significantly longer than the preregions of the other proteases which
range in length from 29 to 40 amino acids. The 40-residue
carboxyl-terminal segment of the SGPD preregion is characteristic of
bacterial secretion signals (21) and shares significant
homology with the preregion of SGPB (Fig. 2). The 24
amino-terminal residues form an amino-terminal extension not present in
the other proteases. Interestingly, a computer search of the complete
nonredundant DNA/protein data base revealed that this region shares
significant homology with the mitochondrial signal sequence of hsp60 (22, 23) (Fig. 2). Moreover, the predictive
method of Gavel and von Heijne (24) revealed that residues
1-44 comprise a sequence which is consistent with mitochondrial
import signals, and residues 24-40 have the potential of forming
an amphipathic -helix with one face highly positively charged, a
motif considered essential for translocation of proteins into
mitochondria(25, 26) .
Figure 2:
Homology of the SGPD preregion with
bacterial and mitochondrial signal sequences. The preregion of SGPD is
shown in alignment with the presequence of the protease SGPB and the
mitochondrial import sequence of human hsp60(22, 23) .
The unusually long preregion of SGPD can be divided into two domains on
the basis of homologies with the prokaryotic secretion and
mitochondrial import signals. The predictive method of Gavel and von
Heijne (24) revealed that residues 1-44 comprise a
sequence that is consistent with mitochondrial import signals. Residues
24-40 have the potential of forming an amphipathic -helix
with one face highly positively charged, a motif considered essential
for translocation of proteins into
mitochondria(25, 26) . The carboxyl-terminal portion
of the preregion (amino acids 45-64) is homologous with the
preregion of SGPB.
The 5`-untranslated region of sprD contains a putative ribosome binding site that was identified by comparison with other Streptomyces gene sequences(6, 27) . The translation stop codon is followed by an inverted repeated sequence capable of forming a stable hairpin loop (Fig. 1). Such structures, believed to be involved in transcription termination, have been identified in other Streptomyces genes(6, 27) .
Figure 3: Substrate specificity of SGPD. The specific activity of SGPD is shown relative to the protease SGPB which is known to be chymotrypsin-like in activity. Specific activities were examined for a series of substrates having the general sequence N-succinyl-Ala-Ala-Pro-X-p-nitroanilide; X, the amino acid at the P1 site of the substrate, was varied. The P1 amino acid is indicated to the right of each data point. The data points fall on line with a slope = 1, indicating that the two enzymes have approximately the same substrate specificities.
SDS-polyacrylamide gel electrophoresis of
SGPC and SGPD provided further evidence that SGPD exists as a very
stable homodimer. Under standard denaturing conditions (see
``Materials and Methods''), SGPC showed a single band
corresponding to a molecular mass of 26 kDa, in good agreement with the
monomeric molecular mass predicted from the DNA sequence of sprC(5) . Under the same conditions, SGPD (which
exhibited a single peak by gel filtration chromatography) resolved into
two distinct bands with apparent molecular masses of approximately 17
and 34 kDa, approximating monomeric () and dimeric
(
) molecular masses for the protease. Indeed, under
these conditions, SGPD exists mainly in dimeric form (Fig. 4). A
sample prepared in nondenatured condition was also included in the
analysis to establish the position of the native form of the enzyme
after electrophoresis. The predominantly negatively charged SGPD (Fig. 5) had a high electrophoretic mobility in its undenatured
form but migrated to a distinct position relative to the monomeric or
dimeric forms of the denatured enzyme. Amino-terminal analysis of the
34-kDa band confirmed the position of the pro mature junction, ruling
out the possibility that this band represents an unprocessed pro mature
form of the enzyme.
Figure 4:
SDS-PAGE analysis of SGPD. Polyacrylamide
gel electrophoresis of the proteases SGPC and SGPD. Lane 1,
denatured molecular weight standards; lane 2, SGPC prepared
under ``standard denaturing conditions'' (see
``Materials and Methods''); lane 3, SGPD prepared
under standard denaturing conditions; lane 4, SGPD prepared
under nondenaturing conditions. SGPC has an apparent molecular mass of
26 kDa (5) , whereas, under standard denaturing conditions,
SGPD resolves into two bands with apparent molecular masses of 17 and
34 kDa, approximating monomeric () and dimeric (
)
masses for the protease. Electrophoresis of SGPD prepared in
nondenaturing conditions produced a single band clearly distinguishable
from SGPD prepared under standard denaturing
conditions.
Figure 5: Summary of the properties of bacterial chymotrypsin-like serine proteases. The prepro mature organization of the six homologous proteases is illustrated. The homologous preregions are shown in solid boxes, proregions are in open boxes, and mature regions are shaded. The carboxyl- and amino-terminal extensions of the enzymes SGPC and SGPD are cross-hatched. To the right of each illustration is the quaternary structure of the mature protease, the isoelectric point (pI) deduced from sequence information using the computer program PC/Gene (IntelliGenetics, Inc., Mountain View, CA), and the primary amino acid specificity of the enzymes.
Figure 6:
Alignment of protease amino acid
sequences. The best alignment of A, the pro, and B,
the mature regions of the proteases SGPA, B, C, D, E, and -lytic
protease are shown. Regions of significant homology are indicated in bold and correspond to identities in at least 2 of 2, 3 of 4,
4 of 5, or 4 of 6 of the aligned sequences.
Figure 7:
Phylogenetic tree of the bacterial
proteases SGPA, B, C, D, and E and the -lytic protease. The tree
was constructed using the nucleotide sequences of the mature regions of
the respective proteases. Nucleotide sequence alignments correspond to
the amino acid alignment shown in Fig. 6B. Pre- and
proregion sequences and the sequence of the carboxyl-terminal domain of
SGPC were not included in the analysis. The number at each of the forks
represents the number of times that the particular grouping (consisting
of the species to the right of the fork) was generated during the 100
bootstrap replicates.
In a probe of S. griseus genomic DNA we detected five genes with significant homology to S. griseus protease B (5) . We now know that these genes correspond to three well characterized S. griseus proteases (namely SGPA, SGPB itself (6) and SGPE(7) ) and two novel proteases (SGPC (5) and SGPD). Hybridization studies and sequence analyses indicated a very close relationship between the mature regions of the five enzymes. Nevertheless, the two newly discovered enzymes were found to be remarkably distinct in aspects of their structure. For example, SGPC has a carboxyl-terminal addition with a high degree of homology to chitin-binding domains and, as discussed below, the recombinant SGPD forms an extraordinarily stable dimer.
In prokaryotes, the preregion acts to signal translational secretion of extracellular enzymes. The preregion sequence of SGPD (Fig. 1) is significantly longer than that of other bacterial proteases (62 amino acids) and it can be divided into two parts, an amino-terminal segment with the characteristics of a mitochondrial import signal and a carboxyl-terminal segment characteristic of bacterial secretion signals. These characteristics are most evident when the preregion is aligned with the preregions of SGPB (6) and the mitochondrial heat shock protein hsp60 (22, 23) (Fig. 2). We are aware of no other prokaryotic enzyme with this type of signal sequence. The unusual organization of the preregion in SGPD suggests that the protease has a function distinct from that of other S. griseus proteases.
A recombinant sprD gene was
constructed and expressed in B. subtilis in a system that we
have used successfully to express the proteases SGPB, SGPC(5) ,
SGPE(7) , SGPA, and the -lytic protease.
The
purified enzyme showed a primary specificity toward large aliphatic and
aromatic amino acids that was virtually identical to that of SGPB (Fig. 3). However, SGPD isolated from B. subtilis culture supernatants proved to be much larger than anticipated
from the nucleotide sequence of sprD. SDS-PAGE gels gave two
bands corresponding to proteins with molecular masses of approximately
34 and 17 kDa (Fig. 4). Gel filtration (size exclusion)
chromatography subsequently established that the enzyme exists in the
form of a stable homodimer. The molecular mass of a monomeric SGPD
should be 18.7 kDa according to sequence data, and consequently, a
homodimer should have a mass of roughly 36 kDa. The high mobility of
the protein in SDS-PAGE is most likely due to the high negative charge
on the protein, although it is possible that SGPD experiences some
limited proteolysis during maturation causing a reduction in its mass.
Amino-terminal analysis of the protein ruled out the possibility that
the 36-kDa band corresponded to an unprocessed pro mature form of the
protein.
It is remarkable that SGPD should have such a high degree of homology to its monomeric cousins and yet form such a stable dimer. Hence, the transition from monomer to dimer (or vice versa) involves relatively few residues in the protein. Given the high negative charge on SGPD (Fig. 5) one might expect the monomers to repel one another. Therefore, dimerization may involve metal chelation, the formation of intermolecular salt bridges, or both. We are currently examining the physical basis for the extraordinary stability of the SGPD dimer in denaturing conditions.
The unusual quaternary structure of SGPD combined with the unusual signal sequence suggests that the enzyme is targeted to a subcellular location. This notion is supported by the fact that SGPD has never been observed in S. griseus secretions even though the substrate specificity of the enzyme is similar to that of the well characterized proteases SGPA and SGPB (Fig. 3). Prokaryotes are not known to contain specific organelles, however subcellular compartments, or mesosomes, have been observed in Streptomyces(30) and other genera of bacteria(31, 32, 33) . Although the significance of mesosomes is controversial, they may have functions similar to the periplasmic spaces of Gram-negative bacteria or even the organelles of eukaryotic cells(34) . It is tempting to speculate that SGPD is directed to one of these structures.
The unusual signal sequence of SGPD has implications for the endosymbiont hypothesis which proposes that mitochondria are derived from bacteria. According to this hypothesis the ``mitochondrial'' genes of a proto-eukaryotic cell were moved to the nucleus (35) where they had targeting sequences attached to them. Therefore, similarities between bacterial and mitochondrial targeting are to be expected. The fact that the preregion of SGPD contains features of both mitochondrial and prokaryotic signal sequences lends support to the endosymbiont hypothesis, but it also implies that so-called ``mitochondrial'' targeting sequences predate the existence of mitochondria.
Serine proteases are divided into two groups
according to their structural (tertiary) similarity to either the
enzyme chymotrypsin or subtilisin and the enzymes that are the subject
of this study are all chymotrypsin-like in structure. Membership in the
chymotrypsin branch of the family can be further divided according to
the dimensions of the enzymes' proregions. With one exception,
chymotrypsin-like enzymes derived from bacteria possess large
propeptide regions. The related mammalian enzymes possess small
propeptides, amounting in some cases to a few amino acids. Studies of
the -lytic protease have demonstrated the importance of bacterial
proregions in catalyzing the proper folding and maturation of bacterial
enzymes(36, 37) . Contrasting the situation in
bacteria, the function of the proregion in mammalian enzymes is to
block the amino terminus of the protease, holding the enzyme in an
inactive state until the propeptide is cleaved from the zymogen. Hence,
the mammalian propeptides appear not to be involved in
``catalyzing'' the folding process.
There is considerable
variation in the lengths of proregions even within the group of
bacterial enzymes compared in Fig. 5and Fig. 6. SGPA and
SGPB appear to fall into one group according to the length of their
proregions, SGPC and the -lytic protease in another group, and
SGPE and SGPD in a third. This arrangement is also reflected in the
phylogeny produced using the parsimony (DNApars) and bootstrap
(DNAboot) analyses of the mature regions of each protein (Fig. 7).
S. griseus trypsin (SGT) is the one exception to the distinction between bacterial and mammalian enzymes. It has a bacterial origin but in terms of sequence and structure it is more closely related to mammalian trypsin than other Streptomyces enzymes. The propeptide of SGT is 4 residues in length(38) , similar to the proregions of mammalian trypsins (4 residues) (39) and notably, SGT was not detected by hybridization with SGPB. The anomolous relationship of SGT to bacterial and mammalian enzymes has even led some authors to speculate that SGT was acquired from a mammalian source only recently (40, 41) . Perhaps a more satisfactory explanation is that the proregions of the bacterial proteases are becoming shorter through the course of evolution and SGT is simply furthest of the S. griseus enzymes from the ancestral.
The phylogenetic
tree shown in Fig. 7indicates that the enzymes SGPC and
-lytic protease have diverged the most from the other proteases in
the analysis. The six proteases form a monophyletic group beginning
with
-lytic and followed by SGPC, SGPB, SGPA, and finally SGPD and
SGPE. DNA bootstrap analysis strongly supports the relationships,
placing excellent confidence limits on the branches between
-lytic, SGPC, SGPB, and the tricotomy formed between SGPB and the
remaining three proteases (SGPA, D, and E).
We believe that
-lytic protease and SGPC are the two most ancient proteases in our
study, primarily because the two enzymes have the most extensive
proregions. Notably, these are also the only two enzymes with three
disulfide bonds instead of two (Fig. 5). It can be argued that
the presence of the two homologous proteases in different genera of
bacteria is evidence that they arose from a common ancestor before the
organisms diverged.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) L29019[GenBank].