(Received for publication, October 23, 1995)
From the
The replicase of equine arteritis virus, an arterivirus, is
processed by at least three viral proteases. Comparative sequence
analysis suggested that nonstructural protein 4 (Nsp4) is a serine
protease (SP) that shares properties with chymotrypsin-like enzymes
belonging to two different groups. The SP was predicted to utilize the
canonical His-Asp-Ser catalytic triad found in classical
chymotrypsin-like proteases. On the other hand, its putative
substrate-binding region contains Thr and His residues, which are
conserved in viral 3C-like cysteine proteases and determine their
specificity for (Gln/Glu)(Gly/Ala/Ser) cleavage sites. The
replacement of the members of the predicted catalytic triad (His-1103,
Asp-1129, and Ser-1184) confirmed their indispensability. The putative
role of Thr-1179 and His-1199 in substrate recognition was also
supported by the results of mutagenesis. A set of conserved candidate
cleavage sites, strikingly similar to junctions cleaved by 3C-like
cysteine proteases, was identified. These were tested by mutagenesis
and expression of truncated replicase proteins. The results support a
replicase processing model in which the SP cleaves multiple
Glu
(Gly/Ser/Ala) sites. Collectively, our data characterize the
arterivirus SP as a representative of a novel group of
chymotrypsin-like enzymes, the 3C-like serine proteases.
The proteolytic processing of viral precursor proteins, composed
of structural and/or replicative subunits, is a crucial step in the
life cycle of the majority of positive-stranded RNA
viruses(1, 2, 3) . The processing of
replicase protein precursors is almost exclusively conducted by
specific virus-encoded proteases, most of which are assumed to be
distantly related to papain-like or chymotrypsin-like (CHL) ()cellular proteases(4) . The only common property
of viral and cellular CHL enzymes appears to be the distinctive double
-barrel fold. Viral CHL proteases do not contain the set of
conserved disulfide bonds found in their cellular counterparts, which
may be due to the fact that they have to function in the reducing
environment of the cytosol. Only a few established viral CHL enzymes,
like the alphavirus capsid protease and the flavivirus NS3 proteases,
contain the His-Asp-Ser catalytic triad that is found in the cellular
world(2, 5, 6, 7, 8) . In a
large number of viral CHL proteases, the picornavirus 3C protease being
the prototype, Cys replaces Ser as the catalytic nucleophile. In
addition to this substitution, a number of these ``3C-like''
enzymes utilize Glu instead of the active site
Asp(5, 9, 10, 11) .
Unlike most of their cellular counterparts, viral CHL enzymes possess high substrate specificity, which is sometimes conserved among related viruses. The sites cleaved by 3C-like Cys proteases, for example, are generally similar. They usually contain Gln or Glu at the P1 position and a small amino acid residue (Gly, Ala, or Ser) at the P1` position (3, 12) . Two domains in particular are thought to determine this conserved substrate specificity. These regions are separated in the primary structure of the enzyme (Fig. 1B) but are spatially juxtaposed. They form part of a wall and the bottom of the substrate-binding pocket (10, 11) and contain a highly conserved His and Thr residue, respectively(5, 9) . The importance of these residues for substrate recognition is underlined by the fact that other amino acids are found at these positions in the NS3 Ser proteases from the family Flaviviridae (Fig. 1B). The NS3 proteases of the three different genera of this family (flaviviruses, pestiviruses, and hepatitis C virus) cleave at sites that are genus-specific, and the P1 residues of these sites do not resemble those recognized by 3C-like Cys proteases.
Figure 1: A, preliminary processing scheme of the EAV replicase ORF1a protein(20) . The three EAV protease domains(13, 19, 22) , corresponding cleavage sites, and the estimated sizes (SDS-PAGE) of the previously identified cleavage products Nsp1 to Nsp6 are depicted. B, sequence alignment of the most important regions of viral and cellular CHL enzymes. The upper block of sequences contains a representative set of 3C-like Cys proteases. The middle block is comprised of (putative) 3C-like Ser proteases and includes the Nsp4 SP sequences of three arteriviruses, EAV, lactate dehydrogenase-elevating virus, and porcine reproductive and respiratory syndrome virus. The proteases in this block were selected from sequence data bases in the course of comprehensive searches that were initiated with three arterivirus proteases and then extended to other proteases containing the characteristic Thr and His residues in their substrate-binding pocket (see text). The lower block contains chymotrypsin, the prototype enzyme, and a set of related (not 3C-like) viral Ser proteases. The numbers on either side of the sequences indicate the distance to the (predicted) N- and C-terminal residue of the protease or protease-containing cleavage product. The EMBL/GenBank (GB) or SwissProt (SP) data base accession numbers of the sequences used in this figure are indicated. Active site residues are highlighted in black (reverse type). The conserved Thr and His residues discussed in this paper are shown in bold and indicated with @. Other important residues from the putative substrate-binding region are indicated with #. In the alignment of the BLIC BLASE protease, a stretch of four amino acids (Arg-Thr-Asn-Cys) has been deleted between two Ser residues shown in lower case. HAV, hepatitis A virus; HRV14, human rhinovirus 14; TEV, tobacco etch virus; RHDV, rabbit hemeorrhagic disease virus; CPMV, cowpea mosaic virus; TBRV, tomato black ring virus; H-AstV, human astrovirus; PEMV, pea enation mosaic virus; BYDV, barley yellow dwarf virus; BWYV, beet Western yellows virus; CAYV, cucurbit aphid-transmitted yellow virus; PLRV, potato leaf roll virus; SBMV, Southern bean mosaic virus; CMV, cocksfoot mottle virus; RYMV, rice yellow mottle virus; MBV, mushroom bacilliform virus (Australian isolate); SAUR ETA, ETB, and V8, Staphylococcus aureus exotoxins A and B and V8 protease, respectively; EFAE, Enterococcus faecalis; sprE, serine proteinase E; BSUB, Bacillus subtilis; MPR, metalloprotease; BLIC, Bacillus licheniformis; BLASE, B. licheniformis protease; SGRI, S. griseus; PrE, protease E; SFRA, Streptomyces fradiae; SP1, serine protease 1; HeCV, hepatitis C virus; YFV, yellow fever virus; SNBV, Sindbis virus; BOVI CHT, bovine chymotrypsin.
Equine arteritis virus (EAV) (13) is the prototype and best studied member of the arteriviruses, a recently recognized group of positive-stranded RNA viruses with a polycistronic genome of between 12 and 16 kilobases(13, 14, 15, 16) . The organization and expression of the arterivirus genome are strikingly similar to those of coronaviruses, and evidence for the common ancestry of the replicases of both virus families has been reported previously(13, 14, 17, 18) . The arterivirus replicase gene is comprised of two open reading frames (ORFs), ORF1a and ORF1b, which are connected by a ribosomal frameshift site(13) . Translation of the EAV genome yields an ORF1a polyprotein and a C-terminally extended ORF1ab protein consisting of 1727 and 3175 amino acids, respectively. These products are proteolytically processed (19, 20, 21, 22) , and we have recently published a preliminary processing scheme for the ORF1a polyprotein (Fig. 1A, (20) ). In infected cells, this 187-kDa precursor was found to be cleaved at least five times, resulting in the generation of nonstructural proteins (Nsp) 1-6. We have reported the characterization of two ORF1a-encoded Cys autoproteases. A papain-like protease in Nsp1 cleaves the Nsp1/2 junction(19) . Likewise, Nsp2 is generated by autoproteolysis, and an unusual Cys protease was recently identified in the conserved N-terminal domain of Nsp2(22) . The presence of a third EAV protease domain could be predicted from comparative sequence analysis(13, 14) . The arterivirus Nsp4 region contains sequence motifs characteristic of cellular and viral CHL Ser and Cys proteases (Fig. 1B). For EAV, a putative catalytic triad consisting of His-1103, Asp-1129, and Ser-1184 was identified (Fig. 1B, (13) ). These residues were later shown to be conserved in the sequences of two other arteriviruses, porcine reproductive and respiratory syndrome virus (15) and lactate dehydrogenase-elevating virus(16) .
Interestingly, the predicted arterivirus Ser protease (SP) also contains sequence elements typical of the substrate-binding pocket of 3C-like Cys proteases (Fig. 1B) and was accordingly postulated to be a 3C-like Ser protease(13, 14) . Our present analysis of the EAV Nsp4 SP constitutes the first experimental characterization of such a protease. We show that the arterivirus SP indeed combines the catalytic system of a CHL Ser protease with the substrate specificity of a 3C-like Cys protease and should therefore be considered as the prototype of a novel group of proteolytic enzymes.
Figure 5: SDS-PAGE of native and synthetic Nsp2, Nsp3, Nsp4, and Nsp34. A, scheme of the full-length ORF1a protein and the set of four truncated expression products. The location of putative cleavage sites and the calculated sizes of the interlying sequences are indicated. B, native and synthetic ORF1a cleavage products were immunoprecipitated from infected and transfected cell lysates, respectively. They were electrophoresed, side by side, in the same gel. Note that the vaccinia virus infection in the expression system induced a set of background bands (especially in the 30-50-kDa region), which were absent from the lanes derived from EAV-infected cells. For the left panel, the immunoprecipitation was carried out with the Nsp2-antiserum, which coprecipitates Nsp3(20) . For the right panel, a mixture of two Nsp4-antisera was used.
Figure 2:
Mutagenesis of the predicted active site
residues of the EAV Nsp4 SP. A, schematic representation of
the processing of the Nsp3456 region of the wild-type ORF1a protein
(expressed from construct pL1a). The residues of the predicted
catalytic triad are depicted in bold. The autoradiograph shows
the results obtained with pL1a (wt) and a set of mutant
pL1a-derived constructs encoding proteins with a single amino acid
substitution. The constructs were expressed using the vaccinia virus/T7
system(34) , and expression products were immunoprecipitated
with an antiserum directed against the C-terminal 220 amino acids of
the ORF1a protein (5 serum, (20) ). The result from a
control transfection without DNA (mock) is also shown. B,
immunoprecipitation analysis of the same cell lysates using two
Nsp4-specific peptide antisera (
4M and
4C, (20) ).
The replacement
of His-1103 (by Gly or Arg) and the substitution of Ser-1184 (by Cys,
Phe, Ile, or Tyr) completely inactivated Nsp3456 processing.
Substitution of Asp-1117 (by either Asn or Thr) had no effect. When
Asp-1129 was replaced by Lys or Val, the Nsp3456 precursor was not
cleaved at all. Interestingly, the mutant carrying the conservative
Asp-1129 Glu substitution was able to cleave the Nsp4/5 site
with wild-type efficiency but did not process the Nsp3/4 and Nsp5/6
junctions at a detectable level (Fig. 2; see also
``Discussion''). These results formed the first experimental
proof for a proteolytic function of EAV Nsp4 in the processing of
Nsp3456 into the Nsp34 and Nsp56 intermediates and, subsequently, into
the previously described end products Nsp3 to Nsp6(20) . The
mutagenesis results were in agreement with the previously published
active site predictions(13, 14, 15) .
Figure 3:
Mutagenesis of the most conserved residues
in the predicted substrate-binding pocket of the EAV Nsp4 SP. A number
of substitutions were introduced at the position of Thr-1179 and
His-1198. Expression and immunoprecipitation using the 5 antiserum
were carried out as described for Fig. 2A.
The replacement of His-1198 by
Leu, Arg, or Tyr completely abolished processing of the two sites
mentioned above. This was also the case when Thr-1179 was replaced by
Asp. The Thr-1179 Ser and Thr-1179
Gly mutants cleaved
the Nsp4/5 site with reduced efficiency (approximately 80 and 10% of
wild-type efficiency, respectively) but were unable to cleave the
Nsp5/6 site at all. The protease containing a Thr-1179
Asn
mutation retained wild-type activity toward the Nsp4/5 junction but
cleaved the Nsp5/6 site with increased (approximately double)
efficiency. These results demonstrated that Nsp4 SP activity is highly
sensitive to even subtle replacements at the positions of Thr-1179 and
His-1198, an observation that is fully compatible with the predicted
role of these residues in substrate recognition.
On the basis of cleavage site preferences of other 3C-like
proteases, Godeny et al.(14) had previously predicted
three SP cleavage sites in the C-terminal half of the arterivirus ORF1a
protein. These proposed sites (downstream of residues 1064, 1268, and
1430 in EAV; Fig. 4A) are conserved in the three
arterivirus sequences known to date. They contain a Glu residue at the
P1 position and a Gly or Ser at the P1` position (see also Fig. 6). However, the estimated sizes (from
SDS-PAGE(20) ) of especially Nsp4 and Nsp5 were difficult to
reconcile with the predicted cleavages downstream of amino acids 1268
and 1430 (Fig. 4A). On the other hand, a conserved
candidate Nsp5/6 cleavage site could be identified: a Glu-Gly dipeptide
at amino acids 1677-1678 (see also Fig. 6). Therefore, we
tested the effect of a Glu Pro substitution at the conserved P1
position of each of the four candidate cleavage sites. This mutation
was expected to interfere strongly with processing by a 3C-like
protease.
Figure 4:
Mutagenesis of putative cleavage sites for
the EAV Nsp4 SP. A, schematic representation of ORF1a protein
processing. The previously observed sizes of the six cleavage products
(SDS-PAGE estimates, (20) ) are shown at the top of
the figure. The putative Nsp2/3 cleavage site and four candidate SP
cleavage sites within Nsp3456, all derived from comparative sequence
analysis, are indicated, and the amino acid number of the P1 residue of
each site is shown. B, expression of ORF1a protein mutants
carrying a Pro substitution at the P1 position of the five candidate
cleavage sites. See Fig. 2for experimental procedures.
Immunoprecipitation was carried out using the 5 antiserum and a
peptide antiserum directed against Nsp2 (
2, (20) ). C, immunoprecipitation analysis of a subset of the samples
using two Nsp4 antisera.
Figure 6: Alignment of (putative) cleavage sites for the arterivirus Nsp4 SP in the ORF1a polyprotein. Two possible additional cleavage sites located within Nsp5 have also been included. The P1 and P1` residues are highlighted in black (reverse type).
Three of the four mutations indeed completely inhibited
one of the previously described cleavages (Fig. 4, B and C). Specifically, the Glu-1064 Pro mutant did
not produce Nsp4 but did generate Nsp34, Nsp56, and Nsp5, indicating
that only Nsp3/4 cleavage was blocked. After expression of the Glu-1268
Pro mutant, Nsp4, Nsp56, and Nsp5 were not detected, suggesting
that the Nsp4/5 junction was not processed. The Glu-1677
Pro
substitution resulted in the generation of Nsp4 and Nsp56, but the
latter was not processed at the Nsp5/6 site as judged by the absence of
fully cleaved Nsp5. Thus, the processing of the Nsp3/4, Nsp4/5, and
Nsp5/6 junctions was completely inhibited by substitution of conserved
Glu residues at positions 1064, 1268, and 1677, respectively,
suggesting that these residues were indeed located at (or close to)
these cleavage sites. The Glu-1430
Pro substitution at the
fourth candidate cleavage site did not influence the generation of
Nsp4, Nsp56, or Nsp5 (Fig. 4B).
Previously, the Nsp2/3 junction had
been estimated to be located close to amino acid 825 in the EAV ORF1a
protein(22) . Residues 834-852 are extremely hydrophobic
and, therefore, are unlikely to be accessible to the Nsp2
autoprotease(20) . This protease is most similar to viral
papain-like proteases(22) , which often cleave between two
small amino acid residues (3) Taken together, these
considerations pointed toward a Gly-Gly dipeptide at position
831-832, which is conserved in other arteriviruses, as the most
likely EAV Nsp2/3 cleavage site. To test this hypothesis, a Gly
Pro substitution was introduced at the putative P1 position of this
site (residue 831). This mutation indeed completely inhibited the
generation of Nsp2 (Fig. 4B), but surprisingly all
downstream cleavages were also abolished, suggesting that cleavage of
the Nsp2/3 junction is a prerequisite for processing of the downstream
Nsp3456 region.
Using the putative Nsp2/3, Nsp3/4, and Nsp4/5 cleavage sites (downstream of residues 831, 1064, and 1268, respectively), expression vectors were created that encoded ``synthetic Nsps'' with N and C termini that closely resembled those of native Nsps (Fig. 5A). These expression products were immunoprecipitated and compared in SDS-PAGE with proteins from infected cell lysates. Samples were run side by side in the same gel (Fig. 5B).
To produce synthetic Nsp2
and Nsp3, we employed the autoproteolytic properties of Nsp1 and Nsp2.
As a result, the products expressed from pL(1-831) and
pL(1-1064) contained natural N termini and only differed from the
native Nsp2 and Nsp3 in having a three-residue C-terminal extension.
Since an anti-Nsp3 serum is not available, Nsp2 and Nsp3 were
precipitated and coprecipitated(20) , respectively, using the
anti-Nsp2 serum. The expression products comigrated with the native
Nsp2 and Nsp3 derived from infected cells (Fig. 5B, 2 panel), thereby suggesting that the putative Nsp2/3 and
Nsp3/4 borders are correct.
Subsequently, synthetic Nsp34 and Nsp4
were expressed from plasmids pL(832-1268) and
pL(1065-1268), respectively. Surprisingly, the synthetic Nsp4 of
21 kDa and the native Nsp4 molecule comigrated perfectly at a rate
corresponding to a molecular mass of 31 kDa (Fig. 5B, 4 panel). This result indicated that the putative N- and
C-terminal boundaries of Nsp4, which had been inferred from sequence
comparison and mutagenesis, were correct and that the migration of Nsp4
during SDS-PAGE is extremely aberrant. This conclusion was also
supported by the fact that the synthetic Nsp34 comigrated with its
native equivalent and cleaved itself, apparently at the correct Nsp3/4
junction since a product comigrating with Nsp4 was observed (Fig. 5B,
4 panel).
Compared to the viral CHL proteases mentioned above, the arterivirus Nsp4 SP is relatively unique. On the one hand, it employs the canonical triad of catalytic residues found in most cellular and some viral CHL enzymes (His-1103/Asp-1129/Ser-1184). The indispensability of these three residues for SP activity was confirmed by extensive site-directed mutagenesis (Fig. 2). On the other hand, the sequence analysis and mutagenesis of residues from the presumed substrate-binding pocket (Fig. 3) revealed that the arterivirus SP also possesses properties found in 3C-like Cys proteases. These were underlined by the identification of three putative SP cleavage sites that fit the 3C-like profile ( Fig. 4and 6). The fact that a Glu substitution was tolerated (to a certain extent) at the position of catalytic Asp-1129 (Fig. 2) can be interpreted as an additional similarity between the arterivirus SP and 3C-like Cys proteases. A fraction of the latter group employs Glu instead of Asp, and for some 3C-like enzymes the interchangeability of Glu and Asp has been documented(5, 36, 37, 38, 39, 40, 41, 42) .
Recent x-ray data for the human rhinovirus-14 3C protease revealed
that Thr-141 and His-160, the counterparts of EAV Thr-1179 and
His-1198, do indeed reside in the substrate-binding pocket and are
likely to be responsible for the specificity of this enzyme for Gln
(Glu) Gly (Ala/Thr) cleavage sites(11) . Thr-1179 and
His-1198 of the EAV Nsp4 SP (Fig. 1B) were probed by
site-directed mutagenesis (Fig. 3). Their replacement showed a
pronounced, albeit variable effect depending on the position mutated
and the cleavage site analyzed. Three substitutions at the position of
His-1198 all completely abrogated processing of both the Nsp4/5 and
Nsp5/6 sites. This is compatible with the drastic effect of mutagenesis
of the equivalent His residue in 3C-like Cys
proteases(43, 44, 45, 46, 47) ,
although a certain tolerance for a His
Leu substitution was
reported for the poliovirus 3C protease(48) . Processing of the
Nsp5/6 junction was affected by all seven mutations tested at the
positions of Thr-1179 and His-1198. Whereas six of these replacements
fully disabled cleavage, it was remarkable that the Thr-1179
Asn
substitution enhanced processing of this site. Three out of four
mutants at the Thr-1179 position partially (Gly and Ser) or fully (Asn)
retained their Nsp4/5 cleavage efficiency. Only one comparable
mutation, a Thr
Ser substitution that partially inactivated the
human rhinovirus-14 3C Cys protease, has been described
previously(42) . Thus, our data considerably extend the
information on the role of this residue in 3C-like proteases and
suggest a relationship between the size of the amino acid introduced
and substrate recognition, provided that the replacing residue does not
carry an acidic side chain.
The EAV SP cleavage sites could not yet be determined by direct sequence analysis of cleavage products. However, three independent, albeit indirect methods were employed, and all results point toward the same conclusion. First, conserved Nsp3/4, Nsp4/5, and Nsp5/6 cleavage sites were identified by sequence comparison (Fig. 6). Second, mutagenesis of the P1 Glu residues completely abolished processing of these junctions (Fig. 4). Third, the results of the comigration assay (Fig. 5) supported the tentative identification of the Nsp3/4 and Nsp4/5 boundaries and revealed that the true molecular mass of the arterivirus SP is 21 kDa instead of the 31 kDa observed upon SDS-PAGE. Taken separately, none of these results unequivocally proves the 3C-like substrate specificity of the arterivirus SP. Collectively, however, these data justify the classification of the arterivirus SP as a 3C-like Ser protease, especially since they are supplemented by the results from comparative sequence analysis and mutagenesis of the substrate-binding region (Fig. 3).
The Thr and His residues implicated in the substrate recognition of viral 3C-like proteases are also conserved in a group of CHL proteases of eubacterial origin (in two of these proteases the Thr residue is replaced by the similar Ser residue; Fig. 1B). The x-ray analysis of the PrE protease of Streptomyces griseus (Fig. 1B) directly implicated the counterparts of EAV Thr-1179 and His-1198 in the determination of substrate specificity(54) . This enzyme, and others that were characterized in terms of their cleavage site preference, cleaves most efficiently after a Glu residue(52, 55, 56, 57) . Although these enzymes can be distinguished from their viral relatives by having no detectable P1` preference and by cleaving after Gln with low efficiency, their other features still support the idea that a cellular branch of 3C-like proteases may exist.
The evolutionary
relationships between the various 3C-like Ser proteases and other CHL
Ser and Cys proteases are yet to be inferred. On the basis of
theoretical considerations, it has been suggested that in evolution an
ancestral Cys protease may have preceded the CHL Ser
proteases(58) . In this respect, it is very interesting that
the arteriviruses share a set of conserved replicase domains and a
number of other properties with coronaviruses, a virus family that
encodes a 3C-like Cys enzyme rather than the Ser protease found in
arteriviruses (13, 14, 17, 18, 59, 60) .
Both proteases occupy the same relative position in the replicase
polyprotein. Recently, a number of active site predictions for
coronavirus 3C-like proteases have been experimentally
confirmed(61, 62, 63) , and these proteases
were shown to be able to cleave at Gln (Ser/Ala/Gly) dipeptides.
In the light of these and other similarities, it seems likely that the
arterivirus and coronavirus 3C-like proteases are related by common
ancestry, even though no significant primary structure similarity can
be detected (data not shown). Therefore, the Ser and Cys branches of
the CHL protease family may have a common root in RNA virus evolution.
It was remarkable that, compared to the Nsp4/5 cleavage, processing of the Nsp5/6 site was much more sensitive to replacements in the SP (Asp-1129 and Thr-1179; Fig. 2A and 3). This indicates that additional factors may influence polyprotein processing, like the possibility of the SP to act in cis or in trans, the structural properties of the cleavage sites, and the association of protease and/or substrates with cofactors. EAV-infected cells contain both Nsp4 and an SP-containing Nsp34 processing intermediate(20) . Previously, we have shown that Nsp2 and Nsp3 remain associated after processing of the Nsp2/3 site(20) , indicating that a complex of Nsp2 and Nsp34 will be generated. Since a regulatory role for nonstructural polyprotein processing is emerging for many other viral systems, the Nsp2-34 complex and the fully cleaved Nsp4 may be proteases with different roles in the arterivirus life cycle.