©1996 by The American Society for Biochemistry and Molecular Biology, Inc.
The Arterivirus Nsp4 Protease Is the Prototype of a Novel Group of Chymotrypsin-like Enzymes, the 3C-like Serine Proteases (*)

(Received for publication, October 23, 1995)

Eric J. Snijder (1)(§) Alfred L. M. Wassenaar (1) Leonie C. van Dinten (1) Willy J. M. Spaan (1) Alexander E. Gorbalenya (2)(¶)

From the  (1)Department of Virology, Institute of Medical Microbiology, Leiden University, The Netherlands and the (2)M. P. Chumakov Institute of Poliomyelitis and Viral Encephalitides, Russian Academy of Medical Sciences, 142782 Moscow Region, Russia

ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

The replicase of equine arteritis virus, an arterivirus, is processed by at least three viral proteases. Comparative sequence analysis suggested that nonstructural protein 4 (Nsp4) is a serine protease (SP) that shares properties with chymotrypsin-like enzymes belonging to two different groups. The SP was predicted to utilize the canonical His-Asp-Ser catalytic triad found in classical chymotrypsin-like proteases. On the other hand, its putative substrate-binding region contains Thr and His residues, which are conserved in viral 3C-like cysteine proteases and determine their specificity for (Gln/Glu)(Gly/Ala/Ser) cleavage sites. The replacement of the members of the predicted catalytic triad (His-1103, Asp-1129, and Ser-1184) confirmed their indispensability. The putative role of Thr-1179 and His-1199 in substrate recognition was also supported by the results of mutagenesis. A set of conserved candidate cleavage sites, strikingly similar to junctions cleaved by 3C-like cysteine proteases, was identified. These were tested by mutagenesis and expression of truncated replicase proteins. The results support a replicase processing model in which the SP cleaves multiple Glu(Gly/Ser/Ala) sites. Collectively, our data characterize the arterivirus SP as a representative of a novel group of chymotrypsin-like enzymes, the 3C-like serine proteases.


INTRODUCTION

The proteolytic processing of viral precursor proteins, composed of structural and/or replicative subunits, is a crucial step in the life cycle of the majority of positive-stranded RNA viruses(1, 2, 3) . The processing of replicase protein precursors is almost exclusively conducted by specific virus-encoded proteases, most of which are assumed to be distantly related to papain-like or chymotrypsin-like (CHL) (^1)cellular proteases(4) . The only common property of viral and cellular CHL enzymes appears to be the distinctive double beta-barrel fold. Viral CHL proteases do not contain the set of conserved disulfide bonds found in their cellular counterparts, which may be due to the fact that they have to function in the reducing environment of the cytosol. Only a few established viral CHL enzymes, like the alphavirus capsid protease and the flavivirus NS3 proteases, contain the His-Asp-Ser catalytic triad that is found in the cellular world(2, 5, 6, 7, 8) . In a large number of viral CHL proteases, the picornavirus 3C protease being the prototype, Cys replaces Ser as the catalytic nucleophile. In addition to this substitution, a number of these ``3C-like'' enzymes utilize Glu instead of the active site Asp(5, 9, 10, 11) .

Unlike most of their cellular counterparts, viral CHL enzymes possess high substrate specificity, which is sometimes conserved among related viruses. The sites cleaved by 3C-like Cys proteases, for example, are generally similar. They usually contain Gln or Glu at the P1 position and a small amino acid residue (Gly, Ala, or Ser) at the P1` position (3, 12) . Two domains in particular are thought to determine this conserved substrate specificity. These regions are separated in the primary structure of the enzyme (Fig. 1B) but are spatially juxtaposed. They form part of a wall and the bottom of the substrate-binding pocket (10, 11) and contain a highly conserved His and Thr residue, respectively(5, 9) . The importance of these residues for substrate recognition is underlined by the fact that other amino acids are found at these positions in the NS3 Ser proteases from the family Flaviviridae (Fig. 1B). The NS3 proteases of the three different genera of this family (flaviviruses, pestiviruses, and hepatitis C virus) cleave at sites that are genus-specific, and the P1 residues of these sites do not resemble those recognized by 3C-like Cys proteases.


Figure 1: A, preliminary processing scheme of the EAV replicase ORF1a protein(20) . The three EAV protease domains(13, 19, 22) , corresponding cleavage sites, and the estimated sizes (SDS-PAGE) of the previously identified cleavage products Nsp1 to Nsp6 are depicted. B, sequence alignment of the most important regions of viral and cellular CHL enzymes. The upper block of sequences contains a representative set of 3C-like Cys proteases. The middle block is comprised of (putative) 3C-like Ser proteases and includes the Nsp4 SP sequences of three arteriviruses, EAV, lactate dehydrogenase-elevating virus, and porcine reproductive and respiratory syndrome virus. The proteases in this block were selected from sequence data bases in the course of comprehensive searches that were initiated with three arterivirus proteases and then extended to other proteases containing the characteristic Thr and His residues in their substrate-binding pocket (see text). The lower block contains chymotrypsin, the prototype enzyme, and a set of related (not 3C-like) viral Ser proteases. The numbers on either side of the sequences indicate the distance to the (predicted) N- and C-terminal residue of the protease or protease-containing cleavage product. The EMBL/GenBank (GB) or SwissProt (SP) data base accession numbers of the sequences used in this figure are indicated. Active site residues are highlighted in black (reverse type). The conserved Thr and His residues discussed in this paper are shown in bold and indicated with @. Other important residues from the putative substrate-binding region are indicated with #. In the alignment of the BLIC BLASE protease, a stretch of four amino acids (Arg-Thr-Asn-Cys) has been deleted between two Ser residues shown in lower case. HAV, hepatitis A virus; HRV14, human rhinovirus 14; TEV, tobacco etch virus; RHDV, rabbit hemeorrhagic disease virus; CPMV, cowpea mosaic virus; TBRV, tomato black ring virus; H-AstV, human astrovirus; PEMV, pea enation mosaic virus; BYDV, barley yellow dwarf virus; BWYV, beet Western yellows virus; CAYV, cucurbit aphid-transmitted yellow virus; PLRV, potato leaf roll virus; SBMV, Southern bean mosaic virus; CMV, cocksfoot mottle virus; RYMV, rice yellow mottle virus; MBV, mushroom bacilliform virus (Australian isolate); SAUR ETA, ETB, and V8, Staphylococcus aureus exotoxins A and B and V8 protease, respectively; EFAE, Enterococcus faecalis; sprE, serine proteinase E; BSUB, Bacillus subtilis; MPR, metalloprotease; BLIC, Bacillus licheniformis; BLASE, B. licheniformis protease; SGRI, S. griseus; PrE, protease E; SFRA, Streptomyces fradiae; SP1, serine protease 1; HeCV, hepatitis C virus; YFV, yellow fever virus; SNBV, Sindbis virus; BOVI CHT, bovine chymotrypsin.



Equine arteritis virus (EAV) (13) is the prototype and best studied member of the arteriviruses, a recently recognized group of positive-stranded RNA viruses with a polycistronic genome of between 12 and 16 kilobases(13, 14, 15, 16) . The organization and expression of the arterivirus genome are strikingly similar to those of coronaviruses, and evidence for the common ancestry of the replicases of both virus families has been reported previously(13, 14, 17, 18) . The arterivirus replicase gene is comprised of two open reading frames (ORFs), ORF1a and ORF1b, which are connected by a ribosomal frameshift site(13) . Translation of the EAV genome yields an ORF1a polyprotein and a C-terminally extended ORF1ab protein consisting of 1727 and 3175 amino acids, respectively. These products are proteolytically processed (19, 20, 21, 22) , and we have recently published a preliminary processing scheme for the ORF1a polyprotein (Fig. 1A, (20) ). In infected cells, this 187-kDa precursor was found to be cleaved at least five times, resulting in the generation of nonstructural proteins (Nsp) 1-6. We have reported the characterization of two ORF1a-encoded Cys autoproteases. A papain-like protease in Nsp1 cleaves the Nsp1/2 junction(19) . Likewise, Nsp2 is generated by autoproteolysis, and an unusual Cys protease was recently identified in the conserved N-terminal domain of Nsp2(22) . The presence of a third EAV protease domain could be predicted from comparative sequence analysis(13, 14) . The arterivirus Nsp4 region contains sequence motifs characteristic of cellular and viral CHL Ser and Cys proteases (Fig. 1B). For EAV, a putative catalytic triad consisting of His-1103, Asp-1129, and Ser-1184 was identified (Fig. 1B, (13) ). These residues were later shown to be conserved in the sequences of two other arteriviruses, porcine reproductive and respiratory syndrome virus (15) and lactate dehydrogenase-elevating virus(16) .

Interestingly, the predicted arterivirus Ser protease (SP) also contains sequence elements typical of the substrate-binding pocket of 3C-like Cys proteases (Fig. 1B) and was accordingly postulated to be a 3C-like Ser protease(13, 14) . Our present analysis of the EAV Nsp4 SP constitutes the first experimental characterization of such a protease. We show that the arterivirus SP indeed combines the catalytic system of a CHL Ser protease with the substrate specificity of a 3C-like Cys protease and should therefore be considered as the prototype of a novel group of proteolytic enzymes.


MATERIALS AND METHODS

Arterivirus Sequences and Comparative Sequence Analysis

The genomic sequences of the Bucyrus strain of EAV(13) , the Lelystad strain of porcine reproductive and respiratory syndrome virus(15) , and the lactate dehydrogenase-elevating virus C strain (14) were extracted from the EMBL/GenBank data base (accession numbers X53549, M96262, L13298, respectively). The nucleotide (nt) and amino acid sequence numbers in this publication refer to those previously published cDNA and protein sequences. Restriction sites are indicated by the first nt of their recognition sequence. Data base searches were performed with the help of the Blitz program (23) and the family of Blast programs(24) . Multiple sequence alignments were produced using the OPTAL program (25) with the PAM250 scoring table (26) and the CLUSTALV program (27) with the PAM250 or different Blossum tables(28) . Tentative borders of conserved domains to be aligned with the OPTAL and CLUSTALV programs were determined through pairwise sequence comparisons in the dotplot fashion, utilizing the high resolution DotHelix program (29) in conjunction with the PAM 250 or Blossum 62 tables.

Construction of ORF1a Expression Vector pL1a

Recombinant plasmids were constructed using standard techniques and procedures (30) . The EAV ORF1a sequence (19) was cloned in pBS (Stratagene) downstream of the T7 promoter and a copy of the encephalomyocarditis virus internal ribosomal entry site, used to enhance translation(31) . The T7 terminator sequence was inserted downstream of the ORF1a sequence. To create the wild-type expression vector pL1a, ORF1a was modified to facilitate site-directed and deletion mutagenesis. A number of translationally silent mutations was introduced to create a set of unique restriction sites (see below): T C, T A, T G, T C, G A, A T, G C, C A, and C A. To avoid mutagenesis artifacts, the entire ORF1a region of pL1a was sequenced and compared with the published ORF1a sequence. Only one difference was detected, which was concluded to be an error in the previously published genomic sequence of EAV (13) ; the sequence at nt positions 4677-4678 should read GC instead of CG, and, as a result, ORF1a residue 1485 is Lys instead of Gly.

Site-directed Mutagenesis of pL1a

The ORF1a sequence of pL1a contained the following unique restriction sites: NcoI (nt 224, containing the ORF1a translation initiation codon), MluI (nt 590), BstEII (nt 1155), HindIII (nt 1501), SphI (nt 1975), ClaI (nt 2296), SalI (nt 2608), NheI (nt 2878), NotI (nt 3316), SpeI (nt 3763), EcoRV (nt 4263), XbaI (nt 4762), BglII (nt 5217), and XhoI (nt 5517, downstream of the ORF1a termination codon at nt 5407-5409). These sites were used to generate shuttle plasmids, each of which contained an EAV-specific insert of 300-500 base pairs flanked by two of the unique restriction sites. Mutations were introduced into appropriate shuttle plasmids using oligonucleotide-directed mutagenesis on single-stranded DNA templates (32) or using polymerase chain reaction mutagenesis(33) . After complete sequence analysis, mutant EAV restriction fragments were transferred back to pL1a. The nucleotide substitutions and corresponding amino acid changes introduced in this manner are listed in Table 1.



Construction of Vectors Expressing Truncated ORF1a Proteins

A number of the pL1a-derived constructs encoding putative cleavage site mutations (Table 1) was used to generate a set of T7 expression plasmids from which N- and C-terminally truncated ORF1a proteins (``synthetic Nsps'') could be expressed (Fig. 5A). In pL1aG831P and pL1aE1064P, the mutation introduced a SmaI restriction site into the plasmid. In the case of pL1aE1268P, a StuI site was generated (Table 1). To obtain reading frames starting at the putative Nsp2/3 and Nsp3/4 cleavage sites, pL1aG831P and pL1aE1064P were digested with NcoI (this site contained the ORF1a initiation codon) and SmaI. Filling of the NcoI sticky end and religation yielded vectors expressing a protein in which Met-1 was followed by the putative P1` residue of the cleavage site (pL(832-1268) and pL(1065-1268)). To generate C-terminally truncated ORF1a proteins, we again utilized the novel SmaI and StuI restriction sites. pL1a mutants were digested with either SmaI or StuI, and an NheI linker (5`-CTAGCTAGCTAG-3`) was inserted, which provided termination codons in all three reading frames. To exclude expression of sequences downstream of the new translation termination codons, the downstream part of ORF1a was deleted (up to a StuI site at nt 5258). The C terminus of the pL(1-831) and pL(1-1064) proteins consisted of a Pro residue (the mutated P1 residue of the putative cleavage site) followed by the sequence Leu-Ala-Ser, which was specified by the NheI linker. In the pL(832-1268) and pL(1065-1268) products, the same Leu-Ala-Ser extension was present, but the Pro-1268 residue was lacking.


Figure 5: SDS-PAGE of native and synthetic Nsp2, Nsp3, Nsp4, and Nsp34. A, scheme of the full-length ORF1a protein and the set of four truncated expression products. The location of putative cleavage sites and the calculated sizes of the interlying sequences are indicated. B, native and synthetic ORF1a cleavage products were immunoprecipitated from infected and transfected cell lysates, respectively. They were electrophoresed, side by side, in the same gel. Note that the vaccinia virus infection in the expression system induced a set of background bands (especially in the 30-50-kDa region), which were absent from the lanes derived from EAV-infected cells. For the left panel, the immunoprecipitation was carried out with the Nsp2-antiserum, which coprecipitates Nsp3(20) . For the right panel, a mixture of two Nsp4-antisera was used.



In Vivo Expression, Protein Labeling, and Immunoprecipitation

The infection of RK-13 cells with EAV and the labeling (from 4 to 8 h post-infection) of nonstructural proteins has been described(20) . EAV ORF1a cDNA constructs were transiently expressed in RK-13 cells using the recombinant vaccinia virus/T7 system (34) as described previously(20) . Proteins synthesized in transfected cell cultures were labeled from 4-7 or 5-8 h post-vaccinia virus infection, using methionine-free medium and 100-200 µCi of [S]methionine per ml. Our EAV ORF1a protein-specific antisera and the methods for cell lysis and immunoprecipitation have been described previously(20) . SDS-PAGE was carried out essentially according to Laemmli (35) and was followed by fluorography(19) .


RESULTS

Identification of the Proteolytic Activity and Catalytic Triad of the EAV SP

To test the proteolytic function and predicted active site residues for the EAV Nsp4 SP and to establish its role in ORF1a protein processing, the Nsp4 domain was subjected to site-directed mutagenesis. The members of the putative catalytic triad, His-1103, Asp-1129, and Ser-1184 (Fig. 1), as well as an alternative Asp residue at position 1117, were replaced by several other residues (see also Table 1). Full-length ORF1a proteins carrying these mutations were transiently expressed using the recombinant vaccinia virus/T7 system(34) , and their processing was analyzed by immunoprecipitation of radiolabeled expression products (Fig. 2). Processing of the wild-type ORF1a protein yielded the usual cleavage products(20) . Nsp3456, Nsp56, and Nsp5 could be immunoprecipitated by the alpha5 antiserum, which is directed against the C-terminal region of the ORF1a protein (Fig. 2A). Due to a previously described interaction between Nsp2 and Nsp3-containing proteins(20) , this serum also precipitated substantial amounts of Nsp2. A combination of two anti-Nsp4 sera (alpha4) precipitated Nsp34 and Nsp4 (Fig. 2B) and again co-immunoprecipitated Nsp2 (data not shown).


Figure 2: Mutagenesis of the predicted active site residues of the EAV Nsp4 SP. A, schematic representation of the processing of the Nsp3456 region of the wild-type ORF1a protein (expressed from construct pL1a). The residues of the predicted catalytic triad are depicted in bold. The autoradiograph shows the results obtained with pL1a (wt) and a set of mutant pL1a-derived constructs encoding proteins with a single amino acid substitution. The constructs were expressed using the vaccinia virus/T7 system(34) , and expression products were immunoprecipitated with an antiserum directed against the C-terminal 220 amino acids of the ORF1a protein (alpha5 serum, (20) ). The result from a control transfection without DNA (mock) is also shown. B, immunoprecipitation analysis of the same cell lysates using two Nsp4-specific peptide antisera (alpha4M and alpha4C, (20) ).



The replacement of His-1103 (by Gly or Arg) and the substitution of Ser-1184 (by Cys, Phe, Ile, or Tyr) completely inactivated Nsp3456 processing. Substitution of Asp-1117 (by either Asn or Thr) had no effect. When Asp-1129 was replaced by Lys or Val, the Nsp3456 precursor was not cleaved at all. Interestingly, the mutant carrying the conservative Asp-1129 Glu substitution was able to cleave the Nsp4/5 site with wild-type efficiency but did not process the Nsp3/4 and Nsp5/6 junctions at a detectable level (Fig. 2; see also ``Discussion''). These results formed the first experimental proof for a proteolytic function of EAV Nsp4 in the processing of Nsp3456 into the Nsp34 and Nsp56 intermediates and, subsequently, into the previously described end products Nsp3 to Nsp6(20) . The mutagenesis results were in agreement with the previously published active site predictions(13, 14, 15) .

Mutagenesis of Two Key Residues in the Putative Substrate-Binding Region of the EAV SP

The 3C-like Cys proteases are characterized by their unique substrate specificity, and sequence comparison suggested that this specificity could be shared by the arterivirus SP (see Introduction). Residues Thr-1179 and His-1198 in the putative substrate-binding pocket of the EAV SP (Fig. 1B) were predicted to be major determinants of substrate binding. Both residues were replaced by a number of other amino acids that have similar physicochemical properties and/or are found in the same position of the primary structure of other 3C-like enzymes (Fig. 1B). The proteolytic activity of these mutant proteases was monitored by analyzing processing of the Nsp4/5 and Nsp5/6 junctions (Fig. 3).


Figure 3: Mutagenesis of the most conserved residues in the predicted substrate-binding pocket of the EAV Nsp4 SP. A number of substitutions were introduced at the position of Thr-1179 and His-1198. Expression and immunoprecipitation using the alpha5 antiserum were carried out as described for Fig. 2A.



The replacement of His-1198 by Leu, Arg, or Tyr completely abolished processing of the two sites mentioned above. This was also the case when Thr-1179 was replaced by Asp. The Thr-1179 Ser and Thr-1179 Gly mutants cleaved the Nsp4/5 site with reduced efficiency (approximately 80 and 10% of wild-type efficiency, respectively) but were unable to cleave the Nsp5/6 site at all. The protease containing a Thr-1179 Asn mutation retained wild-type activity toward the Nsp4/5 junction but cleaved the Nsp5/6 site with increased (approximately double) efficiency. These results demonstrated that Nsp4 SP activity is highly sensitive to even subtle replacements at the positions of Thr-1179 and His-1198, an observation that is fully compatible with the predicted role of these residues in substrate recognition.

Identification and Mutagenesis of Putative Cleavage Sites for the SP

With one exception, the termini of the ORF1a protein cleavage products (Fig. 1A) were unknown. Only the Nsp1/2 cleavage site had been determined by direct protein sequence analysis of the Nsp2 N terminus(19) . Since neither the Nsp2 cysteine protease nor the Nsp4 SP are active in vitro or in Escherichia coli(19, 22) , the direct sequence analysis of the N termini of Nsp3 to Nsp6 would require their purification from infected or transfected cells. In both cases, the amount of cleavage products is low, and we have chosen an alternative, albeit indirect approach to obtain information about the SP cleavage sites.

On the basis of cleavage site preferences of other 3C-like proteases, Godeny et al.(14) had previously predicted three SP cleavage sites in the C-terminal half of the arterivirus ORF1a protein. These proposed sites (downstream of residues 1064, 1268, and 1430 in EAV; Fig. 4A) are conserved in the three arterivirus sequences known to date. They contain a Glu residue at the P1 position and a Gly or Ser at the P1` position (see also Fig. 6). However, the estimated sizes (from SDS-PAGE(20) ) of especially Nsp4 and Nsp5 were difficult to reconcile with the predicted cleavages downstream of amino acids 1268 and 1430 (Fig. 4A). On the other hand, a conserved candidate Nsp5/6 cleavage site could be identified: a Glu-Gly dipeptide at amino acids 1677-1678 (see also Fig. 6). Therefore, we tested the effect of a Glu Pro substitution at the conserved P1 position of each of the four candidate cleavage sites. This mutation was expected to interfere strongly with processing by a 3C-like protease.


Figure 4: Mutagenesis of putative cleavage sites for the EAV Nsp4 SP. A, schematic representation of ORF1a protein processing. The previously observed sizes of the six cleavage products (SDS-PAGE estimates, (20) ) are shown at the top of the figure. The putative Nsp2/3 cleavage site and four candidate SP cleavage sites within Nsp3456, all derived from comparative sequence analysis, are indicated, and the amino acid number of the P1 residue of each site is shown. B, expression of ORF1a protein mutants carrying a Pro substitution at the P1 position of the five candidate cleavage sites. See Fig. 2for experimental procedures. Immunoprecipitation was carried out using the alpha5 antiserum and a peptide antiserum directed against Nsp2 (alpha2, (20) ). C, immunoprecipitation analysis of a subset of the samples using two Nsp4 antisera.




Figure 6: Alignment of (putative) cleavage sites for the arterivirus Nsp4 SP in the ORF1a polyprotein. Two possible additional cleavage sites located within Nsp5 have also been included. The P1 and P1` residues are highlighted in black (reverse type).



Three of the four mutations indeed completely inhibited one of the previously described cleavages (Fig. 4, B and C). Specifically, the Glu-1064 Pro mutant did not produce Nsp4 but did generate Nsp34, Nsp56, and Nsp5, indicating that only Nsp3/4 cleavage was blocked. After expression of the Glu-1268 Pro mutant, Nsp4, Nsp56, and Nsp5 were not detected, suggesting that the Nsp4/5 junction was not processed. The Glu-1677 Pro substitution resulted in the generation of Nsp4 and Nsp56, but the latter was not processed at the Nsp5/6 site as judged by the absence of fully cleaved Nsp5. Thus, the processing of the Nsp3/4, Nsp4/5, and Nsp5/6 junctions was completely inhibited by substitution of conserved Glu residues at positions 1064, 1268, and 1677, respectively, suggesting that these residues were indeed located at (or close to) these cleavage sites. The Glu-1430 Pro substitution at the fourth candidate cleavage site did not influence the generation of Nsp4, Nsp56, or Nsp5 (Fig. 4B).

SDS-PAGE of Native and Synthetic Nsp4 and Nsp34

In EAV-infected cells, the rapid cleavage of the Nsp1/2, Nsp2/3, and Nsp4/5 sites leads to the generation of an SP-containing Nsp34 processing intermediate. This product is only slowly cleaved at the Nsp3/4 junction to yield Nsp3 and Nsp4(20) . Using the putative Nsp3/4 and Nsp4/5 cleavage sites described above, we could calculate a size of 21 kDa for mature Nsp4 (from Gly-1065 to Glu-1268). However, during SDS-PAGE, Nsp4 migrated at a position corresponding to a molecular mass of 31 kDa(20) . To test whether this difference was due to aberrant migration during SDS-PAGE, the behavior in gel of native Nsp4 and Nsp34 was compared with that of expression products derived from truncated ORF1a constructs (Fig. 5). To obtain the putative N terminus of Nsp34, we first determined and tested the most probable Nsp2/3 cleavage site.

Previously, the Nsp2/3 junction had been estimated to be located close to amino acid 825 in the EAV ORF1a protein(22) . Residues 834-852 are extremely hydrophobic and, therefore, are unlikely to be accessible to the Nsp2 autoprotease(20) . This protease is most similar to viral papain-like proteases(22) , which often cleave between two small amino acid residues (3) Taken together, these considerations pointed toward a Gly-Gly dipeptide at position 831-832, which is conserved in other arteriviruses, as the most likely EAV Nsp2/3 cleavage site. To test this hypothesis, a Gly Pro substitution was introduced at the putative P1 position of this site (residue 831). This mutation indeed completely inhibited the generation of Nsp2 (Fig. 4B), but surprisingly all downstream cleavages were also abolished, suggesting that cleavage of the Nsp2/3 junction is a prerequisite for processing of the downstream Nsp3456 region.

Using the putative Nsp2/3, Nsp3/4, and Nsp4/5 cleavage sites (downstream of residues 831, 1064, and 1268, respectively), expression vectors were created that encoded ``synthetic Nsps'' with N and C termini that closely resembled those of native Nsps (Fig. 5A). These expression products were immunoprecipitated and compared in SDS-PAGE with proteins from infected cell lysates. Samples were run side by side in the same gel (Fig. 5B).

To produce synthetic Nsp2 and Nsp3, we employed the autoproteolytic properties of Nsp1 and Nsp2. As a result, the products expressed from pL(1-831) and pL(1-1064) contained natural N termini and only differed from the native Nsp2 and Nsp3 in having a three-residue C-terminal extension. Since an anti-Nsp3 serum is not available, Nsp2 and Nsp3 were precipitated and coprecipitated(20) , respectively, using the anti-Nsp2 serum. The expression products comigrated with the native Nsp2 and Nsp3 derived from infected cells (Fig. 5B, alpha2 panel), thereby suggesting that the putative Nsp2/3 and Nsp3/4 borders are correct.

Subsequently, synthetic Nsp34 and Nsp4 were expressed from plasmids pL(832-1268) and pL(1065-1268), respectively. Surprisingly, the synthetic Nsp4 of 21 kDa and the native Nsp4 molecule comigrated perfectly at a rate corresponding to a molecular mass of 31 kDa (Fig. 5B, alpha4 panel). This result indicated that the putative N- and C-terminal boundaries of Nsp4, which had been inferred from sequence comparison and mutagenesis, were correct and that the migration of Nsp4 during SDS-PAGE is extremely aberrant. This conclusion was also supported by the fact that the synthetic Nsp34 comigrated with its native equivalent and cleaved itself, apparently at the correct Nsp3/4 junction since a product comigrating with Nsp4 was observed (Fig. 5B, alpha4 panel).


DISCUSSION

The 3C-like Proteases and Their Cleavage Sites

A large number of RNA viral polyproteins is processed by internal proteases that belong to the CHL family of proteolytic enzymes (for a recent review, see (3) ). Viral CHL proteases are remarkably diverse and several subgroups have been recognized, like the flavivirus NS3 Ser proteases and the picornavirus 3C and related 3C-like enzymes. For the latter group, a unique set of conserved properties has been documented, including (i) the use of Cys instead of Ser as the principal residue in the catalytic triad, (ii) the presence of two regions containing a conserved Thr and His residue, respectively, which are part of the substrate-binding pocket, and (iii) specificity for cleavage sites that preferably contain the dipeptide Gln (Glu)Gly (Ala/Ser).

Compared to the viral CHL proteases mentioned above, the arterivirus Nsp4 SP is relatively unique. On the one hand, it employs the canonical triad of catalytic residues found in most cellular and some viral CHL enzymes (His-1103/Asp-1129/Ser-1184). The indispensability of these three residues for SP activity was confirmed by extensive site-directed mutagenesis (Fig. 2). On the other hand, the sequence analysis and mutagenesis of residues from the presumed substrate-binding pocket (Fig. 3) revealed that the arterivirus SP also possesses properties found in 3C-like Cys proteases. These were underlined by the identification of three putative SP cleavage sites that fit the 3C-like profile ( Fig. 4and 6). The fact that a Glu substitution was tolerated (to a certain extent) at the position of catalytic Asp-1129 (Fig. 2) can be interpreted as an additional similarity between the arterivirus SP and 3C-like Cys proteases. A fraction of the latter group employs Glu instead of Asp, and for some 3C-like enzymes the interchangeability of Glu and Asp has been documented(5, 36, 37, 38, 39, 40, 41, 42) .

Recent x-ray data for the human rhinovirus-14 3C protease revealed that Thr-141 and His-160, the counterparts of EAV Thr-1179 and His-1198, do indeed reside in the substrate-binding pocket and are likely to be responsible for the specificity of this enzyme for Gln (Glu) Gly (Ala/Thr) cleavage sites(11) . Thr-1179 and His-1198 of the EAV Nsp4 SP (Fig. 1B) were probed by site-directed mutagenesis (Fig. 3). Their replacement showed a pronounced, albeit variable effect depending on the position mutated and the cleavage site analyzed. Three substitutions at the position of His-1198 all completely abrogated processing of both the Nsp4/5 and Nsp5/6 sites. This is compatible with the drastic effect of mutagenesis of the equivalent His residue in 3C-like Cys proteases(43, 44, 45, 46, 47) , although a certain tolerance for a His Leu substitution was reported for the poliovirus 3C protease(48) . Processing of the Nsp5/6 junction was affected by all seven mutations tested at the positions of Thr-1179 and His-1198. Whereas six of these replacements fully disabled cleavage, it was remarkable that the Thr-1179 Asn substitution enhanced processing of this site. Three out of four mutants at the Thr-1179 position partially (Gly and Ser) or fully (Asn) retained their Nsp4/5 cleavage efficiency. Only one comparable mutation, a Thr Ser substitution that partially inactivated the human rhinovirus-14 3C Cys protease, has been described previously(42) . Thus, our data considerably extend the information on the role of this residue in 3C-like proteases and suggest a relationship between the size of the amino acid introduced and substrate recognition, provided that the replacing residue does not carry an acidic side chain.

The EAV SP cleavage sites could not yet be determined by direct sequence analysis of cleavage products. However, three independent, albeit indirect methods were employed, and all results point toward the same conclusion. First, conserved Nsp3/4, Nsp4/5, and Nsp5/6 cleavage sites were identified by sequence comparison (Fig. 6). Second, mutagenesis of the P1 Glu residues completely abolished processing of these junctions (Fig. 4). Third, the results of the comigration assay (Fig. 5) supported the tentative identification of the Nsp3/4 and Nsp4/5 boundaries and revealed that the true molecular mass of the arterivirus SP is 21 kDa instead of the 31 kDa observed upon SDS-PAGE. Taken separately, none of these results unequivocally proves the 3C-like substrate specificity of the arterivirus SP. Collectively, however, these data justify the classification of the arterivirus SP as a 3C-like Ser protease, especially since they are supplemented by the results from comparative sequence analysis and mutagenesis of the substrate-binding region (Fig. 3).

Evolution of 3C-like Proteases

Although the work described in this paper is the first experimental analysis of a 3C-like Ser protease, sequence analyses have predicted 3C-like Ser proteases for a number of other virus groups (Fig. 1B): astroviruses, luteoviruses, sobemoviruses, and the pea enation mosaic virus group (49, 50, 51, 52) . The fact that these viruses clearly belong to a different virus superfamily (4, 53) (^2)implies that 3C-like Ser proteases have been adapted to mediate the processing of (at least) two very different types of replicase polyproteins.

The Thr and His residues implicated in the substrate recognition of viral 3C-like proteases are also conserved in a group of CHL proteases of eubacterial origin (in two of these proteases the Thr residue is replaced by the similar Ser residue; Fig. 1B). The x-ray analysis of the PrE protease of Streptomyces griseus (Fig. 1B) directly implicated the counterparts of EAV Thr-1179 and His-1198 in the determination of substrate specificity(54) . This enzyme, and others that were characterized in terms of their cleavage site preference, cleaves most efficiently after a Glu residue(52, 55, 56, 57) . Although these enzymes can be distinguished from their viral relatives by having no detectable P1` preference and by cleaving after Gln with low efficiency, their other features still support the idea that a cellular branch of 3C-like proteases may exist.

The evolutionary relationships between the various 3C-like Ser proteases and other CHL Ser and Cys proteases are yet to be inferred. On the basis of theoretical considerations, it has been suggested that in evolution an ancestral Cys protease may have preceded the CHL Ser proteases(58) . In this respect, it is very interesting that the arteriviruses share a set of conserved replicase domains and a number of other properties with coronaviruses, a virus family that encodes a 3C-like Cys enzyme rather than the Ser protease found in arteriviruses (13, 14, 17, 18, 59, 60) . Both proteases occupy the same relative position in the replicase polyprotein. Recently, a number of active site predictions for coronavirus 3C-like proteases have been experimentally confirmed(61, 62, 63) , and these proteases were shown to be able to cleave at Gln (Ser/Ala/Gly) dipeptides. In the light of these and other similarities, it seems likely that the arterivirus and coronavirus 3C-like proteases are related by common ancestry, even though no significant primary structure similarity can be detected (data not shown). Therefore, the Ser and Cys branches of the CHL protease family may have a common root in RNA virus evolution.

Arterivirus Replicase Processing

The results described in this paper will lead to a revision of the previously published EAV ORF1a protein processing scheme (Fig. 1A). Mainly due to the correction of the Nsp4 size (from 31 to 21 kDa), it has become clear that the (calculated) size of the region downstream of the Nsp4/5 junction is 51 kDa. Preliminary data indicate that this product is not identical to the 44-kDa C-terminal product, which has until now been indicated as Nsp56. Our recent data suggest that Nsp4 and this 44-kDa protein are separated by a hydrophobic cleavage product of approximately 7 kDa, which has not yet been detected due to the lack of an appropriate antiserum. Furthermore, sequence comparisons (Fig. 6) strongly suggested that Nsp5 contains two additional cleavage sites for the SP, with Glu-1430 and Glu-1452 being their predicted P1 residues. Thus, the Nsp5 region in the current processing scheme may be subject to three additional cleavages.

It was remarkable that, compared to the Nsp4/5 cleavage, processing of the Nsp5/6 site was much more sensitive to replacements in the SP (Asp-1129 and Thr-1179; Fig. 2A and 3). This indicates that additional factors may influence polyprotein processing, like the possibility of the SP to act in cis or in trans, the structural properties of the cleavage sites, and the association of protease and/or substrates with cofactors. EAV-infected cells contain both Nsp4 and an SP-containing Nsp34 processing intermediate(20) . Previously, we have shown that Nsp2 and Nsp3 remain associated after processing of the Nsp2/3 site(20) , indicating that a complex of Nsp2 and Nsp34 will be generated. Since a regulatory role for nonstructural polyprotein processing is emerging for many other viral systems, the Nsp2-34 complex and the fully cleaved Nsp4 may be proteases with different roles in the arterivirus life cycle.


FOOTNOTES

*
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§
To whom correspondence should be addressed: Dept. of Virology, Postbus 320, 2300 AH Leiden, The Netherlands. Tel.: 31-71-5261657; Fax: 31-71-5266761; E.J.Snijder{at}Microbiology.MedFac.LeidenUniv.nl.

Supported in part by a grant from the Netherlands Organization for Scientific Research.

(^1)
The abbreviations used are: CHL, chymotrypsin-like; EAV, equine arteritis virus; Nsp, nonstructural protein; nt, nucleotide; ORF, open reading frame; SP, serine protease; PAGE, polyacrylamide gel electrophoresis.

(^2)
A. E. Gorbalenya, unpublished data.


ACKNOWLEDGEMENTS

-We thank Johan den Boon for experimental support, helpful discussions, and critical reading of the manuscript. We also thank René Rijnbrand for advice on polymerase chain reaction mutagenesis.


REFERENCES

  1. Kräusslich, H. G., and Wimmer, E. (1988) Annu. Rev. Biochem. 57, 701-754 [CrossRef][Medline] [Order article via Infotrieve]
  2. Strauss, J. H. (ed) (1990) Seminars in Virology , Vol. 1, pp. 307-384, W. B. Saunders, Philadelphia
  3. Dougherty, W. G., and Semler, B. L. (1993) Microbiol. Rev. 57, 781-822 [Abstract]
  4. Gorbalenya, A. E., and Koonin, E. V. (1993) Sov. Sci. Rev. D. Physicochem. Biol. 11, 1-84
  5. Gorbalenya, A. E., Donchenko, A. P., Blinov, V. M., and Koonin, E. V. (1989) FEBS Lett. 243, 103-114 [CrossRef][Medline] [Order article via Infotrieve]
  6. Bazan, J. F., and Fletterick, R. J. (1989) Virology 171, 637-639 [Medline] [Order article via Infotrieve]
  7. Chambers, T. J., Hahn, C. S., Galler, R., and Rice, C. M. (1990) Annu. Rev. Microbiol. 44, 649-688 [CrossRef][Medline] [Order article via Infotrieve]
  8. Choi, H.-K., Tong, L., Minor, W., Dumas, P., Boege, U., Rossmann, M. G., and Wengler, G. (1991) Nature 354, 37-43 [CrossRef][Medline] [Order article via Infotrieve]
  9. Bazan, J. F., and Fletterick, R. J. (1988) Proc. Natl. Acad. Sci. U. S. A. 85, 7872-7876 [Abstract]
  10. Allaire, M., Chernaia, M. M., Malcolm, B. A., and James, M. N. G. (1994) Nature 369, 72-76 [CrossRef][Medline] [Order article via Infotrieve]
  11. Matthews, D. A., Smith, W. W., Ferre, R. A., Condon, B., Budahazi, G., Sisson, W., Villafranca, J. E., Janson, C. A., McElroy, H. E., Gribskov, C. L., and Worland, S. (1994) Cell 77, 761-771 [Medline] [Order article via Infotrieve]
  12. Kitamura, N., Semler, B. L., Rothberg, P. G., Larsen, G. R., Adler, C. J., Dorner, A. J., Emini, E. A., Hanecak, R., Lee, J. J., van der Werf, S., Anderson, C. W., and Wimmer, E. (1981) Nature 291, 547-553 [Medline] [Order article via Infotrieve]
  13. Den Boon, J. A., Snijder, E. J., Chirnside, E. D., de Vries, A. A. F., Horzinek, M. C., and Spaan, W. J. M. (1991) J. Virol. 65, 2910-2920 [Medline] [Order article via Infotrieve]
  14. Godeny, E. K., Chen, L., Kumar, S. N., Methven, S. L., Koonin, E. V., and Brinton, M. A. (1993) Virology 194, 585-596 [CrossRef][Medline] [Order article via Infotrieve]
  15. Meulenberg, J. J. M., Hulst, M. M., de Meijer, E. J., Moonen, P. L. J. M., den Besten, A., de Kluyver, E. P., Wensvoort, G., and Moormann, R. J. M. (1993) Virology 192, 62-72 [CrossRef][Medline] [Order article via Infotrieve]
  16. Godeny, E. K., Zeng, L., Smith, L., and Brinton, M. A. (1995) J. Virol. 69, 2679-2683 [Abstract]
  17. Snijder, E. J., and Horzinek, M. C. (1993) J. Gen. Virol. 74, 2305-2316 [Medline] [Order article via Infotrieve]
  18. Snijder, E. J., and Spaan, W. J. M. (1995) The Coronaviruses (Siddell, S. G., ed) pp. 239-255, Plenum Press, New York
  19. Snijder, E. J., Wassenaar, A. L. M., and Spaan, W. J. M. (1992) J. Virol. 66, 7040-7048 [Abstract]
  20. Snijder, E. J., Wassenaar, A. L. M., and Spaan, W. J. M. (1994) J. Virol. 68, 5755-5764 [Abstract]
  21. Den Boon, J. A., Faaberg, K. S., Meulenberg, J. J. M., Wassenaar, A. L. M. Plagemann, P. G. W. M., Gorbalenya, A. E., and Snijder, E. J. (1995) J. Virol. 69, 4500-4505 [Abstract]
  22. Snijder, E. J., Wassenaar, A. L. M., Spaan, W. J. M., and Gorbalenya, A. E. (1995) J. Biol. Chem. 270, 16671-16676 [Abstract/Free Full Text]
  23. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) J. Mol. Biol. 215, 403-410 [CrossRef][Medline] [Order article via Infotrieve]
  24. Sturrock, S. S., and Collins, J. F. (1993) MPsrch version 1.3, Biocomputing Research Unit, University of Edinburgh, UK
  25. Gorbalenya, A. E., Blinov, V. M., Donchenko, A. P., and Koonin, E. V. (1989) J. Mol. Evol. 28, 256-268 [Medline] [Order article via Infotrieve]
  26. Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978) Atlas of Protein Sequence and Structure , pp. 345-352, National Biomedical Research Foundation, Washington, D. C.
  27. Higgins, D. G., Bleasby, A. J., and Fuchs, R. (1992) Comput. Appl. Biosci. 8, 189-191 [Abstract]
  28. Henikoff, S., and Henikoff, J. G. (1992) Proc. Natl. Acad. Sci. U. S. A. 89, 10915-10919 [Abstract]
  29. Leontovich, A. M., Brodsky, L. I., and Gorbalenya, A. E. (1993) BioSystems 30, 57-63 [Medline] [Order article via Infotrieve]
  30. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual , 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
  31. Jang, S. K., Kräusslich, H. G., Nicklin, M. J. H., Duke, G. M., Palmenberg, A. C., and Wimmer, E. (1988) J. Virol. 62, 2636-2643 [Medline] [Order article via Infotrieve]
  32. Kunkel, T. A., Roberts, J. D., and Zakour, R. (1987) Methods Enzymol. 154, 367-382 [Medline] [Order article via Infotrieve]
  33. Landt, O., Grunert, H. P., and Hahn, U. (1990) Gene (Amst.) 96, 125-128
  34. Fuerst, T. R., Niles, E. G., Studier, F. W., and Moss, B. (1986) Proc. Natl. Acad. Sci. U. S. A. 83, 8122-8126 [Abstract]
  35. Laemmli, U. K. (1970) Nature 227, 680-685 [Medline] [Order article via Infotrieve]
  36. Dougherty, W. G., Parks, T. D., Cary, S. M., Bazan, J. F., and Fletterick, R. J. (1989) Virology 172, 302-310 [CrossRef][Medline] [Order article via Infotrieve]
  37. Garcia, J. A., Lain, S., Cervera, M. T., Riechmann, J. L., and Martin, M. T. (1990) J. Gen. Virol. 71, 2773-2779 [Abstract]
  38. Hammerle, T., Hellen, C. U. T., and Wimmer, E. (1991) J. Biol. Chem. 266, 5412-5416 [Abstract/Free Full Text]
  39. Kean, K. M., Teterina, N. L., Marc, D., and Girard, M. (1991) Virology 181, 609-619 [CrossRef][Medline] [Order article via Infotrieve]
  40. Dessens, J. T., and Lomonossoff, G. P. (1991) Virology 184, 738-746 [Medline] [Order article via Infotrieve]
  41. Margis, R., and Pinck, L. (1992) Virology 190, 884-888 [Medline] [Order article via Infotrieve]
  42. Boniotti, B., Wirblich, C., Sibilia, M., Meyers, G., Thiel, H. J., and Rossi, C. (1994) J. Virol. 68, 6487-6495 [Abstract]
  43. Ivanoff, L. A., Towatari, T., Ray, J., Korant, B. D., and Petteway, S. R. (1986) Proc. Natl. Acad. Sci. U. S. A. 83, 5392-5396 [Abstract]
  44. Cheah, K.-C., Leong, L. E.-C., and Porter, A. G. (1990) J. Biol. Chem. 265, 7180-7187 [Abstract/Free Full Text]
  45. Lawson, M. A., and Semler, B. L. (1991) Proc. Natl. Acad. Sci. U. S. A. 88, 9919-9923 [Abstract]
  46. Andino, R., Rieckhof, G. E., Achacoso, P. L., and Baltimore, D. (1993) EMBO J. 12, 3587-3598 [Abstract]
  47. Hemmer, O., Greif, C., Dufourcq, P., Reinbolt, J., and Fritsch, C. (1995) Virology 206, 362-371 [CrossRef][Medline] [Order article via Infotrieve]
  48. Baum, E. Z., Bebernitz, G. A., Palant, O., Mueller, T., and Plotch, S. (1991) Virology 185, 140-150 [Medline] [Order article via Infotrieve]
  49. Gorbalenya, A. E., Koonin, E. V., Blinov, V. M., and Donchenko, A. P. (1988) FEBS Lett. 236, 287-290 [CrossRef][Medline] [Order article via Infotrieve]
  50. Demler, S. A., and De Zoeten, G. A. (1991) J. Gen. Virol. 72, 1819-1834 [Abstract]
  51. Jiang, B. M., Monroe, S. S., Koonin, E. V., Stine, S. E., and Glass, R. I. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 10539-10543 [Abstract]
  52. Bazan, J. B., and Fletterick, R. J. (1990) Semin. Virol. 1, 311-322
  53. Goldbach, R., Le Gall, O., and Wellink, J. (1991) Semin. Virol. 2, 19-25
  54. Nienaber, V. L., Breddam, K., and Birktoft, J. J. (1993) Biochemistry 32, 11469-11475 [Medline] [Order article via Infotrieve]
  55. Svendsen, I., Jensen, M. R., and Breddam, K. (1991) FEBS Lett. 292, 165-167 [CrossRef][Medline] [Order article via Infotrieve]
  56. Kitadokoro, K., Tsuzuki, H., Okamoto, H., and Sato, T. (1994) Eur. J. Biochem. 224, 735-742 [Abstract]
  57. Svendsen, I., and Breddam, K. (1992) Eur. J. Biochem. 204, 165-171 [Abstract]
  58. Brenner, S. (1988) Nature 334, 528-530 [CrossRef][Medline] [Order article via Infotrieve]
  59. Gorbalenya, A. E., Koonin, E. V., Donchenko, A. P., and Blinov, V. M. (1989) Nucleic Acids Res. 17, 4847-4861 [Abstract]
  60. Lee, H. J., Shieh, C. K., Gorbalenya, A. E., Koonin, E. V., La Monica, N., Tuler, J., Bagdzhadzhyan, A., and Lai, M. M. C. (1991) Virology 180, 567-582 [Medline] [Order article via Infotrieve]
  61. Lu, Y., Lu, X., and Denison, M. R. (1995) J. Virol 69, 3554-3559 [Abstract]
  62. Ziebuhr, J., Herold, J., and Siddell, S. G. (1995) J. Virol. 69, 4331-4338 [Abstract]
  63. Liu, D. X., and Brown, T. D. K. (1995) Virology 209, 420-427 [CrossRef][Medline] [Order article via Infotrieve]

©1996 by The American Society for Biochemistry and Molecular Biology, Inc.