©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
Intermolecular Binding Sites of Human Immunodeficiency Virus Type 1 Rev Protein Determined by Protein Footprinting (*)

Torben Heick Jensen (1)(§), Henrik Leffers (2)(¶)(**), J Kjems (1)(¶)(§§)

From the (1) Department of Molecular Biology, University of Aarhus, C. F. MAllé, Building 130, DK-8000 Aarhus C, Denmark and the (2) Department of Medical Biochemistry, University of Aarhus, Ole Worms Allé, Building 170, DK-8000 Aarhus C, Denmark

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

Human immunodeficiency virus encodes the regulatory protein Rev, which is required for expression of viral structural proteins. It binds to an RNA element (RRE) in the viral transcript and up-regulates the cytoplasmic appearance of unspliced and singly spliced viral mRNA. We have studied the structure of Rev alone and complexed with the RRE and two monoclonal antibodies, using a protein footprinting approach. The method involves radioactive labeling at the C-terminal end of Rev fusion protein followed by limited proteolysis under native conditions, using 10 different proteinases. Rev protein was mainly cleaved within the basic domain and in the C-terminal part. The periodicity of the proteolytic cleavages within the basic domain strongly suggests that it forms an -helical structure with one side facing the solvent. In the presence of RRE, these cleavages became significantly reduced. In addition, strong protection was observed at position 66 outside the basic domain. As a control for the specificity of the footprinting reaction, we confirmed the position of the epitopes for two monoclonal antibodies. This protein footprinting methodology is generally applicable to other proteins for which terminal modifications are acceptable, and provides a useful tool for mapping structure, substrate binding, and conformational changes.


INTRODUCTION

Human immunodeficiency virus type 1 (HIV-1)() encodes two regulatory proteins, Tat and Rev, which are absolutely required for virus production. Tat protein up-regulates the synthesis of full-length viral mRNA, whereas the Rev protein acts at a post-transcriptional level promoting the expression of unspliced and singly spliced mRNA in the cytoplasm (for review, see Ref. 1). Rev thereby induces a shift in viral protein expression from the early regulatory proteins (most importantly Tat, Rev, and Nef) to the structural proteins (Gag/Pol and Env). Both Tat and Rev mediate their functions through specific interaction with the viral RNA elements TAR and RRE, respectively.

In vivo and in vitro studies using mutated Rev protein have revealed a domain structure as indicated (see Fig. 1A). A tract of basic amino acids (residues 35-50) is involved in specific binding of RRE RNA and nuclear/nucleolar localization (2, 3, 4, 5) . A slightly more extended region is important for assembly of Rev into oligomers (5, 6, 7, 8) . A leucine-rich sequence located at positions 78-83 constitutes another functionally important region (2, 6, 9). This region may function as an activation domain interacting with a cellular factor (2) and/or play a role in protein oligomerization (10, 11) .


Figure 1: Construction of Rev fusion protein. A, schematic structure of Rev fusion protein expressed in E. coli. The N-terminal GST portion can be removed after purification by cleavage at an internal thrombin site (dottedbox). The protein can be specifically labeled at a serine residue within the heart muscle kinase site (cross-hatchedbox) using the heart muscle kinase enzyme. The functional domains of Rev, based on data from in vitro and in vivo experiments, are indicated. The basic region involved in RNA binding and nuclear/nucleolar localization (solidbox) is flanked by regions important for Rev oligomerization (hatchedboxes). The activation domain (shadedbox) is essential for Rev function in vivo and may interact with a cellular factor and/or play a role in Rev oligomerization. Protein segments are not drawn to scale. B, amino acid sequence of Rev fusion protein. Putative identifications of strong and weak proteinase cleavage sites are denoted by solid and openarrows, respectively, and the name of the corresponding proteinase is indicated. C indicates the position of chemically induced cleavage used as a marker. Positions are numbered according to the N terminus of the Rev amino acid sequence. The basic and activation domains are boxed, and dottedlines indicate regions in which amino acid assignment were uncertain. Abbreviations: HMK, heart muscle kinase target site; T, thrombin; L, Lys-C; A, Arg-C; G, Glu-C; As, Asp-N; Y, trypsin.



Several mechanisms have been suggested for Rev function. It may directly interfere with spliceosome assembly (12, 13, 14, 15) , protect the RNA from mRNA degradation (16) , activate cytoplasmic transport of incompletely spliced mRNA (16, 17, 18) , and/or influence the translatability of mRNA in the cytoplasm (19, 20, 21) . The Rev response is mediated through the specific interaction of multiple Rev proteins with RRE. This interaction has been studied in vitro by a number of different techniques including gel retardation assays (6, 22, 23, 24, 25, 26, 27, 28, 29) , RNA footprinting and chemical modification interference analysis (14, 30, 31) , circular dichroism experiments (32) , and systematic evolution of ligands by exponential enrichment experiments (33, 34, 35) . Recent NMR studies have resolved structural features of a single high affinity Rev binding site located within the RRE RNA (36, 37) . However, in the absence of NMR and x-ray crystallography data for Rev, information about the protein structure remains scarce. A circular dichroism analysis of a peptide spanning the basic domain indicates that this region forms an -helix in solution (38) . Based on results using the same technique, it was more recently proposed that the helical structure of the basic domain forms one arm of a more extended helix-loop-helix motif in Rev (39) . We have studied the structure of Rev using protein footprinting and identified single amino acids protected by the RRE upon binding. The approach involves limited proteolysis of Rev fusion protein specifically labeled at the C-terminal end, in the absence and presence of molecular ligands.


EXPERIMENTAL PROCEDURES

Construction of Plasmid

The pRRE plasmid contains a 223-base pair region (position 7768-7991) encoding the RRE of the HXB-3 isolate of HIV-1 (31) . The Rev sequence was derived from the pSVH6rev plasmid (40) , which contains a functional synthetic Rev gene including codons more efficiently utilized in prokaryotes. This sequence corresponds to Rev from the HIV-1 HXB2 strain but contains a serine and a threonine at positions 61 and 114, respectively. Sequencing of the pSVH6rev plasmid revealed a single base mutation resulting in a Val Ile substitution in Rev as compared with the original published sequence. However, since this mutation frequently occurs at this amino acid position in wild-type HIV-1 strains, it is not likely to effect the structure and function of Rev. The pSVH6rev plasmid was used as a polymerase chain reaction template and amplified using M13(-40) forward primer and a 5`-GCGAATTCCTTCTTTAGCTCC-3` primer creating an EcoRI restriction site at the 3`-end. The resulting polymerase chain reaction fragment was digested with EcoRI and partially with BamHI, and the full-length Rev fragment was ligated to a BamHI-EcoRI digest of pGEX-GTH.() The resulting plasmid encodes a GST-Rev fusion protein containing a C-terminal recognition sequence for the catalytic subunit of cAMP-dependent heart muscle kinase (see Fig. 1). The construct was verified by sequencing.

Expression and Labeling of Protein

Fusion protein was expressed in Escherichia coli strain BL21 and purified essentially as described by Pharmacia Biotech Inc. The bacterial cultures were induced at log phase with 0.1 mM isopropyl-1-thio--D-galactopyranoside, and growth was continued for an additional 3-5 h. Bacteria were harvested by centrifugation and resuspended in buffer A (20 mM HEPES, pH 7.9, 200 mM NaCl, 20% glycerol, 10 mM -mercaptoethanol) containing 0.2 mM phenylmethylsulfonyl fluoride, 0.5 µg/ml leupeptin, 2.0 µg/ml aprotinin, and 0.1 mM EDTA. Resuspended bacteria were sonicated on ice in short bursts and cleared of insoluble material by centrifugation. Fusion protein was collected from the supernatants using glutathione-Sepharose 4B (Pharmacia) at a concentration of 0.5 µl of Sepharose bed volume/ml of bacterial culture. After 30 min with gentle agitation at room temperature, beads were collected by centrifugation and washed 3 times with PBS (140 mM NaCl, 2.7 mM KCl, 10 mM NaHPO, 1.8 mM KHPO, pH 7.3). At this step, fusion protein bound to beads was normally stored at -70 °C in PBS containing 20% glycerol. For labeling purposes, fusion protein bound to Sepharose beads were washed 3 times in heart muscle kinase buffer (20 mM Tris/HCl, pH 7.5, 100 mM NaCl, 12 mM MgCl) and subsequently labeled in 300 µl of protein kinase reaction mixture (heart muscle kinase buffer containing 100 units of kinase (Sigma) and 0.33 mCi [-P]ATP (Amersham Corp., 7000 Ci/mmol)) per ml of bed volume of glutathione-Sepharose 4B for 30 min at 4 °C. Unincorporated nucleotides were removed by washing the beads 5 times with PBS. End-labeled fusion protein were eluted from the beads either by gentle shaking 3-4 times in 1 bed volume of PBS containing 10-20 mM reduced glutathione for 30 min at 4 °C or by thrombin cleavage by incubating with 50 units of thrombin (Pharmacia) per ml of Sepharose bed volume for 2 h at 20 °C. Nonfusion Rev protein was prepared as described previously (31) . Protein concentration and purity were determined by hydrolysis of protein samples in 6 M HCl, 0.01% phenol, 5% thioglycolic acid at 110 °C for 18 h followed by quantification of free amino acids (41) . The purity of both the fusion and the nonfusion Rev proteins were assessed to be above 70%.

Preparation of RNA and Gel Mobility Shift Analysis

RNA used for protein footprinting was synthesized in 200 µl of reaction mixtures containing 10 µg of linearized pRRE plasmid DNA, 40 mM Tris/HCl, pH 7.4, 6 mM MgCl, 4 mM spermidine, 10 mM dithiothreitol, 50 units of RNasin, 0.5 mM ATP, 0.5 mM UTP, 0.5 mM GTP, 0.5 mM CTP, 1 µCi [-P]UTP (Amersham Corp., 900 Ci/mmol), and 200 units of T3 RNA polymerase (Stratagene). The RNA was purified on a 4% polyacrylamide-8 M urea gel, extracted in 0.25 M NaAc, pH 6.0, 1 mM EDTA in the presence of phenol, and ethanol precipitated. The final concentration of the RNA was calculated from the specific activity of incorporated P label. The RNA was renatured by incubating the RNA in renaturation buffer (10 mM HEPES/KOH, pH 7.5, 50 mM NaCl) for 2 min at 80 °C followed by slow cooling to 37 °C. Transcription of high specifically radiolabeled RNA substrates and gel mobility shift analysis were done as described previously (31) .

Ligand Binding and Proteinase Digestion

One µl of RNA renaturation buffer containing 500 ng of RRE RNA was added to 100 ng of Rev fusion protein (approximately 2 times molar excess of RNA) in 9 µl of Rev binding buffer (10 mM HEPES/KOH, pH 7.5, 100 mM KCl, 1 mM MgCl, 0.5 mM EDTA, 1 mM dithiothreitol, 10% glycerol, 0.5 unit of RNasin, 100 ng/µl bovine serum albumin, 50 ng/µl E. coli. tRNA) followed by 20 min. incubation on ice. In the control reaction the RRE was replaced with 500 ng E. coli. tRNA (Boehringer Mannheim). For Rev-mAb binding approximately 100 ng Rev fusion protein were incubated in 9 µl buffer B (5 mM Tris/HCl, pH 7.4, 75 mM NaCl, 1 mM EDTA, 0.025% Nonidet P-40, 100 ng/µl bovine serum albumin) for 10 min at room temperature followed by the addition of 1 µl of mAb (1 µg/µl) and reincubation for 10 min at room temperature. Immediately after the binding of RNA or mAb to Rev, 10 µl of the respective proteinase (diluted in water) was added, and the mixture was incubated for 15 min at 37 °C. Reactions were put on ice and stopped by the addition of 6.7 µl of 4 SDS loading buffer (24% glycerol, 6.8% SDS, 230 mM Tris/HCl, pH 6.8, 0.01% Serva Blue W (Serva), 3.3% -mercaptoethanol). Concentration ranges in the final reaction mixture of the different proteinases were as follows: 0.05 unit/µl thrombin (Pharmacia unit definition), 1-10 ng/µl Lys-C (Sigma), 0.005-0.05 unit/µl Arg-C (Sigma unit definition), 5-50 pg/µl trypsin, tosyl-phenyl-alanine chloromethylketone-treated (Cooper Biomedicals), 0.5-5 ng/µl Glu-C (Boehringer Mannheim), 0.02-0.5 ng/µl Asp-N (Sigma), 5-50 pg/µl proteinase K (Boehringer Mannheim), 5-50 pg/µl subtilisin Carlsberg, 0.5-5 ng/µl Pronase (Boehringer Mannheim), 0.5-5 ng/µl thermolysin (Sigma), 5-50 pg/µl bromelain (Boehringer Mannheim).

SDS-Polyacrylamide Gel Electrophoresis

The cleavage products were resolved using discontinuous Tricine-SDS-polyacrylamide gel electrophoresis (42) to achieve optimal resolution of small peptides. Electrophoresis was done in 0.4-mm thick, 30 40-cm slab gels. The acrylamide percentage was either 16 or 20% for the resolving gel and 7% for the stacking gel. Samples were routinely run through the stacking gel at 20 mA and then at constant 40 mA current until the Serva Blue W dye ran out of the gel. Gels were dryed and autoradiographed with screens at -80 °C.

Protein Sequence Analysis and Chemical Cleavage

Protein separated by SDS-polyacrylamide gel electrophoresis was electroblotted (43) onto a ProBlott membrane (Applied Biosystems). The band of interest was excised and subjected to sequence analysis on an Applied Biosystems 477A instrument equipped with an on-line 120-A chromatograph. Analysis was performed using approximately 20 pmol of peptide. Specific cleavage at the Rev tryptophanyl residue was done as described by Huang et al.(44) .


RESULTS

Preparation of Full-length Radiolabeled Protein

Rev protein containing a glutathione S-transferase (GST) tag at the N terminus and the recognition sequence for the catalytic subunit of cAMP-dependent heart muscle kinase at the C terminus was expressed in E. coli (Fig. 1A). The GST tag allowed rapid affinity purification of the fusion protein on a glutathione-Sepharose matrix (45) , and the presence of the heart muscle kinase site (RRASV) facilitated specific labeling at the serine residue in the presence of [-P]ATP and the heart muscle kinase enzyme (46) . In a control experiment, we observed no significant labeling of GST-Rev fusion protein lacking the heart muscle kinase site, implying that the kinase is highly specific toward its native site (data not shown). The specifically labeled Rev fusion protein will therefore be referred to as end-labeled Rev protein in the text below. The concept of positioning the GST and the heart muscle kinase at opposite ends of the protein ensured that only full-length Rev was labeled. In an initial construct, in which both the GST and the heart muscle kinase tag were placed at the N terminus (using the pGEX-2TK vector (Pharmacia)) a considerable amount of radioactive protein degradation products were observed that interfered with the footprinting analysis (data not shown). A cleavage site for thrombin endopeptidase located between the GST tag and the Rev protein enabled the removal of the GST part if necessary (Fig. 1A). Since partial cleavage after position 66 within the Rev sequence was observed by thrombin, most experiments were performed on GST-Rev fusion protein.

The affinity and specificity of this fusion protein for RRE RNA were compared with nonfusion Rev protein using a gel retardation assay. The molar concentration, required to complex half of radiolabeled input RRE probe, was approximately the same for both types of proteins (Fig. 2). Furthermore, the fusion Rev protein did not bind the reverse RRE probe except at very high concentrations (more than 2 µM). This implies that the modified termini of the Rev protein did not interfere significantly with the folding of the RNA binding domain.


Figure 2: Gel mobility shift analysis of protein-RNA complex formation. Approximately 2 ng of uniformly labeled RRE or antisense RRE (rRRE) was incubated with increasing concentration of wild-type Rev or GST-Rev fusion protein (nM) as indicated. RRE and rRRE indicate the positions of free probes, and Ori. marks the origin of the gel.



Structural Analysis of Rev Using Proteinases

End-labeled Rev fusion protein was digested under native conditions over a wide concentration range with 10 endopeptidases: Lys-C, Arg-C, trypsin, Glu-C, and Asp-N, which are relatively sequence specific and proteinase K, subtilisin Carlsberg, Pronase, thermolysin, and bromelain, which cleave less specifically (Fig. 3A, and see for specificities). All of these proteinases are active under conditions that are optimal for the stability of RevRRE complexes and do not contain detectable RNase activity (results not shown). Proteinase concentrations at which partial cleavage of the Rev protein was observed occurred within a relatively narrow titration range. To favor single-hit kinetics, conditions for protein footprinting were chosen such that at least 50% of the radioactivity remained in the band containing uncleaved protein. Under these conditions, only a subset of the potential proteolytic targets sites were cleaved, which probably reflects the accessibility of the cleavage sites within the protein structure. By comparing the bands produced by proteinases and chemicals of different specificities, it was possible to make a putative identification of most of the bands as described below (Fig. 1B and 3A).


Figure 3: Analysis of proteolytic digests of GST-Rev fusion protein. A, autoradiogram of a 20% protein gel showing proteolytic cleavage products. Rev fusion protein was digested with increasing concentrations of the indicated proteinases. C denotes a control lane containing untreated fusion protein, and T denotes a lane containing thrombin-cleaved Rev fusion protein as a marker. GST-Rev and Rev indicate the N terminus of the GST-Rev fusion protein and Rev, respectively. Putative identifications of corresponding amino acids are indicated except for the basic region (residues 35-50), which is more closely investigated in panelsC and D. Assignment of most bands occurring in the GST portion is not attempted. Identification of a secondary thrombin cleavage sites at Arg within the Rev sequence is based on peptide sequencing (see ``Experimental Procedures''). The remaining bands were identified on the basis of their relative positions, compared with products from other specific proteinases, and to bands in marker lanes containing mixtures of unrelated peptides (not shown). An unidentified artifactual band of unknown origin, which migrated at different positions (ranging from 2 to 40 kDa), depending on type of proteinase, duration of electrophoresis, and acrylamide percentage of the gel, is indicated by a star. Since all proteinase digests were performed multiple times, using different electrophoresis conditions, these bands were easily identified and omitted from the analysis. Proteinase concentrations in the final reaction mixtures were as follows: 0.05 unit/µl thrombin; 1, 3, and 10 ng/µl Lys-C; 0.005, 0.015, and 0.05 unit/µl Arg-C; 5, 15, and 50 pg/µl trypsin; 0.5, 1.5, and 5 ng/µl Glu-C; 0.05, 0.15, and 0.5 ng/µl Asp-N; 5, 15, and 50 pg/µl proteinase K; 5, 15, and 50 pg/µl subtilisin Carlsberg; 0.5, 1.5, and 5 ng/µl Pronase; 0.5, 1.5, and 5 ng/µl thermolysin; 5, 15, and 50 pg/µl bromelain. B, diagram showing the relationship between the logarithm of the mass and migration of the bands shown in panelA. The mass was calculated for the radioactive C-terminal fragment produced by proteolytic cleavage, and the migration was measured as the distance between the bottom of the stacking gel and the center of the radioactive band. For peptides above 5 kDa, an almost linear relationship was obtained. In the basic region (positions 35-50) the curve became more horizontal reflecting larger spacing between the bands. The reverse effect occurred in the 50-80 region, where less resolution were observed. Lower molecular mass peptides (<5 kDa) generally tend to migrate too slowly to convey to the linear relationship. The boxedregion is analyzed in more detail in panelD. C, autoradiogram of a 20% protein gel showing Arg-C and trypsin digests in the Rev region coelectrophoresed along with a marker for Trp (W), thrombin-digested protein (T), and a control containing untreated protein (C). At the highest trypsin and Arg-C concentrations, multiple bands became visible, all of which could be accounted for by individual arginines in the Rev sequence. Assuming that the Trp band migrate in between the suggested Arg and Arg bands and knowing the identity of Argand Arg enabled a putative assignment of the remaining bands. The band labeled Arg may either correspond to Arg or Arg. Final proteinase concentrations were as follows: 0.02 and 0.05 unit/µl for Arg-C and 5 pg/µl and 15 pg/µl for trypsin. The Trp was cleaved by CNBr at the carboxyl side as described under ``Experimental Procedures.'' Electrophoresis conditions were as described in panelA. D, diagram showing the relationship between the calculated peptide mass and the distance migrated for the bands shown in panelC (for details, see legend to panelB).



SDS-polyacrylamide gel electrophoresis generally provides an almost linear correlation between the logarithm of polypeptide mass and gel mobility (47) . However, a nonlinear relationship is occasionally observed mainly because amino acids have different molecular weights, bind SDS with different affinities, and are not uniformly charged (48) . In particular, peptides containing stretches of acidic or basic amino acids often migrate abnormally, and heterologous protein markers only allow a rough estimate of proteolytic cleavage positions. A plot of the logarithm of the mass of the putatively identified Rev peptides as a function of distances migrated in the SDS gel was almost linear for masses above 5 kDa (Fig. 3B). However, abnormally large spacing between the bands was observed in the basic region (position 35-50), consistent with the notion that basic proteins generally exhibit a high apparent size on SDS gels (49) . Abnormal migration was also observed for the peptide cleaved before Asp (Fig. 3B).

Some regions were more accessible to proteinases than others. Most cuts were observed in the basic domain, in the transactivation domain, and in the C-terminal region. When using Arg-C, Lys-C, trypsin, proteinase K, subtilisin Carlsberg, and Pronase, we often observed a strong diffuse band (marked with asterisks in Fig. 3A). This band was easily identified by its abnormal mobility using different gel conditions and was omitted from the analysis (see legend to Fig. 3 ). A strong and a weak Lys-C-specific band probably represent cleavage after Lys and Lys, respectively, which are the only lysine residues in Rev (Fig. 1B and 3A). In contrast, Glu-C only cleaved at two to three of 11 potential Glu-C sites in Rev (Fig. 1B and 3A), and the most accessible site occurred just below the band corresponding to Asp and was identified as Glu. A strong cleavage site was also observed in the C-terminal end of Rev probably corresponding to Glu. This assignment is based on the detection of two weak Glu-C-specific bands, apparent at high enzyme concentrations, immediately below corresponding to Glu and Glu (result not shown). Asp-N cleaved strongly before Asp, which is positioned immediately C-terminal to the activation domain (Fig. 1B and 3A). A weak band appeared on some gels at the same position as Glu, which possibly originates from cleavage before Asp. Since Asp-N also cuts at cysteine residues, albeit with lower efficiency, we cannot exclude that the bands derive from cleavage before Cys or Cys, which are the only cysteine residues in Rev. Cleavage at the other aspartic acids at positions 7 and 9 were not detected. When plotted on the mass/mobility diagram, the band corresponding to the Asp cleavage product migrated abnormally slow (Fig. 3B). However, the identification is probably correct since no other aspartic acids (or cysteines) occur in the 10-83 region. Arg-C cleaved at regions flanking the Rev segment, probably corresponding to residues Argand Arg in the thrombin recognition site and in the heart muscle kinase site, respectively, and at several positions within, and C-terminal to, the basic domain (Fig. 1B and 3C). Interestingly, bands in this region appeared as an evenly spaced pattern representing a subset of the arginine residues. Based on relative mobility of the bands and two specific markers corresponding to cleavage after Trp and Arg, it was possible to assign each of the cleavages to single amino acids (Fig. 3, C and D). Most accessible were residues Arg, Arg or Arg, Arg, and Arg, whereas Arg, Arg, and Arg were cleaved to a minor extent. Trypsin treatment yielded a similar pattern to Arg-C. However, the specificity of Arg-C and trypsin for the arginines differed greatly. Most noticeable were the strong Arg-C cleavages after Argand Arg, which are not, or are weakly, cleaved by trypsin, and the strong cleavage after Arg by trypsin, which is not cleaved by Arg-C above background (Fig. 3, A and C). In addition, trypsin-digested samples exhibited a weak band below Arg, which probably corresponds to Arg (Fig. 3C). At an increased trypsin concentration, several additional weak bands appeared within the basic region of Rev that could all be accounted for by corresponding arginine residues in the amino acid sequence (Fig. 3C). It is possible that increased cleavage activity at the higher proteinase concentration partially denatures the protein structure and exposes additional arginines.

Plotting the migration (in mm) for all visible bands in Fig. 3C, as a function of the mass of the putative peptide resulted in points forming a smooth curve (Fig. 3D). This reinforces that the assignments are correct and that the gel resolution in the basic domain is at the level of single amino acids.

Proteinase K, subtilisin Carlsberg, and Pronase exhibited similar cleavage patterns, although the intensity of each individual cleavage varied considerably (Fig. 3A). Since these proteinases are relatively unspecific (cutting preferentially before hydrophobic amino acids), exact assignments of the proteolytic cleavage products are more difficult. Strong cleavage occurred at several positions between the activation domain of Rev and Glu, whereas no cleavage was observed in, or N-terminal to the basic domain. Bromelain, which is a relatively unspecific proteinase cleaved strongly at a position near Asp, Glu, and just below Glu. In addition, weak cleavage was observed near Lys and Arg(Fig. 3A). Thermolysin cleaved strongly at a position around the activation domain and in the 95-100 region.

Footprinting the RRE Binding Site

Proteinases attack the surface of a folded protein, and their activity may therefore be sensitive to sterical hindrance by intermolecular interactions. Probing a protein in the presence and absence of a substrate may therefore provide information about what amino acids are involved in binding. End-labeled Rev protein was probed with 10 different proteinases in the absence and presence of a 2 times molar excess of RRE RNA (Fig. 4). In the reaction without RRE, a similar amount of E. coli tRNA was added as control RNA. Strong protection of specific cleavage by Arg-C was observed at Arg, Argor Arg, Arg, and Arg, whereas cleavage at Arg, Arg, and Arg was unaffected (Fig. 4). Weak bands corresponding to Arg and Arg were reduced to background levels upon RNA binding. The cleavage pattern by Lys-C, Glu-C, Asp-N, proteinase K, subtilisin Carlsberg, Pronase, thermolysin, and bromelain was unaffected by the presence of RRE RNA (results not shown). Minor, but consistent protection against trypsin digestion was observed at Arg or Arg, indicating that trypsin may be less sensitive to RNA protection (results not shown). The protection of arginines by the RRE may reflect that these amino acids interact directly with, or are shielded by, the RNA. Alternatively, RNA induced conformational changes or protein multimerization render these residues inaccessible to the proteinase.


Figure 4: Autoradiogram of a SDS protein gel showing the footprint obtained with RRE RNA. Rev fusion protein was digested with Arg-C in the presence of RRE RNA or the same amount of tRNA, indicated by + and -, respectively. C denotes control lanes in which Arg-C proteinase was omitted, and T indicates a marker lane where Rev fusion protein was digested with thrombin, which cleaves at Argand at Arg. Proteolytic cleavages, which are specifically sensitive to RRE include Arg, Arg or Arg, Arg, and Arg (Fig. 1B). Proteinase concentrations in the final reaction mixtures were: 0.05 unit/µl thrombin, 0.015 unit/µl Arg-C, and 0.025 unit/µl Arg-C. The SDS gel contained 20% acrylamide.



Mapping Monoclonal Antibody Epitopes

Since mAb recognition sites, natural substrate binding sites, and proteolytic-sensitive sites generally are located on the surface of native proteins, there will often be structural overlap between these sites. Mapping the epitopes of two Rev specific mAbs therefore served as an appropriate positive control for the protein footprinting approach. We have tested the specificity of the protein footprinting technique by mapping the epitopes of two mAbs (mAb1 and mAb2), which have previously been shown to interact with peptides spanning amino acids 75-88 and 91-105, respectively (50) . A Tat-specific mAb was used as a negative control. Rev fusion protein labeled at the C terminus was digested with Arg-C, Glu-C, Asp-N, proteinase K, subtilisin Carlsberg, and bromelain in the presence and absence of mAb1 or mAb2. Binding of mAb1 protected Rev toward Asp-N specific cleavage at Asp and proteinase K and subtilisin Carlsberg-specific cleavages in a region near Asp (Fig. 5). Binding of mAb2 resulted in protection against proteinase K and subtilisin Carlsberg in the 92-96 region and against subtilisin Carlsberg cleavage near Glu. Cleavage with Arg-C, Glu-C, and bromelain were not inhibited by mAb1 or mAb2 binding. Of particular interest is the unaffected cleavage of Glu, suggesting that this residue is not recognized by mAb2. The results obtained with protein footprinting correlate very well with the epitope mapping data obtained previously (50) and reinforce that this method provides reliable information about domains involved in intermolecular interactions.


Figure 5: Autoradiogram of a SDS protein gel showing the footprint obtained with Rev specific mAbs. Rev fusion protein was digested with the indicated proteinases in the presence of mAb1 that specifically recognizes residues 75-88 or in the presence of mAb2 recognizing residues 91-105. As a control, a mAb recognizing residues 49-85 in the Tat protein was included. C, denotes a control lane without added proteinase, and T, shows thrombin-cleaved Rev fusion protein as a marker. Sites specifically sensitive to Rev specific mAb1 binding included Asp-N cleavage at Asp and to proteinase K and subtilisin Carlsberg cleavages slightly more N-terminal to this position. Binding of mAb2 protected Rev against cleavage by proteinase K and subtilisin Carlsberg in the 92-96 region and to subtilisin Carls-berg cleavage near Glu. Proteinase concentrations in the final reaction mixtures were: 1.5 ng/µl Glu-C, 0.2 ng/µl Asp-N, 50 pg/µl proteinase K, and 5 pg/µl subtilisin Carlsberg. The SDS gel contained 16% acrylamide.




DISCUSSION

Enzymatic footprinting of nucleic acids is a powerful approach for studying solution structure and molecular interactions of DNA and RNA. We have used a parallel approach to study the structure of a protein and to characterize the amino acids involved in the binding of other macromolecules. The method is analogous to standard nucleic acid footprinting except that radioactively end-labeled proteins are used instead of nucleic acids, and proteinases are used instead of endonucleases. The peptide cleavage products are subsequently resolved on SDS gels and readily identified using appropriate internal size markers. Using this method, the structure of HIV-1 Rev protein was probed with 10 different proteinases under native conditions providing insight into the overall folding of the fusion protein. Although, it is possible that some artifactual bands corresponding to cleavages of incorrectly folded proteins may occur, the observation that the RRE consistently inhibited proteolytic cleavages at specific amino acids by more than 70% at specific binding conditions, suggests that most of the proteins contain a correctly folded RRE binding domain.

The RNA binding efficiency of GST-Rev fusion protein has been investigated previously. In one report, a 3 times higher molar concentration of partially purified GST-Rev protein compared to nonfusion Rev protein was needed to bind the same amount of radiolabeled input probe (6) . However, in a more recent report it has been demonstrated that GST-Rev and nonfusion Rev protein bind RRE with similar affinities (51) . We find that the GST-Rev fusion protein used in our study and nonfusion Rev protein exhibit essentially the same binding affinity and specificity, using a similar gel mobility shift assay. Although the fusion of the GST and the heart muscle kinase site to the N- and C terminus of Rev, respectively, may alter the structure locally, it is conceivable that the structure of the RNA binding domain is not affected by the modified termini of the protein.

Some regions of Rev fusion protein are more accessible to proteolytic cleavage than others, which may reflect a location on the surface. The C-terminal domain (residues 75-116) was cleaved strongly at multiple positions by most of the proteinases, whereas the N-terminal domain (residues 1-34) generally was much less accessible to proteinases. The central part (residues 35-66) was cleaved by Arg-C yielding a number of evenly spaced bands. Strongest cleavage was observed at residues flanking the central part of the basic region (Arg, Arg or Arg, Arg, and Arg), whereas the core of the basic domain (residues 40-48) was more resistant to proteolytic cleavage yielding only weak bands putatively identified as Arg and Arg. Interestingly, when placed in an -helical projection, the identified cleavage sites are confined to one face of the helix (Fig. 6). This strongly suggests that the basic region forms an -helix in the context of the whole protein, exposing one face of the helix to the solvent. Such an interpretation is supported by circular dichroism data, which show that a peptide, spanning only the basic domain of Rev, forms an -helix in solution and that the helicity of the peptide is important for specific RRE binding (38) . Since the N-terminal region of Rev protein is relatively resistant to proteolytic cleavages, our data does not allow testing of the proposal that residues 8-55 of Rev forms an extended helix-loop-helix motif (39).


Figure 6: Helical wheel projection of amino acids threonine 34 to glutamine 51. Circles denote basic residues, tiltedboxes denote polar residues, and boxes denote hydrophobic residues. Arginines, cleaved strongly by Arg-C (boldfacearrows) include Arg, Arg or Arg, and Arg. Weak Arg-C specific cleavages (thinarrows) include Arg and Arg. Affected arginines are all located at one face of an -helix. No cleavage was observed after Arg, Arg, Arg, Arg, and Glu by Arg-C or Glu-C. Numbers refer to positions in the Rev sequence (See Fig. 1B).



Comparison of the proteolytic digestion pattern of protein alone and in complex with RRE shows that the amino acids, forming the putative -helical structure, also are affected by RNA binding. Four major Arg-C specific cleavage sites, corresponding to Arg, Arg or Arg, Arg, and Arg, were considerably reduced in the presence of RRE RNA, whereas no effects were observed in the presence of the same amount of tRNA. The protection of Arg, Arg or Arg, and Arg against Arg-C cleavage may reflect direct protection to proteinases by the RRE. This interpretation is supported by binding studies of Rev and related peptides to the RRE. Based on in vitro RNA footprinting and chemical modification interference experiments, it has been shown that a peptide, containing amino acids 34-50 of Rev (Rev 34-50), binds specifically to the RRE, forming almost the same contacts to the RNA as the intact protein (14) . Moreover, mutating Arg, Arg, Arg or Arg in Rev 34-50 decreases the specificity of the RNA binding significantly, suggesting that these amino acids contact the RNA (38) . The importance of these amino acids has also been studied in vivo. Substitution of both Arg and Arg or Arg and Arg strongly reduces RRE binding and Rev activity in vivo(3, 5, 6, 7) . However, a recent exhaustive scanning, using single arginine substitutions in Rev, showed that, in contrast to the Rev 34-50 study by Tan et al., no single arginine within the basic domain is essential for Rev function in vivo(51) . This suggests that the arginines within the basic domain of intact Rev protein are functionally redundant for RRE binding (51).

Our data show that the RRE protection extends outside the basic domain at Arg. This amino acid has not previously been assigned any role in RNA binding and mutating Arg has only marginal effect on Rev function (3) . Possibly, Arg interacts with the RRE providing an explanation for the decreased specificity of RNA binding and more strict sequence requirement observed for Rev 34-50 compared with intact Rev protein (38, 51) . Alternatively, protection of Arg upon RRE binding may reflect sterical hindrance of the proteinase by the RNA, oligomerization of the protein on the RRE, or induced conformational changes in the protein, resulting in protein structures that are less sensitive to proteinases.

In a similar footprinting analysis of elongation factor Tu, bound either to GTP or GDP, conformational changes in the protein were accompanied by significant changes in the proteinase cleavage pattern.() In contrast, when probing the Rev fusion protein, no additional cleavage sites or major enhancements were observed upon RRE binding, suggesting that conformational changes in Rev are minimal. This observation is supported by circular dichroism spectra of Rev, which show only marginal changes in the content of helical structure upon RRE binding (32) .

The protein footprinting methodology, described in this paper, provides a general method for mapping protein domains involved in binding of other proteins, nucleic acids, or other macromolecules. The only requirement is that the fusion protein is stable and that the region of interest is correctly folded when situated in a fusion protein. Alternative methods for selective visualization of terminal protein fragments have been used previously. One method involves immunodetection by antisera, raised toward N- and C-terminal peptides of the protein, in a Western blot analysis (52) . However, that method is unsuitable for small proteins like Rev, partly because the antibody epitopes span a significant portion of the protein and partly because blotting of small peptides is relatively inefficient. Another method utilizes chemical linkage of a fluorescent group to the N terminus of a protein, which may then be visualized in a gel by UV radiation (53) . The disadvantage of this method is the requirement of irreversible modification of all internal amino groups under denaturing conditions, making it less useful for protein footprinting of native proteins. These problems are avoided using the fusion-protein approach described in this report. Independent of labeling technique, the most laborious process in protein footprinting is the identification of the peptide cleavage products at the amino acid level. An interesting possibility, which we are currently testing, is to combine the protein footprinting method with mass spectrometry technology to obtain a rapid and accurate identification of proteolytic cleavage products.

  
Table: List of proteinases used in this study

Specificities are given according to Ref. 54.



FOOTNOTES

*
The work was supported in part by grants from the Danish Medical Research Council, Novo Nordisk's Fond, and the Danish Cancer Society. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§
Supported by Aarhus University.

Supported by the Danish Cancer Society.

**
Present address: Dept. of Growth and Reproduction, National University Hospital, Blegdamsvej 9, 2100 KK, Denmark.

§§
To whom correspondence should be addressed. Tel.: 45 8942 2686; Fax: 45 8619 6500; E-mail: KJEMS@BIOBASE.DK.

The abbreviations used are: HIV-1, human immunodeficiency virus, type 1; PBS, phosphate-buffered saline; GST, glutathione S-transferase; mAb, monoclonal antibody.

Jensen, T. H., and Kjems, J. (1995) Gene (Amst.), in press.

T. H. Jensen and J. Kjems, unpublished observations.


ACKNOWLEDGEMENTS

We thank Anne Marie Szilvay for providing Rev mAbs, Dag E. Helland and Anne Marie Szilvay for providing the Tat mAb, and to Lars Sottrup-Jensen and Claus Oxvig for peptide sequencing and amino acid analysis. The pSVH6rev plasmid was kindly provided by Alan W. Cochrane, and the subtilisin Carlsberg proteinase was a gift from Steen Mortensen (Novo Nordisk). We thank Annette H. Andersen for technical assistance and Finn Skou Pedersen, Allan Jensen, Helle Dyhr-Mikkelsen, and Roger A. Garrett for discussions and critical reading of the manuscript.


REFERENCES
  1. Cullen, B. R.(1992) Microbiol. Rev. 56, 375-394 [Abstract]
  2. Malim, M. H., Bohnlein, S., Hauber, J., and Cullen, B. R.(1989) Cell 58, 205-214 [Medline] [Order article via Infotrieve]
  3. Hope, T. J., McDonald, D., Huang, X., Low, J., and Parslow, T. G. (1990) J. Virol. 64, 5360-5366 [Medline] [Order article via Infotrieve]
  4. Perkins, A., Cochrane, A. W., Ruben, S. M., and Rosen, C. A.(1989) J. AIDS 2, 256-263 [Medline] [Order article via Infotrieve]
  5. Olsen, H. S., Cochrane, A. W., Dillon, P. J., Nalin, C. M., and Rosen, C. A.(1990) Genes & Dev. 4, 1357-1364
  6. Malim, M. H., and Cullen, B. R.(1991) Cell 65, 241-248 [Medline] [Order article via Infotrieve]
  7. Zapp, M. L., Hope, T. J., Parslow, T. G., and Green, M. R.(1991) Proc. Natl. Acad. Sci. U. S. A. 88, 7734-7738 [Abstract]
  8. Bohnlein, E., Berger, J., and Hauber, J.(1991) J. Virol. 65, 7051-7055 [Medline] [Order article via Infotrieve]
  9. Venkatesh, L. K., and Chinnadurai, G.(1990) Virology 178, 327-330 [Medline] [Order article via Infotrieve]
  10. Daly, T. J., Rennert, P., Lynch, P., Barry, J. K., Dundas, M., Rusche, J. R., Doten, R. C., Auer, M., and Farrington, G. K.(1993) Biochemistry 32, 8945- 8954 [Medline] [Order article via Infotrieve]
  11. Bogerd, H., and Greene, W. C.(1993) J. Virol. 67, 2496-2502 [Abstract]
  12. Chang, D. D., and Sharp, P. A.(1990) Science 249, 614-5 [Medline] [Order article via Infotrieve]
  13. Lu, X. B., Heimer, J., Rekosh, D., and Hammarskjold, M. L.(1990) Proc. Natl. Acad. Sci. U. S. A. 87, 7598-7602 [Abstract]
  14. Kjems, J., Calnan, B. J., Frankel, A. D., and Sharp, P. A.(1992) EMBO J. 11, 1119-1129 [Abstract]
  15. Kjems, J., and Sharp, P. A.(1993) J. Virol. 67, 4769-4776 [Abstract]
  16. Felber, B. K., Hadzopoulou, C. M., Cladaras, C., Copeland, T., and Pavlakis, G. N.(1989) Proc. Natl. Acad. Sci. U. S. A. 86, 1495-1499 [Abstract]
  17. Emerman, M., Vazeux, R., and Peden, K.(1989) Cell 57, 1155-1165 [Medline] [Order article via Infotrieve]
  18. Malim, M. H., Hauber, J., Le, S. Y., Maizel, J. V., and Cullen, B. R. (1989) Nature 338, 254-257 [CrossRef][Medline] [Order article via Infotrieve]
  19. Arrigo, S. J., and Chen, I. S.(1991) Genes & Dev. 5, 808-819
  20. Lawrence, J. B., Cochrane, A. W., Johnson, C. V., Perkins, A., and Rosen, C. A.(1991) New Biol. 3, 1220-1232 [Medline] [Order article via Infotrieve]
  21. D'Agostino, D. M., Felber, B. K., Harrison, J. E., and Pavlakis, G. N. (1992) Mol. Cell. Biol. 12, 1375-1386 [Abstract]
  22. Zapp, M. L., and Green, M. R.(1989) Nature 342, 714-716 [CrossRef][Medline] [Order article via Infotrieve]
  23. Daly, T. J., Cook, K. S., Gray, G. S., Maione, T. E., and Rusche, J. R. (1989) Nature 342, 816-819 [CrossRef][Medline] [Order article via Infotrieve]
  24. Daefler, S., Klotman, M. E., and Wong, S. F.(1990) Proc. Natl. Acad. Sci. U. S. A. 87, 4571-4575 [Abstract]
  25. Cochrane, A. W., Chen, C. H., and Rosen, C. A.(1990) Proc. Natl. Acad. Sci. U. S. A. 87, 1198-1202 [Abstract]
  26. Heaphy, S., Dingwall, C., Ernberg, I., Gait, M. J., Green, S. M., Karn, J., Lowe, A. D., Singh, M., and Skinner, M. A.(1990) Cell 60, 685-693 [Medline] [Order article via Infotrieve]
  27. Malim, M. H., Tiley, L. S., McCarn, D. F., Rusche, J. R., Hauber, J., and Cullen, B. R.(1990) Cell 60, 675-683 [Medline] [Order article via Infotrieve]
  28. Olsen, H. S., Nelbock, P., Cochrane, A. W., and Rosen, C. A.(1990) Science 247, 845-848 [Medline] [Order article via Infotrieve]
  29. Iwai, S., Pritchard, C., Mann, D. A., Karn, J., and Gait, M. J.(1992) Nucleic Acids Res. 20, 6465-6472 [Abstract]
  30. Tiley, L. S., Malim, M. H., Tewary, H. K., Stockley, P. G., and Cullen, B. R.(1992) Proc. Natl. Acad. Sci. U. S. A. 89, 758-762 [Abstract]
  31. Kjems, J., Brown, M., Chang, D. D., and Sharp, P. A.(1991) Proc. Natl. Acad. Sci. U. S. A. 88, 683-687 [Abstract]
  32. Daly, T. J., Rusche, J. R., Maione, T. E., and Frankel, A. D.(1990) Biochemistry 29, 9791-9795 [Medline] [Order article via Infotrieve]
  33. Tuerk, C., and MacDougal-Waugh, S.(1993) Gene(Amst.) 137, 33-39 [Medline] [Order article via Infotrieve]
  34. Bartel, D. P., Zapp, M. L., Green, M. R., and Szostak, J. W.(1991) Cell 67, 529-536 [Medline] [Order article via Infotrieve]
  35. Giver, L., Bartel, D., Zapp, M., Pawul, A., Green, M., and Ellington, A. D.(1993) Nucleic Acids Res. 21, 5509-5516 [Abstract]
  36. Peterson, R. D., Bartel, D. P., Szostak, J. W., Horvath, S. J., and Feigon, J.(1994) Biochemistry 33, 5357-5366 [Medline] [Order article via Infotrieve]
  37. Battiste, J. L., Tan, R., Frankel, A. D., and Williamson, J. R.(1994) Biochemistry 33, 2741-2747 [Medline] [Order article via Infotrieve]
  38. Tan, R., Chen, L., Buettner, J. A., Hudson, D., and Frankel, A. D. (1993) Cell 73, 1031-1040 [Medline] [Order article via Infotrieve]
  39. Auer, M., Gremlich, H. U., Seifert, J. M., Daly, T. J., Parslow, T. G., Casari, G., and Gstach, H.(1994) Biochemistry 33, 2988-2996 [Medline] [Order article via Infotrieve]
  40. Cochrane, A. W., Chen, C. H., Kramer, R., Tomchak, L., and Rosen, C. A. (1989) Virology 173, 335-337 [Medline] [Order article via Infotrieve]
  41. Sottrup-Jensen, L.(1993) Biochem. Mol. Biol. Int. 30, 789-794 [Medline] [Order article via Infotrieve]
  42. Schagger, H., and von Jagow, G.(1987) Anal. Biochem. 166, 368-379 [Medline] [Order article via Infotrieve]
  43. Matsudaira, P.(1987) J. Biol. Chem. 262, 10035-10038 [Abstract/Free Full Text]
  44. Huang, H. V.(1983) Methods Enzymol. 91, 318-324 [Medline] [Order article via Infotrieve]
  45. Smith, D. B., and Johnson, K. S.(1988) Gene(Amst.) 67, 31-40 [CrossRef][Medline] [Order article via Infotrieve]
  46. Edelman, A. M.(1987) Annu. Rev. Biochem. 56, 567-613 [CrossRef][Medline] [Order article via Infotrieve]
  47. Weber, K., and Osborn, M.(1969) J. Biol. Chem. 244, 4406-4412 [Abstract/Free Full Text]
  48. Hames, B. D.(1990) in Gel Electrophoresis of Proteins (Hames, B. D., and Richwood, D., eds) 2nd Ed., pp. 1-148, IRL press, Oxford, United Kingdom
  49. Kaufmann, E., Geisler, N., and Weber, K.(1984) FEBS Lett. 170, 81-85 [CrossRef][Medline] [Order article via Infotrieve]
  50. Kalland, K. H., Szilvay, A. M., Brokstad, K. A., S, W., and Haukenes, G.(1994) Mol. Cell. Biol. 14, 7436-7444 [Abstract]
  51. Hammerschmid, M., Palmeri, D., Ruhl, M., Jaksche, H., Weichselbraun, I., Böhnlein, E., Malim, M. H., and Hauber, J.(1994) J. Virol. 68, 7329-7335 [Abstract]
  52. Matsudaira, P., Jakes, R., Cameron, L., and Atherton, E.(1985) Proc. Natl. Acad. Sci. U. S. A. 82, 6788-6792 [Abstract]
  53. Jue, R. A., and Doolittle, R. F.(1985) Biochemistry 24, 162-170 [Medline] [Order article via Infotrieve]
  54. Bond, J. S.(1989) in Proteolytic enzymes (Beynon, R. J., and Bond, J. S., eds), IRL Press, Oxford, United Kingdom

©1995 by The American Society for Biochemistry and Molecular Biology, Inc.