(Received for publication, October 27, 1995; and in revised form, January 17, 1996)
From the
Replication of human immunodeficiency virus type 1 (HIV-1)
requires specific interactions of Tat protein with the trans-activation responsive region (TAR) RNA, a stem-loop
structure containing two helical stem regions separated by a
trinucleotide bulge. The Tat protein contains a basic RNA-binding
region (amino acids 49-57) located in the carboxyl-terminal half
of the protein, and peptides containing this basic domain of Tat
protein can bind TAR RNA with high affinities. We synthesized a
31-amino acid Tat fragment (amino acids 42-72) containing the
basic region and part of flanking regulatory core domain that formed a
specific complex with TAR RNA. Upon UV irradiation (254 nm), this Tat
fragment cross-linked covalently with TAR RNA. Sites of cross-links
were determined on both the TAR RNA and Tat protein fragment by RNA and
protein sequencing, respectively. These results revealed that guanosine
26 of TAR RNA was cross-linked with tyrosine 47 of the Tat peptide. Our
results provide the first physical evidence for a direct amino
acid-base contact in Tat-TAR complex. Recently, orientation of the
Tat-(42-72) was determined in our laboratory by
psoralenTat-(42-72) conjugate (Wang, Z., and Rana, T.
M.(1995) J. Am. Chem. Soc. 117, 5438-5444). On the basis
of our findings, we suggest a model in which Tat binds to TAR RNA by
inserting the basic recognition sequence into the major groove with an
orientation where lysine 41 in the core domain of Tat contacts the
lower stem and Tyr
is close to G
of TAR RNA.
The knowledge of the orientation of Tat and details of other
interactions with TAR RNA in Tat-TAR complex has significant
implications for understanding gene regulation in HIV-1.
The role of RNA-protein interactions is vital for many
regulatory processes, especially in gene regulation where proteins
specifically interact with binding sites found within RNA transcripts.
RNA molecules can fold into extensive structures containing regions of
double-stranded duplex, hairpins, internal loops, bulged bases, and
pseudoknotted structures(1, 2) . Due to the complexity
of RNA structure, the rules governing sequence-specific RNA-protein
recognition are not well understood. Recent structural studies have
demonstrated that RNA-binding proteins interact with RNA in both the
minor and major grooves. For example, two tRNA synthetases (alanine and
glutamine) interact with the acceptor stems of their cognate tRNAs in
the minor grooves(3, 4) . Major groove recognition
takes place between aspartyl-tRNA synthetase and its cognate tRNAs at a
site of local distortion in the RNA helix(5) . Bulge loops or
bulges (unpaired nucleotides on one strand of a duplex) in RNA helices
are potentially important in tertiary folding of RNA and in providing
sites for specific RNA-protein interactions, as illustrated by TFIIIA
of Xenopus(6) and the coat protein of phage
R17(7) . In a recent report, interactions between U1 small
nuclear RNA and the N-terminal domain of the human U1A protein were
mapped by multidimensional heteronuclear NMR studies(8) . These
studies showed that protein-RNA contacts occur at the single-stranded
apical loop of the hairpin and also in the major groove of the helical
stem at neighboring U-G and U-U non-Watson-Crick base
pairs(8) . Crystal structure of the RNA-binding domain of the
U1A spliceosomal protein complexed with an RNA hairpin also revealed
that the loop sequence (AUUGCAC) interacts with the surface of the
four-stranded -sheet (9) . On the basis of NMR data, it
has been shown that TAR (
)RNA in HIV-1 changes its
conformation upon arginine binding(10, 11) . All of
these studies suggest that the diversity of RNA structures plays a
central role in their specific recognition by proteins.
The promoter of the human immunodeficiency virus type 1 (HIV-1), located in the U3 region of the viral long terminal repeat, is an inducible promoter that can be stimulated by the trans-activator protein, Tat(12) . As in other lentiviruses, Tat protein is essential for transactivation of viral gene expression (13, 14, 15, 16) . In the absence of Tat, most of the viral transcripts terminate prematurely, producing short RNA molecules ranging in size from 60 to 80 nucleotides. Jeang et al.(17) reported that integrated HIV-1 promoters did not show a high rate of abortive transcription. Nonetheless, HIV-1 proviruses and integrated long terminal repeats respond efficiently to Tat(17) . The Tat protein is a small, cysteine-rich nuclear protein containing 86 amino acids and comprised of three important functional domains. HIV-1 Tat protein acts by binding to the TAR (trans-activation responsive) RNA element, a 59-base stem-loop structure located at the 5`-ends of all nascent HIV-1 transcripts(18, 19, 20, 21, 22) . Upon binding to the TAR RNA sequence, Tat causes a substantial increase in transcript levels (23, 24, 25, 26, 27) . The increased efficiency in transcription may result from preventing premature termination of the transcriptional elongation complex (28) or from enhancing initiation of transcription(29) . TAR RNA was originally localized to nucleotides +1 to +80 within the viral long terminal repeat(18) . Subsequent deletion studies have established that the region from +19 to +42 incorporates the minimal domain that is both necessary and sufficient for Tat responsiveness in vivo(21, 30, 31) . As shown in Fig. 5, the TAR RNA contains a six-nucleotide loop and a three-nucleotide pyrimidine bulge, which separates two helical stem regions(18, 21, 22, 25) . The trinucleotide bulge is essential for high affinity and specific binding of the Tat protein(32, 33) .
Figure 5:
Mapping
of cross-linked base in the RNA-protein cross-links by alkaline
hydrolysis. A, analysis of 5`-end-labeled TAR RNA and
cross-link: B. cereus ladder of TAR RNA (lane 1);
hydrolysis ladder of TAR RNA (lane 2); hydrolysis ladder of
cross-linked RNA-peptide complex (lane 3). The sequence of TAR
RNA from C to U
is labeled. A gap in the
sequence is obvious after the U
residue, indicating that
G
is the cross-linked base. B, sequence and
secondary structure of wild-type TAR RNA used in this study. TAR RNA
spans the minimal sequences that are required for Tat responsiveness in vivo(21) and for in vitro binding of
Tat-derived peptides(38) . Wild-type TAR contains two
non-wild-type base pairs to increase transcription by T7 RNA
polymerase. U25 represents the nucleotide at which the
hydrolysis of the 5`-end-labeled cross-linked RNA-peptide complex was
stopped. The arrow indicates the location of guanosine 26,
which is the cross-linked base in TAR RNA (shown in boldface).
The Tat protein
contains a basic RNA-binding region (amino acids 49-57) located
in the carboxyl-terminal half of the
protein(19, 34, 35, 36, 37) .
Peptides containing the basic domain (residues 49-57) of Tat
protein can bind TAR RNA with high
affinities(36, 38, 39, 40, 41, 42, 43, 44) .
We used a 31-amino acid Tat fragment (amino acids 42-72) to form
a specific complex with TAR RNA. Upon UV irradiation, this Tat fragment
formed a covalent cross-link with TAR RNA. Sites of cross-links were
determined on both the TAR RNA and Tat protein fragment by RNA and
protein sequencing, respectively, which revealed that Tyr of Tat is close to G
of TAR RNA. Our results provide
the first physical evidence for a direct amino acid-base contact in
Tat-TAR complex.
Figure 1:
Separation of covalently cross-linked
Tat-(42-72)TAR complex from TAR RNA by denaturing 8 M urea-20% acrylamide gel. Lanes 1 and 2, RNA
without (lane 1) and with (lane 2) UV irradiation; lanes 3 and 4, RNA with Tat peptide without (lane
3) and with (lane 4) UV irradiation; lane 5, RNA
and Tat peptide irradiated separately and then combined; lane
6, treatment of UV-irradiated products (lane 4) with
Proteinase K. Reaction mixtures contained 0.25 µM of
5`-
P-end-labeled TAR RNA, 1.9 µM Tat-(42-72), and 100 mM NaCl in 25 mM Tris-HCl buffer, pH 7.4. XL, cross-linked RNA-peptide
complex.
Figure 2:
Effect of Tat peptide concentration on
UV-induced RNA-peptide cross-link formation. P-5`-end-labeled TAR RNA (0.25 µM) was
incubated with Tat peptide at concentrations of 0.13, 0.25, 1.25, 2.5,
and 5.0 µM. The concentration ratios of peptide to RNA are
shown in the figure. For details of reaction conditions see
``Experimental Procedures.'' XL1, cross-linked
RNA-peptide complex, major photoproduct; XL2, cross-linked
RNA-peptide complex, minor photoproduct.
The photocross-linking reaction between the Tat peptide and TAR RNA was also dependent on time of irradiation. The yields of cross-linked RNA-peptide complex were increased with an increase in time of irradiation (Fig. 3). In this experiment, similar to that shown in Fig. 2, extended time of irradiation also resulted in the formation of XL2 at 30 and 40 min. This second minor photoproduct could be the result of nonspecific binding of the peptide to RNA (at higher concentrations of peptide) or nonspecific association of photodamaged RNA and peptide after longer irradiation times. Further characterization of this minor photoproduct was not carried out in this study.
Figure 3:
Time course of cross-linking reaction of
Tat-(42-72)TAR complex. The reaction mixture contained 0.25
µM of
P-5`-end-labeled TAR RNA and 1.9
µM Tat peptide and was UV irradiated as described under
``Experimental Procedures.'' After the indicated irradiation
times, aliquots were withdrawn and analyzed on 8 M urea-polyacrylamide gel. XL1, cross-linked RNA-peptide
complex, major photoproduct; XL2, cross-linked RNA-peptide
complex, minor photoproduct.
Figure 4:
Specificity of the cross-linking reaction
determined by competition assays. Complexes were formed between 0.25
µMP-labeled TAR RNA and 1.9 µM of Tat-(42-72) in the presence of unlabeled wild-type TAR
RNA (A) or bulgeless mutant TAR RNA (B).
Concentrations of the competitor RNA in lanes 2, 3, 4, 5, 6, and 7 were 0, 0.25, 0.5,
2.5, 5, and 10 µM, respectively. Lane 1 was a
control RNA-peptide complex without UV irradiation. C,
quantitative analysis of competition experiments. The fraction of RNA
in RNA-peptide cross-link was determined by PhosphorImager analysis as
described under ``Experimental Procedures.''
,
wild-type TAR RNA competitor;
, bulgeless mutant TAR
RNA.
Figure 6: Identification of the amino acid in the Tat-(42-72) sequence that cross-links to TAR RNA. A, amino acid sequence of the cross-linked peptide. A nonstandard amino acid (X) was identified during the 6th cycle of N-terminal sequencing. X in the sequence indicates the cross-linking site that corresponds to Y in B. B, amino acid sequence of Tat-(42-72) peptide used in the cross-linking experiments with TAR RNA. C, schematic representation of the functional domains of HIV-1 Tat protein. Various numbers refer to the amino acid positions, and the arrow represents the separation of the region contributed by adjacent exon(62) .
Ultraviolet-induced cross-linking of RNA to proteins is a widely used technique to study in vitro and in vivo RNA-protein interactions(49, 50, 51) . UV irradiation with sufficient intensity generates a highly reactive species of RNA, which reacts with protein and organic molecules involved in making direct contacts with RNA(52, 53) . To identify specific RNA-protein contacts, we irradiated TAR RNA and Tat-(42-72) protein complex with UV light and observed the formation of a covalent bond between RNA and protein. Formation of this covalently cross-linked product was dependent on the concentration of Tat peptide and irradiation time ( Fig. 2and Fig. 3). Our competition and control experiments showed that a specific RNA-protein complex formation between TAR RNA and Tat fragment was necessary for photo-crosslinking reactions (Fig. 4).
To locate the
cross-link sites in TAR RNA and the Tat peptide, we prepared the
RNA-protein cross-link on a preparative scale, purified the cross-link,
and analyzed it by RNA and protein sequencing. Alkaline hydrolysis of
5`-end-labeled cross-links indicated that a single nucleotide,
G, in TAR RNA was involved in covalent interaction (Fig. 5, A and B). The absence of bands in the
hydrolysis ladder after U
from the 5`-end of RNA indicates
that the RNA fragments after U
are covalently linked to
the Tat peptide and migrate more slowly to create a gap in the standard
hydrolysis ladder. Our results clearly demonstrate that cross-linking
occurs at at G
in TAR RNA.
Peptide sequencing on a
tryptic fragment of the cross-link complex was accomplished by Edman
degradation chemistry. The sequencing data indicate that cross-linking
occurred at Tyr of the Tat peptide. As shown in Fig. 6, peptide sequencing identified a nonstandard amino acid X at the 6th position of the cross-linked peptide,
Ala-Leu-Gly-Ile-Ser-X-Gly-Arg-Lys-Lys. This sequence
corresponds to the region encompassing amino acids 42-51 in HIV-1
Tat protein (Fig. 6). The nonstandard amino acid most likely
corresponds to a photomodified tyrosine. Sequencing of proteins by
Edman degradation chemistry requires unmodified amino and carbonyl
groups in the backbone of the peptide. Evidence that the Edman
sequencing reaction was able to continue through Tyr
indicates that cross-linking does not occur at these
locations(54) . Therefore, we conclude that the aromatic side
chain or C-
atom in the peptide backbone of Tyr
is
involved in the covalent cross-link formation with TAR RNA.
It has
been shown by a number of groups that Tat-derived peptides that contain
the basic arginine-rich region of Tat are able to form in vitro complexes with TAR
RNA(36, 38, 39, 40, 41, 42, 43, 44) .
Recently, Churcher et al.(44) published a detailed
comparative study arguing that Tat peptides can mimic the binding
affinity and specificity of Tat protein. Results from that study showed
that the addition of amino acid residues from the core region of the
Tat protein to the arginine-rich domain-containing peptides increased
binding specificities(44) . To achieve specific RNA binding by
a Tat fragment, we used a Tat peptide, Tat-(42-72), that
contained an RNA-binding domain and six amino acids from the core
domain of the Tat protein. In this report, our cross-linking results
have established that this Tat-(42-72) peptide forms a specific
covalent photocross-link to TAR RNA where Tyr of the
peptide contacts G
of the RNA.
What is the biological
relevance of these findings? A number of studies showed that the
immediate stem nucleotide base pairs flanking the bulge region of TAR
RNA are required for Tat binding and trans-activation(44, 55, 56) .
During a detailed mutational analysis of TAR RNA, it was reported that
a change of the G-C
base pair to
C
-G
base pair resulted in only 12% trans-activation by HIV-1 Tat(56) . These reports
strongly support our finding that G
is directly involved
in sequence-specific recognition and trans-activation by HIV-1
Tat protein. However, Tat protein mutants where Tyr
was
substituted with Ala or His were functional for trans-activation(57, 58) . These data raise
the possibility that Tyr
is not essential for RNA
recognition and that the cross-link formation between Tyr
and G
could be the result of close proximity and
favorable photochemistry. To address this question, we carried out
cross-linking experiments with a Tat fragment lacking Tyr
,
Tat-(48-72), which binds TAR RNA with high
affinities(38, 39, 44) . UV irradiation of
TAR RNA complexed with Tat-(48-72) did not yield any specific
RNA-protein cross-link products (data not shown). These results support
our model of Tat-TAR interactions where the basic recognition sequence
of Tat is located in the major groove of TAR RNA, bringing Tyr
in close vicinity of G
(Fig. 7). The
cross-link formation between G
and Tyr
is
likely the result of close proximity, favorable orientation, and
photoreactivity of tyrosine.
Figure 7:
A schematic illustration to show a
three-dimensional model of the HIV-1 Tat binding site of TAR RNA and
the location of the amino terminus of Tat-(42-72) and
interactions of Tyr of the peptide with G
of
the RNA. Orientation of the Tat-(42-72) was determined in our
laboratory by psoralen
Tat-(42-72) conjugate(48) .
The TAR RNA structure is based on NMR data(63) . Ribbon
structure of TAR RNA is shown in five dark lines. The basic
region of Tat-(47-57) is represented as a barrel positioned in the wide major groove, and the N-terminal region
containing Tat-(42-46) is drawn as a line. Tyrosine 47
is shown directly above the G
of TAR RNA
(indicated in black) to demonstrate a close proximity between
Tyr
and G
. As determined by psoralen-Tat
cross-linking experiments, the amino terminus of Tat-(42-72)
contacts, or is close to, uridine 42 in the lower stem of TAR
RNA(48) ; the amino terminus of the peptide is labeled as
NH
, and its proximal base, U
of TAR RNA, is
indicated in black. Structures of TAR RNA were visualized
using Insight II software on an IRIS work
station.
How does Tat interact with TAR RNA?
Several lines of evidence suggest that Tat protein contacts TAR RNA in
a widened major groove. In a recent study from our laboratory, we used
a rhodium complex, bis(phenanthroline)(phenanthrenequinone
diimine)-rhodium(III) (Rh(phen)phi
), to
probe the effect of bulge bases on the major groove width in TAR
RNA(59) . This metal complex does not bind double helical RNA
or unstructured single-stranded regions of RNA. Instead, sites of
tertiary interaction that are open in the major groove and accessible
to stacking are targeted by the complex through photoactivated
cleavage(60) . The sites targeted by the rhodium complex have
been mapped to single nucleotide resolution on wild-type TAR RNA and on
several mutants of the TAR RNA containing different numbers of mismatch
bases in the bulge region(59) . A strong cleavage at residues
C
and U
was observed on the wild-type TAR RNA
and in mutant TAR RNA containing two mismatch bases in the bulge. No
cleavage at C
and U
was observed in a
bulgeless TAR RNA and in a one-base bulge TAR RNA. Our studies
establish two important factors involved in Tat-TAR recognition. (i)
There is a correlation between major groove opening and Tat binding. At
least a two-base bulge is required for major groove widening and other
conformational changes to facilitate Tat binding. This cannot be
accomplished by a single base bulge. (ii) The Tat fragment
Tat-(42-72) occupies the major groove of TAR RNA and abolishes
access of the rhodium complex. On the basis of chemical modification
and gel mobility studies, a similar model was suggested earlier by
Weeks and Crothers(55) . Last, Hamy et al.(61) carried out site-specific modifications of functional
groups on TAR RNA and showed that Tat forms multiple specific hydrogen
bonds to a series of dispersed sites displayed in the major groove.
To determine the relative orientation of the nucleic acid and protein in the Tat-TAR complex, we have devised a new method based on psoralen photochemistry(48) . We synthesized a 30-amino acid fragment containing the arginine-rich RNA-binding domain of Tat (residues 42-72) and chemically attached a psoralen at the amino terminus. Upon near ultraviolet irradiation (360 nm), this synthetic psoralen peptide cross-linked to a single site in the TAR RNA sequence. The RNA-protein complex was purified, and the cross-link site on TAR RNA was determined by chemical and primer extension analyses. Our results show that the amino terminus of Tat-(42-72) contacts, or is close to, uridine 42 in the lower stem of TAR RNA(48) .
On the basis of the above studies, we suggest a model in which Tat
binds to TAR RNA by inserting the basic recognition sequence into the
enlarged major groove with an orientation where lysine 41 in the core
domain of Tat contacts the lower stem and Tyr is close to
G
of TAR RNA (Fig. 7). These findings are
intriguing and suggest a possible mechanism of RNA recognition by Tat.