(Received for publication, April 13, 1995; and in revised form, July 12, 1995)
From the
Human cDNA clones encoding the UUAG-binding heterogeneous nuclear ribonucleoprotein (hnRNP) D0 protein have been isolated and expressed. The protein has two RNA-binding domains (RBDs) in the middle part of the protein and an RGG box, a region rich in glycine and arginine residues, in the C-terminal part (``2xRBD-Gly'' structure). The hnRNP A1, A2/B1, and D0 proteins, all possess common features of the 2xRBD-Gly structure and binding specificity toward RNA. Together, they form a subfamily of RBD class RNA binding proteins (the 2xRBD-Gly family). One of the structural characteristics shared by these proteins is the presence of several isoforms presumably resulting from alternative splicing. Filter binding assays, using the recombinant hnRNP D0 proteins that have one of the two RBDs, indicated that one RBD specifically binds to the UUAG sequence. However, two isoforms with or without a 19-amino acid insertion at the N-terminal RBD showed different preference toward mutant RNA substrates. The 19-amino acid insertion is located in the N-terminal end of the first RBD. This result establishes the participation of the N terminus of RBD in determining the sequence specificity of binding. A similar insertion was also reported with the hnRNP A2/B1 proteins. Thus, it might be possible that this type of insertion with the 2xRBD-Gly type RNA binding proteins plays a role in ``fine tuning'' the specificity of RNA binding. RBD is supposed to bind with RNA in general and sequence-specific manners. These two discernible binding modes are proposed to be performed by different regions of the RBD. A structural model of these two binding sites is presented.
Ribonucleoproteins have been found in many macromolecular
complexes that have vital biological roles, such as heterogeneous
nuclear ribonucleoprotein (hnRNP)()(1) , small
nuclear ribonucleoprotein (snRNP), ribosomes, and signal recognition
particles. These complexes are composed of RNAs and proteins, many of
which show RNA binding activities. One of the most common groups of RNA
binding proteins is the RBD class proteins(2) . They possess a
CS-RBD (consensus sequence-RNA binding domain) motif, which is
typically 80-90 amino acids. Two short sequences, RNP 2 octamer
and RNP 1 hexamer, have been found to be conserved among different
RBDs. Several RBDs are commonly found in tandem within one molecule. It
is also common to find an auxiliary RNA-binding motif present in
addition to RBDs within the same molecule. Thus, RBD class RNA binding
proteins typically possess several RNA-binding domains as modules. It
has not been well studied, however, how these modular domains
participate together in binding with RNA.
hnRNP proteins are a subset of proteinaceous components found in hnRNP, which is a large complex formed by the nascent pre-mRNA and proteins(1, 3) . More than 20 proteins have been identified as hnRNP proteins on two-dimensional protein gel electrophoresis. Although structures of all of these proteins are not known, many contain RBDs, which are the regions responsible for interaction with RNA. Some hnRNP proteins have been implicated in the processing of pre-mRNA. Anti-hnRNP C protein antibody inhibited pre-mRNA splicing in vitro(4, 5) . Several hnRNP proteins were reported to be associated in spliceosomal complexes(6, 7) . Finally, the hnRNP A1 and A2/B1 have been shown to influence the splice site selection(8, 9, 10) . These observations suggest that hnRNP proteins may have a role in specific RNA processing reactions by virtue of sequence-specific RNA binding in addition to nonspecific general RNA binding. In spite of this expectation, only a small number of hnRNP proteins have been shown to bind to RNA in a sequence-specific manner.
In a previous study, we showed that
several different proteins from the HeLa cell nuclear extract
specifically bind to single-stranded d(TTAGGG) and
r(UUAGGG)
oligonucleotides (11) . These proteins
have apparent molecular masses of 26, 28, 37, 39, 41, 50, and 55 kDa.
Amino acid sequencing of the purified proteins indicated that the 26-,
28-, and 50-kDa proteins are the hnRNP A1 protein, A2/B1 protein, and
nucleolin, respectively. The 39- and 41-kDa proteins were
immunoreactive to anti-hnRNP D monoclonal antibodies. On
two-dimensional gel electrophoresis, they migrated as spots near, but
separate from, the hnRNP D protein. We suggested that the 39- and
41-kDa proteins are identical or closely related to the hnRNP D
protein. Similarly, the 37-kDa protein was suggested to be identical or
closely related to the hnRNP E protein and was referred to as hnRNP E0.
In this study, we will refer to the 39- and 41-kDa hnRNP D-like
proteins having UUAGGG-binding activity as hnRNP D0 proteins.
The hnRNP A1, A2/B1, D0, and E0 proteins bound to UUAGGG repeats but not to single base-substituted oligoribonucleotides, such as CUAGGG-, UCAGGG-, UUGGGG-, or UUAAGG repeats. Thus, their binding to these substrates is exceptionally sequence-specific compared with other hnRNP proteins. This feature offers an opportunity to study the molecular interaction between RBD and RNA. In this study, we first examined the cDNA structure of the hnRNP D0 proteins. Results revealed that the hnRNP D0 protein has a modular structure in common with the hnRNP A1 and A2/B1 proteins. We next examined the RNA binding properties of each modular domain of the hnRNP D0 proteins. A model for molecular interaction between the protein and RNA is proposed based upon these structural and functional analyses.
A
HeLa cDNA library was constructed using 4 µg of HeLa
poly(A) RNA using
EXlox vector (Novagen). A total
of 2
10
plaques were screened by the two E2BP
cDNA-specific probes. Nine clones, cDx1-9, were identified as
being positive by both probes. The clones were sequenced by an
Autocycle Sequencing kit and an ALF. DNA Sequencer (Pharmacia Biotech
Inc.).
The
predicted amino acid sequences deduced from the E2BP cDNA revealed that
this protein has two RBDs(14) . Two sets of PCR primers, S1:S2
and S3:S4 were prepared according to the reported sequences of each
RBD. Accordingly, two DNA fragments derived from E2BP cDNA were
obtained by reverse transcription-PCR from the two primer sets using
the total RNA of HeLa cells. A total of 200,000 clones of the HeLa cell
cDNA library were prepared by oligo(dT)-priming and were screened by
these E2BP-specific probes. Nine different clones were determined to be
double-positive by the probes. The longest clone, cDx7, was sequenced
completely, identifying a 1589-bp cDNA insert (Fig. 1). A long
open reading frame, bound by TAG at 226-228 and TAA at
1204-1206, was identified. A polyadenylation signal sequence,
AATAAA, was noted at 1541-1546. ATG at 286-288 was
tentatively assigned as an initiating codon. It was predicted that the
open reading frame encodes a 306-amino acid protein with a calculated
molecular weight of 32,800. All five amino acid sequences identified in
peptides, obtained from the purified hnRNP D0 proteins, were found in
the predicted amino acid sequence, except that one amino acid
substitution was noted (Fig. 1). As will be described later, the
recombinant protein of this cDNA is immunoreactive to an anti-hnRNP D
monoclonal antibody and binds to the d(TTAGGG) and
r(UUAGGG)
oligonucleotides specifically. Therefore, we
concluded that the cDNA clones we have isolated are for the hnRNP D0
proteins.
Figure 1: Nucleotide sequence of the hnRNP D0 cDNA, cDx7, and predicted amino acid sequence. The nucleotide sequence is numbered from the 5`-end of cDx7 cDNA. A putative initiation codon ATG at 286-288 and the stop codon of the open reading frame TAA at 1204-1206 are shown by boldface letters and are underlined. An in-frame, upstream stop codon TAG at 226-228 and polyadenylation signal sequence aataaa at 1541-1546, are underlined. The amino acid sequences of tryptic peptides that were obtained from purified hnRNP D0 proteins (11) are indicated by shading. Serine at position 150 was identified as arginine in the previous study (11) (dotted). The two CS-RBDs, RBD-1 and RBD-2, are boxed. A 57-bp insertion that is missing in some isoforms is shown by a box inside the RBD-1. The RNP 1 and RNP 2 sequences are indicated by boldface italics and are underlined. The 5`-region encodes an amino acid sequence that is rich in alanine and glycine (indicated by N-ter). Two short motifs of GGSA and EGA, found repeatedly in tandem (amino acids 20-29 and 58-66), are indicated by italics and are underlined. The 3`-region encodes an amino acid sequence rich in glycine (indicated by Gly-rich). The three RGG motifs in this region are shown by italics and are underlined. The position at which the 147-bp insertion is found in other isoforms is indicated by an arrowhead (positions 1138-1139). The differences between the reported E2BP cDNA sequence (14) and this sequence are as follows: E2BP cDNA has a ``t'' insertion at positions 1202 and 1203, ``gg'' deletion at positions 1277 and 1278, and a 139-bp deletion at positions 1419-1557.
The nucleotide sequence of cDx7 is different from that of E2BP in several ways. cDx7 has a longer 5` upstream sequence than E2BP, allowing us to locate the most probable initiating codon. Several nucleotides were missing or replaced by other nucleotides in E2BP, resulting in changes to the open reading frame and the predicted amino acids. The detail of discordance is presented in Fig. 1. These discordant sequences were repeatedly examined with cDx7 and with our other cDNA clones, giving the same results.
The predicted amino acid
sequence of cDx7 can be divided into three parts. The N-terminal 69
amino acids forms an acidic region that is unique to this protein.
Alanine and glycine are abundant in this region (27 and 29%,
respectively). Two short motifs of GGSA and EGA are found repeatedly in
tandem (amino acids 20-29 and 58-66). The Chow and Fasman
algorithm predicts that this region contains four -helices. The
second portion, occupying the central and major part of the protein,
consists of two typical RBDs. Two RBDs are arranged in tandem (amino
acids 70-173 and 174-256) without any apparent spacer
sequence between them. Further analysis of the structure of this
portion will be presented later. The third portion, the C-terminal
third of the protein, starts after a short repeat of glutamine (amino
acids 262-268) and is characterized by high contents of glycine
(32% of amino acids 269-306). In this region, three repeats of
RGG are noted (amino acids 272-274, 282-284, and
334-336). RGG has been found in several RBD class RNA binding
proteins(15) . It has been suggested that it is an auxiliary
motif responsible for protein-protein interaction or nonspecific
nucleic acid binding(16) .
Figure 2: Structures of the hnRNP D0 cDNA isoforms. Structures and the predicted coding regions of the four hnRNP D0 cDNAs; cDx7, -4, -9, and -8 are schematically shown. Deduced amino acid sequences comprise four domains: N-ter, RBD-1, RBD-2, and Gly-rich. A 57-bp insertion in RBD-1 that is present in cDx7, -9, and -8, is illustrated by heavily shaded boxes. A 147-bp insertion in the C-terminal Gly-rich region found in cDx4 and -9 is indicated by hatched boxes. Deduced amino acid sequences of these two insertions are also shown. A motif of eight amino acids was repeated in tandem twice (italic with underline). A 107-bp insertion found in the untranslated region of cDx8 is also shown. The nucleotide sequence of 107-bp insertion in the 3`-untranslated region is as follows: 5`-cgggaacttcattgcaggccctgtgtcgcgctgacttcagattctcacaggcccgctcaatgcggacagggtaacgagatgctccacgctctcgaatgctgccgtttg-3`.
Several hnRNP genes have been shown to produce variant mRNAs resulting from alternative splicing. This mechanism expands the complexity of hnRNP proteins. The differences found in our cDNA clones most likely comes from alternative splicing as well, although at present we do not have any direct evidence for it. We have not isolated cDNA of the -/- type. Thus, we could expect at least three different isoforms of mRNAs with or without the 57- and 147-bp insertions. The shortest +/- type encodes 306 amino acids with a molecular mass of 32.8 kDa. The intermediate -/+ type encodes 336 amino acids with a molecular mass of 36.2 kDa. Finally the longest +/+ mRNA predicts 355 amino acids with a molecular mass of 38.4 kDa. A previous SDS-polyacrylamide gel electrophoresis analysis identified proteins of apparent molecular masses of 41 kDa (possibly doublet) and 39 kDa as anti-hnRNP D monoclonal antibody-immunoreactive proteins in a TTAGGG-binding protein preparation(11) . The presence of isoform mRNAs described above may explain the presence of native proteins with different apparent molecular masses. The proteins' mobility on SDS-polyacrylamide gel electrophoresis was slower than expected from the calculated molecular mass values. This may be in part due to the basic nature of these proteins (the calculated pI is about 8.8).
All of these proteins are characterized as having two RBDs in tandem in the N terminus (hereafter referred to as RBD-1 and RBD-2 from the N terminus) and a Gly-rich region, which typically contains the RGG motif, in the C terminus. The term ``2xRBD-Gly group RNA binding protein'' was coined to designate these proteins on the basis of their common structural organization(1) . A compilation of an additional number of proteins, including hnRNP D0, is shown in Fig. 3, and these new members support the idea of the presence of this group of proteins. The RBD generally consists of about 90 amino acids. Two short stretches of sequence, RNP 1 and RNP 2 (eight and six amino acids, respectively) are highly conserved among the different RBD class RNA-binding proteins. Regions other than RNP 1 and 2 are less conserved. Significantly, proteins listed in Fig. 3have conserved amino acid sequences, not only in RNP 1 and 2 but throughout the RBD. This long range conservation of amino acid sequences, along with a common structural organization, reinforces the presence of the 2xRBD-Gly group RNA binding proteins.
Figure 3: Alignment of the amino acid sequences of two RBDs of proteins having the 2xRBD-Gly structure. The amino acid sequences of RBD-1 and RBD-2 of human hnRNP D0 (this study), type A/B hnRNP(19) , hnRNP A1(39) , hnRNP A2/B1 (29) , Drosophila HRP40(22, 23) , and Xenopus hnRNP A2 (40) are aligned manually. Identical and conserved amino acids among these proteins are marked by heavy and light shading, respectively. Positions of secondary structure are deduced from the study of hnRNP A1 (24) and indicated by underlines. RNP 2 hexamer and RNP 1 octamer are shown in boldface letters. Insertion of short peptides found in isoforms of hnRNP D0 and A2/B1 are shown using boxes.
Recently, an NMR study of the N-terminal RBD of
the human hnRNP A1 was reported(24) . The study indicated that
the hnRNP A1 RBD also forms four-stranded anti-parallel -sheets as
reported repeatedly with other RBDs(25, 26) . Because
2xRBD-Gly type proteins are so closely related to each other, we are
able to tentatively assign the secondary structures determined with the
hnRNP A1 to other members of this group of proteins (Fig. 3).
According to it, the 19-amino acid insertion of the hnRNP D0 found in
RBD-1 is located at the N terminus of
1 of RBD-1.
Figure 4:
Binding properties of recombinant hnRNP D0
proteins. A, structures of recombinant the hnRNP D0 proteins
are schematically shown. Definition of the domain is the same as
described in Fig. 1and Fig. 2. GST, glutathione S-transferase, which is fused with RBD-1 and -2. GD1L and GD1H
are RBD-1 fused to glutathione S-transferase, with (H) or without (L) the 19-amino acid insertion (heavily shaded boxes) at the N terminus of RBD-1. GD2 is
RBD-2 fused with glutathione S-transferase. D12L and D12H are
RBD-1 and -2, with (H) or without (L) the 19-amino
acid insertion at the N terminus of RBD-1. C4 and C7 are different
isoforms of the whole hnRNP D0 protein. The 19-amino acid insertion in
RBD-1 is present in C7 but not in C4. A 49-amino acid insertion in the
Gly-rich region is present in C4 but not in C7 (hatched
boxes). B, the filter-binding assays were carried out and
evaluated as described under ``Materials and Methods'' with
GD1L (part a), GD1H (part b), GD2 (part c), D12L (partd),
D12H (parte), C4 (partf), and C7 (part g).
Oligoribonucleotide probes were as follows: J , rH4 (r(UUAGGG)4);
H
, rH4X1 (r(UUGGGG)4); F
, rECGF
r(GCAGCCUUGAUGACCUCGUGAACC).
Immunoblotting analysis of the recombinant proteins with an anti-hnRNP D monoclonal antibody 5B9 showed that GD2 is immunoreactive but that GD1H and GD1L are not (data not shown). This result supports the conclusion that the clones we isolated are for the hnRNP D0 and suggests that the epitope for the monoclonal antibody 5B9 is present in RBD-2.
Recombinant proteins were subjected to a filter binding assay
to analyze their binding activities. Binding experiments were carried
out by incubating variable amounts of recombinant proteins with
constant amounts of oligonucleotides. Under these conditions,
oligonucleotide concentrations (typically 1-10 nM) were
always much lower than protein concentrations. The apparent K of binding reactions was estimated by the
concentration of proteins at which half maximum binding was obtained.
The oligoribonucleotide probes used in these assays were rH4
(r(UUAGGG)
), rH4X1 (r(UUGGGG)
), and rECGF
(r(GCAGCCUUGAUGACCUCGUGAACC)). rECGF was used as an unrelated sequence
having the same length as rH4. Our previous study indicated that the
purified HeLa cell proteins bind to rH4 but not to rH4X1 or rECGF. The
following results were also obtained with DNA versions of these
oligonucleotides, although the binding affinity was lower than that of
RNA oligonucleotides (data not shown).
First, mutant recombinant
proteins, having only one of the two RBDs, were examined. GD1L bound to
rH4 with high binding affinity (the K is about 200
nM). In contrast, GD1L bound to either rH4X1 or rECGF much
less efficiently (Fig. 4B (part a)). This
specificity found between rH4 and rH4X1 indicated that a single RBD can
strictly discriminate a single base change in the oligonucleotide. A
recombinant protein of only glutathione S-transferase,
excluding hnRNP D0, did not show any binding activity (data not shown).
This result further confirms that the cDNA clones we isolated are for
the UUAG-specific binding protein hnRNP D0.
Unexpectedly,
sequence-specific binding observed with GD1L was detected in a somewhat
different manner with GD1H (Fig. 4B (partb)). GD1H bound to rH4 with a K of
about 1.1 µM. GD1H also bound to rH4X1 with nearly the
same efficiency. Binding to rECGF was more efficient than GD1L showed.
The major difference between GD1H and GD1L is the presence or absence
of the 19-amino acid sequence at the N terminus of RBD-1. This result
suggests that the presence of this insertion changes the preference of
sequences to which hnRNP D0 proteins bind in a sequence-specific
manner. GD2 showed intermediate binding properties between GD1L and
GD1H. GD2 bound to rH4 with a K
of about 320
nM. It bound to rH4X1 and rECGF to some extent, although the
specificity discriminating between rH4 and rH4X1 was higher than that
of GD1H (Fig. 4B (partc)).
The
implication that the 19-amino acid insertion at RBD-1 may have a role
in ``sequence preference'' was also suggested by the results
of other recombinant proteins (Fig. 4B (partsd-g)). D12L, D12H, C4, and C7 bound to rH4 at a K of about 490, 880, 60, and 34 nM,
respectively. No significant difference in the K
of binding between rH4 and proteins was observed in the presence
or absence of the 19-amino acid insertion. However, recombinant
proteins with the insertion, C7 and D12H, also bound to rH4X1 as
tightly as to rH4. In contrast, proteins without the 19-amino acid
insertion, C4 and D12L bound to rH4X1 less efficiently. Therefore, all
binding results are compatible with the idea that the 19-amino acid
insertion modifies the sequence preference of hnRNP D0 protein
resulting in the accommodation of rH4X1 as well as rH4.
Concerning
proteins with several RNA-binding domains, it is of special interest to
know whether or not one molecule of ligand bound to several domains
simultaneously. From binding experiments, rH4 binds to one RBD with a K of 0.2-1 µM, and to two RBDs
with a K
of 0.5-0.9 µM. If both
RBD-1 and -2 can bind to rH4 at the same time, the K
for this binding should be much less than that of a single RBD.
However, the K
values were almost the same. Thus,
it was concluded that RBD-1 and -2 of the hnRNP D0 protein cannot bind
to rH4 simultaneously (numerical treatment for this discussion is
available on request).
We have examined the structure of the hnRNP D0 protein cDNA and have studied the binding properties of recombinant proteins. Results showed that this protein is a member of the 2xRBD-Gly type RNA binding proteins. The notion of grouping the 2xRBD-Gly family is not based simply upon mere resemblance of the proteins but upon detailed structural and functional analysis as discussed below. A comparison of several cDNA clones revealed the presence of different isoforms of proteins, which are presumably derived from alternative splicing. One type of these different isoforms was due to a 19-amino acid insertion at the N terminus of RBD-1. Recombinant proteins having one or more combinations of modular domains were expressed. A filter binding assay of these mutant recombinant proteins with oligonucleotides clearly showed that a single RBD can bind to RNA sequence-specifically. In addition, ``sequence preference'' of the binding was found to be influenced by the presence or absence of the amino acid insertion in RBD-1.
Besides this macroscopic similarity among the hnRNP A1, A2/B1, and D0 proteins, nucleotide sequences of RBDs are also highly conserved (Fig. 3). RBD is made up of about 80-90 amino acids. However, only two relatively short amino acids sequences, RNP 1 and 2, are highly conserved among different RBD class RNA binding proteins(2, 32) . Most of the RBD sequences other than RNP 1 and 2 are even far less conserved. From this point, it is remarkable that the comparison of RBDs derived from hnRNP A1, A2/B1, and D0 has shown a significant identity throughout the RBD regions (Fig. 3; see Fig. 5A for a comparison between the 2xRBD-Gly and non-2xRBD-Gly proteins). The common structural organization, producing similar isoform proteins among hnRNP A1, A2/B1, and D0 and highly conserved amino acid sequences throughout RBDs, strongly suggests that these genes belong to a closely related gene family having a common and old ancestral gene. This notion is consistent with the fact that invertebrates like Drosophila contain only 2xRBD-Gly type hnRNP proteins(22, 33) , whereas vertebrates have many different types of hnRNPs.
Figure 5:
Mapping of general and specific RNA
binding sites on a structural model of the hnRNP D0 protein. A, a comparison of amino acid sequences of N-terminal (RBD-1) and C-terminal (RBD-2) RBDs of snRNP U1A
protein(41) , the hnRNP D0 protein, and the hnRNP A1 protein.
Assignment of secondary structures is from (24) and (25) ). Positions of secondary structures are marked by underlining. The prediction of the secondary structure with
the hnRNP D0 protein is based solely upon sequence similarity to the
hnRNP A1 protein. RNP 2 hexamer and RNP 1 octamer are
shown in boldface letters. B, the mapping of
conserved amino acids on a structural model of the hnRNP D0 RBD-1. The
four-stranded -sheets model of the hnRNP D0 protein RBD-1 is
deduced from a NMR study of the hnRNP A1 protein(24) , the
hnRNP C protein(26) , and an x-ray crystallography study of
snRNP U1A protein(25) . Each circle corresponds to an
amino acid residue. Amino acids of hnRNP D0 that are found identical or
conserved among all of the snRNP U1A, hnRNP D0, and hnRNP A1 proteins
in A. are shown on a structure model by stippled
circles. Amino acids that are found identical or conserved between
the hnRNP D0 and hnRNP A1 proteins but not with snRNP U1A in A, are shown by filled circles. Highly conserved
aromatic amino acids in
1 and
3 and a basic amino acid in
loop
2
3 are indicated using squares. F and R represent phenylalanine and arginine, respectively. Amino
acids present in
-helices A and B are not shown for clarity. C, a model of the distribution of general (lightly
shaded) and specific (heavily shaded) binding sites of
RBD. The positions of the N- and C-terminal portions of RBD (N-ter and C-ter), the amino acid insertion found in isoforms of
hnRNP D0 (Ins), and the RNA substrate bound with RBDs (thick line) are also schematically shown. These positions are
not experimentally determined with hnRNP D0 but rather deduced from
those of the U1A protein (37) (see
``Discussion'').
Recently, Oubridge et al.(37) reported a crystal structure of the RBD of the snRNP
U1A protein complexed with an oligoribonucleotide of U1 snRNA hairpin
II. Fig. 5A shows the comparison among RBDs of U1A
protein, the hnRNP A1 protein, and the hnRNP D0 protein. U1A protein
recognizes specifically 7 nucleotides of the 5`-end of the
10-nucleotide loop, U1 hairpin II. Oubridge et al.(37) showed that the 7-nucleotide bases are extensively
recognized by the surface of the -sheets, maintaining intimate
contact with the highly conserved RNP 2 and RNP 1 motifs. Because the
overall structure and length of the RBD and its substrate are similar
among U1A, hnRNP D0, and hnRNP A1, it may be possible to use the higher
ordered structure reported by Oubridge et al.(37) as
a starting point for constructing a model of binding between hnRNP D0
and an oligonucleotide.
It has been suggested that the flat surface
of the four-stranded -sheets of RBD binds to RNA in two different
modes, one in specific and the other in nonspecific general binding. It
has also been suggested that these functionally discernible types of
binding are performed by molecularly different regions of the
-sheets(3, 38) . Two highly conserved short
sequences, RNP 2 and 1, which are located in the central two
-sheets (
1 and
3, respectively) are candidates for
regions functioning in general binding. Regions variable among
different RBDs, like the
2
3 loop and the C-terminal portion
of RBD, are candidate regions for specific binding. However, further
mapping of these two distinct binding sites has not been proposed.
HnRNP D0 and hnRNP A1 showed a similar binding specificity, which is
different from that of U1A. Therefore, amino acid sequences in Fig. 5A that are conserved between hnRNP D0 and A1 but
not with U1A may be responsible for specific binding. On the other
hand, regions conserved among these three proteins in common may be
candidate regions for general binding. In Fig. 5A,
identical or conserved amino acid residues among three proteins are
indicated. Fig. 5B is the result of mapping on a
structural model of the two types of amino acids of the hnRNP D0
protein. The result shows interesting distributions of these residues.
Residues conserved in three proteins are mostly located in 1 and
3, which form the central ``umbilicus'' of the platform (Fig. 5B, stippledcircles). In
contrast, amino acid residues conserved only in hnRNP A1 and D0 are
distributed at the margin of the platform (filledcircles). These include the start and end regions of
1; loops connecting
1 and
A,
A and
2;
2;
loops connecting
2 and
3,
B and
4; and the entire
4. These regions, which are apparently distributed at intervals in
the primary sequence, precisely trace the rim of the platform. The
characteristic distributions of the ``general'' and
``specific'' amino acids (central versus marginal)
predict the position of bound RNA on the RBD platform. For RNA to keep
contact with both general and specific amino acid residues, it needs to
be positioned by being fitted into the clefts formed by general
-sheets (
1 and
3) and specific
-sheets (
2 and
4). This is exactly what was found in the structural study with
the U1A-RNA complex(37) . They found that the U1 loop II binds
to U1A as schematically shown in Fig. 5C. General
1 and specific
4 have contact with the ascending 5`-half of
the loop. Aromatic amino acids conserved very well in RNP 1 and 2,
which occupy the upper portions of general
3 and
1 in the
orientation of Fig. 5, interact with the top of the loop
(indicated by squares). The 3`-descending loop is recognized
relatively loosely by specific
2. Finally, the highly conserved
basic amino acid in the
2
3 loop (indicated by a square) interacts with the neck of the loop. It is remarkable
that we found that both the loop between
B and
4 and
4
are composed mainly from 2xRBD-Gly-specific amino acid residues. In
U1A, this region was shown to have a tight interaction with RNA in a
sequence-specific manner. Thus, this region may be the major
determinant of sequence specificity of RBD. In contrast, in
2, we
mapped relatively few specific amino acids compared with
4. This
correlates with the observation with U1A that
2 has relatively few
contacts with RNA. Three nucleotides of the 3`-end of the loop that are
positioned around here are known not to participate in
sequence-specific binding. The loop connecting
2 and
3 shows
a somewhat different nature from other regions, because this region is
a mixture of general and specific amino acids. In U1A, this loop
penetrates and opens up the RNA loop. Therefore, it is possible that
amino acids present in this loop participate in both general and
specific binding.
In this context, the observations that the
N-terminal end (this study) and the C-terminal end (38) of RBD
are concerned in sequence-specific binding can be easily understood,
because these regions are presumably positioned at the platform rim (Fig. 5C). We propose a model, as shown in Fig. 5C for the map of the general and
sequence-specific RNA binding sites of RBD. The margin of the RBD
platform, including the N and C termini of RBD, 4,
2, and
several loops (shown by heavyshading in Fig. 5C) interacts with the RNA sequence specifically.
The central part of RBD containing
1 and
3 (shown by lightshading) that contains the highly conserved RNA
1 and 2 motifs interacts with RNA in a nonspecific general manner. This
model should be examined by a direct structural analysis of the hnRNP
D0 protein complexed with an RNA substrate, which is currently on its
way.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) D55671 [GenBank](human mRNA for hnRNP D protein, cDx4), D55672 [GenBank](human mRNA for hnRNP D protein, cDx7), D55671[GenBank]3 (human mRNA for hnRNP D protein, cDx8), and D55674 [GenBank](human mRNA for hnRNP D protein, cDx9).