(Received for publication, February 6, 1996; and in revised form, March 11, 1996)
From the
cDNA species encoding a large DNA-binding protein (NP220) of 1978 amino acids was isolated from human cDNA libraries. Human NP220 binds to double-stranded DNA fragments by recognizing clusters of cytidines. Immunofluorescent microscopy with antiserum directed against NP220 revealed a punctate or ``speckled'' pattern and coiled body-like structures in the nucleoplasm of various human cell lines. These structures diffused in the cytoplasm during mitosis. Western blot analysis showed that NP220 is enriched in the lithium 3,5-diiodosalicylate-insoluble fraction of nuclei. The domain essential for DNA binding is localized in C-terminal half of NP220. Human NP220 shares three types of domains (MH1, MH2, and MH3) with the acidic nuclear protein, matrin 3 (Belgrader, P., Dey, R., and Berezney, R.(1991) J. Biol. Chem. 266, 9893-9899). MH1 is a 48-amino acid sequence near the N terminus of both human NP220 and rat matrin 3. MH2 is a 75-amino acid sequence homologous to the RNA recognition motifs of heterogeneous nuclear RNP I and L. It is repeated three times in NP220 and twice in matrin 3. MH3 is a 60-amino acid sequence at the C terminus of both NP220 and matrin 3. NP220 has an arginine/serine-rich domain commonly found in pre-mRNA splicing factors. Close to the domain essential for DNA binding, there are nine repeats of the sequence LVTVDEVIEEEDL. Thus, NP220 is a novel type of nucleoplasmic protein with multiple domains.
The nucleus is a highly organized structure, permeated by a
proteinous ``matrix'' and including several subcompartments.
Lamins are concentrated near the periphery of the nucleoplasm. By
coiled-coil self-association(1, 2, 3) ,
lamins A, B, and C can form a meshwork throughout the nucleoplasm.
Lamin B1 interacts with chromatin (4) at clusters of adenosine-
or thymidine-rich sequences called ``matrix-attachment
regions'' (MARs) ()(5) . The DNA of chromatin
is organized into constrained loops of
60
kilobases(6, 7) , and MARs are thought to form the
basis of such chromosomal loops. ARBP was first purified from chicken
oviduct cells (8) as a MARs-binding protein. Thymus-specific
SATB1 (9) and HeLa cell SAF-A (10) have been cloned and
shown to bind MARs. Matrins D, E, F, G, and 4 are another group of
DNA-binding proteins(11, 12, 13) . They may
be important for organizing chromosomes, localizing genes, and
regulating DNA transcription and
replication(14, 15, 16, 17) .
The
so-called ``interchromatin space'' corresponds at the
ultrastructural level to several subcompartments: perichromatin
fibrils, interchromatin granules, and nuclear bodies(18) . U1,
U2, U4/U6, and U5 small nuclear ribonucleoproteins (snRNPs), which
sequentially interact with pre-mRNA to form spliceosomes, are major
components of these subcompartments(19, 20) . In
addition, spliceosomes contain non-snRNP proteins such as Drosophila su(w)(21) , tra(22) , and tra-2(23) ,
SF2/ASF(24, 25) , U2AF(26) , and
SC35(27) . Many non-snRNP proteins have RNA recognition motifs
(RRM) or serine/arginine-rich (SR) domains(28, 29) .
The heterogeneous nuclear ribonucleoproteins (hnRNPs) condense and package growing RNA transcripts. Because of these functions, hnRNPs are also important components of the interchromatin space. The primary sequences of many hnRNPs including hnRNP B, C, and E from grasshopper (30) , Drosophila(31) , Xenopus(32) , and human (33, 34) have been determined, and they are grouped into different categories based on the RRM(35) .
We describe here a novel nucleoplasmic protein (NP220) of human cells with an estimated size of 220 kDa. Human NP220 binds cytidine-rich sequences in double-stranded DNA (dsDNA). It is striking that NP220 also has RRMs similar to hnRNPs I and L and a SR domain. Thus, NP220 is a novel type of nucleoplasmic protein having multiple domains suggesting functions related to both RNA and DNA.
Figure 1:
Domain
organization of human NP220 and its cDNA clones. A, RS, a domain rich in arginine and serine; MH1, MH2, and MH3, domains homologous to rat matrin 3; PstI-HindIII, a domain essential for DNA binding; Acidic repeat, a domain where sequences with a consensus of
LVTVDEVIEEEDL repeat nine times. AA, amino acids. B,
regions covered by a series of gt11 or
ZAPII cDNA clones
originally isolated. RACE, 5`-terminal sequence cloned by the
rapid amplification of cDNA ends method. Numbers in parentheses are nucleotide numbers in Fig. 2. C. expression plasmids obtained by subcloning the original
cDNA clones in B. pK1, pK1-H, and pK1-P are in pKK223-3
plasmid and used for the study of DNA binding activity (Fig. 5).
pK1-BD is in pGEX-3X plasmid and used for the antibody
preparation.
Figure 2: Nucleotide sequence and deduced amino acid sequence of human NP220. The presumptive initiation codon, termination codon, and polyadenylation signal are underlined.
Figure 5: Definition of the domain in human NP220 essential for DNA binding. A series of fragments of NP220 was expressed in E. coli by subcloning the inserted sequence of K1 into pKK223-3, digesting it with PstI or HindIII and ligating to yield pK1, pK-H, and pK-P in Fig. 1. Extracts of E. coli expressing pK1 (lanes a), pK-H (lanes b), and pK-P (lanes c) were separated by SDS electrophoresis, and the transblotted filters were protein stained or hybridized to the fragment of mitochondrial promoter region. Arrowheads indicate the migration of the specific recombinant products.
Figure 3: Internally repeated amino acid sequences in human NP220. A, amino acid sequence of RS domain (a domain rich in arginine and serine). B, amino acid sequences of acidic repeats are compared. Numbers are the amino acid number in Fig. 2.
The second type of repeat is a 76-amino acid sequence repeated
three times at residues 677-753, 906-981, and
1010-1084 (Fig. 4C). Since a homologous sequence
is found in rat matrin 3(52) , we refer to it as a MH2 domain
(see below). Together with the sequence in rat matrin 3, the MH2
repeats constitute RRM similar to the RRMs of hnRNPs I and
L(53) . hnRNP I, also known as polypyrimidine tract-binding
protein(54, 55) , binds to hnRNA through this RRM.
Figure 4: Domains of human NP220 homologous with rat matrin 3. Dot matrix plot of NP220 (abscissa) against matrin 3 (ordinate) with minimum averaged score of 1.7 over 20 amino acids (63) is shown in A. Homologies of three paired amino acid sequences (MH1, MH2, and MH3) of NP220 and matrin 3 are compared in B-D with identical or similar amino acids in reversed color. Numbers are the amino acid number of human NP220 in Fig. 2or of rat matrin 3 in (52) .
The third type of repeat is at the C terminus of human NP220 ( Fig. 1and Fig. 2), where characteristic sequences repeat
nine times (Fig. 3B). Since 6 out of 13 amino acids in
the consensus sequence are acidic, we refer to it as the acidic repeat.
Since the acidic repeats contain many amino acids with an oxygen atom
capable of interacting with metals as in EF hand(56) , we
tested the calcium binding ability of this domain by expressing the
inserted sequence of clone M5 (Fig. 1) in Escherichia
coli, separating the product by SDS electrophoresis, blotting to
nitrocellulose, and probing with CaCl
(57) . Although the product gave a
radioactive signal (results not shown), the binding could be
demonstrated only at calcium concentrations above 0.1 mM.
As summarized in Fig. 4, human NP220 shares three types of
domains (MH1, MH2, and MH3 domains) with matrin 3, which had been
cloned by Belgrader et al.(52) from a rat cDNA
library. In the MH1 domain, more than 70% of the amino acids are
identical or similar. In the MH2 domains, 50% of the amino acids
are similar, and both NP220 and matrin 3 retain the core sequences of
RRM found in hnRNPs I and L, suggesting that they form a subfamily
within the large superfamily of RNA-binding proteins(35) . In
MH3, 42% of the amino acids are identical or similar.
A modified SAAB method employing the pK1 product showed that NP220 preferentially binds to cytidine clusters in either strand of dsDNA. Thus, after six rounds of successive selection and amplification of oligonucleotides having 20 bp of random sequence, fragments having the consensus sequence of CCCCC(G/C) were selected (Fig. 6A). Since both mitochondrial promoters (HSP and LSP) have such cytidine clusters (Fig. 6B), it is reasonable that clone K1 was isolated as a binding protein of the mitochondrial promoter region. It is worthwhile to note that this preferential DNA target of NP220 is distinct from the A- and T-rich sequences in MARs. This shows that the DNA binding specificity of NP220 is different from that of ARBP(8) , SATB1(9) , and SAF-A(10) .
Figure 6:
dsDNA fragments selected by pK1 product. A, synthesized dsDNA fragments having 20 bp of random
sequences between 18 bp each of two cassette sequences for
amplification and cloning
(5`-TTGCTCACTCGAGACACC-(N)-GCACATCTAGACGTTAGC-3`) were
selected by binding to the K1 product and then amplified by polymerase
chain reaction. After six rounds of successive selection and
amplification, the fragments were cloned into pBluescript. Sequences of
either strand of the inserted fragments in randomly selected clones are
arranged to give maximal matching. Nucleotides in random and cassette
regions are written with uppercase and lowercase
letters, respectively. The consensus sequence is presented at the bottom. B, the sequences in human mitochondrial
promoters (HSP and LSP) similar to the consensus sequence in A are indicated by shading.
Figure 7: Western blot analysis of subcellular fractions from HeLa cells. A homogenate of HeLa cells was separated into nuclear and cytoplasmic fraction, and nuclei were further fractionated into LIS-soluble (supernatant) and insoluble (nuclear matrix) fractions. The whole cell fraction was prepared by immediate lysis of living cells with SDS sample buffer. After SDS electrophoresis, proteins on the transblotted filters were detected by protein staining or with the antibody against human NP220 in A or with monoclonal antibody against lamin B1 in B. The lanes under HMW (high molecular weight) and LMW (low molecular weight) contained size markers of 200, 116.2, 97.4, 66.2, and 43 kDa, and 97.4, 66.2, 43, 30, 20.1, and 14.4 kDa, respectively.
Indirect immunofluorescence microscopy of interphase HeLa cells with the anti-NP220 antibody showed a diffuse nucleoplasmic signal excluding nucleoli and concentrated in a punctate or ``speckled'' pattern (Fig. 8A). Similar staining patterns have been observed with antibodies against snRNP proteins, antibodies against 2,2,7-trimethylguanosine (m3G) cap structure, and antisense probes targeted to spliceosomal snRNAs(58, 59, 60) . The punctate or ``speckled'' snRNP distribution results from the association of snRNPs with perichromatin fibrils, interchromatin granules, and coiled bodies(19, 20) . Many hnRNPs and the spliceosome-associated SR proteins also showed a ``speckled'' pattern(29, 61) . This suggests that NP220 may be important for packaging or processing the nascent transcripts. It is striking that P220 is also enriched in two or three coiled body-like structures in each cell (Fig. 8A). Comparable staining of NP220 was observed in other human cell lines such as Hep G2, A431, and non-transformed fibroblast SFYT cells (results not shown).
Figure 8: Intranuclear localization of human NP220. A sparse culture of HeLa cells was indirectly immunostained with the antibody directed against human NP220. Most cells in panel A are in interphase. Selected cells presumably in prometaphase and the anaphase are shown in panels B and C, respectively. The scale bar is 10 µm.
Fig. 8, B and C, depicts the behavior of NP220 during mitosis. Diffused cytoplasmic signal is seen with exclusion from the condensed chromatin. Probably due to the reorganization of coiled bodies during mitosis, the bright stain observed in interphase cells (Fig. 7A) disappeared after the onset of mitosis. Such behavior of NP220 during mitosis was strikingly similar to that of spliceosomal snRNPs(62) .
The nucleic acid sequence and deduced amino acid sequence of human NP220 are available from the DDBJ/EMBL/GenBank DNA data bases under accession number D83032[GenBank].