(Received for publication, November 28, 1995)
From the
Proteins are directed to the nucleus by their nuclear
localization sequences (NLSs) in a multistep process. The first step,
which is to dock the NLS-containing protein to the nuclear pore, is
carried out in part by a recently identified NLS receptor named
Srp1/importin-. Using the high mobility group (HMG) DNA binding
domain of human lymphoid enhancer factor-1 (hLEF-1) as bait in a yeast
two-hybrid screen, we have identified two different mouse Srp1 proteins
(pendulin/importin-
and mSrp1) that each bind to a 9-amino acid
sequence in hLEF-1 called the B box. We show that the B box of hLEF-1,
a region essential for high affinity DNA binding, is also necessary and
sufficient for nuclear localization, lending support to the model that
NLSs can function both in nuclear transport and DNA binding. Pendulin
and mSrp1 are the mouse homologues of hRch1/hSrp1
/importin-
and hSrp1/karyopherin
/NPI-1, respectively, and show considerable
sequence divergence from each other. We find a surprising and
significant difference in the expression pattern of pendulin and mSrp1
mRNA, suggesting that these two Srp1 proteins are distinguishable in
function as well as sequence.
Analysis of the nuclear localization of transcription regulatory
factors has largely focused on defining the amino acid sequences within
transcription factor proteins that direct nuclear entry. A universal
consensus sequence that signals nuclear localization (NLS) ()has not been found; nevertheless short regions, rich in
basic residues are a feature common to most identified
NLSs(1, 2) . It is also common for the NLS of
transcription factors to be in regions that overlap or are near the DNA
binding domain, prompting speculation that (a) nuclear
occupancy by a protein is due to both nuclear import and retention by
DNA binding, (b) NLS sequences contribute to DNA binding, or (c) there is coordinate regulation of the two functions due to
their proximity(3) . It is known that the NLS directs transport
through nuclear pores via a receptor-mediated, multistep process (4, 5, 6) . Recently, a cytosolic NLS
receptor protein in Xenopus named importin-
was found to
carry out the first step along with the Xenopus importin-
subunit (also known as p97 or karyopherin
) by docking
NLS-containing proteins at the nuclear pore(7) . Xenopus importin-
is homologous to yeast Srp1p, an essential gene
first identified as a suppressor of RNA polymerase I mutations in Saccharomyces cerevisiae(8) . Immunofluorescence
studies localized ySrp1p mainly to the nuclear pore, and genetic and
biochemical evidence established a tight association of ySrp1p with
known protein components of the pore(8, 9) . More
recently, work with human Srp1
(hSrp1
; also known as hRch1)
and its mouse homolog pendulin/importin-
, and hSrp1/karyopherin
/NPI-1, a different human Srp1 protein, has confirmed in vivo and in vitro that Srp1 proteins and p97/karyopherin
function as cytosolic NLS receptors that usher NLS-containing proteins
to the cytoplasmic side of the nuclear pore, where additional
GTP-requiring factors complete the second step of translocation through
the pore. (10, 11, 12) . Finally, Gorlich et al.(13) have shown that nuclear pore binding
occurs mainly via Xenopus importin-
and that
importin-
translocates through the pore with its NLS substrate to
accumulate in the nucleus.
The lymphocyte transcription factor hLEF-1 (for human lymphoid enhancer factor) is a member of the high mobility group (HMG) family of DNA-binding proteins (14, 15) . hLEF-1 binds DNA through a single HMG homologous region (HMG box) near the carboxyl terminus resulting in a 130° bend in the substrate DNA(16) . Previous studies to delimit the DNA binding region of hLEF-1 determined the HMG box to be necessary for specific sequence recognition and a COOH-terminal 9-amino acid region rich in basic residues called the B box, to be necessary for high affinity binding to the DNA target(17) . The HMG and B boxes together are sufficient for independent, high affinity recognition of LEF-1 binding sites, as well as DNA bending that is indistinguishable from full-length hLEF-1 protein(17, 18) . We show here that in addition to its role in DNA binding, the B box also functions as a NLS and is bound by at least two different Srp1 nuclear transport proteins.
Figure 2: Tissue-specific expression of mouse pendulin and ubiquitous expression of mSrp1 mRNA. 15 µg of total RNA from the indicated mouse tissues was analyzed by Northern analysis using the following random prime-labeled cDNA clones as probes. Upper panel, Northern blot probed with mSrp1 clone H (see Fig. 1). Middle panel, the identical blot probed with mouse pendulin clone A. Lower panel, to control for lane-to-lane variation, the identical blot probed with the ubiquitously expressed glyceraldehyde-3-phosphate dehydrogenase gene. Size of each mRNA was determined using RNA molecular weight markers.
Figure 1:
Amino acid
sequence of mouse pendulin and mouse Srp1 and classification of Srp1
proteins into two groups. A, amino acid sequence of mouse
pendulin and mSrp1 is indicated in single-letter code with the
beginning of each yeast two-hybrid clone marked by a lettered
arrow. Multiple alignment analyses of the nine arm repeats for
each protein are shown by introduced spacing and shading of conserved residues, and a consensus arm repeat sequence is
shown below ( indicates hydrophobic residues L, V, I, and M).
Amino acids that differ from previously reported pendulin sequence are boxed. B, mammalian Srp1 NLS receptors are grouped
into two subtypes based on percentage of amino acid sequence identity
between each group. Amino acid identities within each group average 95%
± 2%. Drosophila pendulin/Oho-31 and Xenopus importin-
average greater sequence identity with the pendulin
group (51% and 61%, respectively) than with the mSrp1 group (45% and
46%, respectively).
To identify proteins that bind directly to HMG DNA binding
domains, we performed a yeast two-hybrid screen of a mouse thymocyte
cDNA library fused with the yeast Gal4 activation domain (amino acids
768-881), using the HMG DNA binding domain of hLEF-1 (amino acids
297-399) fused to the DNA binding domain of Gal4 (amino acids
1-147) as bait. Of 1.4 10
independent clones
analyzed, 10 gave strong positive signals in yeast expressing
Gal4/hLEF-1-(297-399), but not in yeast expressing unrelated
baits such as the protein QM (
)or a reverse transcriptase
from the yeast Ty3 retrotransposon. (
)In addition, we
observed no interaction, as judged by growth on medium lacking
histidine and expression of
-galactosidase activity, with the DNA
binding domains of three other HMG family members: yeast Ste11, human
TCF-1, or human SRY. BLAST analysis identified 8 of the 10 clones to be
Srp1 proteins. Seven clones encoded the same protein and were nearly
identical at the nucleotide level to a GenBank entry called mouse
pendulin (accession no. U12270(1995)), a Srp1 protein first identified
in Drosophila as pendulin or Oho-31(24, 25) .
Recently, a second BLAST search identified a new GenBank entry called
mouse importin-
that also had near sequence identity with the
seven clones (accession no. D55720(1995); (12) ). Since the
sequence variations between pendulin, importin-
, and the seven
yeast two hybrid clones described here are minor, each is likely to
encode the same Srp1 protein; to simplify discussion, we will refer to
our clones as pendulin. The eighth clone is a distinct gene identical
at the amino acid level to mouse Srp1 (mSrp1), a variant Srp1 protein
more closely related to yeast Srp1p(26) .
Srp1 protein can
be divided into an NH-terminal hydrophilic region, a
hydrophobic central region composed of 8-10 degenerate repeats
called ``arm'' repeats, and a short hydrophilic COOH
terminus(27) . Fig. 1shows the amino acid sequence of
the two Srp1 variants identified in our screen and the positions at
which each isolated clone begins relative to the positions of the arm
repeats (Fig. 1, panels A and B). Arm repeats
are so named on the basis of their similarity to an amino acid repeat
structure first identified in the Drosophila segment polarity
protein Armadillo. Although the precise function of multiple,
degenerate arm repeats in Srp1 proteins is not known, it is possible
that they bind an array of different NLS signals. Indeed, hSrp1
,
the human homologue of pendulin, has been shown to directly interact
with two different types of NLS sequences(10) . An exact
determination of the number and borders of arm repeats in Srp1 proteins
awaits a detailed structure/function analysis, and for that reason they
have been variously assigned by different investigators. Fig. 1shows an alignment of arm repeats in pendulin and mSrp1
with borders similar to those described by Peifer et al.(28) and Torok et al.(25) , as well as a
ninth, degenerate arm repeat that has been noted in Drosophila pendulin(24) . All of the Srp1 cDNA isolates were partial
clones, yet all were able to interact with hLEF-1 in vivo,
suggesting that at least the first three arm repeats of pendulin and
the first two arm repeats of mSrp1 are dispensable for binding to the
hLEF-1 bait (Fig. 1). Multiple alignment analysis of the arm
repeats within pendulin show a great deal of diversity in repeat length
because more spacing must be introduced for an optimal alignment (Fig. 1, panel A). This is not the case for mSrp1,
which requires minimal changes in spacing for alignment (Fig. 1, panel B).
Amino acid sequence comparisons of pendulin and
mSrp1 with other Srp1 proteins show that mouse pendulin (or
importin-) is the homologue of human Rch1/NPI-1/karyopherin
hSrp1
(94% sequence identity) and mSrp1 the homologue of human
hSrp1 (97%)(24) . Surprisingly, pendulin is no more similar to
mSrp1 than it is to yeast Srp1p (45% identity to each), revealing a
natural division of the mammalian Srp1 proteins into two subtypes (Fig. 1C). To date, only one type of Srp1 protein has
been identified in Drosophila (as pendulin or Oho-31) and Xenopus (importin-
) and each is more similar in amino
acid sequence and arm repeat spacing to the pendulin/hRch1/hSrp1
class than to the mSrp1 class, which is more closely related to yeast
Srp1p(7, 24, 25) . The existence of two Srp1
subtypes in mouse and human and the distinct arrangement of their
respective arm repeat structures suggests that each subtype may perform
unique functions in nuclear transport or additional functions not yet
identified. Whether these additional functions are unique to mammalian
systems is only speculative, since it is not yet established whether
yeast, Drosophila, or Xenopus have one or two Srp1
subtypes.
The reported tissue-specific and developmental
stage-specific pattern of expression for Drosophila pendulin (24, 25) prompted us to compare the expression pattern
of pendulin and mSrp1 by Northern analysis (Fig. 2). A
4.3-kilobase RNA hybridizing to a mSrp1 probe appears at a low level in
all tissues examined (lung, liver, spleen, thymus, heart, brain,
cerebellum, uterus, and kidney), an expression pattern expected of a
protein involved in an essential cellular function. In contrast,
expression of the pendulin gene is highly variable; a single
2.4-kilobase RNA species appears in all of the tissues surveyed at a
low level, but is very highly expressed in thymus, spleen, and heart.
This great difference between pendulin and mSrp1 expression patterns is
a distinguishing characteristic that supports a grouping of mammalian
Srp1 proteins into two subtypes. It also suggests that pendulin
performs an additional or unique function in tissues that express high
levels of this protein. In Drosophila, pendulin/Oho-31 was
identified as a tumor suppressor gene, as its absence causes
over-proliferation of the hematopoietic
organ(24, 25) . Drosophila pendulin also
localizes to the nucleus precisely at the G/M stage of the
cell cycle. It will be interesting to compare the nuclear localization
patterns of the pendulin and mSrp1 subtypes relative to cell cycle
progression and determine whether this might be yet another
distinguishing characteristic indicative of separate functions in the
cell.
In order to compare the binding specificities of pendulin and
mSrp1 for hLEF-1 and to map the hLEF-1 domains involved in binding, we
established an in vitro binding assay with GST fused to
pendulin (clone A) or mSrp1 (clone H). Recombinant GST fusion proteins
were bound to glutathione-Sepharose beads and incubated with in
vitro translated, S-labeled hLEF-1 deletion mutants
or control proteins (Fig. 3). Neither the DNA binding domain of
Gal4 (data not shown) nor a hLEF-1 deletion mutant missing the entire
HMG DNA binding domain (lane 1) was precipitated with either
GST-pendulin-bound or GST-mSrp1-bound glutathione-Sepharose. In
contrast, protein fragments containing the HMG DNA binding domain were
specifically precipitated with both GST-pendulin and GST-mSrp1 (lanes 2, 4, 5, and 8) but not with
GST alone bound to beads (data not shown). This confirmed that the DNA
binding domain of hLEF-1 was specifically associating with Srp1 protein
and not GST or glutathione-Sepharose. To precisely map the region of
the HMG DNA binding domain recognized by pendulin and mSrp1, we
examined a set of deletions surrounding the HMG and B boxes.
Precipitation of the DNA binding domain with GST-pendulin was most
efficient when all 9 residues of the B box were present (KKKKRKREK; lane 8). Although in the experiment shown,there is little
apparent binding of mSrp1 to the same hLEF-1 deletion mutant in lane 8, we observed in two additional repeat experiments that
mSrp1 bound as well to this fragment as to the hLEF-1 deletion proteins
in lanes 4, 5, and 9-11. Continued
deletion of the B box to 6 (KKKKRK; lane 9) and then 4
residues (KKKK; lane 11) decreased binding of both to pendulin
by 5-fold, but did little to disrupt binding by mSrp1. A mutant B box
in which two lysines were substituted for two arginines appeared to
bind better to pendulin than its wild type counterpart (KKRRRK; lane 10). Finally, deletion of the B box to 2 residues (KK; lane 12) dramatically disrupted interaction with pendulin and
mSrp1 15-fold. A residual amount of binding (4%) was detectable with
hLEF-1 fragments that did not contain the B box, suggesting that
residues in adjacent regions might also contribute to binding (panel A, compare lanes 8, 13, and 14) .
Figure 3:
Mouse pendulin and mSrp1 bind to the B box
in the HMG DNA binding domain of hLEF-1. A, GST-pendulin
(clone A) bound to glutathione Sepharose beads was incubated with in vitro translated,
[S]methionine-labeled fragments of hLEF-1.
Specifically bound eluates were separated by SDS-PAGE and the gel
exposed to film. 10% of total protein input for each reaction is shown
in the lower panel. B, binding of hLEF-1 proteins to
GST-mSrp1 (clone H) was performed as described above in panel
A, except that 5% of total input is shown in the lower panel. C, schematic representation of the subfragments of hLEF-1 used
in the GST binding assay. Specific binding is summarized with a +
or -. Weak binding is indicated with a +/-. Precise
borders of each of the hLEF-1 fragments are as follows: lane
1, internal deletion from amino acids 237 to 396; lane 2,
DNA binding domain of yeast Gal4 (amino acids 1-147) fused to
hLEF-1-(296-399); lane 3, yeast Gal4 DNA binding domain
fused to the HMG protein hTCF-1A-(152-294); lane 4,
hLEF-1-(237-399); lane 5, hLEF-1-(273-384); lane 6, hLEF-1-(273-364); lane 7,
hLEF-1-(285-373); lane 8, hLEF-1-(297-384),
sequence of the COOH-terminal 11 amino acids is shown; lane 9, hLEF-1-(297-379); lane 10, hLEF-1-(297-379)
(this construct has amino acid changes of lysine 376, 377 to arginine); lane 11, hLEF-1-(297-377); lane 12,
hLEF-1-(297-375); lane 13, hLEF-1-(297-373); lane 14, hLEF-1-(297-364). Note that in two repeat
experiments, binding of hLEF-1-(297-384) (lane 8) to
mSrp1 was observed at levels equal to that observed for
hLEF-1-(237-399) and hLEF-1-(273-384) (lanes 4 and 5).
One particularly interesting feature of hLEF-1 is that its HMG DNA binding domain is almost identical in amino acid sequence (98% amino acid identity) to that of hTCF-1, a second HMG lymphocyte transcription factor that binds and bends identical DNA sequences in promoters and enhancers of T cell-specific genes(29, 30) . As mentioned above, neither pendulin nor mSrp1 appeared to interact with the hTCF-1 control bait in yeast. A lack of interaction with hTCF-1 is surprising because the amino acid sequence of the B box differs by only 2 residues (KKKRRSREK), one of which is a conservative change from lysine to arginine. We prepared in vitro translated Gal4/hLEF-1-(297-399) and Gal4/hTCF-1-(152-294) fusion protein and tested each for binding to pendulin and mSrp1 in vitro. Consistent with our observations in yeast, pendulin bound poorly to the HMG DNA binding domain of hTCF-1 (Fig. 3A, lanes 2 and 3), showing a 4-fold drop in Gal4/hTCF-1 recovered with GST-pendulin beads compared to Gal4/hLEF-1; in other experiments we have observed differences as great as 20-fold. In contrast, mSrp1 binding was significantly weaker and equal for Gal4/hLEF-1 and Gal4/hTCF-1, a result we do not observe in yeast (Fig. 3B, lanes 2 and 3). Whether this weaker, equal binding reflects the abundance of truncated translation products in the binding reaction, truncated mSrp1 protein, or is indicative of a true difference in the recognition of hLEF-1 by pendulin and mSrp1, will be determined by further study with full-length proteins. That pendulin appears to interact with hLEF-1 and not hTCF-1 is interesting and may reveal a functional difference between these two transcription factors not previously appreciated. We conclude that Srp1 proteins interact with hLEF-1 through specific binding of amino acid residues in the B box. For pendulin, this recognition is highly specific, as a change of even 2 amino acids disrupts binding.
The B box of hLEF-1 (KKKKRKREK) is required for high affinity DNA binding(17) , but due to its interaction with Srp1 proteins and its high degree of sequence similarity to the established NLS of SV40 T antigen (PKKKRKV), it may also function as a nuclear localization signal. In order to determine if the B box is necessary to direct nuclear localization of hLEF-1, we compared the subcellular localization of transiently expressed hLEF-1 COOH-terminal deletion mutants in COS-1 cells using affinity-purified hLEF-1 antisera for immunofluorescence (Fig. 4A). Truncation to 1 amino acid beyond the end of the B box (amino acid 383) does not alter the normal subcellular localization of hLEF-1 as it appears to be localized exclusively in the nucleus (Fig. 4A, upper panels). However, truncation to amino acid 373, which precisely deletes the B box, causes a complete shift to the cytoplasm (Fig. 4A, lower panels). Thus, the B box appears to be necessary for hLEF-1 to enter the nucleus.
Figure 4: The B box in the HMG DNA binding domain of hLEF-1 is both necessary and sufficient for nuclear localization. A, two COOH-terminal deletions of hLEF-1 (hLEF-1-(1-383) and hLEF-1-(1-373)) were transiently transfected into COS-1 cells. After 48 h, cells were processed for immunofluorescence using affinity-purified hLEF-1 antisera and fluorescein-conjugated anti-rabbit antibody. Nuclear staining is indicated in the upper panel (hLEF-1-(1-383)), and cytoplasmic staining in the lower panels (hLEF-1-(1-373)), showing that the B box (amino acids 374-383) is necessary for nuclear transport. B, hLEF-1 coding sequences were fused to the COOH-terminal end of green fluorescent protein (GFP) and transiently transfected into COS-1 cells. Intrinsic green fluorescence is not visible when GFP is localized in the cytoplasm, but can be detected using indirect immunofluorescence. The upper panels show cytoplasmic staining for GFP alone and hLEF-1-(297-373), and the lower panels show nuclear GFP flourescence for hLEF-1-(297-384) and hLEF-1-(373-384), demonstrating that the B box is a bona fide NLS.
To determine if the B box is sufficient to direct nuclear localization, we prepared expression plasmids that fuse portions of the hLEF-1 HMG DNA binding domain to the COOH terminus of a variant green fluorescent protein (S65T-GFP) and used fluorescence microscopy to analyze the degree of nuclear localization of these fusion proteins in transiently transfected COS-1 cells (Fig. 4B)(21, 22) . Western analysis confirmed high levels of intact GFP to be expressed in these cells, yet curiously, intrinsic GFP fluorescence was not detectable (data not shown). That intact GFP was present and localized exclusively in the cytoplasm was confirmed by immunofluorescence using GFP antibodies (Fig. 4B). In contrast, when green fluorescent protein was fused to the HMG DNA binding domain of hLEF-1 (GFP/hLEF-1-(297-384)), intrinsic green fluorescence was readily visible in the nucleus. One possibility for this difference in signal detection is that sequestration of GFP protein to a subcellular organelle concentrates the protein to a level that exceeds a threshold of detection. Another possibility is that GFP fluorescence is quenched in the cytoplasm but protected from such effects in the nucleus. We have observed similar patterns of detection in HeLa cells and human T cell lines. Removal of all 9 amino acids of the B box from the HMG DNA binding domain eliminates nuclear localization (GFP/hLEF-1-(297-373)). Again, although intrinsic green fluorescence is not detectable because of greatly reduced nuclear localization, GFP/hLEF-1-(297-373) was detected in the cytoplasm by immunofluorescence using either affinity-purified hLEF-1 or GFP antisera. Finally, fusion of sequences encoding only the B box (GFP/hLEF-1-(374-384)) caused exclusive localization of GFP to the nucleus. These data show that the B box, which binds pendulin and mSrp1, is necessary and sufficient for nuclear localization and is most likely the major NLS of hLEF-1.
In summary, we have shown that the B box, a 9-amino acid motif within the hLEF-1 HMG DNA binding domain, functions as a NLS to direct nuclear localization and binds to two different Srp1 NLS receptor proteins (one of which is highly expressed in thymus, heart, and spleen). It is significant that this same motif also plays an important role in DNA binding by contacting the sugar-phosphate backbone of the DNA recognition site(18) . We note that Srp1 transport proteins move through nuclear pores with their NLS substrates(13) , and suggest that direct binding of the B box both to NLS receptor proteins and to DNA may indicate coordinate regulation of these two functions.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U34228 [GenBank]and U34229[GenBank].