From the
Structural Biology Laboratory, St.
Vincent's Institute of Medical Research, 41 Victoria Parade, Fitzroy, Victoria
3065, Australia, the
Departamento de
Física e Biofísica, Instituto de Biociências, Universidade
Estadual Paulista - UNESP, C. P. 510, 18618-000, Botucatu, SP, Brazil, the
||Department of Biochemistry and Molecular Biology,
Institute for Molecular Bioscience, Australian Research Council Special
Research Centre for Functional and Applied Genomics, and Cooperative Research
Centre for Chronic Inflammatory Diseases, University of Queensland, Brisbane,
Queensland 4072, Australia, **Nuclear Signalling
Laboratory, Division for Biochemistry and Molecular Biology, John Curtin
School of Medical Research, Australian National University, Canberra, ACT
2601, Australia, and the
Department
for Biochemistry and Molecular Biology, Monash University, Clayton, Victoria
3168, Australia
Received for publication, March 31, 2003
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Despite the variability, the conventional basic NLSs are recognized by the
same receptor protein termed importin or karyopherin, a heterodimer of
and
subunits (for recent reviews, see Refs.
24).
Importin-
(Imp
) contains the NLS-binding site, and
importin-
(Imp
) is responsible for the translocation of the
importin-substrate complex through the nuclear pore complex. Once inside the
nucleus, Ran-GTP binds to Imp
and causes the dissociation of the import
complex. Imp
becomes autoinhibited, and both importin subunits return
to the cytoplasm separately without the import cargo. The directionality of
nuclear import is conferred by an asymmetric distribution of the GTP- and
GDP-bound forms of Ran between the cytoplasm and the nucleus. This
distribution is in turn controlled by various Ran-binding regulatory
proteins.
Imp consists of two structural and functional domains, a short basic
N-terminal Imp
-binding domain
(57)
and a large NLS-binding domain built of armadillo (Arm) repeats
(8). The structural basis of
monopartite and bipartite NLS recognition by Imp
has been studied
crystallographically in yeast and mouse Imp
proteins
(911).
The two basic clusters of the bipartite NLSs bind to two separate binding
sites on Imp
, involving Arm repeats 14 and 48,
respectively. Monopartite NLSs can bind in both sites but primarily use the
binding site corresponding to the C-terminal basic cluster of the bipartite
NLSs, referred to as the major site
(9,
11).
The structure of full-length Imp indicated that the major
NLS-binding site is occupied by residues 4454 from the N-terminal
region of the protein (Imp
-binding domain) that resembles an NLS
(12); Imp
is therefore
autoinhibited in the absence of Imp
. This observation is supported by
the measurements of higher NLS binding affinity by Imp
/
as
compared with Imp
alone
(1322).
The significance of the autoinhibitory mechanism has been confirmed by in
vivo studies (23).
In this study, we present the crystal structures of peptides corresponding
to the bipartite NLSs from human retinoblastoma protein (RB) and Xenopus
laevis chromatin assembly factor N1N2 bound to mouse Imp. These
NLSs were chosen to represent diverse sequences and different lengths of the
linkers between the clusters, so that some general conclusions can be drawn on
NLS binding. The basic clusters of both peptides bind in the expected binding
pockets, but the linker regions make specific contacts with the receptor also.
Comparisons with other available Imp
structures allow us to explain the
specificities of monopartite and bipartite NLS binding and help us improve the
definition of the consensus sequence of a conventional basic/bipartite NLS.
The results will have general implications for recognizing the NLSs in new
gene products identified in genome sequences and therefore for functional
annotation of new proteins.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Protein Expression, Purification, and
CrystallizationN-terminal truncated mouse importin
(
2 isoform (25))
lacking 69 N-terminal residues (m-Imp
) was expressed recombinantly in
Escherichia coli as a fusion protein containing a hexa-histidine tag
(11). For crystallization,
m-Imp
was concentrated to 18.8 mg/ml (in 20 mM Tris-HCl (pH
8.0), 100 mM NaCl, and 10 mM dithiothreitol) using a
Centricon-30 (Millipore) and stored at 20 °C. Crystallization
conditions were screened by systematically altering various parameters using
the crystallization conditions successful for other peptide complexes
(11) as a starting point. The
crystals of both complexes (rod-shaped, 0.5 x 0.2 x 0.1 mm for the
RB peptide complex and 0.4 x 0.1 x 0.07 mm for the N1N2 peptide
complex) were obtained using co-crystallization by combining 1 µl of
protein solution, 0.7 µl of peptide solution (1.7 mg/ml with
peptide/protein ratio 3.5), and 1 µl of reservoir solution and suspended
over 0.5 ml of reservoir solution containing 0.6 M sodium citrate
(pH 6.0) and 10 mM dithiothreitol.
Diffraction Data CollectionThe crystals exhibit
orthorhombic symmetry (space group P212121;
Table I). Diffraction data were
collected from single crystals transiently soaked in a solution analogous to
the reservoir solution but supplemented with 23% glycerol and flash-cooled at
100 K in a nitrogen stream (Oxford Cryosystems), using a MAR-Research image
plate detector (plate diameter, 345 mm) and CuK radiation from a Rigaku
RU-200 rotating anode generator. Data were autoindexed and processed with the
HKL suite (26)
(Table I).
|
Structure Determination and RefinementThe crystals of the
peptide complexes were highly isomorphous with the crystals of full-length
Imp (12); therefore,
the structure of mouse Imp
(Protein Data Bank number 1IAL
[PDB]
) with
N-terminal residues omitted was used as a starting model for crystallographic
refinement. Electron density maps were inspected for the presence of the
peptide after rigid body refinement using the program CNS
(27) (RB peptide: m-Imp
complex, Rcryst = 30.5%, Rfree =
32.4%, 64-Å resolution; N1N2 peptide: m-Imp
complex,
Rcryst = 30.8%, Rfree = 35.1%,
64-Å resolution; Table
I provides the explanation of R-factors). Electron
density maps calculated with coefficients 3 Fobs2
Fcalc and simulated annealing omit maps
(Fig. 1) calculated with
analogous coefficients were generally used. The model was improved, as judged
by the free R-factor
(28), through rounds of
crystallographic refinement (positional and restrained isotropic individual
B-factor refinement with an overall anisotropic temperature factor and bulk
solvent correction) and manual rebuilding (program O
(29)). Solvent molecules were
added with the program CNS
(27). Asn239 is an
outlier in the Ramachandran plot as also observed in all other structures of
mouse Imp
(11,
12). Pro242 is a
cis-proline. The final models comprise 427 Imp
residues (residues
71497), 20 peptide residues, and 173 water molecules for RB
peptide-Imp
complex, and 426 Imp
residues (residues
72496), 21 peptide residues, and 123 water molecules for N1N2
peptide-Imp
complex (Table
I). The coordinates have been deposited in the Protein Data Bank
(Protein Data Bank numbers 1PJM and 1PJN for the RB and N1N2 peptide
complexes, respectively).
|
Structure AnalysisThe quality of the models was assessed with the program PROCHECK (30). The contacts were analyzed with the program CONTACT, and the buried surface areas were calculated using the program CNS (27).
Bioinformatic AnalysisWe used the consensus sequence KRRK
to search for NLS-containing proteins using the Quick Matrix option of
Scansite (31); the sequence
KRXK was entered for the primary preference positions 0 to +3, and R
was entered into the secondary preference position +2. The bipartite consensus
was too long to use with the current version of Scansite. Testing using
proteins with known NLSs showed that there is a high likelihood of detecting a
functional NLS at Scansite scores 0.0408 (corresponding to 0.448% of all
yeast proteins) and a reasonable likelihood at Scansite scores
0.0596
(corresponding to 1.169% of all yeast proteins). To estimate the efficiency of
detecting an unknown NLS, we performed a Scansite search with a test set of 50
randomly selected yeast nuclear proteins (Munich Information Center for
Protein Sequences subcellular catalogue
(32)); 9 proteins showed a
sequence match to the above motif with Scansite scores
0.0408, and 23
proteins (46%) showed a match with Scansite scores
0.0569. For
comparison, we searched for NLSs in the same test set of proteins using
PredictNLS (33); this method
detected an NLS in nine proteins, some of which did not belong to the
conventional basic/bipartite group.
![]() |
RESULTS AND DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Residues 859878 of the peptide RB (residue 879 had no interpretable electron density) and residues 535555 of the peptide N1N2 (residues 533534 and 556 had no interpretable electron density) could unambiguously be identified in the electron density maps (Fig. 1). The side chains of Glu543 and Lys549 of N1N2 were poorly ordered; therefore, these residues were modeled as alanines.
Structure of Importin- in the
ComplexesImp
forms a single elongated domain built from 10
Arm structural repeats, each containing three
helices (H1, H2, and H3)
connected by loops (Fig. 2).
The structure of Imp
in the complexes is comparable with the crystal
structure of the full-length Imp
. (r.m.s. deviations of C
atoms
of Imp
residues 72496 are 0.22 Å between the RB and N1N2
peptide-m-Imp
complexes; 0.35 and 0.31 Å between full-length
Imp
and the RB and N1N2 peptide-m-Imp
complexes, respectively;
and 0.34 and 0.30 Å between nucleoplasmin peptide-m-Imp
and the
RB and N1N2 peptide-m-Imp
complexes, respectively.)
|
Binding of the NLS Peptides to Importin-The
peptides bind in an extended conformation with the chain running antiparallel
to the direction of the Arm repeat superhelix
(Fig. 2). The base of the
groove that contains the binding sites is formed mainly by the H3 helices of
the Arm repeats, which carry some residues conserved among the repeats,
including the tryptophans and asparagines at the third and fourth turns in H3
helices of the Arm repeats, respectively
(10,
11).
The two basic clusters of the RB and N1N2 peptides bind to two separate
well defined binding sites on the surface of the m-Imp molecule,
referred to as the minor and major sites
(Fig. 2). The minor site
specifically binds to the N-terminal basic cluster KR, and the larger,
C-terminal basic cluster binds to the major site. The electron density is
present for 20 peptide residues in the RB peptide (average B-factor, 63.3
Å2) and for 21 peptide residues of the N1N2 peptide (average
B-factor, 72.1 Å2) (Fig.
1). In both complexes, the linker sequences connecting the major
and minor sites (residues 863873 and 539550 for RB and N1N2,
respectively) have B-factors above the average number for the entire peptides
(71.1 and 82.8 Å2 for RB and N1N2, respectively). By
contrast, the residues bound to the major sites of both complexes have lower
B-factors (40.5 and 47.6 Å2 for position
P2P5 of RB and N1N2, respectively), reflecting
the strong interaction of these residues with the protein.
There is 2573 and 2715 Å2 of surface area buried between
m-Imp and the RB and N1N2 peptide, respectively. All residues of the RB
peptide, except residues 865 and 869, and of the N1N2 peptide, except residues
542 and 546, make contacts with m-Imp
at distances below 4 Å
(Fig. 3).
|
The minor and major site portions of the RB and N1N2 peptides have very
similar structures (Fig. 4);
after superposition of the equivalent C atoms, the r.m.s. deviation of
the residues in positions P1P6 is 0.19 Å,
and the r.m.s. deviation of the residues in positions
P1'P3' is 0.03 Å. By contrast,
they adopt very different conformations in the linker regions between
positions P3' and P1; the path of the peptide
chain is more linear in the case of RB than N1N2. The bipartite NLS linker
sequence of both peptides makes favorable interactions with the H3 helices of
armadillo repeats 47 of Imp
. The most important contacts are
limited to residues Asn868 and Lys871 of RB and
Lys547 of N1N2.
|
Among other interactions, the conserved residues Arg315 (Arm 6)
and Tyr277 (Arm 5) of Imp that interrupt the regularity of
the Trp-Asn array (11) make
extensive main chain and side chain contacts with the peptides. These residues
also interact with nucleoplasmin NLS in mammalian and yeast Imp
(10,
11). However, in that case,
the interaction involves only the main chain of the peptide. Other important
contacts are made with Arg238 (main chain of RB and N1N2),
Ser276 (side chain of N1N2), and Ser234 (side chain of
RB).
Comparison with Other NLS Peptide-Importin- Complex
StructuresSignificantly, the work presented here allows us for the
first time to perform a detailed comparison of the structural determinants of
binding of a number of NLSs to Imp
, to align the NLSs with the binding
pockets on Imp
with some confidence, and to draw some general
conclusions on the specificity of NLS binding
(Table II). Despite diverse
sequences, the binding of both basic clusters (positions
P1P5 and
P1'P2') is similar in all the
available structures, and the major differences occur in the linker regions
connecting the basic clusters (Fig.
4). The exceptions are the N- and C-terminal portions of
nucleoplasmin NLS bound to m-Imp
, where the side chains of
Lys155 (position P1'; N terminus of the peptide)
and Lys170 (position P5; C terminus of the peptide)
follow the direction of the main chain of the other peptides. Because all the
other peptides contain at least one additional residue at the N and C termini,
the conformation of the nucleoplasmin NLS peptide may be an artifact of the
short length of the peptide. The case of nucleoplasmin highlights the
importance of residues preceding position P1' and following
position P6.
|
The structures of nucleoplasmin NLS bound to y-Imp and m-Imp
are significantly different (r.m.s. deviation of C
atoms of residues
155170 is 2.44 Å). In addition to the differences caused by the
different lengths of the peptides discussed above, some differences may be
explained by structural differences between the two Imp
proteins; the
yeast structure is slightly more "open" than the mouse structure
(12). Most of the differences
are found in the region comprising residues 159165 (there is high
structural similarity when only the major (r.m.s. deviation of C
atoms
for positions P2P5 is 0.28 Å) and minor
(r.m.s. deviation of C
atoms for positions P1'-P3' is 0.24
Å) sites are superimposed). With the basic clusters binding most
tightly, the different curvatures of the two Imp
proteins appear to be
compensated in the linker region. Although some linker region residues have
different conformations, the main contacts of this region with Imp
are
comparable in both nucleoplasmin structures. The most important interactions
occur for the main chain of conserved residues Arg315,
Arg238, and Tyr277 of m-Imp
(Arg321,
Arg244, and Tyr283 of y-Imp
).
The portions of RB and N1N2 peptides bound in the major and minor sites
superimpose closely with the nucleoplasmin NLS (the r.m.s. deviations of
C atoms of positions P1P5 and positions
P1'P3' are 0.34 and 0.42 Å
between RB and nucleoplasmin and 0.33 and 0.42 Å between N1N2 and
nucleoplasmin, respectively). The structure of the nucleoplasmin linker region
is more similar to that of RB than that of N1N2 (r.m.s. deviation of C
atoms of linker region is 1.70 Å between RB (864873) and
nucleoplasmin (157165) and 2.20 Å between N1N2 (541550)
and nucleoplasmin (157165)), mainly in the region closest to the major
site (residues 162165 for nucleoplasmin). The region of the linker
closest to the major site is structurally conserved best among the three
bipartite NLS peptides.
Binding of Bipartite NLS Linker RegionsThe nucleoplasmin
NLS-Imp complexes (10,
11) showed that the main chain
of the linker region binds to the conserved Imp
residues
Arg315 (Arm 6) and Tyr277 (Arm 5) (Arg321 and
Tyr283 for y-Imp
; these residues interrupt the regularity of
Trp-Asn array (11)). The
conserved residue Arg238 (Arm 4) (Arg244 for
y-Imp
) also binds to the main chain of nucleoplasmin NLS. By contrast,
the structures of RB and N1N2-m-Imp
complexes show the binding of
Arg315 and Tyr277 to the side chains of peptides
(Arg238 binds to the main chain of the peptides). These
observations suggest that residues Arg315, Tyr277, and
Arg238 play crucial roles in binding bipartite NLSs; however, these
interactions are specific to individual NLSs.
Other important contacts of the peptide linker with m-Imp involve
the residues Asn868 and Lys871 for the RB peptide and
Lys547 for the N1N2 peptide; all of them are situated close to the
major site. This is the region of the linker most structurally conserved among
the three bipartite NLS peptides. The electron density maps of RB and N1N2
have a superior quality in this region as compared with the rest of linker.
These data suggest that the C-terminal portion of the linker (closest to the
major site) plays an important role in binding to Imp
. These data are
supported by NLS binding studies
(15,
17,
19,
21) and structural analyses of
extended simian virus 40 (SV40) large tumor-antigen (T-Ag) peptide
complexes.2 The
electron density maps suggest that the RB peptide is better ordered overall,
consistent with a larger number of contacts.
Not every linker residue is able to make contacts with the protein. Because
of the differences in the lengths of the linkers between RB (11 residues, if
we define the linker sequence as the sequence between sites
P2' and P2) and N1N2 (12 residues), the RB peptide
binds to m-Imp in a more extended conformation than the N1N2. The
11-residue length appears more favorable for binding; 12 residues require some
short turns and force the chain farther from Imp
, precluding some
favorable interactions. Importantly, this is consistent with the observation
that incorporation of QPWL in the linker region of nucleoplasmin NLS reduced
the efficiency of nuclear import
(34).
The affinities for mouse Imp of N1N2 and RB NLSs fused to
-galactosidase have been determined using enzyme-linked immunosorbent
assay assays (15,
19). N1N2 binds with a higher
affinity than RB to both Imp
/
complex
(Kd measured as 5.4 nM for N1N2 and 45
nM for RB) and Imp
alone (Kd
measured as 22 or N1N2 and 180 nM for RB). Therefore, although the
linker region appears to interact more favorably in the case of RB, it is most
likely the presence of Lys in position P5 that is responsible for
the higher affinity of the recognition of N1N2 NLS by Imp
(see
below).
The Role of Minor Site BindingMutagenesis studies with
nucleoplasmin (34), RB
(15), and N1N2
(19) NLSs revealed that
substitutions of P1' and P2' residues to other than Arg or Lys
abolished nuclear localization. The binding of Lys-Arg at positions
P1' and P2' is observed in all available
bipartite NLS peptide-Imp complex structures. Similarly, the majority
of the bipartite NLSs characterized contain a Lys-Arg sequence in the
N-terminal cluster (35).
Comparison of the structures here reveals a high structural similarity of the
binding of the Arg side chain in the position P2' in all
cases. The Arg side chain is situated at the groove created by the tryptophans
Trp399 and Trp357 (Trp405 and
Trp363 for y-Imp
) located in H3 helices of Arm repeats 7 and
8 and also contacts Glu396 and Ser360 (Glu402
and Ser366 in y-Imp
); these residues are conserved among
known Imp
sequences. The interaction with Glu396 appears
particularly important because the distance from the Arg side chain is nearly
identical in all the structures. The binding of T-Ag at the minor site
(11) represents a model for
binding of Lys (instead of Arg) at P2'; the peptide
126PKKKRKV132 places Lys128 and
Lys129 at positions P1' and P2',
respectively. This likely occurs because it is more favorable to have some
amino acid binding at positions P4' and P5',
rather than having an Arg at P2'. However, some evidence of
staggering in different registers was observed in the crystal structure
(11). The Lys at position
P2' of T-Ag contacts Glu396 and Ser360
at approximately the same distance but does not contact Trp399,
losing one favorable contact. The monopartite NLS from c-Myc binds with
Lys-Arg in the P1'P2' positions
(10).
A Lys side chain at position P1' makes less favorable
contacts than Arg at position P2', and this position appears
less important than the P2' position for bipartite NLS
binding. The P1' Lys of RB and N1N2 peptides makes
interactions with the side chains of the conserved Thr328 and
Asn361 and the main chain of Val321. Also close by is
Asp325. The P1' Lys of nucleoplasmin bound to
y-Imp also contacts the equivalent of Thr328 and
Val321 but not Asn361. The pocket prefers a Lys residue
because an Arg side chain is too long to make similar favorable contacts in
the P1' pocket.
Finally, we suggest that the positions preceding the P1'
and following P2' contribute significantly to the minor site
binding. This is consistent with the side chain of the N-terminal
Lys155 (position P1') of the nucleoplasmin peptide
preferring to follow the main chain of the other peptides instead of binding
in the regular P1' pocket. Also, the T-Ag peptide
126PKKKRKV132 binds to Imp with two Lys residues
in P1' and P2' instead of the more favorable
Arg at P2', possibly so that it can place some amino acid in
positions P4' and P5'.
The NLS ConsensusThe definition of an NLS consensus based
on an analysis of sequences alone is difficult due to the diversity of
sequences that can function as nuclear targeting signals. The structural data,
the accumulated knowledge on functional NLSs, and mutagenesis studies now
allow us to better define the conventional basic/bipartite NLS consensus that
involves Imp binding. The ability to recognize NLSs is of significance,
as it could help with the functional annotation of new proteins predicted in
genomic sequences.
We used an approach to define the NLS consensus similar to that used
previously to define optimal substrates for protein kinases
(36). The structures of
Imp-NLS peptides were analyzed by molecular modeling in conjunction
with other available information. This allows us to draw some conclusions
about individual positions in the NLS.
One general observation is that the binding pockets for individual side
chains are much better defined in the major site than in the minor site. This
observation is consistent with the contribution of the N-terminal basic
cluster (minor site) in a bipartite NLS of 4 kcal/mol, which is
comparable with the loss of a single Lys side chain in the major basic cluster
(21). A large portion of the
free energy of binding is contributed by the main chain of the peptide
(estimated at
6.9 kcal/mol), but a functional NLS requires an additional
contribution of the side chains of about 4.2 kcal/mol; this contribution by
the side chains accounts for the specificity of nuclear import. The best
defined pockets include P2, P3, and P5 of the
major site. P2 is defined mainly by residues Thr155 and
Asp192 in adjacent Arm repeats of Imp
and is well suited for
binding a Lys side chain. Arg can be accommodated only to a lesser degree,
consistent with the loss of 2.7 kcal/mol for the Lys to Arg mutation in the
context of T-Ag (21). Side
chains such as Thr, Gln, and Pro could be accommodated but would contribute
less to binding; there is a substantial electrostatic contribution to binding
in this pocket. P3, comprising Glu266 and
Asp270 as the major binding determinant, prefers Arg over Lys based
on Ala mutagenesis; this is likely because an Arg can reach closer to
Glu266 and Asp270. The energetic contribution of the
side chains in the P3 site is about two-thirds of the contribution
in P2, with a smaller contribution of the electrostatic terms. The
modeling suggests that P5, with the primary binding residue
Gln81, is best suited for binding a Lys side chain. The
electrostatic contribution is very minor, and positive charge is not strictly
required; due to size, Gln and Glu would be favored over Arg. The energetic
contribution of this site is similar to P3
(21).
The other pockets are less specific. The P4 pocket is relatively
large and should favor Arg because this is the only side chain that can reach
to the main chain of Imp Arg106 and Glu107 and
make favorable interactions. However, hydrophobic side chains such as Pro,
Val, Ile, or Leu could also be accommodated relatively well. The
discrimination between Arg and Val is only minor based on the Ala mutagenesis
results of T-Ag and c-Myc NLSs, and the contribution of this pocket is about
four times smaller than that of P2. P1, with
Asp270 responsible for specificity, prefers Lys over Arg, whereas
side chains such as Gln, Pro, or Gly could easily be tolerated, and the side
chain contribution in this pocket is only half of that of pocket
P4. The pocket preceding P1 has Trp231 as the
major specificity determinant and also prefers Lys over residues such as Arg,
Pro, Ala, Glu, or Gly, but is also not very specific.
The minor site is much less defined than the major site. Only the sites P1' and P2' appear to be able of significant discrimination between different peptide side chains, with Lys and Arg favored in those two positions, respectively, as discussed above. The N-terminal basic cluster may play a role in relieving autoinhibition (the minor site is not autoinhibited (12)).
We conclude that the optimal basic/bipartite consensus sequence for binding
Imp is
KRX1012KRRK, with Lys
at position P2 (bold and underlined) the most important specificity
determinant that also forces the rest of the sequence to bind in one register.
The basic residues at positions P3 and P5 (underlined)
are also significant (Table
II). To facilitate accurate NLS identification in novel sequences,
the motif could be represented in a matrix format with probabilities for each
of the 20 amino acids defined at each position
(31); this should be possible
when experimental data are available through a peptide library
(37) or equivalent
experiment.
Our discussion assumes that there is a linear correlation between
Imp binding and the rate of nuclear import, as the available data
suggest (15,
16,
38). However, a functional NLS
may display both lower and upper limits in affinity (the NLS needs to both to
bind and release) (21).
Therefore, the optimal Imp
-binding sequence is not necessarily an
optimal NLS in terms of the overall nuclear import process. It should be
possible to define both limits experimentally and find a correlation between
the optimal Imp
-binding sequence and the optimal NLS.
An alternative approach of finding NLSs is to build an expert data base of
experimentally known NLSs and extend it through "in silico
mutagenesis," as implemented in PredictNLS
(33). The accuracy and
coverage of this approach depends on the number of experimentally known NLSs.
This approach does not distinguish between different nuclear import pathways,
does not have a significant predictive power, and thus is complementary to our
approach of defining the conventional basic/bipartite NLS consensus. A
comparison of the efficiency of NLS detection in known nuclear proteins
between PredictNLS and our consensus-based search using Scansite
(31) revealed the latter
approach to be superior, although a significant portion of the nuclear
proteins does not utilize the Imp/Imp
-dependent import pathway
(see "Experimental Procedures").
ConclusionsThe structures of the complexes of mammalian
Imp with the peptides RB and N1N2, corresponding to bipartite NLS, and
their comparisons with other available Imp
structures, provide new
insights into the molecular basis of nuclear import and its regulation.
Although the binding of both basic clusters is comparable in all the
structures, the linker region connecting the two basic clusters of an NLS
makes different but specific contacts with Imp
in the cases of all
three common linker lengths (10, 11, and 12 residues). The most critical
region in the linker for binding is the extreme C-terminal end of the linker
sequence. The most important residue binding in the minor site is an Arg at
position P2' of the NLS, although the surrounding residues
make significant contributions. The integration of the structural information
with sequences of characterized NLSs and mutagenesis data allows us to provide
an improved consensus sequence for the conventional basic/bipartite NLS,
KRX1012KRRK.
![]() |
FOOTNOTES |
---|
The atomic coordinates and structure factors (code 1PJM and 1PJN) have been deposited in the Protein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers University, New Brunswick, NJ (http://www.rcsb.org/).
* This work was supported by the Australian Research Council (to B.K.). The
costs of publication of this article were defrayed in part by the payment of
page charges. This article must therefore be hereby marked
"advertisement" in accordance with 18 U.S.C. Section 1734
solely to indicate this fact.
¶ Supported by the Fundação de Amparo à Pesquisa do
Estado de São Paulo, Brazil.
A National Health and Medical Research Council (NHMRC) Senior Research
Fellow.
¶¶ Formerly a Wellcome Senior Research Fellow in Medical Science in Australia and currently an NHMRC Senior Research Fellow. To whom correspondence should be addressed. Tel.: 61-7-3365-2132; Fax: 61-7-3365-4699; E-mail: b.kobe{at}mailbox.uq.edu.au.
1 The abbreviations used are: NLS, nuclear localization sequence; Arm repeat,
armadillo repeat; Imp, importin-
; Imp
, importin-
;
m-Imp
, mouse importin-
(residues 70529); y-Imp
,
yeast importin-
(residues 88530); N1N2, X. laevis
phosphoprotein N1N2; RB, human retinoblastoma protein; T-Ag, simian virus 40
large T-antigen.
2 M. R. M. Fontes, T. Teh, G. Toth, A. John, I. Pavo, D. A. Jans, and B.
Kobe, unpublished results.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|