Structural Basis for the Specificity of Bipartite Nuclear Localization Sequence Binding by Importin-{alpha}*

Marcos R. M. Fontes {ddagger} § , Trazel Teh {ddagger} ||, David Jans ** {ddagger}{ddagger} §§, Ross I. Brinkworth || and Bostjan Kobe {ddagger} || ¶¶

From the {ddagger}Structural Biology Laboratory, St. Vincent's Institute of Medical Research, 41 Victoria Parade, Fitzroy, Victoria 3065, Australia, the §Departamento de Física e Biofísica, Instituto de Biociências, Universidade Estadual Paulista - UNESP, C. P. 510, 18618-000, Botucatu, SP, Brazil, the ||Department of Biochemistry and Molecular Biology, Institute for Molecular Bioscience, Australian Research Council Special Research Centre for Functional and Applied Genomics, and Cooperative Research Centre for Chronic Inflammatory Diseases, University of Queensland, Brisbane, Queensland 4072, Australia, **Nuclear Signalling Laboratory, Division for Biochemistry and Molecular Biology, John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia, and the {ddagger}{ddagger}Department for Biochemistry and Molecular Biology, Monash University, Clayton, Victoria 3168, Australia

Received for publication, March 31, 2003


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS AND DISCUSSION
 REFERENCES
 
Importin-{alpha} is the nuclear import receptor that recognizes cargo proteins carrying conventional basic monopartite and bipartite nuclear localization sequences (NLSs) and facilitates their transport into the nucleus. Bipartite NLSs contain two clusters of basic residues, connected by linkers of variable lengths. To determine the structural basis of the recognition of diverse bipartite NLSs by mammalian importin-{alpha}, we co-crystallized a non-autoinhibited mouse receptor protein with peptides corresponding to the NLSs from human retinoblastoma protein and Xenopus laevis phosphoprotein N1N2, containing diverse sequences and lengths of the linker. We show that the basic clusters interact analogously in both NLSs, but the linker sequences adopt different conformations, whereas both make specific contacts with the receptor. The available data allow us to draw general conclusions about the specificity of NLS binding by importin-{alpha} and facilitate an improved definition of the consensus sequence of a conventional basic/bipartite NLS (KRX10–12KRRK) that can be used to identify novel nuclear proteins.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS AND DISCUSSION
 REFERENCES
 
Nucleocytoplasmic transport occurs through nuclear pore complexes, large proteinaceous structures that penetrate the double lipid layer of the nuclear envelope. Most macromolecules require an active, signal-mediated transport process that enables the passage of particles up to 25 nm in diameter (~25 MDa). The best characterized nuclear targeting signals are the conventional nuclear localization sequences (NLSs)1 that contain one or more clusters of basic amino acids (1). The NLSs fall into two distinct classes termed monopartite NLSs, containing a single cluster of basic amino acids, and bipartite NLSs, comprising two basic clusters separated by a spacer.

Despite the variability, the conventional basic NLSs are recognized by the same receptor protein termed importin or karyopherin, a heterodimer of {alpha} and {beta} subunits (for recent reviews, see Refs. 24). Importin-{alpha} (Imp{alpha}) contains the NLS-binding site, and importin-{beta} (Imp{beta}) is responsible for the translocation of the importin-substrate complex through the nuclear pore complex. Once inside the nucleus, Ran-GTP binds to Imp{beta} and causes the dissociation of the import complex. Imp{alpha} becomes autoinhibited, and both importin subunits return to the cytoplasm separately without the import cargo. The directionality of nuclear import is conferred by an asymmetric distribution of the GTP- and GDP-bound forms of Ran between the cytoplasm and the nucleus. This distribution is in turn controlled by various Ran-binding regulatory proteins.

Imp{alpha} consists of two structural and functional domains, a short basic N-terminal Imp{beta}-binding domain (57) and a large NLS-binding domain built of armadillo (Arm) repeats (8). The structural basis of monopartite and bipartite NLS recognition by Imp{alpha} has been studied crystallographically in yeast and mouse Imp{alpha} proteins (911). The two basic clusters of the bipartite NLSs bind to two separate binding sites on Imp{alpha}, involving Arm repeats 1–4 and 4–8, respectively. Monopartite NLSs can bind in both sites but primarily use the binding site corresponding to the C-terminal basic cluster of the bipartite NLSs, referred to as the major site (9, 11).

The structure of full-length Imp{alpha} indicated that the major NLS-binding site is occupied by residues 44–54 from the N-terminal region of the protein (Imp{beta}-binding domain) that resembles an NLS (12); Imp{alpha} is therefore autoinhibited in the absence of Imp{beta}. This observation is supported by the measurements of higher NLS binding affinity by Imp{alpha}/{beta} as compared with Imp{alpha} alone (1322). The significance of the autoinhibitory mechanism has been confirmed by in vivo studies (23).

In this study, we present the crystal structures of peptides corresponding to the bipartite NLSs from human retinoblastoma protein (RB) and Xenopus laevis chromatin assembly factor N1N2 bound to mouse Imp{alpha}. These NLSs were chosen to represent diverse sequences and different lengths of the linkers between the clusters, so that some general conclusions can be drawn on NLS binding. The basic clusters of both peptides bind in the expected binding pockets, but the linker regions make specific contacts with the receptor also. Comparisons with other available Imp{alpha} structures allow us to explain the specificities of monopartite and bipartite NLS binding and help us improve the definition of the consensus sequence of a conventional basic/bipartite NLS. The results will have general implications for recognizing the NLSs in new gene products identified in genome sequences and therefore for functional annotation of new proteins.


    EXPERIMENTAL PROCEDURES
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS AND DISCUSSION
 REFERENCES
 
Peptide Synthesis—The peptides CGKRSAEGSNPPKPLKKLRGY (RB peptide) and CGRKKRKTEEESPLKDKAKKSKGY (N1N2 peptide) were synthesized using the Applied Biosystems 433A peptide synthesizer, purified by cation exchange chromatography followed by reverse phase chromatography, and analyzed by quantitative amino acid analysis using a Beckman 6300 amino acid analyzer and electrospray mass spectrometry (Sciex API 111, PerkinElmer Life Sciences) (24). The peptides RB and N1N2 correspond to the NLSs of human retinoblastoma protein, residues 861–877, and X. laevis N1N2 protein (N1N2), residues 535–555, respectively, with two heterologous residues added at each terminus.

Protein Expression, Purification, and Crystallization—N-terminal truncated mouse importin {alpha} ({alpha}2 isoform (25)) lacking 69 N-terminal residues (m-Imp{alpha}) was expressed recombinantly in Escherichia coli as a fusion protein containing a hexa-histidine tag (11). For crystallization, m-Imp{alpha} was concentrated to 18.8 mg/ml (in 20 mM Tris-HCl (pH 8.0), 100 mM NaCl, and 10 mM dithiothreitol) using a Centricon-30 (Millipore) and stored at –20 °C. Crystallization conditions were screened by systematically altering various parameters using the crystallization conditions successful for other peptide complexes (11) as a starting point. The crystals of both complexes (rod-shaped, 0.5 x 0.2 x 0.1 mm for the RB peptide complex and 0.4 x 0.1 x 0.07 mm for the N1N2 peptide complex) were obtained using co-crystallization by combining 1 µl of protein solution, 0.7 µl of peptide solution (1.7 mg/ml with peptide/protein ratio 3.5), and 1 µl of reservoir solution and suspended over 0.5 ml of reservoir solution containing 0.6 M sodium citrate (pH 6.0) and 10 mM dithiothreitol.

Diffraction Data Collection—The crystals exhibit orthorhombic symmetry (space group P212121; Table I). Diffraction data were collected from single crystals transiently soaked in a solution analogous to the reservoir solution but supplemented with 23% glycerol and flash-cooled at 100 K in a nitrogen stream (Oxford Cryosystems), using a MAR-Research image plate detector (plate diameter, 345 mm) and CuK{alpha} radiation from a Rigaku RU-200 rotating anode generator. Data were autoindexed and processed with the HKL suite (26) (Table I).


View this table:
[in this window]
[in a new window]
 
TABLE I
Structure determination

 

Structure Determination and Refinement—The crystals of the peptide complexes were highly isomorphous with the crystals of full-length Imp{alpha} (12); therefore, the structure of mouse Imp{alpha} (Protein Data Bank number 1IAL [PDB] ) with N-terminal residues omitted was used as a starting model for crystallographic refinement. Electron density maps were inspected for the presence of the peptide after rigid body refinement using the program CNS (27) (RB peptide: m-Imp{alpha} complex, Rcryst = 30.5%, Rfree = 32.4%, 6–4-Å resolution; N1N2 peptide: m-Imp{alpha} complex, Rcryst = 30.8%, Rfree = 35.1%, 6–4-Å resolution; Table I provides the explanation of R-factors). Electron density maps calculated with coefficients 3 Fobs–2 Fcalc and simulated annealing omit maps (Fig. 1) calculated with analogous coefficients were generally used. The model was improved, as judged by the free R-factor (28), through rounds of crystallographic refinement (positional and restrained isotropic individual B-factor refinement with an overall anisotropic temperature factor and bulk solvent correction) and manual rebuilding (program O (29)). Solvent molecules were added with the program CNS (27). Asn239 is an outlier in the Ramachandran plot as also observed in all other structures of mouse Imp{alpha} (11, 12). Pro242 is a cis-proline. The final models comprise 427 Imp{alpha} residues (residues 71–497), 20 peptide residues, and 173 water molecules for RB peptide-Imp{alpha} complex, and 426 Imp{alpha} residues (residues 72–496), 21 peptide residues, and 123 water molecules for N1N2 peptide-Imp{alpha} complex (Table I). The coordinates have been deposited in the Protein Data Bank (Protein Data Bank numbers 1PJM and 1PJN for the RB and N1N2 peptide complexes, respectively).



View larger version (28K):
[in this window]
[in a new window]
 
FIG. 1.
Structure determination. A, stereo view of the electron density (drawn with the program BOBSCRIPT (39)) in the region of the RB peptide bound to the major binding site of m-Imp{alpha}. All peptide residues were omitted from the model, and simulated annealing was run with the starting temperature of 1000 K. The electron density map was calculated with coefficients 3 Fobs–2 Fcalc and data between 40- and 2.5-Å resolution and contoured at 1.5 standard deviations. The refined model of the peptide is superimposed. B, stereo view of the electron density (contoured at 1.5 standard deviations) in the region of the N1N2 peptide bound to the major binding site of m-Imp{alpha}, shown as in A.

 

Structure Analysis—The quality of the models was assessed with the program PROCHECK (30). The contacts were analyzed with the program CONTACT, and the buried surface areas were calculated using the program CNS (27).

Bioinformatic Analysis—We used the consensus sequence KRRK to search for NLS-containing proteins using the Quick Matrix option of Scansite (31); the sequence KRXK was entered for the primary preference positions 0 to +3, and R was entered into the secondary preference position +2. The bipartite consensus was too long to use with the current version of Scansite. Testing using proteins with known NLSs showed that there is a high likelihood of detecting a functional NLS at Scansite scores <= 0.0408 (corresponding to 0.448% of all yeast proteins) and a reasonable likelihood at Scansite scores <= 0.0596 (corresponding to 1.169% of all yeast proteins). To estimate the efficiency of detecting an unknown NLS, we performed a Scansite search with a test set of 50 randomly selected yeast nuclear proteins (Munich Information Center for Protein Sequences subcellular catalogue (32)); 9 proteins showed a sequence match to the above motif with Scansite scores <= 0.0408, and 23 proteins (46%) showed a match with Scansite scores <= 0.0569. For comparison, we searched for NLSs in the same test set of proteins using PredictNLS (33); this method detected an NLS in nine proteins, some of which did not belong to the conventional basic/bipartite group.


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS AND DISCUSSION
 REFERENCES
 
Structure Determination—The RB and N1N2 NLS peptides were co-crystallized with an N-terminal truncated mouse Imp{alpha} lacking residues 1–69 (m-Imp{alpha}); residues 1–69 are responsible for autoinhibition. The co-crystals with both peptides grew in similar conditions and isomorphously to other mouse Imp{alpha} crystals (11, 12). Electron density maps based on the Imp{alpha} model, following rigid body refinement, clearly showed electron density corresponding to the peptides (Fig. 1). The structures were refined at 2.5-Å resolution for both complexes (Table I).

Residues 859–878 of the peptide RB (residue 879 had no interpretable electron density) and residues 535–555 of the peptide N1N2 (residues 533–534 and 556 had no interpretable electron density) could unambiguously be identified in the electron density maps (Fig. 1). The side chains of Glu543 and Lys549 of N1N2 were poorly ordered; therefore, these residues were modeled as alanines.

Structure of Importin-{alpha} in the Complexes—Imp{alpha} forms a single elongated domain built from 10 Arm structural repeats, each containing three {alpha} helices (H1, H2, and H3) connected by loops (Fig. 2). The structure of Imp{alpha} in the complexes is comparable with the crystal structure of the full-length Imp{alpha}. (r.m.s. deviations of C{alpha} atoms of Imp{alpha} residues 72–496 are 0.22 Å between the RB and N1N2 peptide-m-Imp{alpha} complexes; 0.35 and 0.31 Å between full-length Imp{alpha} and the RB and N1N2 peptide-m-Imp{alpha} complexes, respectively; and 0.34 and 0.30 Å between nucleoplasmin peptide-m-Imp{alpha} and the RB and N1N2 peptide-m-Imp{alpha} complexes, respectively.)



View larger version (72K):
[in this window]
[in a new window]
 
FIG. 2.
Structures of complexes. A, structure of RB peptide-m-Imp{alpha} complex. m-Imp{alpha} is shown as a ribbon diagram (yellow; drawn with the program RIBBONS (40)). The superhelical axis of the repetitive part of the molecule is approximately horizontal. The NLS peptide is shown in a ball-and-stick representation, colored blue. B, structure of N1N2 peptide-m-Imp{alpha} complex, shown as in A. The bound peptide is colored red.

 

Binding of the NLS Peptides to Importin-{alpha}The peptides bind in an extended conformation with the chain running antiparallel to the direction of the Arm repeat superhelix (Fig. 2). The base of the groove that contains the binding sites is formed mainly by the H3 helices of the Arm repeats, which carry some residues conserved among the repeats, including the tryptophans and asparagines at the third and fourth turns in H3 helices of the Arm repeats, respectively (10, 11).

The two basic clusters of the RB and N1N2 peptides bind to two separate well defined binding sites on the surface of the m-Imp{alpha} molecule, referred to as the minor and major sites (Fig. 2). The minor site specifically binds to the N-terminal basic cluster KR, and the larger, C-terminal basic cluster binds to the major site. The electron density is present for 20 peptide residues in the RB peptide (average B-factor, 63.3 Å2) and for 21 peptide residues of the N1N2 peptide (average B-factor, 72.1 Å2) (Fig. 1). In both complexes, the linker sequences connecting the major and minor sites (residues 863–873 and 539–550 for RB and N1N2, respectively) have B-factors above the average number for the entire peptides (71.1 and 82.8 Å2 for RB and N1N2, respectively). By contrast, the residues bound to the major sites of both complexes have lower B-factors (40.5 and 47.6 Å2 for position P2–P5 of RB and N1N2, respectively), reflecting the strong interaction of these residues with the protein.

There is 2573 and 2715 Å2 of surface area buried between m-Imp{alpha} and the RB and N1N2 peptide, respectively. All residues of the RB peptide, except residues 865 and 869, and of the N1N2 peptide, except residues 542 and 546, make contacts with m-Imp{alpha} at distances below 4 Å (Fig. 3).



View larger version (54K):
[in this window]
[in a new window]
 
FIG. 3.
Peptide-importin-{alpha} interaction. A, schematic diagram of the interactions between the RB peptide and m-Imp{alpha}. Polar contacts are shown with dashed lines, and hydrophobic contacts are indicated by arcs with radiating spokes. The NLS peptide residues are labeled with R. The water molecules are labeled with S. Carbon, nitrogen, and oxygen atoms are shown in black, white, and gray, respectively. This figure was prepared with the program LIGPLOT (41). B, schematic diagram of the interactions between the N1N2 peptide and m-Imp{alpha}, shown as in A. The NLS peptide residues are labeled with N.

 

The minor and major site portions of the RB and N1N2 peptides have very similar structures (Fig. 4); after superposition of the equivalent C{alpha} atoms, the r.m.s. deviation of the residues in positions P1–P6 is 0.19 Å, and the r.m.s. deviation of the residues in positions P1'–P3' is 0.03 Å. By contrast, they adopt very different conformations in the linker regions between positions P3' and P1; the path of the peptide chain is more linear in the case of RB than N1N2. The bipartite NLS linker sequence of both peptides makes favorable interactions with the H3 helices of armadillo repeats 4–7 of Imp{alpha}. The most important contacts are limited to residues Asn868 and Lys871 of RB and Lys547 of N1N2.



View larger version (16K):
[in this window]
[in a new window]
 
FIG. 4.
Superposition of the RB (green), N1N2 (red), and nucleoplasmin (cyan) (11) NLS peptides bound to m-Imp{alpha}. The C{alpha} atoms of m-Imp{alpha} in the three complex structures were used in the superposition. This figure was drawn with the program RIBBONS (40).

 

Among other interactions, the conserved residues Arg315 (Arm 6) and Tyr277 (Arm 5) of Imp{alpha} that interrupt the regularity of the Trp-Asn array (11) make extensive main chain and side chain contacts with the peptides. These residues also interact with nucleoplasmin NLS in mammalian and yeast Imp{alpha} (10, 11). However, in that case, the interaction involves only the main chain of the peptide. Other important contacts are made with Arg238 (main chain of RB and N1N2), Ser276 (side chain of N1N2), and Ser234 (side chain of RB).

Comparison with Other NLS Peptide-Importin-{alpha} Complex Structures—Significantly, the work presented here allows us for the first time to perform a detailed comparison of the structural determinants of binding of a number of NLSs to Imp{alpha}, to align the NLSs with the binding pockets on Imp{alpha} with some confidence, and to draw some general conclusions on the specificity of NLS binding (Table II). Despite diverse sequences, the binding of both basic clusters (positions P1–P5 and P1'–P2') is similar in all the available structures, and the major differences occur in the linker regions connecting the basic clusters (Fig. 4). The exceptions are the N- and C-terminal portions of nucleoplasmin NLS bound to m-Imp{alpha}, where the side chains of Lys155 (position P1'; N terminus of the peptide) and Lys170 (position P5; C terminus of the peptide) follow the direction of the main chain of the other peptides. Because all the other peptides contain at least one additional residue at the N and C termini, the conformation of the nucleoplasmin NLS peptide may be an artifact of the short length of the peptide. The case of nucleoplasmin highlights the importance of residues preceding position P1' and following position P6.


View this table:
[in this window]
[in a new window]
 
TABLE II
Binding of NLSs to specific binding pockets of importin-a

 

The structures of nucleoplasmin NLS bound to y-Imp{alpha} and m-Imp{alpha} are significantly different (r.m.s. deviation of C{alpha} atoms of residues 155–170 is 2.44 Å). In addition to the differences caused by the different lengths of the peptides discussed above, some differences may be explained by structural differences between the two Imp{alpha} proteins; the yeast structure is slightly more "open" than the mouse structure (12). Most of the differences are found in the region comprising residues 159–165 (there is high structural similarity when only the major (r.m.s. deviation of C{alpha} atoms for positions P2–P5 is 0.28 Å) and minor (r.m.s. deviation of C{alpha} atoms for positions P1'-P3' is 0.24 Å) sites are superimposed). With the basic clusters binding most tightly, the different curvatures of the two Imp{alpha} proteins appear to be compensated in the linker region. Although some linker region residues have different conformations, the main contacts of this region with Imp{alpha} are comparable in both nucleoplasmin structures. The most important interactions occur for the main chain of conserved residues Arg315, Arg238, and Tyr277 of m-Imp{alpha} (Arg321, Arg244, and Tyr283 of y-Imp{alpha}).

The portions of RB and N1N2 peptides bound in the major and minor sites superimpose closely with the nucleoplasmin NLS (the r.m.s. deviations of C{alpha} atoms of positions P1–P5 and positions P1'–P3' are 0.34 and 0.42 Å between RB and nucleoplasmin and 0.33 and 0.42 Å between N1N2 and nucleoplasmin, respectively). The structure of the nucleoplasmin linker region is more similar to that of RB than that of N1N2 (r.m.s. deviation of C{alpha} atoms of linker region is 1.70 Å between RB (864–873) and nucleoplasmin (157–165) and 2.20 Å between N1N2 (541–550) and nucleoplasmin (157–165)), mainly in the region closest to the major site (residues 162–165 for nucleoplasmin). The region of the linker closest to the major site is structurally conserved best among the three bipartite NLS peptides.

Binding of Bipartite NLS Linker Regions—The nucleoplasmin NLS-Imp{alpha} complexes (10, 11) showed that the main chain of the linker region binds to the conserved Imp{alpha} residues Arg315 (Arm 6) and Tyr277 (Arm 5) (Arg321 and Tyr283 for y-Imp{alpha}; these residues interrupt the regularity of Trp-Asn array (11)). The conserved residue Arg238 (Arm 4) (Arg244 for y-Imp{alpha}) also binds to the main chain of nucleoplasmin NLS. By contrast, the structures of RB and N1N2-m-Imp{alpha} complexes show the binding of Arg315 and Tyr277 to the side chains of peptides (Arg238 binds to the main chain of the peptides). These observations suggest that residues Arg315, Tyr277, and Arg238 play crucial roles in binding bipartite NLSs; however, these interactions are specific to individual NLSs.

Other important contacts of the peptide linker with m-Imp{alpha} involve the residues Asn868 and Lys871 for the RB peptide and Lys547 for the N1N2 peptide; all of them are situated close to the major site. This is the region of the linker most structurally conserved among the three bipartite NLS peptides. The electron density maps of RB and N1N2 have a superior quality in this region as compared with the rest of linker. These data suggest that the C-terminal portion of the linker (closest to the major site) plays an important role in binding to Imp{alpha}. These data are supported by NLS binding studies (15, 17, 19, 21) and structural analyses of extended simian virus 40 (SV40) large tumor-antigen (T-Ag) peptide complexes.2 The electron density maps suggest that the RB peptide is better ordered overall, consistent with a larger number of contacts.

Not every linker residue is able to make contacts with the protein. Because of the differences in the lengths of the linkers between RB (11 residues, if we define the linker sequence as the sequence between sites P2' and P2) and N1N2 (12 residues), the RB peptide binds to m-Imp{alpha} in a more extended conformation than the N1N2. The 11-residue length appears more favorable for binding; 12 residues require some short turns and force the chain farther from Imp{alpha}, precluding some favorable interactions. Importantly, this is consistent with the observation that incorporation of QPWL in the linker region of nucleoplasmin NLS reduced the efficiency of nuclear import (34).

The affinities for mouse Imp{alpha} of N1N2 and RB NLSs fused to {beta}-galactosidase have been determined using enzyme-linked immunosorbent assay assays (15, 19). N1N2 binds with a higher affinity than RB to both Imp{alpha}/{beta} complex (Kd measured as 5.4 nM for N1N2 and 45 nM for RB) and Imp{alpha} alone (Kd measured as 22 or N1N2 and 180 nM for RB). Therefore, although the linker region appears to interact more favorably in the case of RB, it is most likely the presence of Lys in position P5 that is responsible for the higher affinity of the recognition of N1N2 NLS by Imp{alpha} (see below).

The Role of Minor Site Binding—Mutagenesis studies with nucleoplasmin (34), RB (15), and N1N2 (19) NLSs revealed that substitutions of P1' and P2' residues to other than Arg or Lys abolished nuclear localization. The binding of Lys-Arg at positions P1' and P2' is observed in all available bipartite NLS peptide-Imp{alpha} complex structures. Similarly, the majority of the bipartite NLSs characterized contain a Lys-Arg sequence in the N-terminal cluster (35). Comparison of the structures here reveals a high structural similarity of the binding of the Arg side chain in the position P2' in all cases. The Arg side chain is situated at the groove created by the tryptophans Trp399 and Trp357 (Trp405 and Trp363 for y-Imp{alpha}) located in H3 helices of Arm repeats 7 and 8 and also contacts Glu396 and Ser360 (Glu402 and Ser366 in y-Imp{alpha}); these residues are conserved among known Imp{alpha} sequences. The interaction with Glu396 appears particularly important because the distance from the Arg side chain is nearly identical in all the structures. The binding of T-Ag at the minor site (11) represents a model for binding of Lys (instead of Arg) at P2'; the peptide 126PKKKRKV132 places Lys128 and Lys129 at positions P1' and P2', respectively. This likely occurs because it is more favorable to have some amino acid binding at positions P4' and P5', rather than having an Arg at P2'. However, some evidence of staggering in different registers was observed in the crystal structure (11). The Lys at position P2' of T-Ag contacts Glu396 and Ser360 at approximately the same distance but does not contact Trp399, losing one favorable contact. The monopartite NLS from c-Myc binds with Lys-Arg in the P1'–P2' positions (10).

A Lys side chain at position P1' makes less favorable contacts than Arg at position P2', and this position appears less important than the P2' position for bipartite NLS binding. The P1' Lys of RB and N1N2 peptides makes interactions with the side chains of the conserved Thr328 and Asn361 and the main chain of Val321. Also close by is Asp325. The P1' Lys of nucleoplasmin bound to y-Imp{alpha} also contacts the equivalent of Thr328 and Val321 but not Asn361. The pocket prefers a Lys residue because an Arg side chain is too long to make similar favorable contacts in the P1' pocket.

Finally, we suggest that the positions preceding the P1' and following P2' contribute significantly to the minor site binding. This is consistent with the side chain of the N-terminal Lys155 (position P1') of the nucleoplasmin peptide preferring to follow the main chain of the other peptides instead of binding in the regular P1' pocket. Also, the T-Ag peptide 126PKKKRKV132 binds to Imp{alpha} with two Lys residues in P1' and P2' instead of the more favorable Arg at P2', possibly so that it can place some amino acid in positions P4' and P5'.

The NLS Consensus—The definition of an NLS consensus based on an analysis of sequences alone is difficult due to the diversity of sequences that can function as nuclear targeting signals. The structural data, the accumulated knowledge on functional NLSs, and mutagenesis studies now allow us to better define the conventional basic/bipartite NLS consensus that involves Imp{alpha} binding. The ability to recognize NLSs is of significance, as it could help with the functional annotation of new proteins predicted in genomic sequences.

We used an approach to define the NLS consensus similar to that used previously to define optimal substrates for protein kinases (36). The structures of Imp{alpha}-NLS peptides were analyzed by molecular modeling in conjunction with other available information. This allows us to draw some conclusions about individual positions in the NLS.

One general observation is that the binding pockets for individual side chains are much better defined in the major site than in the minor site. This observation is consistent with the contribution of the N-terminal basic cluster (minor site) in a bipartite NLS of ~4 kcal/mol, which is comparable with the loss of a single Lys side chain in the major basic cluster (21). A large portion of the free energy of binding is contributed by the main chain of the peptide (estimated at ~6.9 kcal/mol), but a functional NLS requires an additional contribution of the side chains of about 4.2 kcal/mol; this contribution by the side chains accounts for the specificity of nuclear import. The best defined pockets include P2, P3, and P5 of the major site. P2 is defined mainly by residues Thr155 and Asp192 in adjacent Arm repeats of Imp{alpha} and is well suited for binding a Lys side chain. Arg can be accommodated only to a lesser degree, consistent with the loss of 2.7 kcal/mol for the Lys to Arg mutation in the context of T-Ag (21). Side chains such as Thr, Gln, and Pro could be accommodated but would contribute less to binding; there is a substantial electrostatic contribution to binding in this pocket. P3, comprising Glu266 and Asp270 as the major binding determinant, prefers Arg over Lys based on Ala mutagenesis; this is likely because an Arg can reach closer to Glu266 and Asp270. The energetic contribution of the side chains in the P3 site is about two-thirds of the contribution in P2, with a smaller contribution of the electrostatic terms. The modeling suggests that P5, with the primary binding residue Gln81, is best suited for binding a Lys side chain. The electrostatic contribution is very minor, and positive charge is not strictly required; due to size, Gln and Glu would be favored over Arg. The energetic contribution of this site is similar to P3 (21).

The other pockets are less specific. The P4 pocket is relatively large and should favor Arg because this is the only side chain that can reach to the main chain of Imp{alpha} Arg106 and Glu107 and make favorable interactions. However, hydrophobic side chains such as Pro, Val, Ile, or Leu could also be accommodated relatively well. The discrimination between Arg and Val is only minor based on the Ala mutagenesis results of T-Ag and c-Myc NLSs, and the contribution of this pocket is about four times smaller than that of P2. P1, with Asp270 responsible for specificity, prefers Lys over Arg, whereas side chains such as Gln, Pro, or Gly could easily be tolerated, and the side chain contribution in this pocket is only half of that of pocket P4. The pocket preceding P1 has Trp231 as the major specificity determinant and also prefers Lys over residues such as Arg, Pro, Ala, Glu, or Gly, but is also not very specific.

The minor site is much less defined than the major site. Only the sites P1' and P2' appear to be able of significant discrimination between different peptide side chains, with Lys and Arg favored in those two positions, respectively, as discussed above. The N-terminal basic cluster may play a role in relieving autoinhibition (the minor site is not autoinhibited (12)).

We conclude that the optimal basic/bipartite consensus sequence for binding Imp{alpha} is KRX10–12KRRK, with Lys at position P2 (bold and underlined) the most important specificity determinant that also forces the rest of the sequence to bind in one register. The basic residues at positions P3 and P5 (underlined) are also significant (Table II). To facilitate accurate NLS identification in novel sequences, the motif could be represented in a matrix format with probabilities for each of the 20 amino acids defined at each position (31); this should be possible when experimental data are available through a peptide library (37) or equivalent experiment.

Our discussion assumes that there is a linear correlation between Imp{alpha} binding and the rate of nuclear import, as the available data suggest (15, 16, 38). However, a functional NLS may display both lower and upper limits in affinity (the NLS needs to both to bind and release) (21). Therefore, the optimal Imp{alpha}-binding sequence is not necessarily an optimal NLS in terms of the overall nuclear import process. It should be possible to define both limits experimentally and find a correlation between the optimal Imp{alpha}-binding sequence and the optimal NLS.

An alternative approach of finding NLSs is to build an expert data base of experimentally known NLSs and extend it through "in silico mutagenesis," as implemented in PredictNLS (33). The accuracy and coverage of this approach depends on the number of experimentally known NLSs. This approach does not distinguish between different nuclear import pathways, does not have a significant predictive power, and thus is complementary to our approach of defining the conventional basic/bipartite NLS consensus. A comparison of the efficiency of NLS detection in known nuclear proteins between PredictNLS and our consensus-based search using Scansite (31) revealed the latter approach to be superior, although a significant portion of the nuclear proteins does not utilize the Imp{alpha}/Imp{beta}-dependent import pathway (see "Experimental Procedures").

Conclusions—The structures of the complexes of mammalian Imp{alpha} with the peptides RB and N1N2, corresponding to bipartite NLS, and their comparisons with other available Imp{alpha} structures, provide new insights into the molecular basis of nuclear import and its regulation. Although the binding of both basic clusters is comparable in all the structures, the linker region connecting the two basic clusters of an NLS makes different but specific contacts with Imp{alpha} in the cases of all three common linker lengths (10, 11, and 12 residues). The most critical region in the linker for binding is the extreme C-terminal end of the linker sequence. The most important residue binding in the minor site is an Arg at position P2' of the NLS, although the surrounding residues make significant contributions. The integration of the structural information with sequences of characterized NLSs and mutagenesis data allows us to provide an improved consensus sequence for the conventional basic/bipartite NLS, KRX10–12KRRK.


    FOOTNOTES
 
This article is dedicated to Alec Hodel.

The atomic coordinates and structure factors (code 1PJM and 1PJN) have been deposited in the Protein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers University, New Brunswick, NJ (http://www.rcsb.org/).

* This work was supported by the Australian Research Council (to B.K.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Back

Supported by the Fundação de Amparo à Pesquisa do Estado de São Paulo, Brazil. Back

§§ A National Health and Medical Research Council (NHMRC) Senior Research Fellow. Back

¶¶ Formerly a Wellcome Senior Research Fellow in Medical Science in Australia and currently an NHMRC Senior Research Fellow. To whom correspondence should be addressed. Tel.: 61-7-3365-2132; Fax: 61-7-3365-4699; E-mail: b.kobe{at}mailbox.uq.edu.au.

1 The abbreviations used are: NLS, nuclear localization sequence; Arm repeat, armadillo repeat; Imp{alpha}, importin-{alpha}; Imp{beta}, importin-{beta}; m-Imp{alpha}, mouse importin-{alpha} (residues 70–529); y-Imp{alpha}, yeast importin-{alpha} (residues 88–530); N1N2, X. laevis phosphoprotein N1N2; RB, human retinoblastoma protein; T-Ag, simian virus 40 large T-antigen. Back

2 M. R. M. Fontes, T. Teh, G. Toth, A. John, I. Pavo, D. A. Jans, and B. Kobe, unpublished results. Back



    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 EXPERIMENTAL PROCEDURES
 RESULTS AND DISCUSSION
 REFERENCES
 

  1. Dingwall, C., and Laskey, R. A. (1991) Trends Biochem. Sci. 16, 478–481[CrossRef][Medline] [Order article via Infotrieve]
  2. Conti, E. (2002) Results Probl. Cell Differ. 35, 93–113[Medline] [Order article via Infotrieve]
  3. Damelin, M., Silver, P. A., and Corbett, A. H. (2002) Methods Enzymol. 351, 587–607[Medline] [Order article via Infotrieve]
  4. Weis, K. (2002) Curr. Opin. Cell Biol. 14, 328–335[CrossRef][Medline] [Order article via Infotrieve]
  5. Gorlich, D., Henklein, P., Laskey, R. A., and Hartmann, E. (1996) EMBO J. 15, 1810–1817[Abstract]
  6. Weis, K., Ryder, U., and Lamond, A. I. (1996) EMBO J. 15, 1818–1825[Abstract]
  7. Moroianu, J., Blobel, G., and Radu, A. (1996) Proc. Natl. Acad. Sci. U. S. A. 93, 6572–6576[Abstract/Free Full Text]
  8. Peifer, M., Berg, S., and Reynolds, A. B. (1996) Cell 76, 789–791
  9. Conti, E., Uy, M., Leighton, L., Blobel, G., and Kuriyan, J. (1998) Cell 94, 193–204[Medline] [Order article via Infotrieve]
  10. Conti, E., and Kuriyan, J. (2000) Structure 8, 329–338[CrossRef][Medline] [Order article via Infotrieve]
  11. Fontes, M. R. M., Teh, T., and Kobe, B. (2000) J. Mol. Biol. 297, 1183–1194[CrossRef][Medline] [Order article via Infotrieve]
  12. Kobe, B. (1999) Nat. Struct. Biol. 6, 388–397[CrossRef][Medline] [Order article via Infotrieve]
  13. Rexach, M., and Blobel, G. (1995) Cell 83, 683–692[Medline] [Order article via Infotrieve]
  14. Gorlich, D., Pante, N., Kutay, U., Aebi, U., and Bischoff, F. R. (1996) EMBO J. 15, 5584–5594[Abstract]
  15. Efthymiadis, A., Shao, H., Hübner, S., and Jans, D. A. (1997) J. Biol. Chem. 272, 22134–22139[Abstract/Free Full Text]
  16. Hübner, S., Xiao, C. Y., and Jans, D. A. (1997) J. Biol. Chem. 272, 17191–17195[Abstract/Free Full Text]
  17. Hübner, S., Smith, H. M. S., Hu, W., Chen, C. K., Rihs, H. P., Paschal, B. M., Raikhel, N. V., and Jans, D. A. (1999) J. Biol. Chem. 274, 22610–22617[Abstract/Free Full Text]
  18. Briggs, L. J., Stein, D., Goltz, J., Corrigan, V. C., Efthymiadis, A., Hübner, S., and Jans, D. A. (1998) J. Biol. Chem. 273, 22745–22752[Abstract/Free Full Text]
  19. Hu, W., and Jans, D. A. (1999) J. Biol. Chem. 274, 15820–15827[Abstract/Free Full Text]
  20. Fanara, P., Hodel, M. R., Corbett, A. H., and Hodel, A. E. (2000) J. Biol. Chem. 275, 21218–21223[Abstract/Free Full Text]
  21. Hodel, M. R., Corbett, A. H., and Hodel, A. E. (2001) J. Biol. Chem. 276, 1317–1325[Abstract/Free Full Text]
  22. Catimel, B., Teh, T., Fontes, M. R., Jennings, I. G., Jans, D. A., Howlett, G. J., Nice, E. C., and Kobe, B. (2001) J. Biol. Chem. 276, 34189–34198[Abstract/Free Full Text]
  23. Harreman, M. T., Hodel, M. R., Fanara, P., Hodel, A. E., and Corbett, A. H. (2003) J. Biol. Chem. 278, 5854–5863[Abstract/Free Full Text]
  24. Michell, B. J., Stapleton, D., Mitchelhill, K. I., House, C. M., Katsis, F., Witters, L. A., and Kemp, B. E. (1996) J. Biol. Chem. 271, 28445–28450[Abstract/Free Full Text]
  25. Kussel, P., and Frasch, M. (1995) Mol. Gen. Genet. 248, 351–363[Medline] [Order article via Infotrieve]
  26. Otwinowski, Z., and Minor, W. (1997) Methods Enzymol. 276, 307–326
  27. Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T., and Warren, G. L. (1998) Acta Crystallogr. Sect. D Biol. Crystallogr. 54, 905–921[CrossRef][Medline] [Order article via Infotrieve]
  28. Brünger, A. T. (1992) Nature 355, 472–475[CrossRef]
  29. Jones, T. A., Bergdoll, M., and Kjeldgaard, M. (1990) in Crystallographic and Modeling Methods in Molecular Design (Bugg, C. E., and Ealick, S. E., eds) pp. 189–195, Springer-Verlag, New York
  30. Laskowski, R. A., MacArthur, M. W., Moss, D. S., and Thornton, J. M. (1993) J. Appl. Cryst. 26, 283–291
  31. Yaffe, M. B., Leparc, G. G., Lai, J., Obata, T., Volinia, S., and Cantley, L. C. (2001) Nat. Biotechnol. 19, 348–353[CrossRef][Medline] [Order article via Infotrieve]
  32. Mewes, H. W., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S., and Weil, B. (2002) Nucleic Acids Res. 30, 31–34[Abstract/Free Full Text]
  33. Cokol, M., Nair, R., and Rost, B. (2000) EMBO Rep. 1, 411–415[Abstract/Free Full Text]
  34. Robbins, J., Dilworth, S. M., Laskey, R. A., and Dingwall, C. (1991) Cell 64, 615–623[Medline] [Order article via Infotrieve]
  35. Nair, R., Carter, P., and Rost, B. (2003) Nucleic Acids Res. 31, 397–399[Abstract/Free Full Text]
  36. Brinkworth, R. I., Breinl, R. A., and Kobe, B. (2003) Proc. Natl. Acad. Sci. U. S. A. 100, 74–79[Abstract/Free Full Text]
  37. Songyang, Z., Shoelson, S. E., Chaudhuri, M., Gish, G., Pawson, T., Haser, W. G., King, F., Roberts, T., Ratnofsky, S., Lechleider, R. J., Neel, B. J., Birge, R. B., Fajardo, J. E., Chou, M. M., Hanafusa, H., Schaffhausen, B., and Cantley, L. C. (1993) Cell 72, 767–778[Medline] [Order article via Infotrieve]
  38. Xiao, C. Y., Jans, P., and Jans, D. A. (1998) FEBS Lett. 440, 297–301[CrossRef][Medline] [Order article via Infotrieve]
  39. Esnouf, R. M. (1997) J. Mol. Graphics 15, 133–138
  40. Carson, M. (1997) Methods Enzymol. 277, 493–505
  41. Wallace, A. C., Laskowski, R. A., and Thornton, J. M. (1995) Protein Eng. 8, 127–134[Abstract]