A theoretical model of restriction endonuclease NlaIV in complex with DNA, predicted by fold recognition and validated by site-directed mutagenesis and circular dichroism spectroscopy

Agnieszka A. Chmiel1, Monika Radlinska2, Sebastian D. Pawlak1, Daniel Krowarsch3, Janusz M. Bujnicki1,4 and Krzysztof J. Skowronek1

1Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, ul. ks. Trojdena 4, 02-109 Warsaw, 2Institute of Microbiology, Warsaw University, ul. Miecznikowa 1, 02-093 Warsaw, and 3Institute of Biochemistry and Molecular Biology, University of Wroclaw, ul. Tamka 2, 50-137 Wroclaw, Poland

4 To whom correspondence should be addressed. E-mail: iamb{at}genesilico.pl


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Restriction enzymes (REases) are commercial reagents commonly used in DNA manipulations and mapping. They are regarded as very attractive models for studying protein–DNA interactions and valuable targets for protein engineering. Their amino acid sequences usually show no similarities to other proteins, with rare exceptions of other REases that recognize identical or very similar sequences. Hence, they are extremely hard targets for structure prediction and modeling. NlaIV is a Type II REase, which recognizes the interrupted palindromic sequence GGNNCC (where N indicates any base) and cleaves it in the middle, leaving blunt ends. NlaIV shows no sequence similarity to other proteins and virtually nothing is known about its sequence–structure–function relationships. Using protein fold recognition, we identified a remote relationship between NlaIV and EcoRV, an extensively studied REase, which recognizes the GATATC sequence and whose crystal structure has been determined. Using the ‘FRankenstein's monster’ approach we constructed a comparative model of NlaIV based on the EcoRV template and used it to predict the catalytic and DNA-binding residues. The model was validated by site-directed mutagenesis and analysis of the activity of the mutants in vivo and in vitro as well as structural characterization of the wild-type enzyme and two mutants by circular dichroism spectroscopy. The structural model of the NlaIV–DNA complex suggests regions of the protein sequence that may interact with the ‘non-specific’ bases of the target and thus it provides insight into the evolution of sequence specificity in restriction enzymes and may help engineer REases with novel specificities. Before this analysis was carried out, neither the three-dimensional fold of NlaIV, its evolutionary relationships or its catalytic or DNA-binding residues were known. Hence our analysis may be regarded as a paradigm for studies aiming at reducing ‘white spaces’ on the evolutionary landscape of sequence–function relationships by combining bioinformatics with simple experimental assays.

Keywords: extreme divergence/fold recognition/molecular evolution/restriction modification/structural model validation/structure prediction


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Type II restriction endonucleases (REases) comprise one of the major classes of nucleases and one of the largest groups of experimentally characterized enzymes (review: Pingoud, 2004Go). They usually recognize a short (4–8 bp) palindromic sequence of double-stranded DNA and, in the presence of Mg2+, catalyze the hydrolysis of phosphodiester bonds at precise positions within or close to this sequence, leaving ‘blunt’ or ‘sticky’ (with a 5' or 3' overhang) ends. The enzymes that do not fit this definition or exhibit certain structural and functional peculiarities have been classified into several subtypes (review: Roberts et al., 2003Go). REases coupled with DNA methyltransferases (MTases) of identical specificity form restriction-modification (RM) systems, which are ubiquitous among Bacteria and Archaea (review: Wilson and Murray, 1991Go). Wherease cleavage at specific sequences provides efficient means of destroying foreign DNA, methylation of these sequences inhibits the REase and thereby protects the own DNA from cleavage. Because the cleavage of the chromosomal DNA in unmodified sequences would be deleterious for the cell, the REases must maintain extremely high specificities, tightly coupled with that of the MTase. There is evidence that RM systems not only serve to protect the host genome from invasion of foreign DNA (for instance bacteriophages), but also act as ‘selfish’ elements and cause genomic rearrangements (review: Kobayashi, 2001Go).

Because of their high specificity of protein–DNA interactions, Type II REases are among the most often used enzymes in recombinant DNA technology and serve as model systems for analyzing protein–DNA interactions. Despite the fact that several thousand different enzymes have been isolated from various microorganisms, many specificities are still unavailable (Roberts et al., 2003Go). Therefore, a long-term goal has been to generate novel enzymes by randomly mutating or rationally engineering existing REases. However, it turned out that the specificity of the protein–DNA interactions of restriction enzymes is far more rigid than originally hoped for (Lanio et al., 2000Go). Most REases achieve a remarkable level of specificity owing to a redundancy of base-specific contacts formed by residues from several discontinuous sequence segments and coupling of recognition and catalysis. This implies that changing the specificity may require alteration of several of the contacts with the bases in addition to the phosphate backbone (Jeltsch et al., 1996Go). Although conversions between already known specificities have been achieved using directed evolution (Samuelson and Xu, 2002Go), rational engineering of REases with completely new specificities remains largely unsuccessful (Lanio et al., 2000Go) owing to our insufficient understanding of the temporal contexts of sequence–structure–function relationships in these proteins, on the time-scale of both the single reaction (recognition and catalysis) and the natural change of specificity in the course of evolution from one enzyme to another.

To date, crystal structures of 16 Type II REases have been solved, including enzymes comprising a single highly specific catalytic domain, in addition to those composed of a separate specific DNA-binding module and a non-specific cleavage domain (review: Pingoud, 2004Go). The specific and non-specific nuclease domains of REases exhibited a common core and a weakly conserved catalytic motif (P)D–Xn–(D/E)–X–K, suggesting that they are evolutionarily related. The availability of highly specific and non-specific members of the ‘PD–(D/E)XK superfamily’ suggests that comparative analysis of structures and sequences could provide insight into the evolution of specificity. Indeed, the phylogenetic history of restriction enzymes has been inferred from comparison of their crystal structures, suggesting that radiation of the evolutionary lineage leading to the non-specific nuclease domain of FokI is more ancient than radiation of highly specific enzymes EcoRI and BamHI (Bujnicki, 2000Go, 2001Go). However, attempts to infer general sequence–structure–specificity relationships from crystallographically characterized REases have been hampered by their extreme divergence. From the analyses of REases with known structures, it was found that the overall structural similarities are strongest among enzymes that share a similar cleavage pattern (Aggarwal, 1995Go). However, even enzymes that recognize related targets may use different mechanisms to recognize the same bases or employ different number of metal ions to cleave the phosphodiester bond (reviews: Kovall and Matthews, 1998Go; Lukacs and Aggarwal, 2001Go). Nonetheless, even for REases with different modes of protein–DNA interactions, ‘intermediates’ can be found that illuminate the pathways of structural and evolutionary divergence (Townson et al., 2004Go). It seems that the current set of REase structures is not yet sufficient to reconstruct ‘evolutionary pathways’ between enzymes of different specificities and therefore new structures are in great demand.

It is obvious that despite the tremendous progress in protein structure determination, for most proteins the structure will never be solved experimentally. In the absence of experimentally determined protein structures, homology-based structure predictions may serve as working models for the investigation of sequence–structure–function relationships and especially for the identification of the molecular basis of functional differences between diverged enzymes (Sanchez et al., 2000Go). Even though homology-modeled structures may be of too low resolution to characterize the atomic details of protein–DNA contacts in REases, they can suggest which amino acids are likely to be important for the interactions with the DNA and which regions are responsible for differences in the substrate specificity. However, homology (comparative) modeling of REases is far from trivial, because sequences of Type II REases usually show no significant similarity to each other or to any other proteins. The high degree of sequence similarity required for automated homology modeling with standard tools for sequence alignment and structure prediction has been reported only for several subfamilies of isoschizomers, i.e. REases with identical recognition and cleavage specificity, which are of little use for consideration of specificity evolution. Moreover, it was found that REases may belong not only to the PD–(D/E)XK superfamily, but also to completely unrelated superfamilies of nucleases: Nuc/phospholipase D, HNH and GIY–YIG (Aravind et al., 2000Go; Sapranauskas et al., 2000Go; Bujnicki et al., 2001Go). The lack of overall sequence conservation among REases, the absence of invariable residues even in the active site and the presence of several alternative folds make the identification of suitable templates for modeling and calculation of biologically relevant sequence alignments a challenging task.

NlaIV is a classical Type II REase isolated from Neisseria lactamica, which recognizes the interrupted palindromic sequence GGNNCC (Lau et al., 1994Go) and cleaves it in the middle between the two unspecified bases, leaving blunt ends. REases with partially non-specific target sites, such as NlaIV, are interesting objects for evolutionary considerations and attractive targets for rational engineering of new specificities by introducing specific contacts to the unspecified bases. However, a structural model of NlaIV, which could help identify regions of the polypeptide chain in proximity to the specified and unspecified bases of the GGNNCC site, is not yet available. Recently, we used a protein fold recognition (FR) approach to identify successfully evolutionary relationships and predict the structures of several enzymes from the PD–(D/E)XK superfamily in spite of a complete lack of sequence similarity to proteins with experimentally solved structures and even without the conservation of the catalytic sites (Bujnicki and Rychlewski, 2000Go, 2001Go; Chmiel et al., 2005Go; Feder and Bujnicki, 2005Go; Pawlak et al., 2005Go). Thus, we carried out the fold recognition analysis to identify an appropriate modeling template for NlaIV among known proteins structures. The resulting tertiary model of NlaIV in complex with the DNA was validated by mutagenesis of putative catalytic and DNA-binding residues. Our analysis can be regarded as a case study aiming at reducing ‘white spaces’ on the landscape of a protein superfamily by combining bioinformatics with simple experimental assays.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Protein sequence analysis

Sequence searches of the non-redundant (nr) database, the database of putative translations from finished and unfinished microbial genomes (Wheeler et al., 2004Go) and the environmental genome shotgun sequences collected from the Sargasso Sea (Venter et al., 2004Go) were carried out at the NCBI (http://www.ncbi.nlm.nih.gov) using PSI-BLAST (Altschul et al., 1997Go). Multiple sequence alignment was carried out using CLUSTALX (Thompson et al., 1997Go). Prediction of the secondary structure and solvent accessibility and tertiary fold recognition was carried out via the GeneSilico meta-server gateway at http://genesilico.pl/meta/ (Kurowski and Bujnicki, 2003Go), using the multiple sequence alignment as a query. Secondary structure prediction was carried out using PSIPRED (McGuffin et al., 2000Go), PROFsec (Rost et al., 2004Go), PROF (Ouali and King, 2000Go), SABLE (Adamczak et al., 2004Go), JNET (Cuff and Barton, 2000Go), JUFO (Meiler and Baker, 2003Go) and SAM-T02 (Karplus et al., 2003Go). Solvent accessibility for the individual residues was predicted with SABLE (Adamczak et al., 2004Go) and JPRED (Cuff et al., 1998Go). The fold recognition analysis (attempt to match the query sequence with known protein structures) was carried out using FFAS03 (Jaroszewski et al., 2000Go), SAM-T02 (Karplus et al., 2003Go), 3DPSSM (Kelley et al., 2000Go), BIOINBGU (Fischer, 2000Go), FUGUE (Shi et al., 2001Go), mGENTHREADER (Jones, 1999Go) and SPARKS (Zhou and Zhou, 2004Go). Fold recognition alignments reported by these methods were compared, evaluated and ranked by the Pcons server (Lundstrom et al., 2001Go).

Comparative modeling

The alignments between the sequence of NlaIV and the structures of selected templates (members of the fold identified by Pcons) were used as a starting point for modeling of the NlaIV tertiary structure using the ‘FRankenstein's monster’ approach (Kosinski et al., 2003Go). The intermediate models based on unrefined FR alignments were built with MODELLER (Fiser and Sali, 2003Go). Evaluation of the sequence–structure fit was carried out by VERIFY3D (Luthy et al., 1992Go) via the COLORADO3D server (Sasin and Bujnicki, 2004Go). A hybrid model was built from fragments conserved in >50% of intermediate models and the non-consensus regions were built from fragments with highest VERIFY3D scores. The hybrid model (i.e. the ‘FRankenstein's monster’) exhibited numerous stereochemical problems such as steric clashes or breaks in the polypeptide chain at the junctions of fragments. Therefore, it was not directly refined, but instead it was superimposed on to the template structure, yielding a new target–template sequence alignment, which was only then used to generate a new model that satisfied criteria of stereochemical ‘protein likeness’ implemented in MODELLER. The sequence–structure fit in the new model was evaluated again with VERIFY3D and elements of secondary structure with low scores were selected for further refinement. For each poorly scored region, a set of new alignments was generated by progressively shifting the target sequence by one amino acid in the direction of either terminus, within the region of overlap between the secondary structure elements found in the template structure and those predicted for NlaIV. All alignments were used to generate a new family of intermediate models, which were again evaluated and recombined to produce a hybrid model. Model building, evaluation, realignment in poorly scored regions and merging of best scoring fragments were reiterated until all regions in the protein core obtained an acceptable VERIFY3D score (>0.3) or their score could not be improved by any manipulations. Apart from the standard optimization carried out by MODELLER to minimize the violation of distance restraints taken from the template structure and knowledge-based stereochemical constraints, no energy optimization was carried out, nor was any effort undertaken to refine the conformation of the loops. First, in our experience the energy minimization cannot improve the quality of homology models based on remote templates in terms of the distance to the true structure. Second, in PD–(D/E)XK enzymes nearly all residues important for catalysis and DNA binding are located in strands and helices, whose native-like orientation can be correctly identified by the crude knowledge-based potentials (e.g. as implemented in VERIFY3D) that assess interactions mostly at the level of amino acids, regardless of the quality of atomic details (review: Bujnicki, 2003Go). However, we made sure that the bond lengths, bond angles and dihedrals in the model conform to the quality expected for the refined structures, by applying a series of test using the WHAT_CHECK program (Hooft et al., 1996Go).

Cloning and mutagenesis of the nlaIVR gene

The restriction-modification system NlaIV was amplified form the N.lactamica ATCC 23970 genome in a polymerase chain reaction (PCR) and cloned into the pBRR322 plasmid, resulting in a pNlaIVMRBR construct. Site-directed mutagenesis of the nlaIVR gene was performed by a PCR-based technique according to the QuikChange site-directed mutagenesis strategy (Stratagene) following the manufacturer's instructions. The mutant genes were sequenced and found to contain only the desired mutations. Plasmid clones of mutant and wild-type (wt) genes of NlaIV were used for in vivo tests in Escherichia coli XL1 Blue MRF'–D(mcrA)183, D(mcrBC-hsdSMR-mrr)173, endA1, supE44, thi-1, recA1, gyrA96, relA1, lac, [F' proAB, lacIqZDM15, Tn10 (tetr)] (Stratagene). {lambda}vir was used to measure restriction of infecting bacteriophages by the cells harboring the wt NlaIV and mutant proteins. Expression plasmids (pNlaRET28) for protein overproduction and purification were obtained by amplifying the pNlaIVMRBR plasmid insert (wt or mutated) with oligonucleotides: NlaIVRf: ATCCATGGTAAAACTTACTGCACAAC and NlaIVRr: TTACTCGAGTTTCTTTTTGTATTTATGTGC in a PCR reaction and cloning as NcoI–XhoI fragments into the pET28a vector (Novagene), leading to a C-terminal fusion of His6 tag to the recombinant NlaIVR. Strain E.coli DH10B F mcrA {Delta}(mrr-hsdRMS-mcrBC) {phi}80lacZ{Delta}M15 {Delta}lacX74 recA1 endA1 araD139 {Delta}(ara, leu)7697 galU galK {lambda} rpsL nupG (Invitrogen) was used as a recipient. A copy of the nlaIVM gene had been provided on a plasmid pM.NlaIVAC constructed in a compatible vector pACYC184 (CmR), in order to protect the host DNA against the REase activity of the cloned nlaIVR gene.

Protein expression and purification

Proteins were expressed in E.coli, strain ER2566 F fhuA2 [lon] ompT lacZ::T7 gene1 gal sulA11 (mcrC–mrr)114: :IS10 R(mcr–73::miniTn10-TetS)2 R(zgb-210::Tn10)(TetS) endA1 [dcm] (New England Biolabs) containing pM.NlaIVAC plasmid, from pNlaRET28 plasmid, after 1 mM IPTG induction for 3–5 h at 37°C. Cell pellets were freeze–thawed and lysed in buffer A (20 mM Tris–HCl pH 8.0, 0.3 M NaCl, 10 mM ß-mercaptoethanol, 10 mM imidazole, 1 mM PMSF, 10% glycerol) by a single passage through French press at 20 000 psi. REase from the clarified lysates was batch bound to His-Bind resin (Novagene) for 1 h at 4°C. After two short washes with 10 vol. of binding buffer, resin was applied to an empty disposable column and washed sequentially with buffer A with 2 M NaCl and then with buffer A with 20, 60 and 100 mM imidazole. Purified enzyme was eluted with buffer A containing 250 mM imidazole and was used in the in vitro tests.

For CD analysis, proteins eluted from the His-Bind resin were dialysed to buffer B (25 mM sodium phosphate buffer pH 7.6, 0.3 M NaCl, 10% glycerol, 0.25 mM DTT) and further purified on a 16/60 Superdex 75 PG gel filtration column (Amersham Pharmacia Biotech) equilibrated with buffer B. Fractions containing the dimeric form of the enzyme were pooled and concentrated with VivaSpin 6 concentrators (VivaScience). Purified proteins were aliquoted and frozen at –70°C. Protein concentration was measured spectrophotometrically at 280 nm (extinction coefficient 57 040 M–1 cm–1 calculated according to Pace et al. (1995)Go or by densitometry of Coomassie Brilliant Blue-stained SDS–PAGE gels.

Analysis of the in vivo REase activity of wt NlaIV and its mutants

The extent of phage restriction was determined quantitatively by plating portions of serially diluted phage stock on a lawn of bacteria (Sambrook et al., 2002Go). The strains were grown on LB plates containing 50 µg/ml ampicillin and 20 µg/ml chloramphenicol. The ability to restrict {lambda}vir was assessed by measuring the titer (from at least three independent measurements) of {lambda}vir phage on E.coli strain XL1 Blue MRF' expressing wt and mutant NlaIV genes in the presence of NlaIV MTase and comparing it with a titer of phage on the strain expressing only the MTase gene and containing the pBR322 vector without the REase gene.

Analysis of the in vitro REase activity of wt NlaIV and mutants D87A, S176A and K179A

The in vitro cleavage assays were set up in 10 µl of buffer Y+ (MBI Fermentas) (33 mM Tris–OAc (pH 7.9), 66 mM KOAc, 10 mM MgOAc, 0.1 mg/ml BSA) with 0.3 µg of {lambda} DNA (dam, dcm) (MBI Fermentas) as a substrate and serial dilutions of purified REase. Digestion was performed for 1 h at 37°C. The 611 bp PCR-generated fragment of the pBluescript II KS(+) (Stratagene) plasmid with a single NlaIV site at position 247 was amplified with primers MbspNlaF: TGCTCTTGCCCGGC and MbspNlaR: GCCATCGCCCTGA and used as a substrate to compare digestion rates of the wt NlaIV and partially active mutants: S176A and K179A. Activity assays were carried out in duplicate with a substrate concentration of 25 ng/µl and with several enzyme concentrations, ranging from 1 to 4 ng/µl. The activity of the enzymes was measured in at least two concentrations. For all concentrations there was a linear relationship between the reaction rate and the enzyme concentration.

Circular dichroism (CD)

Far-UV (from 200 to 260 nm) CD spectra were recorded using a Jasco J-710 spectropolarimeter with a temperature controller in 1 cm pathlength cells at 20°C. The concentration of enzyme dimer was 4 µM. Spectra were averaged from three separate scans with a response time of 1.0 s and with 1.0 nm steps. The mean residue ellipticities for the spectra were calculated and secondary structure content was estimated using the CDpro server http://lamar.colostate.edu/~sreeram/CDPro/main.html (Sreerama and Woody, 2000Go).


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
In PSI-BLAST searches (see Materials and methods), the sequence of NlaIV showed no significant sequence similarity to any REase of known structure. Among REases with publicly available sequences, the only easily identifiable homologs of NlaIV were its isoschizomers from other Neisseria species, which share over 95% amino acid identity (data not shown) and are therefore useless for identification of conserved residues that may constitute a potential active site. No other protein from the non-redundant database or from the unfinished genomic projects showed any similarity to the NlaIV sequence. However, a search of putative translations of sequences from the environmental genome shotgun sequencing of the Sargasso Sea revealed a large fragment of an uncharacterized open reading frame (protein_id: EAG328571) with significant sequence similarities to NlaIV. It is known that secondary structure prediction and identification of a three-dimensional fold can be significantly improved if a sequence alignment is used instead of a single sequence (Jaroszewski et al., 2000Go). Therefore, we carried out the protein structure prediction analysis via the GeneSilico metaserver (Kurowski and Bujnicki, 2003Go), using both the sequence of NlaIV alone and aligned with its uncharacterized homolog EAG328571.

The fold recognition analysis carried out with the NlaIV sequence alone revealed only traces of similarity to known protein structures. INBGU reported a match to the structure of EcoRV, a Type II REase that belongs to the PD–(D/E)XK superfamily of nucleases (Winkler et al., 1993Go), with a moderate score of 17.44. However, when the sequence alignment was used, most of FR algorithms reported EcoRV as the best hit, even though the scores were not always very high: SPARKS (poor score, –1.68), INBGU (good score, 30.66), mGENTHREADER (poor score, 0.389), 3DPSSM (good score, 0.013). FUGUE reported the structure of HincII, another PD–(D/E)XK REase (Horton et al., 2002Go), with the highest, albeit insignificant, score (2.8), while the EcoRV structure came in second position (score 2.62). Based on the FR results, the consensus server PCONS (Lundstrom et al., 2001Go) singled out the EcoRV structure as the only reasonable template for modeling of the NlaIV structure (score 2.11 for EcoRV compared with 0.86 for HincII and ≤0.66 for other proteins with different folds).

Comparative modeling of NlaIV

A comparative model of NlaIV was constructed based on the alignments between NlaIV and EcoRV reported by FR methods, using the ‘FRankenstein's monster’ approach (Kosinski et al., 2003Go)(see Materials and methods for details). Both monomers of EcoRV were used as templates (i.e. the restraints derived from them were averaged). The final, refined alignment between NlaIV (and its uncharacterized homolog) and EcoRV is shown in Figure 1. The corresponding final model of NlaIV was within 1.7 Å over 223 superimposable residues of EcoRV (0.82 Å over 208 core residues) and passed the checks of bond lengths, bond angles, dihedrals and chi-1/chi-2 angles implemented in WHAT_CHECK (Hooft et al., 1996Go) (Z-score for bond lengths, 1.135; Z-score for bond angles, 1.451; Ramachandran Z-score, 0.595; chi-1/chi-2 correlation Z-score, 0.400 – all within expected ranges for well-refined structures).



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 1. Sequence alignment of NlaIV, its homolog from the environmental genome sequencing project and EcoRV. Residues conserved in at least two proteins are shown on a black background and residues with similar physico-chemical character are shown on a gray background. NlaIV and EcoRV show only 13.5% sequence identity. Amino acid residues which were subjected to a mutational analysis in NlaIV and found to be essential or at least important are indicated above the alignment by ‘#’ and ‘*’ (presumably involved in catalysis and DNA binding, respectively), and the two unimportant residues are indicated by ‘–’. Secondary structure (determined experimentally for EcoRV) is shown below the alignment.

 
EcoRV is the archetypal member of the ß-class of PD–(D/E)XK nucleases, whose most characteristic features that distinguish them from the {alpha}-class are antiparallel orientation of the fifth strand of the common ß-sheet and recognition of the DNA by an additional ß-sheet formed by extended loops between the common secondary structure elements (Huai et al., 2000Go; Bujnicki, 2001Go). Most of ß-class PD–(D/E)XK REases (including EcoRV) exhibit a similar mode of dimerization, which results in positioning of the two active sites so as to cut the pair of the opposite phosphodiester bonds in the middle of the recognition sequence and thereby produce blunt ends (Bujnicki, 2000Go, 2004Go). Both NlaIV and EcoRV recognize hexanucleotide sequences (GGNNCC and GATATC, respectively) and cleave them in the middle, leaving blunt ends. This phenotypic similarity, together with the FR-based prediction that among all REase structures EcoRV is the closest homolog of NlaIV, suggests that NlaIV is a member of the ß-class and that it interacts with the DNA in a similar manner to EcoRV. Therefore, we modeled the structure of the NlaIV dimer in complex with the DNA based on the available crystal structure of EcoRV bound to the cognate sequence (Winkler et al., 1993Go). Briefly, the model of the NlaIV monomer was duplicated and each of the copies was superimposed on the corresponding monomer in the EcoRV dimer. A few minor steric clashes between the side chains of residues at the protein–protein interface were removed by choosing alternative rotamers for the respective amino acids, but the protein backbone was not modified. The DNA duplex (sequence 5'AAGATATCTT3') was copied from the EcoRV co-crystal structure (PDB code 1rvb) (Winkler et al., 1993Go) and ‘mutated’ to 5'AAGGTACCTT3' using HyperChem 7.1 (Hypercube), followed by geometry optimization. The model of NlaIV dimer is shown in Figure 2 and is available for download at ftp://genesilico.pl/iamb/models/R.NlaIV/.



View larger version (54K):
[in this window]
[in a new window]
 
Fig. 2. Homology model of the NlaIV dimer complexed with the target DNA (5'-GGTACC-3'). Protein monomers are shown as white and gray ribbons and DNA is shown in the ‘space-filled’ representation. The active sites of both monomers are indicated by the presence of metal ions (small spheres), whose coordinates were copied from the homologous structure of EcoRV (1rva).

 
Based on the homology model of NlaIV, we predicted that the spatial configuration of the active site is typical for an active site architecture conserved among PD–(D/E)XK nucleases. We predict that the active site of NlaIV comprises residues D73, E87 and K89, which superimpose well on the catalytic residues of metal-binding/catalytic residues of EcoRV, D74, D90 and K92, respectively (Figure 3A). On the other hand, the DNA-recognition mode of NlaIV appears to be different from that of EcoRV, which is not surprising, given the different substrate specificities of these enzymes. Although the model of NlaIV is of too low resolution for the details of protein–DNA interactions to be proposed, it suggested that the following residues make direct interactions with the bases or phosphodiester backbone of the target and therefore may be important for the REase activity: Q69, D100, D103, S176, K179 and K180 (Figure 3B).



View larger version (63K):
[in this window]
[in a new window]
 
Fig. 3. Functionally important residues of NlaIV. The protein is shown as a ribbon, with strands in dark and helices in light gray. The DNA is shown in black, with specifically recognized bases in gray. (A) The active site. Oxygen and nitrogen atoms in the catalytic side chains are colored in gray. The metal ions are shown as spheres. (B) The DNA-binding residues. Side chains shown to be important for the enzyme activity are shown and labeled in black and non-important side chains are shown and labeled in gray. A color version of this image is available as Supplementary data at PEDS Online.

 
Experimental validation of the model: site-directed mutagenesis of the putative active site and DNA-binding residues

In order to validate the model of NlaIV, a series of mutants was constructed with single alanine substitutions at all the above-mentioned positions that truncate the corresponding side chains beyond the Cß atom and thereby made them chemically inert. The importance of the selected residues for the activity of NlaIV mutants was analyzed using the phage plating assay, which evaluates the in vivo function of REases, i.e. the ability to restrict the phage growth (Figure 4). There was no detectable restriction in the case of mutation of residues predicted to be directly involved in DNA cleavage (D73, E87 and K89) or DNA binding (Q69A, D100A and D103A). Three of the mutants predicted to have disturbed DNA-binding ability by alanine substitution in positions S176, K179 and K180 have significantly reduced restriction levels (11.9, 24.7 and 14.7%, respectively) compared with wt NlaIV activity (100%). Moreover, alanine substitutions of amino acids adjacent the DNA-binding residues, but whose side chains were directed away from the DNA (N178 and N181; see Figure 3B), have not changed the restriction level significantly compared with the wt enzyme, further supporting the model.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 4. In vivo {lambda}vir phage restriction activity of NlaIV variants with alanine substitutions of residues of presumptive importance for cleavage activity and DNA binding and residues believed to be unimportant for the protein function in comparison with the wt enzyme restriction level taken as 100%.

 
Analysis of the in vitro activity of purified wt NlaIV and selected mutants

To rule out the possibility that the decreased REase activity may be due to the poor expression or aggregation of the mutant enzymes rather than a genuine lack of nuclease activity, we purified the wt enzyme and three representative mutants (E87A, a substitution in the predicted active site, and S173A and K179A, substitutions of the predicted DNA-binding residues) and determined their ability to cleave different types of substrate in vitro. The results obtained with {lambda} DNA confirmed the decreased cleavage activity of the S173A and K179A mutants and complete lack of activity of the E87A mutant (Figure 5). Moreover, in vitro cleavage assays using the 611 bp PCR-generated substrate with a single NlaIV site allowed quantitative validation of the in vivo results (Figure 6B). The K179A and S176A mutants are about four and 10 times less active then the wt enzyme, respectively, in excellent correlation with the in vivo results.



View larger version (63K):
[in this window]
[in a new window]
 
Fig. 5. In vitro cleavage assay. Cleavage of {lambda} DNA with varying concentration of purified wt and selected mutants of NlaIV REase. The concentrations of the enzyme (in ng/µl) are indicated above the lines. The cleavage pattern of the purified enzyme was identical with that for BspLI, the commercially available isoschisomer of NlaIV. The E87A mutant has <1% activity of the wt NlaIV.

 


View larger version (42K):
[in this window]
[in a new window]
 
Fig. 6. DNA cleavage rate of the NlaIV and S176A and K179A mutants. (A) Gel assay of DNA cleavage with 4 ng/µl of enzyme. Numbers above lines indicate time (in min ). S, 611 bp PCR-generated DNA substrate; P, products (249 and 362 bp fragments). (B) Comparison of cleavage rates of the wt NlaIV and mutants (S176A and K179A) measured in gel assays.

 
Structural analysis of wt NlaIV and selected mutants by circular dichroism

To provide further support for the structural model of the wt NlaIV and to confirm that the inactivity of the alanine mutants is caused by the lack of functionally important functional groups and not due to misfolding, we carried out measurements of the CD spectra. The far-UV CD spectra of the wt enzyme and two mutants (E87A and S176A) were virtually identical, suggesting that these amino acid substitutions do not cause any major structural perturbations (Figure 7). The spectrum of NlaIV is characteristic for the {alpha}/ß proteins. CONTINLL (with protein reference 4) gave an estimate of the secondary structure content of 18.1% {alpha}-helix, 29.7% ß-sheet and 52.2% turns and coils. Our model of NlaIV has a secondary structure content of 32% {alpha}-helix, 30% ß-sheet and 38% turns and coils. Whereas the amount of the ß structure in the model is in very good agreement with the predictions based on the CD spectrum, the apparent excess of {alpha}-helix in the model may be due to idealized geometry of the C-terminus of NlaIV. In the native structure of NlaIV, the C-terminus may be less regular than in the template structure of EcoRV and thus contribute to the spectrum as a ‘random coil’ rather than as an {alpha}-helix.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 7. Far-UV CD spectra of the wt NlaIV and S176A and K176A mutants. (A) Plots of mean residue molar ellipticity vs wavelength in far-UV range of wt NlaIV and its mutants are nearly superimposible. (B) Far-UV CD spectra of wt NlaIV (solid line) and the theoretical line of the best fit for secondary structure components (dashed line).

 

    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
EcoRV used as the template structure for modeling of NlaIV makes multiple contacts with the first 2 bp (GA) and the last 2 bp (TC) in its 6 bp recognition sequence, whereas the recognition of the central 2 bp (TA) occurs solely via hydrophobic interaction with thymine methyl groups (Winkler et al., 1993Go). Nonetheless, DNA molecules with single substitutions at the non-contacted central positions are as poor substrates as those with substitutions at the contacted bases (Taylor and Halford, 1989Go; Newman et al., 1990Go). EcoRV causes a major distortion to the structure of the target (sharp bending of the DNA and unstacking of the central TA step), therefore it was suggested that the central bases are recognized by ‘indirect readout’, in the sense that only the GATATC sequence may be able to assume the conformation that allows the cleavage to occur (Martin et al., 1999Go). Since NlaIV binds and cleaves substrates with any bases at the central positions, it is tempting to speculate that it represents a less stringent relative of EcoRV, in which the indirect readout does not occur. Hence, the introduction of specific contacts at the NlaIV–DNA interface specifically to recognize the central base pair may be an alternative way of developing a stringent specificity.

The structural model of NlaIV–DNA complex suggests regions of the protein sequence that may interact with the ‘non-specific’ bases of the target. In particular, K179 and K180, which were found to be important, but not essential, for the enzyme activity, are predicted to be located in the major groove, next to the two central base pairs. On the other hand, our model suggests that I36 and I37 are located on the opposite side of the target and their substitutions with longer, charged or polar side chains may introduce new contacts (direct or water-mediated) in the minor groove side. Concerted mutagenesis of these two regions may lead to engineering of new enzyme variants that are able to recognize specifically the central 2 bp step and exhibit more stringent specificity than wt NlaIV. This exercise will require, however, the development of an efficient selection method for the isolation of mutants with novel specificities and is therefore beyond the scope of the present work.

Conclusions

Using protein fold recognition, we attempted to identify a remote relationship between NlaIV and proteins of known structure. As expected, fold recognition methods that rely only on sequence similarity (PDB-BLAST and FFAS) failed to succeed, but threading methods that explicitly use the structural information from the templates were able to identify a remote homology between NlaIV and EcoRV. Using the ‘FRankenstein's monster’ approach we constructed a homology model of NlaIV based on the crystal structure of EcoRV template and used it to predict the catalytic and DNA-binding residues. The model was tested by site-directed mutagenesis of these residues and analysis of the activity of the mutants in vivo. For two selected mutants, a detailed in vitro characterization was carried out to validate the in vivo test and rule out potential artifacts. The results of in vitro and in vivo analyses are in excellent agreement, suggesting that in vivo analyses may be sufficient for the validation of similar predictions in the future. Finally, CD spectroscopy was used to confirm that the secondary structure content of NlaIV calculated from CD spectrum agrees with that predicted by bioinformatics and that the structure of the mutants is very similar to that of the wt enzyme, thereby arguing that the detrimental effect of amino acid substitutions studied in this work is localized to the catalytic center and the DNA-binding site.

There is no doubt that the understanding of the details of protein–DNA interactions can be achieved only if a series of high-resolution crystal structures with and without the substrate are obtained. It is unrealistic, however, for such analyses to be carried out for all REases, which are among the largest groups of enzymes ever characterized at the biochemical level (Roberts et al., 2003Go). Our study may be regarded as an example of reducing the ‘white spaces’ on the evolutionary landscape of sequence–function relationships in a large protein superfamily, using a combination of bioinformatics and simple experimental assays. It is noteworthy that before our analysis virtually nothing was known about the three-dimensional fold and the phylogenetic origin of NlaIV or the identity of its catalytic or DNA-binding residues. Our model of the NlaIV–DNA complex provides low-resolution structural information, which nonetheless seems to be sufficient to provide insight into the evolution of sequence specificity in a particular subgroup of REases and will guide further studies aimed at engineering enzymes with novel specificities.


    Acknowledgements
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
We are grateful to Jacek Otlewski for discussions and help with the CD analysis. This analysis was funded by MNII (grant 3P04A01124 to J.M.B.). A.A.C. was supported by a mini-grant from HHMI for collaboration between the groups of J.O. and J.M.B. M.R was supported by the Faculty of Biology, Warsaw University, Grant BW 1485/16. J.M.B. is an EMBO/HHMI Young Investigator and a fellow of the Foundation for Polish Science.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Acknowledgements
 References
 
Adamczak,R., Porollo,A. and Meller,J. (2004) Proteins, 56, 753–767.[CrossRef][Medline]

Aggarwal,A.K. (1995) Curr. Opin. Struct. Biol., 5, 11–19.[CrossRef][ISI][Medline]

Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

Aravind,L., Makarova,K.S. and Koonin,E.V. (2000) Nucleic Acids Res., 28, 3417–3432.[Abstract/Free Full Text]

Bujnicki,J.M. (2000) J. Mol. Evol., 50, 39–44.[ISI][Medline]

Bujnicki,J.M. (2001) Acta Biochim. Pol., 48, 935–967.[ISI][Medline]

Bujnicki,J.M. (2003) Curr. Protein Pept. Sci., 4, 327–337.[CrossRef][ISI][Medline]

Bujnicki,J.M. (2004) In Pingoud,A. (ed.), Restriction Endonucleases. Springer, Berlin, pp. 63–87.

Bujnicki,J.M. and Rychlewski,L. (2000) FEBS Lett., 486, 328–329.[CrossRef][ISI][Medline]

Bujnicki,J.M. and Rychlewski,L. (2001) J. Mol. Microbiol. Biotechnol., 3, 69–72.[CrossRef][ISI][Medline]

Bujnicki,J.M., Radlinska,M. and Rychlewski,L. (2001) Trends Biochem. Sci., 26, 9–11.[CrossRef][ISI][Medline]

Chmiel,A.A., Bujnicki,J.M. and Skowronek,K.J. (2005) BMC Struct. Biol., 5, 2.[CrossRef][Medline]

Cuff,J.A. and Barton,G.J. (2000) Proteins, 40, 502–511.[CrossRef][ISI][Medline]

Cuff,J.A., Clamp,M.E., Siddiqui,A.S., Finlay,M. and Barton,G.J. (1998) Bioinformatics, 14, 892–893.[Abstract]

Feder,M. and Bujnicki,J.M. (2005) BMC Genomics, 6, 21.[CrossRef][Medline]

Fischer,D. (2000) Pacific Symp. Biocomp., 119–130.

Fiser,A. and Sali,A. (2003) Methods Enzymol., 374, 461–491.[ISI][Medline]

Hooft,R.W., Vriend,G., Sander,C. and Abola,E.E. (1996) Nature, 381, 272.[ISI][Medline]

Horton,N.C., Dorner,L.F. and Perona,J.J. (2002) Nat. Struct. Biol., 9, 42–47.[CrossRef][ISI][Medline]

Huai,Q., Colandene,J.D., Chen,Y., Luo,F., Zhao,Y., Topal,M.D. and Ke,H. (2000) EMBO J., 19, 3110–3118.[Abstract/Free Full Text]

Jaroszewski,L., Rychlewski,L. and Godzik,A. (2000) Protein Sci., 9, 1487–1496.[Abstract]

Jeltsch,A., Wenz,C., Wende,W., Selent,U. and Pingoud,A. (1996) Trends Biotechnol., 14, 235–238.[CrossRef][ISI][Medline]

Jones,D.T. (1999) J. Mol. Biol., 287, 797–815.[CrossRef][ISI][Medline]

Karplus,K., Karchin,R., Draper,J., Casper,J., Mandel-Gutfreund,Y., Diekhans,M. and Hughey,R. (2003) Proteins, 53, Suppl. 6, 491–496.[CrossRef][ISI][Medline]

Kelley,L.A., MacCallum,R.M. and Sternberg,M.J. (2000) J. Mol. Biol., 299, 499–520.[ISI][Medline]

Kobayashi,I. (2001) Nucleic Acids Res., 29, 3742–3756.[Abstract/Free Full Text]

Kosinski,J., Cymerman,I.A., Feder,M., Kurowski,M.A., Sasin,J.M. and Bujnicki,J.M. (2003) Proteins, 53, Suppl. 6, 369–379.[CrossRef][ISI][Medline]

Kovall,R.A. and Matthews,B.W. (1998) Proc. Natl Acad. Sci. USA, 95, 7893–7897.[Abstract/Free Full Text]

Kurowski,M.A. and Bujnicki,J.M. (2003) Nucleic Acids Res., 31, 3305–3307.[Abstract/Free Full Text]

Lanio,T., Jeltsch,A. and Pingoud,A. (2000) Protein Eng., 13, 275–281.[CrossRef][ISI][Medline]

Lau,P.C., Forghani,F., Labbe,D., Bergeron,H., Brousseau,R. and Holtke,H.J. (1994) Mol. Gen. Genet., 243, 24–31; erratum, Mol. Gen. Genet., 244, 167.[CrossRef][ISI][Medline]

Lukacs,C.M. and Aggarwal,A.K. (2001) Curr. Opin. Struct. Biol., 11, 14–18.[CrossRef][ISI][Medline]

Lundstrom,J., Rychlewski,L., Bujnicki,J.M. and Elofsson,A. (2001) Protein Sci., 10, 2354–2362.[Abstract/Free Full Text]

Luthy,R., Bowie,J.U. and Eisenberg,D. (1992) Nature, 356, 83–85.[CrossRef][ISI][Medline]

Martin,A.M., Sam,M.D., Reich,N.O. and Perona,J.J. (1999) Nat. Struct. Biol., 6, 269–277.[CrossRef][ISI][Medline]

McGuffin,L.J., Bryson,K. and Jones,D.T. (2000) Bioinformatics, 16, 404–405.[Abstract]

Meiler,J. and Baker,D. (2003) Proc. Natl. Acad. Sci. USA, 100, 12105–12110.[Abstract/Free Full Text]

Newman,P.C., Williams,D.M., Cosstick,R., Seela,F. and Connolly,B.A. (1990) Biochemistry, 29, 9902–9910.[CrossRef][ISI][Medline]

Ouali,M. and King,R.D. (2000) Protein Sci., 9, 1162–1176.[Abstract]

Pace,C.N., Vajdos,F., Fee,L., Grimsley,G. and Gray,T. (1995) Protein Sci., 4, 2411–2423.[Abstract/Free Full Text]

Pawlak,S.D., Radlinska,M., Chmiel,A.A., Bujnicki,J.M. and Skowronek,K.J. (2005) Nucleic Acids Res., 33, 661–671.[Abstract/Free Full Text]

Pingoud,A.M. (ed.) (2004) Restriction Endonucleases. Springer, Berlin.

Roberts,R.J., Vincze,T., Posfai,J. and Macelis,D. (2003) Nucleic Acids Res., 31, 418–420.[Abstract/Free Full Text]

Rost,B., Yachdav,G. and Liu,J. (2004) Nucleic Acids Res., 32, W321–326.[Abstract/Free Full Text]

Sambrook,J., Russell,D.W. and Sambrook,J. (eds) (2002) Molecular Cloning. A Laboratory Manual, 3rd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

Samuelson,J.C. and Xu,S.Y. (2002) J. Mol. Biol., 319, 673–683.[CrossRef][ISI][Medline]

Sanchez,R., Pieper,U., Melo,F., Eswar,N., Marti-Renom,M.A., Madhusudhan, M.S., Mirkovic,N. and Sali,A. (2000) Nat. Struct. Biol., 7, Suppl., 986–990.[CrossRef][Medline]

Sapranauskas,R., Sasnauskas,G., Lagunavicius,A., Vilkaitis,G., Lubys,A. and Siksnys,V. (2000) J. Biol. Chem., 275, 30878–30885.[Abstract/Free Full Text]

Sasin,J.M. and Bujnicki,J.M. (2004) Nucleic Acids Res., 32, W586–589.[Abstract/Free Full Text]

Shi,J., Blundell,T.L. and Mizuguchi,K. (2001) J. Mol. Biol., 310, 243–257.[CrossRef][ISI][Medline]

Sreerama,N. and Woody,R.W. (2000) Anal. Biochem., 287, 252–260.[CrossRef][ISI][Medline]

Taylor,J.D. and Halford,S.E. (1989) Biochemistry, 28, 6198–6207.[CrossRef][ISI][Medline]

Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997) Nucleic Acids Res., 25, 4876–4882.[Abstract/Free Full Text]

Townson,S.A., Samuelson,J.C., Vanamee,E.S., Edwards,T.A., Escalante,C.R., Xu,S.Y. and Aggarwal,A.K. (2004) J. Mol. Biol., 338, 725–733.[CrossRef][ISI][Medline]

Venter,J.C. et al. (2004) Science, 304, 66–74.[Abstract/Free Full Text]

Wheeler,D.L. et al. (2004) Nucleic Acids Res, 32, Database issue, D35–40.[Abstract/Free Full Text]

Wilson,G.G. and Murray,N.E. (1991) Annu. Rev. Genet., 25, 585–627.[CrossRef][ISI][Medline]

Winkler,F.K., Banner,D.W., Oefner,C., Tsernoglou,D., Brown,R.S., Heathman, S.P., Bryan,R.K., Martin,P.D., Petratos,K. and Wilson,K.S. (1993) EMBO J., 12, 1781–1795.[Abstract]

Zhou,H. and Zhou,Y. (2004) Proteins, 55, 1005–1013.[CrossRef][Medline]

Received December 8, 2004; revised March 5, 2005; accepted March 9, 2005.

Edited by Lynne Regan