Homology modeling of the central catalytic domain of insertion sequence ISLC3 isolated from Lactobacillus casei ATCC 393

Thy-Hou Lin1, Keng-Chang Tsai and Ta-Chun Lo

Department of Life Science, National Tsing Hua University, Hsinchu 30043, Taiwan

1 To whom correspondence should be addressed. e-mail: thlin{at}life.nthu.edu.tw


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
The tertiary structure of the central catalytic domain of insertion sequence ISLC3 isolated from Lactobacillus casei ATCC 393 was predicted using the homology modeling approach. The novel insertion sequence was isolated by us from the template bacteriophage {phi}A3 of L.casei ATCC 393. The number of amino acid residues of the ISLC3 central catalytic domain was 116 and was treated as the query sequence. There were five Web-available threading methods used to find some primary structure templates for the query sequence. These primary templates were further screened using the SWISS-MODEL Protein Modeling Server and the default parameter settings therein to give six final structure templates. All of these final structure templates were the integrase (IN) protein of retroviruses. Multiple sequence alignment using these IN sequences against the query one revealed the signature DDE motif. Based on the structures of these final templates, the structure of the query sequence was constructed using the InsightII/Discover/Homology programs. A metal ion, Mg2+, was inserted into the center of the putative catalytic pocket formed by the DDE residues of the predicted structure in the final rounds of refinement by molecular dynamics (MD) simulations. The structure with a metal ion included was designated withMg and that without a metal ion was designated freeMg. The average exposed surface area of some hydrophobic residues of both the predicted freeMg and withMg structures were computed and compared with those computed for the six structure templates. Whereas the predicted withMg structure was slightly more exposed than the predicted freeMg structure, the former appeared to be more stable than the latter, as revealed by the lower conformation energy recorded for the former during the structure refinement by MD simulations. To verify further the predicted structures, the coordinates of both predicted structures were fed into the ERRAT Protein Verification Server. It was found that the quality of the predicted withMg structure was much better than that of the freeMg structure. The validation results also indicated that regions of the predicted withMg structure that can be rejected at the 95% confidence level were ~20% whereas those which can be rejected at the same level for the six structure templates were ~10%. The predicted withMg structure was also docked into a short oligonucleotide representing the substrate of the ISLC3 transposase using the DOCK_4.0.2 program. It was found that both Glu140 and Asp68 residues of the DDE motif of the predicted withMg structure were able to form hydrogen bonds with the DNA substrate, which was similar to what was observed in a docking study using the retrovirus IN 1asu and its DNA substrate.

Keywords: insertion sequence/homology modeling/protein structure/threading


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Insertion sequences (ISs) are mobile DNA elements capable of mediating various types of DNA rearrangements such as transposition, deletion, inversion and cointegration. They are usually 0.8–2.5 kb long and encode a transposase protein. To date, more than 600 IS elements have been isolated from both eubacteria and archaea. Except for those highly similar variants from the same or related hosts, IS elements are considered heterogeneous at the nucleotide sequence level. Many can be grouped into families on the basis of conservation of motifs in their putative transposase amino acid sequences and their terminal nucleotide sequences. The IS3 family is one of the largest families (Mahillon and Chandler, 1998Go). The IS3 family can be divided into the subgroups IS407, IS2, IS3, IS51 and IS150 on the basis of alignment of the transposase sequences (Mahillon and Chandler, 1998Go). Members of the IS3 family contain two overlapping open reading frames (ORFs) and the transposase protein is generated by a programmed ribosomal frameshifting between the two ORFs. The transposase C-terminal region contains the characteristic DD(35)E motif, i.e. conservation of the acidic amino acid triad with several additional residues and 35 residues between the last two conserved acidic residues, D and E. It has been suggested that the acidic amino acid triad interacts with the terminal 2 or 3 bp of the element to position the IS ends correctly in the catalytic site during transposition (Jenkins et al., 1997Go; Mahillon and Chandler, 1998Go). The DDE motif is also present on the sequences of integrase (IN) proteins of several retroviruses and retrotransposons, as has been demonstrated by a multiple sequence alignment (Khan et al., 1991Go). A helix–turn–helix motif at the N-terminus of the transposae is recognized as the possible DNA binding domain of the IS (Haren et al., 1999Go), whereas the N-terminal DNA binding domain of the IN is identified as the HHCC ‘Zn finger’-like motif (Cai et al., 1997Go; Eijkelenboom et al., 1997Go). The catalytic activity of both the transposase and IN requires Mg2+ or Mn2+ as the cofactor (Haren et al., 1999Go).

Until recently, the ISs isolated from lactobacilli were still rare. Two ISs, ISL2 and ISL3, were isolated as factors influencing lactose utilization of Lactobacillus helveticus (Zwahlen and Mollet, 1993Go) and L.bulgaricus (Germond et al., 1993Go), respectively. ISL2 was isolated as an insertion in spontaneous lactose-negative mutant (Zwahlen and Mollet, 1993Go) and ISL3 was isolated from a deletion-prone region following the lacZ gene (Germond et al., 1993Go). The insertion element IS1223 was discovered in plasmid pSA3 resolution products recovered from transconjugants in L.johnsonii (Walker and Klaenhammer, 1993Go). Transpositional activity of IS1163 was found to abolish the lactocin S production of L.sake by integration into the lactocin S operon (Skaugen and Nes, 1994Go). Recently, we have isolated a novel insertion element ISLC3 (AF445084) from a temperate bacteriophage {phi}A3 of L.casei ATCC393. The new IS was classified as a member of the IS3 family. We found that the transposition of ISLC3 created some circles on which 25 bp from the left-inverted repeat in the junction region were deleted. This unusual deletion was observed in L.casei ATCC393 and also in an Escherichia coli model system. The unusual deletion in the junction region also generated a promoter which was much stronger in activity than the indigenous one. Here, we used several theoretical structure prediction methods to derive a three-dimensional (3D) structure model for the catalytic domain of the newly identified insertion element ISLC3. Our work was aimed at providing a more profound insight into the structural features governing the catalytic activity exhibited by the newly identified transposase sequence. We used five threading methods available on the Web to find some structure templates for the central catalytic domain of ISLC3. Further screening and refinement of the predicted 3D structures was conducted using the InsightII/Discover/Homology programs (BIOSYM Technology, San Diego, CA) or the SWISS-MODEL Protein Modeling Server (Guex and Peitsch, 1997Go) on the Web. To compare the predicted 3D structures with the known structures of some INs and transposases, an Mg2+ cofactor was placed in the center of the predicted catalytic domain in the final structure refinement using molecular dynamics (MD) simulations. The predicted structures with and without the cofactor Mg2+ included were designated freeMg and withMg structures and both were validated using several methods. A biochemical study where the predicted withMg structure was docked into a short oligonucleotide representing the substrate of the ISLC3 transposase was also conducted. It was found that Glu104 and Asp68 of the DDE motif of the predicted withMg structure could form hydrogen bonds with the DNA substrate similar to that observed by docking the retrovirus IN 1asu into its DNA substrate.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
The insertion element ISLC3 (AF445084) was identified by us from a novel temperate bacteriophage {phi}A3 (unpublished data). The novel temperate bacteriophage was induced from L.casei ATCC393 with 0.2 µg/ml mitomycin-C. The ISLC3 sequence was found to be inserted into the coding region of the putative phage structure genes. The complete sequence of ISLC3 was determined and is shown in Figure 1. The total length of ISLC3 was 1351 bp and both ends of the element were flanked by 37 bp inverted repeats (IRs). Apparently, there were two open reading frames, orfA (from 82 to 333 bp) and orfB (from 387 to 1229 bp) on the ISLC3 sequence, which were in phase 0 and –1 (Sekine and Ohtsubo, 1989Go), respectively (Figure 1). The two orfs may be expressed since a {sigma}70-like promoter (Bao et al., 1997Go) was located upstream of orfA but not orfB (Figure 1). The potential A7G shift window motif (Sekine et al., 1994Go) which is present in other members of the IS3 family can be found at the N-tail of orfA (from 300 to 307 bp) (Figure 1). A pair of 8 bp IRs that may form a stem-and-loop structure in the mRNA can be seen downstream of the A7G motif (Figure 1). The shifty motif and the downstream IRs together could promote –1 translational frameshifting (Sekine and Ohtsubo, 1989Go; Bao et al., 1997Go) at the motif, thus generating the OrfAB fusion protein, the transposase of the IS. The center residues involved in the catalytic DDE motif are highlighted in Figure 1. The number of residues of the central catalytic domain predicted was 116 and was treated as the query sequence. The location of the query sequence in the transposase identified is shown in Figure 2.



View larger version (73K):
[in this window]
[in a new window]
 
Fig. 1. The complete nucleotide sequence of ISLC3. The deduced amino acid sequence for the longest ORF identified is shown below the corresponding nucleotide sequence. The first and second start codons are labeled with bold letters. The important DDE motif is also highlighted.

 


View larger version (21K):
[in this window]
[in a new window]
 
Fig. 2. The amino acid sequence of the ISLC3 central catalytic domain. The important DDE motif in the sequence is also marked. The position of the central catalytic domain in the entire ISLC3 transposase is also shown.

 
The five Web-available threading methods used to find the structure templates for the ISLC3 central catalytic domain were as follows: TOPITS (http://dodo.cpmc.columbia.edu/predict protein/); HMM (http://www.cse.ucsc.edu/research/compbio/ HMM-apps/HMM-applications.html); 3D-JIGSAW (http://www. bmm.icnet.uk/~3djigsaw/); 3D-PSSM (http://www.sbg. bio.ic.ac.uk/~3dpssm/); and HFR (http://www.cs.bgu.ac.il/~bioinbgu/). TOPITS (Rost, 1995Go) and 3D-PSSM (Kelley et al., 2000Go) are methods based on matching the predicted secondary structure and solvent accessibility or solvation potentials of the query sequence with those of the proteins of known structures. HFR (Fischer, 2000Go) is a hybrid method that combines results from five threading programs to search the most consistent fold prediction among them. HMM (Karplus et al., 1998Go) uses a hidden Markov model engine to compare the query sequence with sequences of known protein structures to derive a possible structure class for the query one. The 3D-JIGSAW (Bates et al., 2001Go) Web server builds 3D models for proteins based on homologues of known structures. Through the threading experiments on the Web sites, we chose several templates for the query sequence. These templates were further screened using the SWISS-MODEL Protein Modeling Server (Guex and Peitsch, 1997Go) on the Web with the following parameter settings: (i) BLAST search P value <0.00001, (ii) global degree of sequence identity (SIM) >25%, and (iii) minimal projected model length = 25 amino acid residues. Proteins 1asu (Bujacz et al., 1995Go), 1bi4C (Maignan et al., 1998Go), 1bisB (Goldgur et al., 1998Go), 1c0m (Yang et al., 1999Go), 1bl3C (Maignan et al., 1998Go) and 2itg (Bujacz et al., 1996Go) were then selected as the structure templates for the query sequence and all of them were the INs of retroviruses.

To construct a structure template for the query sequence, we used the InsightII/Discover/Homology programs implemented on a Silicon Graphics computer. Sequences of the six INs were aligned against the query one to find regions where the structures of these proteins were most matched. The matched structures were taken as the structures of the regions for the query sequence. Loop searching of the Protein Data Bank (Berman et al., 2002Go) then yielded the missing fragments of the query sequence. Residues showing bad contacts were replaced with their rotamers and also manually adapted. One round of minimization (300 steps of steepest descent plus 500 steps of conjugate gradient) was performed while keeping the conserved residues restrained to their initial positions in order to relax the loops and bad contacts. The tertiary structure constructed for the query sequence was refined using the InsightII/Discover/Discover3 MD simulation programs with the consistent valence force field. The protein was held inside a box of water molecules and the temperature was kept at 298 K during the MD simulation runs. The cofactor Mg2+ was added to the final round of structure refinement by the MD simulation. The cofactor Mg2+ was initially placed in the center of a triangle formed by the C{alpha}-atoms of the three central catalytic residues, namely Asp7, Asp68 and Glu104 (Figure 2). The secondary structure of each protein was defined using the Kabsch–Sander DSSP program (Kabsch and Sander, 1983Go) implemented in the SYBYL 6.9 package (Tripos Associates, St. Louis, MO). The exposed molecular surfaces of all the known or predicted structures were computed using the Connolly MS program (Connolly, 1983Go) with a probe size of 1.4 and all the structures were displayed using the Kraulis MolScript v2.1 program (Kraulis et al., 1994Go).

The DOCK_4.0.2 program (Kuntz et al., 1994Go) was used for docking both the predicted withMg and retrovirus IN 1asu structures into their corresponding DNA substrates. The sequences of these two DNA substrates are displayed in Figure 9. These were the rigid docking processes where the charges on DNAs and proteins were added using th Amber95_All parameters in the SYBYL 6.9 program. A B-form DNA was used as the starting conformation for each DNA substrate. Having fixed the conformation of DNA after the rigid docking process, the position of DNA was further adjusted to be close to that of the DDE motif of protein structure using the Swiss-PdbViewer_3.7 program. The docked and adjusted structure of the DNA–protein complex was further refined by MD simulation runs using the InsightII/Discover program. The structure of the DNA–protein complex was solvated in five layers of water and the Amber force field was employed. The DNA–protein structure complexes were briefly energy minimized and then subjected to 104 steps of MD simulation runs.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
All the Web servers used found some structure templates with significant scores for the query sequence except TOPITS (Rost, 1995Go), which did not give any significant result. The searched structure templates and their corresponding scores are listed in Table I. It is interesting that all of these structure templates were single catalytic or catalytic plus some other domains of retrovirus INs. The sequences of these proteins were fed into the SWISS-MODEL Protein Modeling Server (Guex and Peitsch, 1997Go) with fixed parameter settings given by the Server to screen the structure templates further. The Server gave predictions of 3D structures for six proteins: 1asu (Bujacz et al., 1995Go), 1bi4C (Maignan et al., 1998Go), 1bisB (Goldgur et al., 1998Go), 1c0m (Yang et al., 1999Go), 1bl3C (Maignan et al., 1998Go), and 2itg (Bujacz et al., 1996Go). Whereas 1asu (Bujacz et al., 1995Go) and 1c0m (Yang et al., 1999Go) were the structures of the catalytic domain of INs of avian sarcoma virus (ASV) and Rous sarcoma virus (RSV), 1bi4C (Maignan et al., 1998Go), 1bisB (Goldgur et al., 1998Go), 1bl3C (Maignan et al., 1998Go) and 2itg (Bujacz et al., 1996Go) were the structures of the catalytic domain of IN of human immunodeficiency virus type 1 (HIV-1). The sequences of these proteins were aligned against the query sequence and the results are presented in Figure 3a. As shown in the alignment, the three central catalytic residues involved in the important DDE motif were identified to be D of the aligned position 18, D of the aligned position 79 and E of the aligned position 115, respectively (Figure 3a). The sequence homology of the query sequence to the sequences of 1asu (Bujacz et al., 1995Go), 1c0m (Yang et al., 1999Go), 1bi4C (Maignan et al., 1998Go), 1bisB (Goldgur et al., 1998Go), 1bl3C (Maignan et al., 1998Go) and 2itg (Bujacz et al., 1996Go) were 31.9, 31.9, 29.2, 29.2, 29.2 and 29.2%, respectively. Apparently, the sequence homology of the query sequence to the ASV or RSV IN sequences was slightly greater than that to the HIV-1 sequence. However, the overall sequence homology between the query sequence and those of all the structure templates searched was low. The transposase sequences of several ISs were also aligned against that of ISLC3 and the results are presented in Figure 3b. Apparently, the IS153 sequence was among the most homologous ones to the ISLC3 sequence. The DDE motif was also manifested by the alignment, although sequence homology between the ISLC3 sequence and some IS sequences was low (Figure 3b).


View this table:
[in this window]
[in a new window]
 
Table I. Template searching results for the ISLC3 central catalytic domain by Web-available methods
 



View larger version (141K):
[in this window]
[in a new window]
 
Fig. 3. (a) Multiple sequence alignment for the sequences of six searched structural templates against the sequence of the ISLC3 central catalytic domain. (b) Multiple sequence alignment for the transposase sequences of seven ISs against that of the ISLC3. The central catalytic domains of these sequences can be located with the DDE motif identified.

 
A comparison of structures for all the template structures searched was made by superposition of the coordinates of C{alpha}-atoms of each structure on to each other using the SYBYL Fit module (Tripos Associates) and the results are presented in Table II. The difference in structural features between these template structures was low since the root-mean-square deviation (r.m.s.d.) values computed between them were low, as can be seen from the table. The structures were further compared using MolScript (Kraulis et al., 1994Go) displays for the template structures of 1asu (Bujacz et al., 1995Go) (ASV), 1c0m (Yang et al., 1999Go) (RSV) and 1bisB (Goldgur et al., 1998Go) (HIV-1). There were three helices and five ß-strands which were commonly exhibited by all the structure templates (data not shown). There was also a striking similarity in structure features of the three central catalytic residues since all these residues, namely D64, D121, E157 of 1asu (Bujacz et al., 1995Go) (ASV), D64, D121, E157 of 1c0m (Yang et al., 1999Go) (RSV) and D64, D116, E152 of 1bisB (Goldgur et al., 1998Go) (HIV-1) were coincidently identified to be on ß-strand, coil and helix, respectively (data not shown). The difficulty in predicting a 3D structure for the query sequence lies in the fact that its sequence homology with the templates is low (Figure 3a) whereas the structures of the templates are very similar. The predicted 3D structure for the query sequence was denoted freeMg and is depicted in Figure 4. Apparently, there were four ß-strands and two helices predicted for the query sequence. Most of these ß-strands were short and the major helix was classified as the {pi} one by the Kabsch and Sander DSSP program (Kabsch and Sander, 1983Go) implemented in the SYBYL 6.9 package. However, an apparent catalytic pocket was formed by the three central catalytic residues D7, D68 and E104 (Figure 4). The backbone dihedral angles {phi} and {psi} of templates 1asu (Bujacz et al., 1995Go), 1c0m (Yang et al., 1999Go) and 1bisB (Goldgur et al., 1998Go) plus those of the freeMg were determined using the SYBYL 6.9 package to construct a Ramachandran plot (Ramachandran and Sasisekharan, 1968Go) as shown in Figure 5. This analysis shows that nearly 90% of the predicted residues of freeMg were in the favored regions as those of the known template structures.


View this table:
[in this window]
[in a new window]
 
Table II. R.m.s.d. values computed between the coordinates of C{alpha}-atoms of each pair of templates searched
 


View larger version (43K):
[in this window]
[in a new window]
 
Fig. 4. The predicted structure freeMg displayed by the MolScript v2.1 program. The DDE (D7, D68 and E104) residues are shown using the ball-and-stick rendering.

 


View larger version (13K):
[in this window]
[in a new window]
 
Fig. 5. Ramachandran analysis of the backbone dihedral angles PSI ({psi}) and PHI ({phi}) of templates 1asu, 1c0m, 1bisB and the predicted structure freeMg. Values of the dihedral angles of all the templates are represented with diamonds and those of the predicted structure freeMg with crosses.

 
The divalent metal ion requirements of HIV-1 IN have been investigated by Engelman and Craigie (Engelman and Craigie, 1995Go) and they found that the protein is capable of promoting efficient 3' processing and DNA strand transfer with either Mn2+ or Mg2+. It is known that members of the transposase protein family also carry the DDE motif. These acidic residues have been proposed to coordinate divalent ions (presumably Mg2+ in vivo, although Mn2+ also supports activity in vitro) (Rice and Baker, 2001Go). The Tn5 structure has one Mn2+ ion in each active site, coordinated by two DDE motif residues (D97 and E326) and the 3'-OH of the transferred DNA strand (Davies et al., 2000Go). This Mn2+ is attractively positioned for a role in activating this 3'-OH DNA end for nucleophilic attack on a new DNA phosphodiester (Davies et al., 2000Go). In transposases of other systems such as IS10 (Sakai et al., 1995Go), Tn7 (Sarnovsky et al., 1996Go) or IS911 (Polard et al., 1996Go), Mg2+ or Mn2+ is essential for DNA cleavage and strand transfer and cannot be substituted by Ca2+ (Haren et al., 1999Go). However, two rather than one metal ions are involved in the catalytic process of these latter enzymes (Haren et al., 1999Go). One Mg2+ ion is proposed to act as a Lewis acid which coordinates with two aspartate residues while the second Mg2+ is complexed with the non-bridging oxygen on the glutamate residue to act as a base for deprotonization of the incoming nucleophile (Haren et al., 1999Go). To reflect the reality in the catalytic process, an Mg2+ ion was inserted in the predicted freeMg structure and some further MD refinement steps were conducted for the predicted metal ion–protein structure. The newly predicted structure with an Mg2+ ion included was designated the withMg structure. The predicted withMg structure which carries an Mg2+ ion in its active site is displayed in Figure 6. The difference between the predicted structure of withMg (Figure 6) and the known structures of HIV-1 IN and the central catalytic domain of Tn5 transposase was evident. There was also an apparent difference between the predicted structures of freeMg (Figure 4) and withMg (Figure 6). The major helix (residues 37–55) predicted for freeMg was slightly distorted in withMg and some ß-strands (residues 32–34, 62–64, 84–85 and 108–112) originally predicted for freeMg were absent in the withMg structure. To compare these structures further, the distances between the Mg2+ or Mn2+ ion and every atom of the three central catalytic residues were determined for the withMg and the two known structures 1biu (Goldgur et al., 1998Go) and 1f3i (Davies et al., 2000Go), as presented in Table III. The distances between the Mg2+ ion and the first two DD residues in structure 1biu (Goldgur et al., 1998Go) were apparently shorter than those between the ion and the E residue (Table III). This was slightly different from that in structure 1f3i (Davies et al., 2000Go), in which the distances between the Mn2+ ion and the first D and E residues were longer than that between the ion and the second D residue (Table III). However, in the predicted structure withMg the distances between the Mg2+ ion and all the three central catalytic residues were approximately the same (Table III). In general, the central catalytic pocket of the predicted withMg structure was more similar in size to that of structure 1biu (Goldgur et al., 1998Go) than to structure 1f3i (Davies et al., 2000Go) (Table III).



View larger version (44K):
[in this window]
[in a new window]
 
Fig. 6. The predicted structure withMg displayed by the MolScript v2.1 program. The DDE (D7, D68 and E104) residues are shown using the ball-and-stick rendering and the central ball represents the metal ion Mg2+.

 

View this table:
[in this window]
[in a new window]
 
Table III. Distances (Å) between each atom of each central catalytic residue and the metal ion of proteins 1biu and 1f3i and the predicted structure withMg
 
The solvent-exposed molecular surface of a non-polar side chain has been used as a criterion to discriminate native proteins and incorrectly folded models. A direct numerical measure of burial of the non-polar atoms is the non-polar/polar side chain surface area ratio, with values of ~2.0–2.2 and greater indicating incorrect folding in the case of hemerythrin and the VL domain (Novotny et al., 1988Go). We have computed the averaged exposed molecular surface area for each hydrophobic residue present in all the proteins studied or predicted and the results are presented in Table IV. Except for residues Ile, Val, Leu and Cyc, the average exposed molecular surface area computed for the other hydrophobic residues of both the predicted freeMg or withMg structures was nearly the same as that computed for all the known structures. The averaged exposed molecular surface area computed for the above four hydrophobic residues for the predicted structures was slightly larger than that computed for all the known structures, indicating that some incorrect folds might be present in the predicted structures. Further, the withMg structure appears to be slightly more exposed than the freeMg structure since the average exposed molecular surface area computed for the above four residues for the former was slightly larger than that for the latter. The conformation energy of the withMg structure was lower than that of the freeMg structure during the course of structure refinement using the MD simulations, as shown in Figure 7. The number of water molecules used to solvate the withMg structure was 877 whereas that used to solvate the freeMg structure was 911 during the MD simulation runs. To verify the predicted structures further, the coordinates of both predicted structures were fed into the ERRAT Protein Verification Server (Colovos and Yeates, 1993Go) and the results are presented in Figure 8a and b for the freeMg and withMg structures, respectively. ERRAT is a program (Colovos and Yeates, 1993Go) for verifying protein structures determined by crystallography. Error values are the position of a sliding nine-residue window. The error function is based on the statistics of non-bonded structures in the query structure compared with a database of reliable high-resolution structures (Colovos and Yeates, 1993Go). The results indicate that the percentage below the 95% confidence limit computed for the freeMg structure was 34.3 whereas that for the withMg structure was 78.7. In other words, the withMg structure was significantly improved relative to the freeMg one according to the analyses. However, the quality of the predicted withMg structure was slightly worse than that of the template structures since regions of the structure that can be rejected at the 95% confidence level for the former was ~20% while that for the latter was ~10%.


View this table:
[in this window]
[in a new window]
 
Table IV. Average exposed surface areas (Å2) computed for several hydrophobic residues of proteins 1asu, 1c0m, 1bis, 1biu and 1f3i and the predicted structures freeMg and withMg
 


View larger version (12K):
[in this window]
[in a new window]
 
Fig. 7. Conformation energy of the structures freeMg and withMg recorded during the course of structure refinement by the MD simulations. The number of water molecules used to solvate the freeMg and withMg structures was 911 and 877, respectively.

 



View larger version (49K):
[in this window]
[in a new window]
 
Fig. 8. Structure validation results for (a) the freeMg structure and (b) the withMg structure by the ERRAT Protein Verification Server.

 
The predicted withMg structure was further validated and refined by a biochemical approach by docking the structure into a short oligonucleotide (Figure 9) representing the substrate of the ISLC3 transposase. As a control, the ASV IN 1asu was also docked into its short DNA substrate (Figure 9). These were rigid docking processes where the parameter maximum_orientations was set at 3000 in the DOCK_4.0.2 program (Kuntz et al., 1994Go). The docked structures of protein–DNA complexes were solvated in water for further refinement by substantial MD simulation runs. The docked and MD refined structures of the protein–DNA complexes for 1asu and withMg are depicted in Figure 10a and b, respectively. The docking processes reveal that the predicted withMg structure may function properly as the central catalytic domain of ASV IN on their own DNA substrates. The docking causes interactions between the DDE motif of both the ASV IN and withMg structures with their corresponding DNA substrates as shown in Figure 11a and b, respectively. It is interested that in both the docked structures of protein–DNA complexes, only the second D and E residues of the DDE motif were found to interact with their DNA substrates (Figure 11a and b). The formation of hydrogen bonds with an AT dinucleotide and an A nucleotide at the DNA cleavage site by the Asp68 and Glu104 residues (the second D and E residues of the DDE motif) of the withMg-DNA structure is detected and depicted in Figure 12. It seems that the DNA is being torn apart at the cleavage site by the two opposite dragging forces afforded by the Asp68 and Glu104 residues of the predicted withMg structure (Figure 12). It is also found that there is no substantial change in the backbone structure of the predicted withMg domain before and after the biochemical refinement, as shown in Figure 13, where the backbone structures of the two are superimposed for comparison.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 9. The sequence of oligonucleotide substrates used in the docking studies for the ASV IN 1asu and the predicted withMg proteins. Each oligonucleotide was generated as a B-form DNA. The DNA cleavage site by each protein is marked with an arrow.

 



View larger version (115K):
[in this window]
[in a new window]
 
Fig. 10. The docked and further MD refined structure of (a) the ASV IN 1asu–DNA complex and (b) the predicted withMg–DNA complex. The structure of protein is represented in ribbon form and that of DNA in stick form.

 



View larger version (88K):
[in this window]
[in a new window]
 
Fig. 11. (a) The interaction of the DDE motif of the docked and further MD refined structure of the ASV IN 1asu with its DNA substrate. The first and second D residues are Asp64 and Asp121 and the E residue is Glu157. The DDE residues are represented in ball-and-stick form and the DNA in stick form. Note that there is no apparent interaction between the Asp64 residue and DNA. (b) The interaction of the DDE motif of the docked and further MD refined structure of the predicted withMg domain with its DNA substrate. The first and second D residues are Asp7 and Asp68 and the E residue is Glu104. The DDE residues are represented in ball-and-stick form and the DNA in stick form. The metal ion Mg2+ is shown as a ball. Note that there is no apparent interaction between the Asp7 residue and DNA.

 


View larger version (28K):
[in this window]
[in a new window]
 
Fig. 12. Formation of hydrogen bonds with an AT dinucleotide and an A nucleotide of DNA substrate by residues Asp68 and Glu104 of the DDE motif of the docked and further MD refined structure of the withMg–DNA complex. The DDE motif is represented in ball-and-stick form and the DNA in stick form. The metal ion Mg2+ is represented by a ball. The distances between the hydrogen bonds formed and the atoms involved are all marked.

 


View larger version (65K):
[in this window]
[in a new window]
 
Fig. 13. Superposition of the backbone structures of the predicted withMg domain before and after the biochemical refinement using the DOCK_4.0.2 program and MD simulation runs. The r.m.s.d. computed by the Magic fit module of the Swiss-PdbViewer_3.7 program is 2.92 Å. The structures before and after biochemical refinement are represented with black and gray sticks, respectively.

 
Conclusion

The accuracy of comparative modeling depends strongly on the degree of homology between sequences of the query and the templates on which the model is built. The structure model we present here for the central catalytic domain of ISLC3 transposase may be categorized as of low accuracy (Pieper et al., 2002Go) since the model is based on a sequence identity of only 30%. However, the prediction accuracy is greatly enhanced by inserting a metal ion into the predicted structure in the final steps of structural refinement. Although both the transposase and IN protein family carry the signature DDE motif, there are substantial mechanistic and structural differences within this protein family. It has been noted that domains outside the catalytic core are not highly conserved among many transposases (Haren et al., 1999Go). While a tetramer of IN has been proposed to be required for the integration activity of the protein, the self-association properties of transposases are complex and still poorly understood (Haren et al., 1999Go). The crystal structure of the Inh protein of IS50, a regulatory derivative of the transposase lacking the first 55 amino acids, has recently been determined (Davies et al., 1999Go). Here, too, the DDE triad (D119, D188 and E326) forms a distinct catalytic pocket with a similar fold to that found in IN, although the sequence homology between them is low (Davies et al., 1999Go). The importance of the DDE residues has been demonstrated by site-directed mutagenesis for several IN proteins plus the transposases of bacteriophage Mu (Baker and Luo, 1994Go), Tn7 (Sarnovsky et al., 1996Go), IS10 (Junop and Haniford, 1997Go), Tc1/3 (Vos and Plasterk, 1994Go) and IS911 (Haren, 1998Go). Many of these results can now be understood from the known structure of the IN catalytic domains. It is also known that both the catalytic domain of the IN/transposase group and of other enzymes that promote phosphoryl transfer reactions, notably RNaseH (Grindley and Leschziner, 1995Go) and the RuvC resolvase (Rice et al., 1996Go), exhibit similar topologies. Therefore, the model presented here can serve as a guide for the allocation of amino acid residues of importance for further investigations or for the further refinement of the models of the ISLC3 central catalytic domain.


    Acknowledgement
 
This work was supported in part by a grant from the National Science Council, Taiwan (NSC91-2313-B007-001).


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Baker,T.A. and Luo,L. (1994) Proc. Natl Acad. Sci. USA, 91, 6654–6658.[Abstract]

Bao,T.H., Betermier,M., Polard,P. and Chandler,M. (1997) EMBO J., 16, 3357–3371.[Abstract/Free Full Text]

Bates,P.A., Kelley,L.A., MacCallum,R.M. and Sternber,M.J.E. (2001) Proteins: Struct. Funct. Genet., Suppl 5, 39–46.

Berman,H.M. et al. (2002) Acta Crystallogr., D58, 899–907.

Bujacz,G., Jaskolski,M., Alexandratos,J., Wlodawer,A., Merkel,G., Katz,R.A. and Skalka,A.M. (1995) J. Mol. Biol., 253, 333–346.[CrossRef][ISI][Medline]

Bujacz,G., Alexandratos,J., Qing,Z.L., Clement-Mella,C. and Wlodawer,A. (1996) FEBS Lett., 398, 175–178.[CrossRef][ISI][Medline]

Cai,M., Zheng,R., Caffrey,M., Craigie,R., Clore,G.M. and Gronenborn,A.M. (1997) Nat. Struct. Biol., 4, 567–577.[ISI][Medline]

Colovos,C. and Yeates,T.O. (1993) Protein Sci., 2, 1511–1519.[Abstract/Free Full Text]

Connolly,M.L. (1983) Science, 221, 709–713.[ISI][Medline]

Davies,D.R., Braam,L.M., Reznikoff,W.S. and Rayment,I. (1999) J. Biol. Chem., 274, 11904–11913.[Abstract/Free Full Text]

Davies,D.R., Goryshin,I.Y., Reznikoff,W.S. and Rayment,I. (2000) Science, 289, 77–85.[Abstract/Free Full Text]

Eijkelenboom,A.P., van den Ent,F.M., Vos,A., Doreleijers,J.F., Hard,K., Tullius,T.D., Plasterk,R.H., Kaptein,R. and Boelens,R. (1997) Curr. Biol., 7, 739–746.[ISI][Medline]

Engelman,A. and Craigie,R. (1995) J. Virol., 69, 5908–5911.[Abstract]

Fischer,D. (2000) In Maun,L. (ed.), Pacific Symposium on Biocomputing 2000, pp. 119–130.

Germond,J.E., Lapierre,L., Delley,M. and Mollet,B. (1993) FEMS Microbiol. Rev., 12, 10–27.

Goldgur,Y., Dyda,F., Hickman,A.B., Jenkins,T.M., Craigie,R. and Davies,D.R. (1998) Proc. Natl Acad. Sci. USA, 95, 9150–9154.[Abstract/Free Full Text]

Grindley,N.D. and Leschziner,A.E. (1995) Cell, 83, 1063–1066.[ISI][Medline]

Guex,N. and Peitsch,M.C. (1997) Electrophoresis 18, 2714–2723.[ISI][Medline]

Haren,L. (1998) PhD Thesis, Université Paul Sabatier, Toulouse.

Haren,L., Ton-Hoang,B. and Chandler,M. (1999) Annu. Rev. Microbiol., 53, 245–281.[CrossRef][ISI][Medline]

Jenkins,T.M., Esposito,D., Engelman,A. and Craigie,R. (1997) EMBO J., 16, 6849–6859.[Abstract/Free Full Text]

Junop,M.S. and Haniford,D.B. (1997) EMBO J., 16, 2646–2655.[Abstract/Free Full Text]

Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577–2637.[ISI][Medline]

Karplus,K., Barrett,C. and Hughey,R. (1998) Bioinformatics, 14, 846–856.[Abstract]

Kelley,L.A., MacCallum,R.M. and Sternberg,M.J.E. (2000) J. Mol. Biol., 299, 501–522.[CrossRef]

Khan,E., Mack,J.P.G., Katz,R.A., Kulkosky,J. and Skalka,A.M. (1991) Nucleic Acids Res., 19, 851–860.[Abstract]

Kraulis,P.J., Domaille,P.J., Campbell-Burk,S.L., Van Aken,T. and Laue,E.D. (1994) Biochemistry, 33, 3515–3531.[ISI][Medline]

Kuntz,I.D., Meng,E.C. and Shoichet,B.K. (1994) Acc. Chem. Res., 27, 117–123.[ISI]

Mahillon,J. and Chandler,M. (1998) Microbiol. Mol. Biol. Rev., 62, 725–774.[Abstract/Free Full Text]

Maignan,S., Guilloteau,J.P., Zhou-Liu,Q., Clement-Mella,C. and Mikol,V. (1998) J. Mol. Biol., 282, 359–368.[CrossRef][ISI][Medline]

Novotny,J., Rashin,A.A. and Bruccoleri,R.E. (1988) Proteins: Struct. Funct. Genet., 4, 19–30.[ISI][Medline]

Pieper,U., Eswar,N., Stuart,A.S., Llyin,V.A. and Sali,A. (2002) Nucleic Acids Res., 30, 255–259.[Abstract/Free Full Text]

Polard,P., Ton-Hoang,B., Haren,L., Betermier,M., Walczak,R. and Chandler,M. (1996) J. Mol. Biol., 264, 68–81.[CrossRef][ISI][Medline]

Ramachandran,G.N. and Sasisekharan,V. (1968) Adv. Protein Chem., 23, 283–437.[Medline]

Rice,P.A. and Baker,T.A. (2001) Nat. Struct. Biol., 8, 302–307.[CrossRef][ISI][Medline]

Rice,P., Craigie,R. and Davies,D.R. (1996) Curr. Opin. Struct. Biol., 6, 76–83.[CrossRef][ISI][Medline]

Rost,B. (1995) In Rawlings,C., Clark,D., Altman,R., Hunter,L., Lengauer,T. and Wodak,S. (eds), The Third International Conference on Intelligent Systems for Molecular Biology (ISMB), AAAI Press, Cambridge, pp. 314–321.

Sakai,J., Chalmers,R.M. and Kleckner,N. (1995) EMBO J., 14, 4374–4383.[Abstract]

Sarnovsky,R.J., May,E.W. and Craig,N.L. (1996) EMBO J., 15, 6348–6361.[Abstract]

Sekine,Y. and Ohtsubo,E. (1989) Proc. Natl Acad. Sci. USA, 86, 4609–4613.[Abstract]

Sekine,Y., Eisaki,N. and Ohtsubo,E. (1994) J. Mol. Biol., 235, 1406–1420.[CrossRef][ISI][Medline]

Skaugen,M. and Nes,I.F. (1994) Appl. Environ. Microbiol., 60, 2818–2829.[Abstract]

Vos,J.C. and Plasterk,R.H. (1994) EMBO J., 13, 6125–6132.[Abstract]

Walker,D.C. and Klaenhammer,T.R. (1993) Abstracts of the 93rd General Meeting of the American Society for Microbiology, H-230.

Yang,Z.N., Mueser,T.C., Bushman,F.D. and Hyde,C.C. (1999) J. Mol. Biol., 296, 535–548.[CrossRef][ISI]

Zwahlen,M.C. and Mollet,B. (1993) FEMS Microbiol. Rev., 12, 27.

Received December 9, 2002; revised August 21, 2003; accepted September 12, 2003.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (1)
Request Permissions
Google Scholar
Articles by Lin, T.-H.
Articles by Lo, T.-C.
PubMed
PubMed Citation
Articles by Lin, T.-H.
Articles by Lo, T.-C.