Unité de Recherche en Biologie Moléculaire, Facultés UniversitairesNotre-Dame de la Paix, B-5000 Namur, Belgium
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: consensus/fold recognition/periplasmic sugar-binding proteins/protein modeling/secondary structure prediction
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
However, knowledge-based modeling, also called comparative model building, can still be used when only low sequence homologies (below 25%) exist between a target sequence and protein structures (Bajorath et al., 1993; Vinals et al., 1995
; Tramontano, 1998
; Wouters and Baudoux, 1998
). Such modeling techniques start from the premise that adequate known 3D structures can be used as templates to model proteins of unknown structure, even if structural similarities are not detectable in terms of sequence. Identification of template candidates can be achieved using a number of new methods developed in the last few years, in particular efficient secondary structure prediction techniques (King and Sternberg, 1996
; Rost, 1996
) and fold recognition tools (threading) (Jones et al., 1992
; Sippl and Weitckus, 1992
; Rice et al., 1997
).
Correspondences between target and template residues, established from the results of the threading programs and using predicted secondary structure as guides, provide structural information for the construction of the target fold in a similar way to multiple sequence alignment for homology modeling (Aszodi et al., 1997).
In the present work, we propose to use a combination of sequence alignments, consensus of secondary structure predictions and fold recognition tools to identify a reasonable template and to increase the accuracy of the modeling process of P39.
It appears that complementary results are consistent with one another and that the model, although rough, allows us to make relevant hypotheses on the main structural features of the protein and to select potential target residues for site-specific mutagenesis studies. Considering the genetic context of the gene encoding P39, the model suggests a function for the protein.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Sequence analysis was performed using the P39 sequence as the query with homology search tools BLAST (Altschul et al., 1990), BLAST2 (or Gapped-BLAST) (Altschul et al., 1997
) and FASTA (Pearson and Lipman, 1988
; Pearson, 1990
) in the following databases: non-redundant GenBank CDS translations, PDB, SwissProt, SP-update and PIR. The P39 sequence was also compared, using BLAST and FASTA, with the Brookhaven Database (PDB) only, to detect potentially weak homologies with proteins of known structure.
In every case default parameters were used. Each tool provided a measure of the statistical significance of the alignment between the query sequence and each matching sequence.
Finally, we used BLOCKS server (Henikoff and Henikoff, 1994) to find structurally or functionally conserved stretches of residues and SignalP V1.1 server (Nielsen et al., 1997
) to detect signal peptide and cleavage sites.
Multiple alignment
Multiple alignments of P39 and sequences of interest were executed using ClustalW (Thompson et al., 1994) and MATCH-BOX (Depiereux et al., 1997
).
Secondary structure prediction
The target sequence was used as input to several web servers for secondary structure prediction, including PHD (Rost and Sander, 1993, 1994
; Rost et al., 1994
), DSC (King and Sternberg, 1996
), PREDATOR (Frishman and Argos, 1997
), SSP (Solovyev and Salamov, 1994
), NNPredict (McClelland and Rumelhart, 1988
; Kneller et al., 1990
) and the IBCP-Web server, based on the Gibrat (Gibrat et al., 1987
), Levin (Levin et al., 1986
), DPM (Deleage and Roux, 1987
) and SOPMA (Geourjon and Deleage, 1995
) methods. PHDhtm (Rost et al., 1995
, 1996
) was also used for the prediction of putative transmembrane helices.
Fold recognition
Fold recognition experiments were performed to detect similarities between protein 3D structure in spite of the lack of any statistically significant sequence similarity. For this we used the following programs: ProFIT 2.0 (Sippl and Weitckus, 1992), THREADER 2 (Jones et al., 1992
), UCLADOE (Fischer and Eisenberg, 1996
) and Topits (Rost, 1995
). We used standard libraries provided by the authors and kept all program command line options at default. For each method, results are given as a list of possible fold candidates in decreasing order of probability, where expected structural matches are ranked at the top of the list (highest Z-scores and lowest energy). Similarities of the various candidate folds were analyzed using the SCOP classification (Murzin et al., 1995
). The 3D coordinates of the best hits were extracted from PDB and primary sequences were retrieved from SwissProt (Bairoch and Apweiler, 1997
) or from FSSP (Holm and Sander, 1998
) databases.
Sequence-structure alignment
A consensus alignment was achieved manually to obtain the most reliable alignment between the target sequence and the template structure. This was obtained by combining (i) a structural alignment (from the FSSP database) of four homologous structures, (ii) the alignments generated by fold recognition methods and (iii) three alignments based on sequence similarity [ClustalW, Matchbox and Align (Myers and Miller, 1988)]. This consensus was optimized using the consensus of predicted secondary structure and information from the 3D structure of the template.
Modeling
The targettemplate (1D3D) alignment was edited with the HOMOLOGY module of MSI (San Diego, CA), then submitted to the program MODELLER4 (Sali and Blundell, 1993) to obtain the 3D model of the target. Graphical displays were generated with the INSIGHTII molecular modeling system of MSI (Molecular Simulations, 1996
). The resulting model was checked with PROSAII (Sippl, 1993
) and PROCHECK (Laskowski et al., 1993
).
![]() |
Results and discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Only two sequences were found to be significantly similar to P39 by each of the three homology search tools (BLAST, BLAST2 and FASTA). The first hit is a partial sequence of an excreted protein of unknown function from Leptothrix discophora, excA (Corstjens, 1993); the second is the precursor of a periplasmic multiple sugar-binding protein from Streptococcus mutans, MsmE (Russell et al., 1992
). Among other less significant matches from FASTA we found three additional periplasmic binding proteins: a putative periplasmic maltose-binding protein precursor from Thermotoga maritima (Liebl et al., 1997
), a glycerol-3-phosphate-binding periplasmic protein precursor from Escherichia coli (Overduin et al., 1988
) and a putative maltose-binding protein from Streptomyces coelicolor (van Wezel et al., 1997
). Homology search against the PDB database highlighted three maltose/maltodextrin-binding protein structures, but without any statistically significant sequence homology [P(N) > 0.99].
Multiple alignments performed on these sequences and P39 did not allow us to detect clearly conserved regions. However, searches in the BLOCKS database revealed that P39 contains two of the eight signatures described for maltose-binding proteins (Accession No. PR00181): residues 127146 aligned with D signature and residues 334353 aligned with H signature.
Finally, analysis of the predicted amino acid sequence with the SignalP web server revealed no typical N-terminal signal sequence.
Prediction of P39 2D and 3D structures
As no strong evidence was obtained by similarity searches, the logical next step was to predict secondary and tertiary structures.
The secondary structure predictions help in refining the alignment of the target sequence with the fold candidate obtained by threading. The positioning of helices, which are systematically predicted with higher accuracy, is especially useful. To improve the reliability of the predictions, a rational consensus between the different outputs obtained was calculated manually (Figure 2) according to a scoring pattern detailed hereafter. First, a score was assigned to each position of each prediction, according to the confidence level (c.l.) provided by the method, when available, and to the predicted secondary structure type, the score being positive for helices, H, and negative for strands, E (Table I
).
|
|
Five fold recognition programs were also used and the results were cross-correlated to extract the most reliable candidate. Four methods out of the five placed the same protein at the top of the list (Table II), with a high confidence level (except for Topits, which gave a confidence level of 33%). This protein is MalE, a maltose/maltodextrin-binding protein of E.coli [periplasmic component family of the binding protein-dependent (BPD) transporters], that was crystallized either in a complex with or without ß-cyclodextrin (1dmb and 1omp structures, respectively; Sharff et al., 1992
, 1993
). Its total length (398 amino acids; 370 in its mature form) is similar to P39 (383 amino acids) and both sequences share between 14 and 18% identity depending on the alignment method used. Its secondary structure ratio (39.7% of
-helical residues, 15.4% strand residues) is in the range of the P39's predictions. Two other structures (1pot and 1sbp) belonging to the same family as MalE were found within the 10 best fold candidates proposed by ProFIT (data not shown).
|
1D3D alignment
A consensus alignment between 1omp and P39 was obtained (as described in Materials and methods) by combining an FSSP-structure alignment of 1dmb, 1omp, 1pot and 1sbp, with sequencestructure alignments given by threading programs and sequence alignments between P39 and MalE. Interestingly, all of the programs aligned residues 127146 of P39 with the maltose-binding protein signature D (residues 125144 of 1omp), as predicted by BLOCKS. This signature was thus used as an anchor point for starting the manual fitting of the consensus in order to optimize alignment of secondary structures of P39 and 1omp. This step minimizes gaps and confines their position into loops. However, secondary structure prediction was not relevant for two regions of P39 (regions 45125 and 155190, on both sides of residues predicted to be similar to signatures D) because secondary structure prediction did not correspond to 1omp secondary structures. However, we could find in these regions sequence similarities with two other maltose-binding protein signatures (Figure 3): residues 4854 sharing 28,6% identity in a seven amino acid overlap with MalE signature B and residues 101114 sharing 14.3% identity in a 14 amino acid overlap with MalE signature C. Using these signatures as anchor points, we moved gaps toward regions that are not in an
-helix or a ß-strand, generally upstream from a proline residue.
|
Building and evaluation of the initial model
The coordinates of the 1omp structure were assigned to the P39 sequence according to the consensus alignment using MODELLER4. The resulting model, shown in Figure 4, was analyzed without any additional minimization steps. In this case, molecular dynamics was not applied because it is likely to make the prediction worse rather than better.
|
A preliminary evaluation with PROCHECK and PROSAII shows that most of the current P39 models do not present stereochemical aberrations. With PROCHECK, 95.2% of residues are found in `most favoured' and `additional allowed' regions and only seven residues, generally located in loops, are scored in `disallowed' regions. In the PROSAII energy profile analysis, only six residues (residues 322326 and G338) have unfavorable positive energies. Residues 322326 are located in an exposed loop constituting the third inter-domain linker. However, inter-domain linkers are known to be the most different regions in the folding topology of the periplasmic substrate-binding protein structures.
Finally, these observations in conjunction with known structural features of the periplasmic substrate-binding proteins allow us to point out the more reliable regions of the model. These are represented in Figure 5, on the 1omp topology. Elements colored in gray on the topology plot are the most confident in our model, as they correspond to the regions of the periplasmic substrate-binding protein structures that have the greatest structural homology. The white elements, corresponding to the extremities of the structure, the third inter-domain linker and part of the C-lobe (which was deduced from a domain unique to the maltose-binding proteins), need to be further refined. These regions are surrounded in Figure 4
.
|
There is a third functionally important region in 1omp, the hinge region, composed of the three inter-domain linkers, connecting the two domains and helping stabilize both domains precisely in the liganded closed form (Spurlino et al., 1991). However, as these segments vary from one substrate-binding protein structure to another, we could not delineate them precisely in P39.
Genetic context of P39 ORF
Previous results highlighted some structural features of the P39 protein, but not its function. To confirm the hypothesis that P39 is a periplasmic substrate-binding protein, we take advantage of the fact that the genes encoding the components of BPD transporters are almost invariably organized in operons to achieve a coordinated regulation of their expression (Boos and Lucht, 1996). Consequently, we undertook the DNA sequencing downstream of the P39 ORF (A.Tibor, unpublished data).
The analysis of these sequences reveals the presence of two ORFs that show high homology with sequences belonging to the integral inner membrane components of binding protein-dependent transporters and that contain the highly conserved sequence motif, located near the C-terminus of all proteins of this class and named the EAA loop (Dassa and Hofnung, 1985; Boos and Lucht, 1996
). We also found a fourth ORF that shows homology with the ATP-binding-cassette (ABC) subunits of these transport systems.
These results are in agreement with the hypothesis that P39 protein is the periplasmic substrate-binding component of a BPD transport system and suggest that ORFs 2 and 3 encode integral membrane proteins with permease properties and that ORF 4 encodes the ABC subunit (Figure 6).
|
Our work constitutes the first attempt to solve structural features of the protein P39, at the limit of the so-called `twilight zone'. To improve the accuracy of predictions, the proposed methodology is based on a combination of methods (sequence similarity searches, secondary structure prediction, fold recognition and alignments) and seeks a consensus at different steps of the modeling procedure.
The model suggests that P39 protein adopts a general periplasmic substrate-binding protein fold, with closer similarities to the maltose/maltodextrin-binding protein fold.
The genetic context suggests that the gene encoding the P39 belongs to a binding protein-dependent transporter operon. The evidence is clear for the sequences (ORFs 2, 3 and 4) located downstream of the P39 ORF, which exhibit high homology with other integral inner membrane components or with the ABC subunit of BDP transporters.
Functional characterization is in progress in order to identify the substrate and localization of the peptide signal of P39, as no amino-terminal signal peptide has been predicted for P39.
Finally, the model provides a first step for designing site-directed mutants in two regions of P39: in the ligand-binding site (residues F59, D60,Q157 and W230) and in the region that should interact with the inner membrane component (residues 211224). Functional tests still have to be developed to determine the effects of these mutations.
In conclusion, it appears that results from several prediction methods are consistent with each other and agree with the genetic context of the P39 ORF. The fact that the best template available for the modeling of P39 does not share a high homology impeded the construction of a very accurate model of this protein. The model obtained, although rough, is accurate enough to provide plausible hypotheses on the overall fold of P39, on its function and for designing site-directed mutations.
![]() |
Acknowledgments |
---|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul,S.F., Madden,T.L., Schäffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 33893402.
Aszodi,A., Munro,R.E.J. and Taylor,W.R. (1997) Proteins, Suppl.1, 3842.
Bairoch,A. and Apweiler,R. (1997) Nucleic Acids Res., 26, 3842.
Bajorath,J., Stenkamp,R. and Aruffo,A. (1993) Protein Sci., 2, 17981810.
Boos,W. and Lucht,J.M. (1996) In Neidhardt, F.C. (ed.), Escherichia coli and Salmonella. ASM Press, Washington, DC, pp. 11751209.Boos,W. and Lucht,J.M. (1996) In Neidhardt, F.C. (ed.), Escherichia coli and Salmonella. ASM Press, Washington, DC, pp. 11751209.
Corstjens,P. (1993), Thesis. Rijksuniversiteit te Leiden.
Dassa,E. and Hofnung,M. (1985) EMBO J., 4, 22872293.[Abstract]
Deleage,G. and Roux,B. (1987) Protein Engng, 1, 289294.[Abstract]
Denoël,P., Vo,T., Tibor,A., Weynants,V.E., Trunde,J.-M., Dubray,G., Limet,J.N. and Letesson,J.-J. (1997) Infect. Immun., 65, 495502.[Abstract]
Depiereux,E., Baudoux,G., Briffeuil,P., Reginster,I., De Bolle,X., Vinals,C. and Feytmans,E. (1997) Comput. Appl. Biosci., 13, 249256.[Abstract]
Fischer,D. and Eisenberg,D. (1996) Protein Sci., 5, 947955.
Frishman,D. and Argos,P. (1997) Proteins, 27, 329335.[ISI][Medline]
Geourjon,C. and Deleage,G. (1995) Comput. Appl. Biosci., 11, 681684.[Abstract]
Gibrat,J.F., Garnier,J. and Robson,B. (1987) J. Mol. Biol., 198, 425443.[ISI][Medline]
Guex,N. and Peitsch,M. C. (1997) Electrophoresis, 18, 27142723.[ISI][Medline]
Henikoff,S. and Henikoff,J.G. (1994) Genomics, 19, 97107.[ISI][Medline]
Holm,L. and Sander,C. (1998) Nucleic Acids Res., 26, 316319.
Jones,D.T., Taylor,W.R. and Thornton,J.M. (1992) Nature, 358, 8689.[ISI][Medline]
King,R.D. and Sternberg,M.J. (1996) Protein Sci., 5, 22982310.
Kneller,D.G., Cohen,F.E. and Langridge,R. (1990) J. Mol. Biol., 214, 171182.[ISI][Medline]
Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946950.[ISI]
Laskowski,R.A., Moss,D.S. and Thornton,J.M. (1993) J.Mol. Biol., 231, 10491067.[ISI][Medline]
Levin,J.M., Robson,B. and Garnier,J. (1986) FEBS Lett., 205, 303308.[ISI][Medline]
Liebl,W., Stemplinger,I. and Ruile,P. (1997) J. Bacteriol., 179, 941948.[Abstract]
McClelland,J.L. and Rumelhart,D.E. (1988) Explanations in Parallel Distributed Processing. http://www.impharm.ucsf.edu/rnomi/mmpredict.html. MIT Press, Cambridge, MA, pp. 318362.
Molecular Simulations (1996) Cerius2 User Guide. Molecular Simulations, San Diego.
Murzin,A.G., Brenner,S.E., Hubbard,T. and Chotia,C. (1995) J. Mol. Biol., 247, 536540.[ISI][Medline]
Myers,E. and Miller,W. (1988) CABIOS, 4, 1117.[Abstract]
Nielsen,H., Engelbrecht,J., Brunak,S. and von Heijne,G. (1997) Protein Engng, 10, 16.[Abstract]
Overduin,P., Boos,W. and Tommassen,J. (1988) Mol. Microbiol., 2, 767775.[ISI][Medline]
Pearson,W.R. (1990) Methods Enzymol., 183, 6398.[ISI][Medline]
Pearson,W.R. and Lipman,D.J. (1988) Proc. Natl Acad. Sci. USA, 85, 24442448.[Abstract]
Rice,D.W., Fischer,D., Weiss,R. and Eisenberg,D. (1997) Proteins, Suppl.1, 113122.
Rost,B. (1995) In Rawling,C. (ed.), The Third International Conference on Intelligent Systems for Molecular Biology (ISMB). AAAI Press, Cambridge, pp. 314321.
Rost,B. (1996) Methods Enzymol., 266, 525539.[ISI][Medline]
Rost,B. and Sander,C. (1993) J. Mol. Biol., 232, 584599.[ISI][Medline]
Rost,B. and Sander,C. (1994) Proteins, 19, 5572.[ISI][Medline]
Rost,B., Sander,C. and Schneider,R. (1994) CABIOS, 10, 5360.[Abstract]
Rost,B., Casadio,R., Fariselli,P. and Sander,C. (1995) Protein Sci., 4, 521533.
Rost,B., Fariselli,P. and Casadio,R. (1996) Protein Sci., 7, 17041718.
Russell,R.R., Aduse-Opoku,J., Sutcliffe,I.C., Tao,L. and Ferretti,J.J. (1992) J. Biol. Chem., 267, 46314637.
Sali,A. and Blundell,T.L. (1993) J. Mol. Biol., 234, 779815.[ISI][Medline]
Sharff,A.J., Rodseth,L.E., Spurlino,J.C. and Quiocho,F.A. (1992) Biochemistry, 31, 1065710663.[ISI][Medline]
Sharff,A.J., Rodseth,L.E. and Quiocho,F.A. (1993) Biochemistry, 32, 1055310559.[ISI][Medline]
Sippl,M.J. (1993) Proteins, 17, 355362.[ISI][Medline]
Sippl,M.J. and Weitckus,S. (1992) Proteins, 13, 258271.[ISI][Medline]
Solovyev,V.V. and Salamov,A.A. (1994) CABIOS, 10, 661669.[Abstract]
Spurlino,J.C., Lu,G.-Y. and Quiocho,F.A. (1991) J. Biol. Chem., 266, 52025219.
Thompson,D.J., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 46734680.[Abstract]
Tramontano,A. (1998) METHODS: Companion Methods Enzymol., 14, 293300.[ISI]
van Wezel,G.P., White,J., Young,P., Postma,P.W. and Bibb,M.J. (1997) Mol. Microbiol., 23, 537549.[ISI][Medline]
Vinals,C., De Bolle,X., Depiereux,E. and Feytmans,E. (1995) Proteins, 21, 307318.[ISI][Medline]
Wouters,J. and Baudoux,G. (1998) Proteins, 32, 97110.[Medline]
Received July 20, 1998; revised November 26, 1998; accepted December 11, 1998.