From the aBiosciences Division, Argonne National Laboratory, Argonne, Illinois, 60439, the cBanting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5G 1L6, Canada, the eClinical Genomics Centre/Proteomics, University Health Network, Toronto, Ontario M5G 1L7, Canada, the dEuropean Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom, and the fDepartment of Biochemistry, University of Western Ontario, London, Ontario N6A 5C1, Canada
Received for publication, April 14, 2003
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Structural proteomics, the large scale determination of protein structure, is expected to provide insight into the fundamental mechanisms by which a protein sequence adopts a defined three-dimensional structure. Most of the organized efforts in structural proteomics (Ref. 1; rcsb.org/pdb/strucgen.html) specifically target protein sequences for which there is no known structural homologue in the public data bases at a level of 30% sequence identity. One aim of this effort is to more fully define the universe of protein folds. Importantly, because protein structure is often conserved in the absence of detectable sequence homology, the comparison of new protein structures with those of known proteins will likely provide clues to biochemical function.
The discovery of biochemical function from a new protein structure begins with automated searches for structural homologues of known function. The results of these comparisons are provided as lists with significance scores. The methods of comparison are now used routinely in the structural community and have proved invaluable for detecting structural conservation and for providing the basis for hypotheses (2). However, the interpretation of the results from structural comparisons often consumes a significant amount of time and is influenced by the extent to which the investigator is able to scour the literature.
In an effort to improve the process by which function is derived from structure, we have combined two methods to facilitate functional studies. First, we have employed a data base of structural templates derived from the active sites of 189 different classes of enzymes.1 This exploits the fact that the chemistry of the reaction restricts the types and the topological arrangement of the catalytic amino acids and hence results in strong conservation of their spatial arrangement, even where the protein folds are very different (3). By focusing on the catalytic moieties, functional similarities can be detected in cases where there is no similarity in sequence, fold, or secondary structure. Second, we have created and used a panel of generic biochemical assays to test the functional hypotheses raised by the structural comparisons. These assays are based on simple, often nonphysiological, substrates; the experiment is designed to reveal the chemistry of the active site and not the cellular substrate.
Here we present the results of the combined structural, bioinformatic, and enzymatic analysis of Escherichia coli BioH, a target within the Midwest Center for Structural Genomics (www.mcsg.anl.gov). By comparing the crystal structure of BioH with other known enzymes, we found that BioH is a member of the protein hydrolase superfamily and contains a classical Ser-His-Asp catalytic triad. A screen with different hydrolase substrates revealed that BioH has significant carboxylesterase activity, with a preference for short acyl chain substrates, and weak thioesterase activity. The strategy used for BioH might facilitate analysis of novel, uncharacterized proteins and structures arising form structural proteomics projects.
![]() |
EXPERIMENTAL PROCEDURES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
For the preparation of the selenomethionine enriched protein, BioH was expressed in the E. coli methionine auxotroph strain B834 (DE3) (Novagen) in supplemented M9 medium. The sample was prepared under the same conditions as the native protein except for the addition of 5 mM 2-mercaptoethanol to the purification buffers.
CrystallizationBioH was crystallized by vapor diffusion in hanging drops (ratio of 2 µl of protein to 2 µl of precipitant) equilibrated against reservoir containing 1.2 M sodium citrate trihydrate and 0.1 M Tris-HCl (pH 8.0). X-ray quality crystals grow at 21 °C in 25 days. For diffraction studies, the crystals were stabilized with the crystallization buffer supplemented with 15% ethylene glycol as a cryoprotectant and flash frozen in liquid nitrogen.
Mass SpectrometryAll of the mass spectrometry data were acquired and analyzed using Masslynx 3.5 (Micromass, Manchester, UK). Electrospray ionization mass spectrometry (ESI-MS)2 was performed on a Micromass Q-Tof2 mass spectrometer. Positive ion mode ESI-MS of the whole protein was achieved in 50:50 acetonitrile:water with 0.1% formic acid. Exact mass MS was performed in negative ion mode regular ESI-MS using 10% aqueous methanol containing 1% ammonia as a carrier solvent. Tryptic digestions were performed overnight in 100 mM ammonium bicarbonate (pH 7.8) or in 100 mM ammonium bicarbonate buffer (pH 6.4) for 1.5 h followed by MALDI-MS analysis. MALDI-MS was performed on a Micromass MALDI-R mass spectrometer (Micromass) using an m/z range of 5004000. ESI-MS and MS/MS analysis of the low pH tryptic digest were performed on a Micromass Q-Tof2 mass spectrometer using nano-LC with a C18 column (0.3 x 5 mm; LC Packings). Data-dependent acquisition parameters were set to select the doubly and triply charged unmodified and modified precursor ions corresponding to residues 78100 of the protein. MS-MS spectra were processed by base-line subtraction and deconvoluted using the Max-Ent3 module of MassLynx 3.5. The peptide sequences were determined semi-automatically from the resulting singly charged, deisotoped spectra using PepSeq, version 3.3 supplied with MassLynx 3.5.
Enzyme AssaysRapid screening for enzyme activities were
performed using the following procedures: (a) fatty acid esterase
activity was measured spectrophotometrically at 37 °C using
p-nitrophenyl (pNP) acetate or pNP esters of other
fatty acids (C3C18) as substrates
(5), (b) thioesterase
activity was measured spectrophotometrically using CoA thioesters of fatty
acids (acetyl-CoA, malonyl-CoA, and palmitoyl-CoA) as described earlier
(6), (c) lipase
activity (with sonicated olive oil as substrate) was measured
spectrophotometrically by the copper soap assay after extraction of released
free fatty acids with chloroform: heptane:methanol mixture
(7), (d) protease
activity was measured using L-leucine p-nitroanilide
(aminopeptidase activity) or N-benzoyl-L-arginine
p-nitroanilide (trypsin-like endopeptidase activity) as described
(8,
9), (e) phosphatase
activity was determined spectrophotometrically using 5 mM
p-nitrophenyl phosphate in 50 mM HEPES-K (pH 7.5) buffer
at 37 °C (10), and
(f) bromoperoxidase activity was measured spectrophotometrically with
phenol red or monochlorodimedon as described previously
(11).
Crystallographic Data CollectionA two-wavelength
multiple-wavelength anomalous dispersion experiment was carried out on the
19ID line of the Structural Biology Center at Advanced Photon Source (Argonne,
IL). All of the crystallographic data were collected at 110 K on one crystal
containing selenomethionine-substituted protein. The crystal belongs to the
tetragonal space group P43 with unit cell dimensions a =
b = 75.2 Å, c = 49.3 Å, =
=
= 90°. The multiple-wavelength anomalous dispersion data set was
colleted using inverse beam strategy at the selenium absorption peak energy
(0.97947 Å) and at a remote wavelength (0.95373 Å). The absorption
edge was determined from the x-ray fluorescence spectrum and the
f' and f'' plots versus energy obtained
with the program CHOOCH (12).
High resolution data were collected from the unexposed part of the same
crystal, which had been stored in liquid nitrogen. All of the data were
measured with the CCD detector
(13) 210 x
210-mm2 sensitive area and fast duty cycle. Control of the
experiment, data collection and visualization was done with d*TREK
(14), and all of the data were
integrated and scaled with the program package HKL2000
(15). Some of the basic
statistics of data collection and processing are given in
Table I.
|
Structure DeterminationMultiple-wavelength anomalous dispersion phasing of BioH data was carried out with the program CNS (16). Experimental phases were extended from 2.5 to 2.0 Å resolution with density modification, using data collected at the f'' peak wavelength. With these improved phases, the initial model was built with the program ARP/wARP (17). The high quality of the phases allowed 94% of the main chain to be built automatically and most of the side chains to be placed with a confidence level of 79%. The remainder of the model was built, and all of the side chains were corrected manually using the program O (18). This model was then refined against the 1.7 Å resolution data with several macro cycles of CNS, including simulated annealing, B-factor, and positional refinements. After each macro cycle, the model was inspected, and corrections and/or additions were made manually, with the programs O and QUANTA (Accelrys, Inc.). All subsequent refinement was carried out with REFMAC (19) within the CCP4 (20) suite of programs. The phasing and refinement parameters are shown in Table II.
|
CoordinatesThe coordinates have been deposited in the Protein Data Bank under accession code 1M33 [PDB] .
![]() |
RESULTS AND DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
BioH is a two-domain protein (Fig.
1A). The /
/
three layer sandwich of
the large domain (residues 5109 and 188256; see below) consists
of a twisted
-sheet formed by seven mostly parallel strands
1
(residues 59),
3
(residues 1419),
2
(residues 4146),
4
(residues 7681),
5
(residues 101105),
6
(residues 198203)
and
7
(residues 225230) and flanked on both sides by five
-helices
1 (residues 3139),
2 (residues
6070),
3 (residues 8394),
8 (residues
215222), and
9 (residues 237252). Ile32 and
Pro242 introduce
90° kinks into the first and last
helices, respectively. This domain resembles the Rossman fold, which is
commonly found in enzymes.
|
A small auxiliary domain is formed by the C-terminal segment of the
polypeptide chain (Cys110Asp187) and is inserted
into the catalytic domain. The auxiliary domain contains four -helices,
residues 122134 (
4), 136145 (
5), 155166
(
6), and 173185 (
7), that create a bundle of two V-shaped
bends (Fig. 1B). The
two domains are connected by a hinge region near Cys110 and
Asp187. The interface between domains is stabilized by multiple
hydrophobic interactions including helices
6 and
7 that run
across the surface of catalytic domain and intramolecular hydrogen bonds
between the carbonyl of Pro109 and the nitrogen of
Leu188 and two hydrogen bonds between Asp187 and
Arg189.
Automated Structural Bioinformatics Reveals a Ser-His-Asp Catalytic
TriadOne of the aims of structural proteomics is to perform more
comprehensive automated analysis of protein structures to reduce the level of
time-intensive human intervention. To screen new structures for potential
catalytic function, we have created a data base of 189 three-dimensional
enzyme active site structural
templates.1 The BioH
structure was scanned against this data base of using the TESS program
(3). This automated search gave
a close match of BioH to the Ser-His-Asp catalytic triad of lipases
(21) (EC 3.1.1.3
[EC]
). The BioH
residues involved (Ser82, His235, and Asp207)
matched the template with a root mean square deviation of 0.28 Å for the
overlaid side chains (Fig. 2).
This is well within the cut-off of 1.2 Å used for discriminating true
from false matches for this template. The presence of the catalytic triad
suggested that BioH might possess lipase, protease, or esterase activity.
Furthermore, the serine nucleophile (Ser82) is located within one
of the two earlier identified Gly-Xaa-Ser-Xaa-Gly motifs
(22), which is typical for
acyltransferases and thioesterases.
|
The structure of BioH was also compared with all other known structures using conventional methods such as the DALI algorithm (23). The results from the DALI search revealed structural homology to a large number of proteins with a broad range of enzymatic functions. The closest matches with strong structural similarities include a bromoperoxidase (EC 1.11.1.10 [EC] ; Z score, 22.6; Protein Data Bank code 1brt [PDB] ), an aminopeptidase (EC 3.4.11.5 [EC] ; Z score, 21.1; Protein Data Bank code 1qtr [PDB] ), two epoxide hydrolases (EC 3.3.2.3 [EC] ; Z scores, 20.5 and 18.2; Protein Data Bank codes 1ehy [PDB] and 1cr6 [PDB] , respectively), two haloalkane dehalogenases (EC 3.8.1.5 [EC] ; Z scores, 20.2 and 16.2; Protein Data Bank codes 1bn6 [PDB] and 1b6g [PDB] , respectively), and a lyase (EC 4.2.1.39 [EC] ; Z score, 17.2; Protein Data Bank code qj4). A comparison of BioH with a chloroperoxidase (EC 1.11.1.10 [EC] ) is shown in Fig. 3. The sequence identities between BioH and these proteins range from 15 to 25% and therefore do not suggest a specific catalytic function for BioH. Further manual analysis of these enzymes and literature review would have revealed to the expert that each contains a Ser-His-Asp catalytic triad in their active sites.
|
Ser82 Is Covalently Modified by a Hydrolase
InhibitorThe structural informatics provided initial evidence for
the location of the BioH catalytic site. The experimental density maps also
showed an unusual feature that extended from the side chain of
Ser82 (Fig. 4). The
shape of the density and its environment, insinuated that the corresponding
compound was covalently attached to the O atom of
Ser82 and formed hydrogen bonds with the backbone nitrogens of
Trp22 and Leu83. To investigate the properties of the
Ser82 modification, we analyzed the full length and trypsinized
BioH with mass spectroscopy. Under denaturing conditions, two major peaks were
observed with molecular masses of 29,152 Da (corresponding to the full-length
protein) and 29,306 Da with similar intensity. Treatment of the protein with
mild base caused the peak at 29,306 Da to disappear over time and the peak at
29,152 Da to increase in relative intensity. In addition, a new peak was
detected with mass of 172 Da, interpreted as singly hydrated 154-Da molecule
(see below). We also examined the mass of the tryptic fragment of BioH that
contains Ser82. When the tryptic digestion was done under slightly
acidic conditions and examined by both MALDI and ESI-MS, only the
Ser82-containing fragment showed a 154-Da adduct. Therefore the
catalytic potential of Ser82 seems responsible for observed
additional mass attached to Ser82.
|
For crystallographic experiments and initial MALDI and ESI-MS, the BioH
protein was purified in the presence of protease inhibitor
phenylmethylsulfonyl fluoride (PMSF), which is known to react with the
catalytic serine in hydrolases
(24) and form a stable
covalent adduct. Therefore it appears that BioH was modified during
purification The protein purified in the absence of PMSF did not reveal this
modification. These results strongly suggest that the modification corresponds
to the addition of PMSF (expected m = 154) at Ser82
and that the serine possesses nucleophilic properties.
BioH Is a New Carboxylesterase in E. coliBioH purified in the absence of PMSF was subjected to several enzymatic assays that focused on hydrolase function including carboxylesterase, lipase, thioesterase, phosphatase, endopeptidase, aminopeptidase, and bromoperoxidase. BioH demonstrated significant carboxylesterase activity (Table III) (EC 3.1.1.1 [EC] ) and hydrolyzed p-nitrophenyl esters of fatty acids. The enzyme showed rather narrow pH optimum (8.08.5) and broad substrate specificity with a preference for short chain substrates (Fig. 5). The kinetic parameters of BioH were determined for several substrates (Table III). These results demonstrate that although BioH was most active with pNP-acetate, the Km for all C-2C-6 substrates was essentially the same. In agreement with the results of the mass spectrometry and crystallography, BioH was strongly inhibited by PMSF (10.5% of residual activity after 10 min of incubation with 2 mM PMSF). Purified BioH showed classical Michaelis-Menten kinetics, and linear double reciprocal plots were obtained for all of the pNP substrates tested (data not shown).
|
|
Purified BioH showed low enzymatic activities for thioesterase (using palmitoyl-CoA as a substrate; 186.5 ± 18.6 nmol/min/mg protein), lipase (using olive oil; 18.5 ± 1.3 nmol/min/mg protein), and aminopeptidase (using leucine-p-anilide as a substrate; 3.8 nmol/min/mg protein) and showed no detectable enzymatic activity for phosphatase (using p-nitrophenyl phosphate as a substrate), trypsin-like endopeptidase (using benzoyl-arginine-p-nitroanilide as a substrate), or bromoperoxidase (phenol red and monochlorodimedone as potential substrates).
Our data combined with results reported in the literature suggest that BioH
represents a novel carboxylesterase in E. coli. E. coli is known to
express at least three other proteins with carboxylesterase activity:
carboxylesterase YbaC (25),
thioesterase TesA (26,
27), and thioesterase TesB
(28). BioH shows no
significant sequence similarity with these enzymes (data not shown). BioH also
possessed different enzymological properties compared with the other enzymes.
Compared with BioH, YbaC and TesA exhibit higher affinities for the long chain
fatty acids, pNP-octanoate (C8) and pNP-decanoate (C10). The
specific activity of BioH for short C2 or C3 substrates was in the same range
as for YbaC and at least 1030 times lower as compared with TesA. Both
BioH and TesA also displayed thioesterase activity with palmitoyl-CoA (however
13 times lower for BioH) but show no activity with acetyl-CoA as a
substrate. The ratio of carboxylesterase/thioesterase activities (with
pNP-palmitate/palmitoyl-CoA) was 0.3 for TesA and 1.3 for BioH.
The specificity for the short chain fatty acid esters likely arises from the fact that the catalytic site of BioH is buried between two domains (Fig. 1) and is not readily accessible for bulkier compounds. Substrates with acyl chain length of up to 6 carbons (C-2C-6) could be accommodated within the hydrophobic crevice in the V-shaped cap domain of BioH where the invariant Phe143 (Fig. 1B) can act as a facilitator of binding. In fact, the walls of the active site are quite hydrophobic; therefore binding of acyl substrates to BioH is likely to be mediated mostly by hydrophobic interactions, and the active site is sufficiently large to accommodate short chain substrates with very similar affinities for the C-2C-6 range. This is consistent with the observation that BioH shows essentially same Km for C-2C-6 substrates (Table III).
A Possible Role for BioH in Biotin BiosynthesisIn microorganisms and plants, biotin is synthesized from pimeloyl-CoA by the enzymes BioF, BioA, BioD, and BioB in a conserved four-step reaction (2931). In the Gram-negative bacteria, such as E. coli, pimeloyl-CoA is produced from L-alanine and/or acetate (32) using the BioC and BioH proteins (33), whose exact biochemical roles have not been elucidated. The bioC gene is widely distributed in bacteria, whereas bioH is not found in many bioC-containing bacterial genomes; in these organisms, bioH appears to be complemented by other genes (bioG and bioK) (34). In some Gram-positive bacteria, such as Bacillus sphaericus and Bacillus subtilis, pimeloyl-CoA is produced from pimelic acid by pimeloyl-CoA synthetase (BioW) (35, 36). Efforts to identify the precursors of pimeloyl-CoA in E. coli using 13C NMR labeling studies have been inconclusive (32, 37) but preclude a pimelic acid intermediate. Most studies support a mechanism based on the condensation of acetyl-CoA or malonyl-CoA moieties into pimeloyl-CoA (38). Consistent with this model, Lemoine et al. (22) identified two Gly-Xaa-Ser-Xaa-Gly motifs in BioH that are characteristic of acyl-transferase and thioesterase proteins. BioH was suggested to transfer pimeloyl units from BioC directly to CoA, and the E. coli BioC protein may function as an acyl-carrier protein involved in pimeloyl-CoA synthesis. The discovery of a BioH-CoA complex (by liquid chromatography-mass spectrometry) (39) supports a role for BioH as a CoA donor to a pimeloyl-acyl-carrier protein (or pimeloyl-BioC), releasing pimeloyl-CoA.
Our biochemical and structural data are consistent with the current model of the BioH reaction, which proposes that BioH transfers pimeloyl units from pimeloyl-BioC to CoA (22) and therefore should possess both esterase (carboxylesterase or thioesterase) and acyltransferase activities. We demonstrated that purified BioH shows carboxylesterase and low thioesterase activities and that BioH cannot use free pimelic acid for pimeloyl-CoA synthesis. Therefore, we propose that the function of BioH is to condense CoA and pimelic acid into pimeloyl-CoA. Several surface residues, Arg138, Arg142, Arg155, Arg159, and Lys162, which are disordered in the crystal structure but are nevertheless conserved throughout many bacteria, could potentially mediate CoA binding. It is also possible that BioC, which is proposed to function as a specific pimeloyl-acyl-carrier protein in the synthesis of pimeloyl-CoA (22), may interact with BioH and facilitate the delivery of a pimeloyl unit to the BioH catalytic site.
PerspectiveThree-dimensional structures are now being generated for many proteins of unknown function. In many cases, such as for BioH, the structural data combined with existing clues in the literature, or even the intuition of an experienced investigator, can point the experimentalist in the right direction to identify and confirm biochemical function. However, as structural proteomics efforts gain momentum, there will be an increase in the number of protein structures for which there is no existing body of literature. The annotation of these proteins will demand methods that do not depend on specialists who are experts in a specific area of biology. The three-dimensional structure of BioH was analyzed using several automated methods for structural comparison and also with a series of generic enzymatic assays. This approach enabled us to rapidly characterize BioH enzymatic activity and reveal a new enzyme in E. coli. The development and refinement of these combined methods will significantly increase the value of structural genomics/proteomics results in the future by assigning biochemical or enzymatic functions to complement the structural information.
![]() |
FOOTNOTES |
---|
* This work was supported by the United States Department of Energy Office of
Biological and Environmental Research, the Ontario Research and Development
Challenge Fund, and National Institutes of Health Grant GM 62414. This work
has been created by the University of Chicago as Operator of Argonne National
Laboratory under Contract W-31-109-ENG-38 with the United States Department of
Energy. The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
"advertisement" in accordance with 18 U.S.C. Section 1734
solely to indicate this fact.
b These authors contributed equally to this work.
g Canadian Institutes of Health Research investigators.
h i h To whom correspondence may be addressed: Biosciences Div., Argonne National Laboratory, S. Cass Ave., Argonne, IL 60439. Tel.: 630-252-3926; Fax: 630-252-6126; E-mail: andrzejj{at}anl.gov.i To whom correspondence may be addressed: Banting and Best Dept. of Medical Research, 112 College St., University of Toronto, Toronto, Ontario M5G 1L6, Canada. Tel.: 416-946-3436; Fax: 416-978-8528; E-mail: aled.edwards{at}utoronto.ca.
1 C. Porter, manuscript in preparation.
2 The abbreviations used are: ESI-MS, electrospray ionization mass
spectrometry; MALDI, matrix-assisted laser desorption ionization; MS, mass
spectrometry; PMSF, phenylmethylsulfonyl fluoride; pNP,
p-nitrophenyl.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|