Experimental and computational mapping of the binding surface of a crystalline protein

Andrew C. English1,2, Colin R. Groom3 and Roderick E. Hubbard1,4

1 Structural Biology Laboratory, Department of Chemistry, University of York, Heslington, York YO10 5DD, UK 3 Pfizer Global Research and Development, Sandwich, Kent CT13 9NJ, UK


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Accession numbers
 References
 
Multiple Solvent Crystal Structures (MSCS) is a crystallographic technique to identify energetically favorable positions and orientations of small organic molecules on the surface of proteins. We determined the high-resolution crystal structures of thermolysin (TLN), generated from crystals soaked in 50–70% acetone, 50–80% acetonitrile and 50 mM phenol. The structures of the protein in the aqueous–organic mixtures are essentially the same as the native enzyme and a number of solvent interaction sites were identified. The distribution of probe molecules shows clusters in the main specificity pocket of the active site and a buried subsite. Within the active site, we compared the experimentally determined solvent positions with predictions from two computational functional group mapping techniques, GRID and Multiple Copy Simultaneous Search (MCSS). The experimentally determined small molecule positions are consistent with the structures of known protein–ligand complexes of TLN.

Keywords: binding sites/inhibitors/organic solvent/structure-based drug design/X-ray crystallography


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Accession numbers
 References
 
An important contribution to modern pharmaceutical research has been the development of structure-assisted drug design. A key requirement for the optimal use of this structural information is tools for the prediction of the position, orientation and binding affinity of functional group fragments in an active site (Böhm and Klebe, 1996Go).

Two experimental techniques have recently emerged to allow the direct incorporation of structural information in inhibitor library design. Fesik's group has pioneered a technique known as SAR by NMR (Structure–Activity Relationships by Nuclear Magnetic Resonance) in which a 15N-labeled protein is screened against a library of compounds by NMR (Shuker et al., 1996Go; Hajduk et al., 1997Go). Changes in selected amide chemical shifts in the NMR spectrum of the protein are monitored during a ligand titration experiment. This allows the identification of which ligands bind to the active site and more detailed NMR studies can establish the positions and orientations of ligands in the binding site. There have been a number of examples where linkage of these optimally placed fragments has resulted in high-affinity ligands.

Ringe's group has developed an experimental method to map the binding surface of crystalline proteins which has been termed Multiple Solvent Crystal Structures (MSCS) (Mattos and Ringe, 1996Go). Protein crystals are soaked in organic solvents and subsequent X-ray analysis can ascertain where particular functional groups bind. By using various solvents that mimic different functional groups, it should, in principle, be possible to map out the specificities of the binding pockets, in a similar way to theoretical techniques such as Multiple Copy Simultaneous Search (MCSS) (Miranker and Karplus, 1991Go; Caflisch et al., 1993Go).

In a recent study (English et al., 1999Go), we applied MSCS to map experimentally the surface of Bacillus thermoproteolyticus thermolysin (TLN) to identify interaction sites complementary to the small molecule isopropanol. TLN is an endopeptidase specific for peptide bonds on the imino side of hydrophobic residues such as leucine, isoleucine and phenylalanine (Matthews, 1988Go). Running through the middle of the protein is a large, rigid, active-site cleft consisting of at least four subsites (S2, S1, S1', S2') with the main specificity pocket being the S1' subsite, which is known to prefer hydrophobic groups. Ideally, the target for studies using MSCS should crystallize in its native state without any ligands bound. However, although native TLN has a Val–Lys dipeptide bound in the active site interacting with the S1' and S2' subsites (Holland et al., 1995Go), it is displaced at relatively modest concentrations of isopropanol (~1–2 M) (English et al., 1999Go). By varying the concentration of isopropanol in the soak conditions, it was demonstrated that the interaction sites could be experimentally ranked and only two of the 12 interaction sites identified using neat solvent were occupied at low concentrations.

In this work, we continued to map experimentally the binding surface of TLN by soaking crystals in two different solvents (acetone and acetonitrile) and a solute (phenol) at various concentrations. Within the active site, the experimentally determined solvent binding sites were compared with computational predictions using Multiple Copy Simultaneous Search (MCSS) and GRID (Goodford, 1985Go). The binding site positions were also compared with published structures of TLN–ligand complexes.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Accession numbers
 References
 
Crystallization and data collection

Crystals of TLN were grown and infused with organic solvents and data collected as described previously (English et al., 1999Go). The data were integrated using DENZO (Otwinowski, 1990Go), scaled and processed using CCP4 software (CCP4, 1994Go) and hkl software (Otwinowski and Minor, 1997Go). For one crystal (TLN soaked in 70% acetone), X-ray data were collected to a resolution of 1.7 Å at room temperature at Daresbury SRS (station PX9.6), using a CCD detector. This data set was integrated using MOSFLM and scaled and processed using CCP4 software. Data statistics can be found in Table IGo.


View this table:
[in this window]
[in a new window]
 
Table I. Data and refinement statistics
 
Refinement

All the structures were refined using the maximum likelihood program REFMAC (Murshudov et al., 1997Go). For all the complexes the initial phases were calculated using a structure of native TLN (2tlx) deposited in the protein databank, with the water molecules, active-site dipeptide and side chains of Met120, Glu143, Leu144 and Tyr157 removed. All structures could be solved using straightforward difference Fourier techniques. During refinement, water molecules were added manually in the X-SOLVATE module of QUANTA (Molecular Simulations, Inc., 1998Go) using maximum likelihood weighted mFoDFc maps contoured at 3.0{sigma} (Oldfield, 1996Go). During subsequent refinement, water molecules with a temperature factor above 75 Å2 were removed from the model unless electron density persistently returned on removal of the water. Manual re-building of protein side chains was carried out in the X-BUILD module of QUANTA using maximum likelihood weighted mFoDFc maps contoured at 3.0{sigma} (Oldfield, 1996Go). A bulk solvent correction was included in the model, overall anisotropic scaling was used and individual B factors were freely refined. The refinement was taken to the stage where all the solvent structure that could be unambiguously identified as water had been built into the model.

Determination of solvent binding to a protein can be open to ambiguity and a rigorous protocol, essential for distinguishing between solvent molecules and water in the electron density maps, was followed which was detailed previously (English et al., 1999Go). Several peaks were apparent in the various electron density maps [mFo (soak)mFo (nat), mFo (soak) and mFo (soak)DFc (nat)] that could be interpreted as protein-bound organic solvent. The deposited structure 2tlx (English et al., 1999Go) was used to provide the set of native structure factors required for calculation of mFo (soak)mFo (nat) maps. Occupancies of organic molecules included in the model were set to 1.0 and no attempt was made to refine them. The B factors were set to 20 Å2 and freely refined as implemented within REFMAC.

The resulting structures are hereafter referred to as HET_X, where HET is the three-letter code assigned to each molecule: IPA (isopropanol) (English et al., 1999Go), ACN (acetone), IPH (phenol) and CCN (acetonitrile), and X is the percentage (v/v) of solvent. Although phenol is a solute, a concentration of 50 mM phenol approximates to 0.4% (v/v). A summary of the refinement statistics is given in Table IGo.

Assignment of electron density to protein-bound organic molecules

The crystal structures of TLN soaked in 2–100% isopropanol were recently reported by English et al. (1999). The assignment of electron density to solvent structure for crystals of TLN soaked in acetone, phenol and acetonitrile is now presented.

Acetone. In ACN_50 (2.0 Å), two molecules of acetone (CH3COCH3), were identified in the various electron density maps. The first (ACN 1) was located in the main specificity pocket (S1' subsite) of the active site. The dipeptide in the active site has been partially displaced, as indicated by the negative difference electron density in the mFo (acn, 50)mFo (nat) map. In addition, a peak in the mFo (acn, 50)DFc and mFo (acn, 50) electron density maps could be interpreted as acetone. Also, in a buried hydrophobic pocket a positive peak in the mFo (acn, 50)mFo (nat) difference electron density map could be interpreted as a second bound molecule of acetone (ACN 2).

In ACN_60 (2.0 Å), an additional three molecules of acetone were identified (ACN 3–5). ACN 3 displaces two molecules of water on binding. The carbonyl group of ACN 3 has been oriented to accept hydrogen bonds from two different water molecules and the CH3–C–CH3 portion to interact with Tyr27, Pro214 and Lys219. ACN 4 binds where nothing was observed in the native structure. The carbonyl group of ACN 4 has been oriented to accept hydrogen bonds from a water molecule and the side chain of Thr249. In the mFo (acn, 60) map the electron density is strongest over the oxygen of ACN 4, which suggests that this acetone is tethered by these good hydrogen bonds. In addition, the CH3–C–CH3 portion of acetone interacts favorably with several residues in the pocket including Val255 and Val256. ACN 5 was the final molecule to be located at a crystal contact and displaces a molecule of water on binding. The orientation of ACN 5 was assigned by positioning the carbonyl group over the water molecule replaced, although this is somewhat arbitrary. The hydrophobic stacking of ACN 5 against Tyr46 is the principle interaction between this acetone and the protein and there are probably many orientations within the `plane' that are equally favorable. In ACN_70 (1.7 Å), an additional acetone binding site to those located previously was identified (ACN 6).

Phenol. In IPH_0.4, two molecules of phenol (C6H5OH) were identified. The first was located in the S1' subsite of the active site (see Figure 1Go). The dipeptide in the active site has been displaced, as indicated by the negative difference electron density in the mFo (iph, 0.4)mFo (nat) map. In addition, a peak in the mFo (iph, 0.4)DFc electron density map could be readily interpreted as phenol (IPH 1) (1.9 Å). At this resolution, the orientation of phenol within the peak of electron density could be unambiguously assigned. In addition, the strongest positive peak in the mFo (iph, 0.4)mFo (nat) difference electron density map (contoured at +0.3 e/Å3) could be interpreted as a second molecule of phenol (IPH 2) bound in the buried subsite.



View larger version (46K):
[in this window]
[in a new window]
 
Fig. 1. Stereo representation of the main specificity pocket (S1' subsite) of TLN showing a protein-bound molecule of phenol (IPH 1). Water molecules (gray) and a zinc ion (dark gray) are shown as cpk spheres. The figure shows the mFo (iph, 0.4)DFc `omit' map (contoured at +0.2 e/Å) superimposed with the refined IPH_0.4 model (1.9 Å). In addition to IPH 1 (B factor 39 Å2), two water molecules (B factors 41 and 44 Å2) were omitted from the model, followed by two rounds of refinement on the truncated model before calculating the map. A possible hydrogen bonding interaction is shown as a dashed line (as calculated within HBPLUS) (McDonald and Thornton, 1994Go). The geometric criteria used to define hydrogen bonds are given in the Materials and methods section. This figure was prepared using BOBSCRIPT (Esnouf, 1997Go).

 
Acetonitrile. In CCN_50, the dipeptide in the active site has been partially displaced, as indicated by the negative electron density in the mFo (ccn, 50)mFo (nat) difference electron density map (2.2 Å). However, there is positive electron density in the mFo (ccn, 50)DFc map at the site of the dipeptide (contoured at +0.1 e/Å3) and it is therefore retained in the model. In CCN_60, the dipeptide has been displaced to a greater extent, as indicated by a stronger peak of negative electron density in the mFo (ccn, 60)mFo (nat) difference electron density map (2.0 Å). At this concentration, there are no features in the mFo (ccn, 60) or mFo (ccn, 60) DFc electron density maps that could be interpreted as being due to acetonitrile or the dipeptide. In the CCN_80 structure, the dipeptide is further displaced and a peak of electron density can be interpreted as a single molecule of acetonitrile (CH3CN) in the mFo (ccn, 80)DFc map (2.0 Å). The high average B factor of CCN 1 (60 Å2) indicates that this acetonitrile is probably fairly mobile.

Solvent interaction sites

HBPLUS (McDonald and Thornton, 1994Go) and LIGPLOT (Wallace et al., 1995Go) were used to produce schematic diagrams of each of the binding sites for the solvent molecules in the ACN_70, IPH_0.4 and CCN_80 structures (see Table IIGo). The cut-offs for the hydrogen bonding and hydrophobic contacts used in the HBPLUS program were 2.5–3.4 and 2.9–4.3 Å, respectively.


View this table:
[in this window]
[in a new window]
 
Table II. Ligplot (Wallace et al., 1995Go) representations of molecules of acetone, phenol and acetonitrile bound to TLN (see Materials and methods)

 
Computational functionality maps

Multiple Copy Simultaneous Search (MCSS) and GRID were used to compute favorable interaction sites within the active site of TLN for the organic molecules used in this study. For each solvent/solute, the following crystal structures were used as the starting models: IPA_100, ACN_70, IPH_0.4 and CCN_80. Any protein-bound organic molecules, water molecules and disordered side chains were removed from the structure prior to running the calculations, although water molecule(s) coordinated to the active-site zinc ion were retained. A symmetry-related molecule (symmetry operator 1 – y, 1 – x, –1/6 – z) was generated to account for groups which bind at crystal contacts such as IPA 5 in the IPA_100 structure. No minimization was performed on any of the structures prior to the calculations. The calculations were performed with neutral histidine residues.

Multiple Copy Simultaneous Search (MCSS)

MCSS (Multiple Copy Simultaneous Search) identifies favorable regions where chemically realistic fragments can interact with a protein. Between 500 and 3000 copies of a functional group are randomly positioned and orientated within a region of the binding site and are simultaneously minimized using an empirical potential energy function, such that they are subject only to internal forces and those due to the protein. Each minimum has a position, orientation and interaction energy computed using the CHARMM force field (Brooks et al., 1983Go). MCSS functional group maps for isopropanol, acetone, phenol and acetonitrile were calculated. The molecules were modeled as extended atom representations with an overall net charge of zero. Polar hydrogens were added to the protein with the hydrogen positions placed optimally within HBUILD (Brunger and Karplus, 1988Go), over three iterations; 500 replicas of each functional group flooded a parallelepiped (25x25x15 Å3) over the active-site cleft of the protein. The system was initially subjected to 100 steps of steepest descent minimization, followed by 300 additional steps of steepest descent and 20 repetitions of 500 steps of conjugate gradient minimization. Pairs of molecules were considered to have converged if the root mean square deviation (r.m.s.d.) between them was <0.2 Å and in such cases one of the pair was eliminated. All minimization steps had a convergence criterion of 0.001 kcal/mol.Å. All calculations were performed using version 22 of the CHARMM program with the polar hydrogen model (PARAM19) and MCSS version 2.1 (Harvard).

GRID

GRID locates favorable interaction sites on the surface of proteins by systematically searching points on a 3D grid with spherical probes, using an energy function of Lennard–Jones, electrostatic and hydrogen-bonding terms. An interaction energy between the protein and the probe is calculated at each grid point and the GRID map that is produced can be contoured to indicate favorable regions for each probe. GRID maps were calculated using hydroxyl, methyl, aromatic carbon, carbonyl oxygen, phenolic hydroxyl and sp nitrogen as probe groups. The program GRIN was used to add polar hydrogens, which utilizes the parameters in GRUB based on the CHARMM `extended' atom representation. The calculations were performed by GRID (version 17) within a parallelepiped (25x25x15 Å3) over the active-site cleft of the protein, with a grid size of 0.2 Å and distance dependent dielectric, {varepsilon}r with values of 2 and 80 for the protein and solvent environments, respectively.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Accession numbers
 References
 
Overall protein structure

R.m.s.d.s calculated between native TLN (hereafter referred to as TLN_0; PDB code 2tlx) and the solvent-soaked structures for main chain and side chain atoms are in the ranges 0.1–0.2 and 0.2–0.5 Å, respectively. This is comparable to the average r.m.s.d.s calculated between five representative TLN complexes (PDB codes: 5tln, 7tln, 2tmn, 1thl, 1tlp) (0.1 and 0.4 Å for main chain and side chain atoms, respectively), confirming that infusing the crystals with organic solvent does not significantly affect the protein structure.

Solvent interaction sites

Figure 2Go shows the experimental functionality map of crystalline TLN. The distribution of molecules shows that there are interaction sites over a number of regions on the surface with more than one probe molecule located in several of the sites. These include the main specificity pocket (S1' subsite) and a cavity in the interior of the protein. These two subsites are generally the first to be occupied as the concentration of organic solvent is increased in the crystal soak solution. Schematics for the acetone, phenol and acetonitrile binding sites are shown in Table IIGo. Solvent molecules bound in the S1' subsite, buried subsite and crystal contacts are now described in more detail.



View larger version (55K):
[in this window]
[in a new window]
 
Fig. 2. Ribbon representation showing the experimental functionality map derived from soaking crystals of TLN in organic probe molecules. The various probe molecules are colored according to the key. The active-site zinc ion and calcium ions are shown as cpk spheres and are colored gray and black, respectively. Dimethyl sulfoxide (DMSO) is present in the crystallization conditions of TLN (45%, v/v) and each crystal structure has a molecule bound. The molecules are numbered according to their observed order of binding and are taken from the IPA_100, ACN_70, IPH_0.4 and CCN_80 structures. This figure was prepared using BOBSCRIPT (Esnouf, 1997Go) and RASTER3D (Bacon and Anderson, 1988Go; Merritt and Murphy, 1994Go).

 
S1' subsite

Within the active site of TLN, all of the experimental probe molecules were observed to bind in the main specificity pocket (S1' subsite). Phenol, isopropanol, acetone and acetonitrile were first identified to bind at concentrations of 0.4, 5, 50 and 80% (v/v), respectively. The subsite forms a distinct cavity that is lined with hydrophobic residues (Phe130, Leu133, Val139, Ile188, Val192 and Leu202). Towards the edge of the pocket are several polar residues (Asn112, Glu143, Arg203 and His231). Figure 3AGo shows an overlay of the different probe molecules within the pocket.



View larger version (47K):
[in this window]
[in a new window]
 
Fig. 3. Stereo representation of the S1' subsite (A) and a buried subsite (B) in TLN showing a clustering of organic probe molecules. The ACN_70, IPH_0.4, IPA_90 and CCN_80 structures were overlaid within QUANTA (Molecular Simulations Inc., 1998Go) based on C{alpha} atoms only. In both figures, the ACN_70 structure is shown in a ball-and-stick representation with carbon, oxygen and nitrogen atoms colored white, red and blue, respectively. Probe molecules and the solvent structure associated with them are colored as follows: isopropanol (cyan), acetone (red), phenol (yellow) and acetonitrile (green), with black spheres representing either oxygen or nitrogen atoms. In (A), the active-site zinc ion is shown as a gray cpk sphere. Possible hydrogen bonding interactions are shown as dashed lines (as calculated within HBPLUS). The geometric criteria used to define hydrogen bonds in HBPLUS are given in the Materials and methods section.

 
It is perhaps not surprising that each probe molecule has been identified to bind in the main specificity pocket, since enzyme active sites evolve to bind substrates (Schmitke et al., 1998Go). Each of the probe molecules used comprises a lipophilic and a hydrophilic portion and although the molecules are all water miscible, they would generally be considered to be hydrophobic. It is likely that hydrophobic interactions are driving solvent binding at this site, with the lipophilic end of each molecule directed towards the interior of the pocket (Table IIGo).

Buried subsite

All of the oxygen-containing probe molecules (isopropanol, acetone and phenol) are identified to bind within a pocket in the interior of the protein. This cavity is lined with a number of hydrophobic residues (Tyr81, Tyr84, Tyr93, Ile100, Leu144 and Val148), but within the pocket there is little opportunity for hydrogen bonding, except for a water molecule present in all TLN crystal structures. Figure 3BGo shows an overlay of the different probe molecules within this subsite. In each case, the lipophilic end of the probe molecule is oriented towards the hydrophobic end of the pocket and the hydrophilic end is orientated towards the invariant water molecule (Table IIGo).

In native TLN (2tlx), the side chain of Met120 occupies two discrete conformations and Glu143 is disordered. In contrast, in all of the acetone- and phenol-soaked structures Met120 and Glu143 each favor only a single conformer and Leu144 has changed conformation. It is suggested that when acetone and phenol bind in the buried subsite they directly interact with the side chain of Leu 144, causing a change in conformation of this residue (r.m.s.d. for Leu 144 between native and acetone/phenol structures 2.3–2.6 Å). This then causes a `concerted re-packing' of the side chains of Met120 and Glu143 similar to that observed when crystals of TLN were soaked in isopropanol (English et al., 1999Go).

Crystal contacts

Of the six molecules of acetone bound to TLN, three are located at crystal contacts (ACN 3, 4 and 5) and the interactions they make with the protein are shown in Table IIGo. While such sites would probably not be identified in solution, their direct observation within the crystalline environment has several important implications. First, small molecules are sometimes used as additives in the crystallization conditions of proteins (Ducruix and Giegé, 1991Go) and the identification of bound molecules located at crystal contacts might provide a structural explanation. Second, that interaction sites for small molecules are located at crystal contacts identifies these sites as having the general features of a binding site, albeit within the confines of a crystal lattice. In general, movements of the protein side chains in the vicinity of the crystal contact were slight and the molecules of acetone can be envisaged as slotting in between neighboring molecules of TLN.

MCSS

Table IIIGo shows the number of minima found by MCSS for isopropanol, acetone, phenol and acetonitrile in the active site of TLN. For each solvent/solute, the following crystal structures were used as the starting models: IPA_100, ACN_70, IPH_0.4 and CCN_80. The calculations were repeated using the native structure (2tlx) as the starting model to check that the interaction sites identified were not sensitive to the model used. Essentially the same interaction sites were identified and these results are not described. To facilitate analysis, the minima were contoured at a suitable level and clustered with a 3.5 Å r.m.s.d. criterion. The range of interaction energies between those MCSS functional groups retained and the protein binding site are also tabulated. Figure 4Go compares all the MCSS minima for each group with the crystallographic solvent positions. The minima are colored from low energy which is associated with stronger binding (blue) to high energy which is associated with weaker binding (red). MCSS identifies three principle interaction sites where the majority of the minima are located—these are the S1' subsite, the S2 subsite and the subsite where IPA 3 and ACN 6 were experimentally identified to bind. In general, the regions where the organic probe molecules are experimentally located are found by the MCSS program, but the specific details of binding are less well defined. Table IVGo provides a summary of the r.m.s.d.s between each experimental position with the closest and lowest energy MCSS minimum within the nearest cluster. For each group, the minima are numbered consecutively from low to high energy, starting at M1. A detailed comparison of the MCSS minima with the experimental solvent positions is provided below.


View this table:
[in this window]
[in a new window]
 
Table III. Minima found by MCSS in the active site of TLN
 


View larger version (143K):
[in this window]
[in a new window]
 
Fig. 4. Stereo representations to compare the MCSS functional group minima with the experimental solvent positions within the active site of TLN (S2, S1, S1' and S2' subsites). MCSS minima are colored from low energy (blue) to high energy (red). Experimental positions are colored green and numbered according to the observed order of binding as identified by X-ray crystallography. The regions where the solvent molecules are experimentally located are found by MCSS, but the specific details of binding are less well defined. The solid surface (shown in gray) was calculated within QUANTA (Molecular Simulations Inc., 1998Go) using a probe radius of 1.4 Å.

 

View this table:
[in this window]
[in a new window]
 
Table IV. Comparison of MCSS functional group minima with the experimental small molecule positions in the active site of TLN
 
Isopropanol. Figure 4AGo shows that the 37 isopropanol minima form five distinct clusters, consisting of a group of 15, a group of 11, a group of seven and two groups each containing two minima and these are close to four of the five isopropanol molecules shown. The largest cluster of 15 minima occupies the hydrophobic S1' subsite where IPA 1 was experimentally located. In the various crystal structures (English et al., 1999Go), the isopropyl group of IPA 1 is directed towards the interior of the S1' pocket. Within this cluster, the lower energy computational minima form favorable hydrogen bonds at the expense of hydrophobic interactions. This is possibly due the lack of an explicit hydrophobic component in the CHARMM energy function and the fact that in vacuo electrostatic effects tend to be overestimated.

The cluster of 11 minima are located in the S2 subsite where IPA 5 is bound. The lower energy minima within this cluster are in the region of the B conformer of the side chain Tyr157, which was removed prior to the MCSS calculations. Of note, the B conformer of Tyr157 forms an equivalent hydrogen bond to Asp150 (2.9 Å) that the M21 minimum identifies (Table IVGo). By repeating the MCSS calculation but retaining the B conformer of Tyr157 instead and applying the same threshold as before a cluster of two minima close to IPA 5 were identified. The r.m.s.d. between IPA 5 and the lower energy minimum was just 0.4 Å and the same hydrogen bond with the OH atom of a symmetry-related Tyr106 (2.9 Å) identified.

One of the more accurately predicted isopropanol binding modes is a cluster of two minima in the S1 subsite where IPA 8 is bound. The M33 minimum donates a single hydrogen bond to the O{varepsilon}1 atom of Glu143 (2.9 Å) and affords favorable electrostatic interactions with the active-site zinc ion, similar to IPA 8 (Table IVGo).

The lowest energy minimum (M1) is among a cluster of minima located close to the region where IPA 3 is bound and potentially forms two good hydrogen bonds with the protein.

Acetone. The six acetone minima form four clusters, consisting of a group of three and three groups each containing a single minimum (Figure 4BGo). The lowest energy minimum (M1, –37.8 kcal/mol, Table IVGo) is located in the S1' subsite and the key hydrophobic and hydrogen binding interactions that ACN 1 forms with the protein have been correctly reproduced (Table IVGo). The cluster of three acetone minima is in the site where ACN 6 binds, although the details of the orientation are less well predicted. Neither of the two remaining clusters are close to any experimentally observed acetone positions.

Phenol. The 10 phenol minima form three clusters, consisting of a group of seven, a group of two and one cluster containing a single minimum (Figure 4CGo). The cluster of seven minima is located in the S1' subsite, where phenol (IPH 1) was experimentally identified to bind. Within this cluster, MCSS has identified many putative binding modes of similar energy. Although M2 (Table IVGo) accepts a hydrogen bond of good geometry from both the NH1 and NH2 atoms of Arg203 (both 2.8 Å), the aromatic ring is positioned across the face of the S1' pocket and the potential to interact with the hydrophobic residues that line the subsite remains largely unsatisfied. When phenol binds in the S1' subsite highly favorable hydrophobic interactions are formed with the pocket with just a single hydrogen bond to Glu143 (2.6 Å). Although the actual binding mode of phenol is within this cluster of minima (M6) (Table IVGo), MCSS has failed to discriminate between this and the false positives. MCSS has predicted the binding of phenol to be significantly weaker than for the other groups, which further indicates that hydrophobic interactions are being underestimated (Table IVGo). Since only a single molecule of phenol was identified to bind in the active site, neither of the two remaining clusters are close to any experimentally observed phenol positions.

Acetonitrile. The 11 acetonitrile minima form seven clusters, consisting of a group of five and six groups each containing a single minimum. Of the four clusters shown in Figure 4DGo, one of the clusters containing a single minimum (M2) is in the S1' subsite, with an r.m.s.d. of 2.4 Å between M2 and the experimental position of CCN 1 (Table IVGo). Although no explicit hydrogen bonds were identified in QUANTA (MSI, 1998), the nitrogen atom of the M2 minima is 3.0 and 2.9 Å from the NH1 and NH2 atoms of Arg 203 with donor–H–acceptor angles of 142 and 149°, respectively.

Similarly to phenol, the aliphatic portion of acetonitrile is positioned across the face of the pocket rather than directed towards the interior. In the CCN_80 structure, the nitrogen atom of acetonitrile accepts a hydrogen bond from a water molecule. Since all water molecules (except those coordinated to the zinc ion) were removed prior to the MCSS calculation, it is perhaps not surprising that MCSS fails to reproduce the binding mode of acetonitrile correctly. The lowest energy minimum is in a region where a water molecule is observed in the CCN_80 crystal structure, so although this could be a potential binding site for acetonitrile the strength of binding in this subsite is possibly insufficient to displace this water molecule.

GRID

A series of probes (hydroxyl, methyl, aromatic carbon, carbonyl oxygen, phenolic hydroxyl and nitrile) were used in the GRID program to explore interaction regions in the active site of TLN. These probes represent the main atom types present in the experimental probe molecules. Table IVGo shows the number of favorable regions identified by GRID for the different probes below a suitable contour level. Contour levels were chosen that represent a significant attraction between the probe group and the protein and the MINIM routine within GRID was used to calculate the coordinates of minima within each map. A comparison of GRID minima with small molecule positions identified using X-ray crystallography is also presented in Table IVGo.

Isopropanol. Figure 5AGo shows the GRID maps calculated using hydroxyl and methyl atoms as the probes, which together represent the functionality present in isopropanol. Energy contours for the hydroxyl and methyl atoms are shown in blue and orange, respectively. For the hydroxyl probe there are six main regions within the active-site cleft when the map is contoured at –10.0 kcal/mol and three of the regions are close to experimental isopropanol positions (IPA 1, 5 and 8). For these molecules the distances between the oxygen atom of the hydroxyl group and the closest minimum below the contour level are given in Table IVGo. For the methyl probe, there are five main favorable regions using a cut-off of –3.8 kcal/mol (Figure 5AGo) and the distance from the center of mass of each isopropyl group (CH3CHCH3) to the closest minimum below this contour level was also measured (Table VGo).



View larger version (60K):
[in this window]
[in a new window]
 
Fig. 5. Comparison of the energy contours calculated using GRID with the experimental positions of isopropanol and acetone in the active site of TLN. The protein is shown in a ball-and-stick representation with carbon, oxygen and nitrogen atoms colored white, red and blue, respectively. Experimental positions are colored green and numbered according to the observed order of binding as identified by X-ray crystallography. The active-site zinc ion is shown as a black cpk sphere. In (A), the energy contours for the hydroxyl and methyl probes are at –10.0 kcal/mol (colored blue) and –3.8 kcal/mol (colored orange), respectively. A symmetry-related molecule is colored gray. In (B), the energy contours are for the carbonyl oxygen and methyl probes are at –6.0 kcal/mol (colored purple) and –3.5 kcal/mol (colored orange), respectively.

 

View this table:
[in this window]
[in a new window]
 
Table V. Comparison of GRID minima with small molecule positions identified using X-ray crystallography in the active site of TLN
 
Figure 5AGo shows that two of the favorable regions for the hydroxyl probe are close to the experimental position of IPA 1. One of these is spherical and clearly defined by the potential to accept two hydrogen bonds from Arg203. In fact, a water molecule is identified in all of the isopropanol structures at this site. A second region is more elongated and extends over a larger volume encompassing possible interactions with Asn112, Glu143 and the zinc ion. Also close to IPA 1 is one of the favorable regions for the methyl probe and this energy contour identifies the hydrophobic pocket and contains the lowest energy minimum in the entire map.

In the region of IPA 5, the contours of the hydroxyl probe identify the strong directionality of hydrogen bonds made by tyrosine residues (in this case Tyr106 of a symmetry-related molecule) (Figure 5AGo). The distance between the closest GRID map minimum and the oxygen atom of IPA 5 is only 0.4 Å (Table VGo). Also, a second favorable region identifies the potential hydrogen bonding between Asp150 and the B conformer of Tyr157, which was removed prior to the GRID calculation.

Close to IPA 8, two favorable regions for the hydroxyl probe were identified. The larger of these was described above, but there is also a satellite peak which identifies the potential of the hydroxyl probe to donate a hydrogen bond to Glu143 and participate in favorable electrostatic interactions with the zinc ion. The distance between the closest GRID map minimum and the oxygen atom of IPA 8 is 0.9 Å (Table VGo). Also close to IPA 8 is a favorable region for the methyl probe and this energy contour identifies a hydrophobic region near Trp115, His146 and the A conformer of Tyr157. The GRID energy contours calculated suggest that most of the active-site molecules of isopropanol are located at boundaries of favorable regions for the methyl and hydroxyl probe atoms. This might be anticipated since isopropanol is a small amphipathic molecule.

Acetone. For acetone, the carbonyl oxygen and methyl atoms were used as the probe groups. There are three main regions when the carbonyl oxygen map is contoured at –6.0 kcal/mol and one of these is close to the crystallographic position of ACN 1 (Table VGo) (Figure 5BGo). The agreement is significant since this minimum is also the most negative in the entire map, with an interaction energy of –9.3 kcal/mol. There are six regions in the methyl map when contoured at –3.5 kcal/mol and two of the regions are close to ACN 1 (Table VGo). Similarly to isopropanol, the position of ACN 1 is located at a boundary of two regions, one region favoring non-polar groups and the other favoring groups capable of hydrogen bonding to the protein.

Phenol. For phenol, the phenolic oxygen and aromatic carbon atoms were used as the probe groups. The same principle regions as those located using the hydroxyl probe were identified using the phenolic oxygen as the probe group. The closest region to the oxygen atom of IPH 1 is the spherical region defined by the potential to accept two hydrogen bonds from Arg203. The largest region of attraction identified by the aromatic probe is the main specificity pocket.

Acetonitrile. For acetonitrile, the sp nitrogen and methyl atoms were used as the probes. The sp nitrogen probe identifies approximately the same regions as located using the hydroxyl probe. At the chosen contour level, there are six regions defined (not shown). These include the spherical region defined by the potential to accept a hydrogen bond from Arg203 and the location of the B side chain conformer of Tyr157 (which was removed prior to the calculation). Compared with the GRID map for the hydroxyl probe, the elongated region defined by possible electrostatic and hydrogen bonding interactions with Asn112, Glu143 and the zinc ion is reduced in size. The closest minimum to the nitrogen atom of CCN 1 is in this region and is 2.6 Å away. The relatively poor agreement probably reflects the fact that the hydrogen bonding of the sp nitrogen is not the primary determinant of acetonitrile binding. The same principle regions identified in the other structures were identified by the methyl probe.

Comparison with known inhibitors

ACN 1, IPH 1 and CCN 1 were identified to bind in the main specificity pocket of the active site of TLN and it is possible to compare their positions and orientations with similar functionality in the published TLN–ligand complexes. Some differences might be anticipated since the position of individual functional groups present in larger inhibitors may be compromised by other functionality, whereas the position and orientation of the bound solvent/solute molecules in the crystal structures should be more optimal. A similar analysis was presented for molecules of isopropanol bound in the active site of TLN (English et al., 1999Go).

Acetone (ACN 1) binds in the S1' subsite. The superposition of ACN_70 (1.7 Å) with the inhibitors carbobenzoxy–GlyP–(S)-Leu–(S)-Leu (ZGPLL) (5tmn) (Holden et al., 1987Go), N-{1-[2-(R,S)-carboxy-4-phenylbutyl]cyclopentylcarbonyl}-(S)-tryptophan (CCT) (1thl) (Holland et al., 1994Go) and (S)-benzylsuccinic acid (BZSA) (1hyt) (Bolognesi and Matthews, 1979Go) is shown in Figure 6AGo. The hydrogen bonding pattern of ZGPLL and CCT with TLN contains a characteristic feature of many TLN–inhibitor complexes: hydrogen bonds between the P1' substituent carbonyl oxygen and the two guanidinium nitrogens of Arg203 (Monzingo and Matthews, 1984Go). The same hydrogen bonding pattern is observed for the carboxylate functionality in the TLN–BZSA complex. In each case, the carbonyl or carboxylate oxygen is equidistant from both the guanidinium nitrogens, with distances for ZGPLL, CCT and BZSA of 3.0, 2.9 and 3.0 Å, respectively. Therefore, it might be anticipated that the position of acetone (ACN 1) would overlay well with this functionality within these inhibitors. The oxygen atom of ACN 1 is fairly close to the corresponding oxygen atoms in ZGPLL, CCT and BZSA with distances of 1.4, 1.4 and 1.2 Å, respectively. However, the orientation of ACN 1 is such that it accepts only a single hydrogen bond from Arg203 (NH2 atom, 3.1 Å), with the distance to the NH1 atom ~3.5 Å and of unsuitable geometry to form a second hydrogen bond. The reason for this is clear since by compromising the hydrogen bonding, the aliphatic portion of this acetone can form favorable hydrophobic interactions with Val139 and Leu202. Indeed, the aliphatic portion of ACN 1 overlays well with the P1' substituents of ZGPLL, CCT and BZSA. Optimal hydrogen bonding interactions between ACN 1 and Arg203 would require the CH3CCH3 portion of acetone to be positioned across the face of the pocket, allowing for only limited hydrophobic interactions with the protein. Similarly, the extent of the hydrophobic interactions could be increased if acetone were to lie in the same plane as groups like the phenyl ring of BZSA, although this would exclude accepting any hydrogen bonds from Arg203.



View larger version (50K):
[in this window]
[in a new window]
 
Fig. 6. Stereo representation of TLN to compare the position and orientation of acetone (ACN 1) and phenol (IPH 1) with similar functionality present in published TLN–ligand complexes. The protein is shown in a ball-and-stick representation with carbon, oxygen and nitrogen atoms colored white, red and blue, respectively. Experimental small molecule positions are colored green and the active-site zinc ion is shown as a gray cpk sphere. In (A), the final ACN_70 model has been overlaid with the inhibitors ZGPLL (5tmn), CCT (1thl) and BZSA (1hyt). Carbon atoms of the TLN ligands are color coded as follows: ZGPLL (yellow), CCT (cyan) and BZSA (orange). The phosphorus atom in ZGPLL is colored purple. In (B), the final IPH_0.4 model has been overlaid with the inhibitors HONH-BAGN (5tln) and BZSA (1hyt). Carbon atoms of the ligands are color coded as follows: water molecules of IPH_0.4 (green), HONH-BAGN (cyan) and BZSA (orange).

 
Phenol (IPH 1) binds in the S1' subsite. The superposition of IPH_0.4 (1.9 Å) with the inhibitors HONH-(benzylmalonyl)-(S)-Ala-Gly-p-nitroanilide (HONH-BAGN) (5tln) (Holmes and Matthews, 1981Go) and BZSA, both of which contain phenyl rings as P1' substituents, is shown in Figure 6BGo. The phenyl ring of IPH 1 occupies a similar position to the phenyl rings in both HONH-BAGN and BZSA, with r.m.s.d.s of 1.0 and 0.8 Å, respectively. The slightly different orientation of IPH 1 from the phenyl side chains in both HONH-BAGN and BZSA is probably due to several factors. These include the hydrogen bonding between IPH 1 and Glu143 and the electrostatic interactions between the hydroxyl group of IPH 1 and the zinc ion. In this structure a water molecule can be envisaged as mimicking the oxygen atom of the peptide bond in many of the TLN–inhibitor complexes, including that of HONH-BAGN and the carboxylate functionality in BZSA. The oxygen atom of IPH 1 is only 0.5 Å from the corresponding oxygen in BZSA.


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Accession numbers
 References
 
The crystallographic results described in this paper demonstrate that locating and characterizing interaction sites for small molecules using MSCS can be straightforward and rapid. In contrast to studies on elastase (Allen et al., 1996Go; Ringe and Mattos, 1999Go) and subtilisin (Fitzpatrick et al., 1993Go), crystals of TLN were found to be stable in a range of organic solvents and did not require cross-linking.

For this study, we determined the crystal structure of TLN from crystals soaked in various concentrations of acetone, acetonitrile and phenol. An increasing number of solvent interaction sites could be identified as the solvent concentration was increased. The fact that only two binding sites were identified for phenol (despite containing a hydroxyl moiety and an aromatic ring) probably stems from the relatively low concentrations achieved in the soaking experiment. The lack of binding sites for acetonitrile can be attributed to its weak potential for hydrogen bonding and hydrophobic interactions. In addition, its small size means that attributing electron density to acetonitrile is considerably more difficult than for the other fragments.

Combining these results with those from the isopropanol series for TLN (English et al., 1999Go) has allowed an experimental functionality map to be constructed. The main features of the experimental functionality map of TLN are strikingly similar to those of porcine pancreatic elastase (Mattos and Ringe, 1996Go), where clusters of probe molecules were identified to bind in principally the S1 site and a second site 20 Å away from the active site. Both these studies are consistent with the view that on the surface of protein there exist a few regions or `hot spots' that provide most of the binding affinity (Clackson and Wells, 1995Go).

The computational methods MCSS and GRID identified essentially the same interaction sites in the active site of TLN; however, of these only a handful were identified experimentally. Comparison of the computational interaction sites close to experimental solvent positions showed that in some cases the observed binding modes are fairly accurately predicted (e.g. IPA 5 and 8, ACN 1), whereas in other cases there is a large discrepancy (e.g. IPA 1, IPH 1, CCN 1). Such a disparity in the predictions might be anticipated since entropic and solvation effects were not explicitly included in the calculations (Caflisch, 1996Go). In general, electrostatic interactions dominate the computational predictions as they tend to be overestimated in vacuo. This comparison serves to highlight the amphipathic nature of these probe molecules (particularly isopropanol and acetone), with the observed binding mode representing a compromise between hydrophobic and hydrogen bonding interactions.

There are several limitations of the MSCS methodology. The original concept of soaking protein crystals in high concentrations of organic solvents is limited owing to the fragile nature of crystals. Clearly, moderately high-resolution data (~2.0 Å) are a prerequisite for this approach since protein-bound organic solvent must be reliably distinguished from water molecules. MSCS will readily identify molecules of organic solvents that are well ordered although peaks of electron density can be attributed to solvent binding in a disordered manner. It is also likely that there are other binding sites where solvent molecules are highly disordered or very mobile, that MSCS will fail to identify. Despite these limitations, MSCS could have an impact in a several areas of structural biology. Applied to therapeutically relevant systems MSCS could be used to identify key interaction sites on the protein, optimize known inhibitors or provide a basis for new leads. For example, functional groups within protein-bound solvent molecules could be linked together computationally to design novel ligands, which could be synthesized and their binding modes determined. Clearly, a requirement for this is the position of solvent molecules in several different specificity pockets within the active site. There are also possible applications in the use the structural information derived from MSCS experiments to search databases for lead molecules. The experimental functionality map identifies the spatial relationship of groups that can participate in non-covalent interactions (i.e. the pharmacophore), which could be used to search databases of 3D structures for active compounds.

MSCS could aid the design of more focused combinatorial libraries, by screening a target protein with potential fragments prior to their incorporation into a library. This is something that the SAR by NMR technique has achieved effectively (Shuker et al., 1996Go; Hajduk et al., 1997Go). Where the NMR-based approach succeeds is in its ability to screen compounds rapidly for biological activity and the direct determination of binding constants. Where the target protein can be successfully 15N-labeled and the system is suitable for NMR studies, this technique represents the obvious choice for experimentally screening libraries of compounds. In comparison, MSCS is slow and is limited in that only structural information can be derived. However, where MSCS is successful is in providing very detailed information about protein–ligand interactions. With the continuing advances in X-ray sources, detectors and computing power the acquisition and processing of data will become increasingly rapid and a situation can be envisaged whereby organic molecules could be screened against target proteins within a matter of hours.


    Accession numbers
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Accession numbers
 References
 
Atomic coordinates for the refined models have been deposited in the Protein Data Bank with the accession codes 1fj3, 1fjo, 1fjq, 1fjt, 1fju, 1fjv and 1fjw.


    Notes
 
2 Present address: Molecular Simulations Inc., 200 Wheeler Road,South Tower, 2nd Floor, Burlington, MA 01803, USA Back

4 To whom correspondence should be addressed. E-mail: rod{at}ysbl.york.ac.uk Back


    Acknowledgments
 
This research was supported by a grant from the BBSRC and CASE award from Pfizer Central Research. We thank Johan Turkenburg with help during data collection and processing of the 1.7 Å ACN_70 data set and gratefully acknowledge the use of Daresbury SRS.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Accession numbers
 References
 
Allen,K.N., Bellamacina,C.R., Ding,X.C., Jeffery,C.J., Mattos,C., Petsko,G.A. and Ringe,D. (1996) J. Phys. Chem., 100, 2605–2611.[ISI]

Bacon,D. and Anderson,W.F. (1988) J. Mol. Graphics, 6, 219–220.[ISI]

Böhm,H.J. and Klebe,G. (1996) Angew. Chem., Int. Ed. Engl., 35, 2589–2614.[ISI]

Bolognesi,M.C. and Matthews,B.W. (1979) J. Biol. Chem., 254, 634–639.[ISI][Medline]

Brooks,B.R., Bruccoleri,R.E., Olafson,B.D., States,D.J., Swaminathan,S. and Karplus,M. (1983) J. Comput. Chem., 4, 187–217.[ISI]

Brunger,A.T. and Karplus,M. (1988) Proteins: Struct. Funct. Genet., 4, 148–156.[ISI][Medline]

Caflisch,A. (1996) J. Comput.-Aided Mol. Des., 10, 372–396.

Caflisch,A., Miranker,A. and Karplus,M. (1993) J. Med. Chem., 36, 2142–2167.[ISI][Medline]

CCP4. Collaborative Computational Project, No. 4. (1994) Acta Crystallogr., D50, 760–763.[ISI]

Clackson,T. and Wells,J.A. (1995) Science 267, 383–386.[ISI][Medline]

Ducruix,A. and Giegé,R. (1991) Crystallisation of Nucleic Acids and Proteins: a Practical Approach. IRL Press, Oxford.

English,A.C., Done,S.H., Caves,L.S.D., Groom,C.R. and Hubbard,R.E. (1999) Proteins: Struct. Funct. Genet., 37, 628–640.[ISI][Medline]

Esnouf,R.M. (1997) J. Mol. Graphics, 15, 132–134[ISI]

Fitzpatrick,P.A., Steinmetz,A.C.U., Ringe,D. and Klibanov,A.M. (1993) Proc. Natl Acad. Sci. USA, 90, 8653–8657.[Abstract/Free Full Text]

Goodford,P.J. (1985) J. Med. Chem., 28, 849–857.[ISI][Medline]

Hajduk,P.J., et al. (1997) J. Am. Chem. Soc., 119, 5818–5827.[ISI]

Holden,H.M., Tronrud,D.E., Monzingo,A.F., Weaver,L.H. and Matthews,B.W. (1987) Biochemistry, 26, 8542–8553.[ISI][Medline]

Holland,D.R., Barclay,P.L., Danilewicz,J.C., Matthews,B.W. and James,K. (1994) Biochemistry, 33, 51–56.[ISI][Medline]

Holland,D.R., Hausrath,A.C., Juers,D. and Matthews,B.W. (1995) Protein Sci., 4, 1955–1965.[Abstract/Free Full Text]

Holmes,M.A. and Matthews,B.W. (1981) Biochemistry, 20, 6912–6920.[ISI][Medline]

Leslie,A.G.W., Brick,P. and Wonacott,A.J. (1986) CCP4, 18, 33–39.

Mattos,C. and Ringe,D. (1996) Nature Biotechnol., 14, 595–599.[ISI][Medline]

Matthews,B.W. (1988) Acc. Chem. Res., 21, 333–340.[ISI]

McDonald,I.K. and Thornton,J.M. (1994) J. Mol. Biol., 161, 269–288.

Merritt,E.A. and Murphy,M.E.P. (1994) Acta Crystallogr., D50, 869–873.

Miranker,A. and Karplus,M. (1991) Proteins: Struct. Funct. Genet., 11, 29–34.[ISI][Medline]

Molecular Simulations Inc. (1998) QUANTA. Molecular Simulations Inc., San Diego, CA.

Monzingo,A.F. and Matthews,B.W. (1984) Biochemistry, 23, 5724–5729.[ISI][Medline]

Murshudov,G.N., Vagin,A.A. and Dodson,E.J. (1997) Acta Crystallogr., D53, 240–255.[ISI]

Oldfield,T.J. (1996) In Proceedings of the CCP4 Study Weekend, SRS Daresbury Laboratory, Warrington, UK. SRS Daresbury Laboratory, Warrington, pp. 67–74.

Otwinowski,Z. (1990). DENZO Data Processing Package. Yale University, New Haven, CT.

Otwinowski,Z. and Minor,W. (1997) Methods Enzymol., 276, 307–326.[ISI]

Ringe,D. and Mattos,C. (1999) Med. Res. Rev., 19, 321–331.[ISI][Medline]

Schmitke,J.L., Stern,L.J. and Klibanov,A.M. (1998) Biochem. Biophys. Res. Commun., 248, 273–277.[ISI][Medline]

Shuker,S.B., Hajduk,P.J., Meadows,R.P. and Fesik,S.W. (1996) Science, 274, 1531–1534.[Abstract/Free Full Text]

Wallace,A.C., Laskowski,R.A. and Thornton.J.M. (1995) Protein Eng., 8, 127–134.[Abstract]

Received August 23, 2000; revised October 31, 2000; accepted November 9, 2000.