* U.S. Environmental Protection Agency, National Health and Environmental Effects Research Laboratory, Mid-Continent Ecology Division, 6201 Congdon Boulevard, Duluth, Minnesota 55804; and
Bourgas University "Prof. As. Zlatarov," Laboratory of Mathematical Chemistry, Department of Physical Chemistry, 118010 Bourgas, Bulgaria
Received April 17, 2000; accepted July 31, 2000
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: structure activity relationships; expert systems; human estrogen relative binding affinity; estrogen receptor ligands.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Typical approaches to quantifying 3-D similarity in the context of ligand-receptor interactions encompass pharmacophore (or toxicophore) search methods and receptor site mapping. However, selecting appropriate molecular conformations and obtaining structural alignments can be quite challenging with these methods. The use of the lowest-energy conformers to assess similarity in pharmacophore search and receptor-mapping algorithms is common, but inappropriate, because in complex systems such as biological tissues and fluids, chemicals are likely to exist in a variety of conformational states. In fact, the lowest-energy gas-phase conformations might be the least likely to interact with macromolecules (Eliel, 1993), and solvation and binding interactions could more than compensate for energy differences among the conformers of a chemical (Bradbury et al., 1996
, 1998
; Mekenyan et al., 1994a
, 1996a
, Mekenyan et al., b
; Wiese and Brooks, 1994
). In terms of appropriate chemical alignment, most modeling algorithms explore hundreds of alignments to reach an optimum outcome which, if not carefully evaluated in the context of a presumed mechanism of interaction with the receptor, may be susceptible to violation of the criteria of Topliss and Edwards (1979) for causality in SAR models. Alignment errors also can lead to models that are incorrect or are poorly predictive.
To address these issues, we recently described a technique to generalize the use of multiple conformers in an active analogue approach (Mekenyan et al., 1997, 1999
). The common reactivity pattern (COREPA) approach circumvents the problems of conformer alignment and selection, and initial assumptions concerning specific atoms/fragments in a pharmacophore are not obligatory. In this respect, the method implicitly defines the common reactivity pattern across global and local reactivity descriptor(s) potentially associated with the specific biological endpoint under study. As described by Mekenyan et al. (1997, 1999), the 3 principal steps of the algorithm are: (1) definition of a training set of chemicals; (2) evaluation of stereoelectronic descriptors hypothesized to be associated with compounds exerting similar biological activity; and (3) recognition of the common reactivity pattern for those compounds. In the most recent version of the technique (COREPA-C), the common reactivity patterns are described in terms of probabilistic functions (Mekenyan et al., 1999
; Schmieder et al., 2000
). This feature improves the means of quantifying chemical similarity and expressing prediction uncertainties based on relative differences in the measured biological activity, in this case relative binding affinity.
To initially develop the COREPA algorithms, stereoelectronic requirements associated with the binding of 28 steroidal and nonsteroidal ligands to the androgen receptor (AR) were defined (Mekenyan et al., 1997, 1999
). In the present study, the COREPA algorithm was further evaluated by assessing stereoelectronic requirements for the binding of 45 diverse structures to the human estrogen receptor
(hER
). Specifically, the algorithm was employed to establish reactivity patterns for ligand subsets derived from a training set of chemicals with relative binding affinities (RBAs) of >150%, 10010%, 101%, and 10.1%. Based on the results of this analysis, an exploratory expert system was developed for use in assigning potential hER
RBA to chemicals for ranking and prioritizing large chemical data sets for subsequent testing.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
ER ligand conformations.
Conformer generation was based on a combinatorial procedure that initiates from molecular topology and generates all conformers consistent with steric constraints and expert rules (Ivanov et al., 1994). In generating conformers, the torsion resolution around "saturated" (sp3-sp3) bonds was 120°, using an initial torsion angle of 60° with respect to the plane of the preceding 3 atoms (Bradbury et al., 1996
; Mekenyan et al., 1997
, 1999
). Distance between nonbonded atoms was set at 1.5 Å, while a range of 1.2 to 1.8 Å was imposed for ring closure. Due to the rigidity of the natural steroids and their derivatives, less restrictive geometric constraints for ring closures (1.02.5 Å) were imposed to generate a sufficiently large number of conformations with the same stereospecificity as the natural enantiomers (i.e., B/C trans and C/D trans ring fusion). Combinatorial problems were encountered for chemicals 4, 7, 10, 12, 13, 36, and 43, due to the high degree of flexibility in their acyclic fragments. The number of conformers initially generated for those chemicals was reduced by not permitting rotation around the 2 most peripheral CC bonds. Up to 500 of the sterically most distinct points from the conformational space for each chemical were selected. Geometric dissimilarity was assessed, based on Euclidean distances between the sums of interatomic distances for the conformers.
Each of the generated conformations was submitted to a strain minimization technique (pseudo-molecular mechanics, PMM) based on a simple energy-like function, where only the electrostatic terms are omitted (Ivanov et al., 1994). Subsequently, conformational degeneracy, due to molecular symmetry and geometry convergence was detected within a 30° range of torsion angle differences. Next, geometry optimization was achieved by employing MOPAC 93 (Stewart, 1990
, 1993
), using the AM1 Hamiltonian with the key words "PRECISE" and "NOMM". As a result of the optimizations, some of the conformations quenched into the same energy minima, further reducing the number of conformers. Finally, conformers were screened to eliminate those whose
Hf° was
20 kcal/mol more than the conformer with the absolute energy minimum. This 20 kcal/mol threshold was based on experimental evidence that the free energy of binding for steroid hormones is in the range of 10 to 20 kcal/mol (Anstead et al., 1989
, 1997
; Wiese and Brooks, 1994
), which can provide the necessary energy to elevate conformers from the low(est) energy state during binding. As reported previously, conformers selected within this range of
Hf° are energetically reasonable from a thermodynamic and kinetic perspective (Bradbury et al., 1998
; Ivanov et al., 1998
; Mekenyan et al., 1997
, 1999
).
For a given compound, conformers within the specified 20 kcal/mol range of Hf° often exhibited significant variation in potentially relevant electronic descriptors (Table 1
). For example, conformers of ß-zearalanol (chemical 13) had a range 0.449 eV for the energy of lowest unoccupied molecular orbital (ELUMO), 0.189eV for energy of highest occupied molecular orbital (EHOMO), 0.425 eV for EHOMO-LUMO, and 3.89 D for dipole moment (µ). Similar variations were observed for other nonsteroidal compounds. Descriptor ranges for the steroids, while smaller, are also noteworthy. For example, conformers of moxestrol (compound 11) had a range of 0.460 eV for ELUMO, 0.470 eV for EHOMO, 0.648 eV for EHOMO-LUMO, and 0.92 D for µ. The observation that relatively small energy differences between conformers can result in significant variations in electronic structure highlights the necessity of including all energetically reasonable conformers when defining common reactivity patterns.
To aid in interpretation of conformational flexibility for this data set, Table 1 provides the range of root mean square (RMS) differences between atoms of the conformers for each compound, based on comparisons of each conformer with the lowest-energy structure. Consistent with their greater rigidity, smaller RMS ranges were associated with steroid derivatives than with the nonsteroids. For example, RMS ranges of 2.423 to 8.641, 0.152 to 0.564, and 0.318 to 0.577 were derived for hexestrol (1), estrone (8), and estriol (14), respectively.
Molecular descriptors.
The global and local electronic descriptor pool used in this study was restricted to those hypothesized to be associated with ER binding affinity, based on previous studies with a variety of model receptors (e.g., Bradbury et al., 1996; Mekenyan et al., 1997, 1999; VanderKuur et al., 1993; Waller et al., 1995, 1996a, Waller et al., b; Wiese and Brooks, 1994; Wurtz et al., 1996). Stereoelectronic descriptors were calculated with MOPAC 93, augmented by a computing module that provides additional reactivity descriptors (Mekenyan et al., 1994b), using the AM1 all-valence electron, semi-empirical Hamiltonian. Electronegativity, dipole moment, energy of frontier orbitals, and the electronic gap were used as global electronic descriptors; whereas atomic charges, donor and acceptor superdelocalizability indices, and atomic self-polarizabilities were calculated as local electronic indices. In the present study, atomic reactivity indices were not restricted to specific rings in steroidal or nonsteroidal derivatives when searching common patterns based upon local descriptor distributions.
Conformer structures were also assessed based on steric descriptors, including the sum of geometric distances (Mekenyan et al., 1986), the greatest interatomic distance, steric distance between atoms, and planarity (the normalized sum of torsion angles in a molecule; Mekenyan et al., 1996b). These descriptors were selected because hydrophobicity, steric bulk, and size constraints also have been reported as important in predicting and interpreting ligand binding for nuclear receptors (e.g., Bradbury et al., 1998; Goldstein et al., 1993; Lewis et al., 1995; Waller et al., 1995, 1996a; Wurtz et al., 1996). Finally, volume polarizability, defined as a sum of atomic self-polarizabilities, and thus, the averaged ability of a compound to change electron density at its atoms during chemical interactions (Lewis, 1989; Schüürman, 1990), was also employed. The use of volume polarizability was based on previous observations suggesting that more polarizable conformers generally had greater ER binding affinities (Bradbury et al., 1996
).
The COREPA-C method.
A brief summary of the COREPA method is provided below. The conceptual basis and detailed mathematical formulations, algorithm, and illustrations of the method are reported elsewhere (Mekenyan et al., 1997, 1999
; Schmieder et al., 2000
).
To employ the COREPA method, an exhaustive conformer generation routine (described previously) is used to establish conformers of each chemical within a certain energy range of the lowest energy structure. In the present study, it was assumed that conformers of each chemical could be considered as a statistical ensemble, based on the Boltzman's statistics. As described previously, the electronic and steric attributes of the conformers then are assessed using a predetermined set of global and local molecular descriptors. All conformers of a given chemical are plotted across a molecular descriptor axis, thus forming a discrete distribution for the chemical relative to the selected descriptor. For the global molecular descriptors, each conformer is represented by single point value, whereas for atomic descriptors, several values for each conformer are allocated across the descriptor axis. These values are associated with various local sites (atoms) of the conformer. These descriptor point estimates are considered to be midpoints of a continuous (gamma) distribution that approximates the discrete distribution. Subsequently, a conformer distribution of a chemical is represented by a continuous function obtained by summing all gamma distributions corresponding to conformers of the specified chemical. Conformer distributions of a chemical are normalized by dividing the total distribution area by the number of conformers (i.e., normalizing the distribution area to unity for each chemical), thus providing a probabilistic characterization of the distributions.
The COREPA algorithm consists of 3 steps: First, two subsets of chemicals are selected as training sets (Step 1). The first subset consists of chemicals having activity (here, in terms of RBA) above a user-defined high activity threshold (HAT). The second subset includes chemicals having activity below a predetermined non-active threshold (NAT). Next, a set of descriptors associated with the biological activity of interest are established by evaluating the degree of overlap (in %) between the distributions associated with the HAT and NAT patterns (denoted as HAP [high activity pattern] and NAP [non-active pattern, i.e., the reactivity pattern based on the learning set of non-active chemicals], respectively) (Step 2). The descriptors are evaluated based on the normalized sum of dynamic similarity indices, S(AP/NAP), the portion of the non-active pattern exceeded by the maximum of the active pattern (3-D similarity measure between reactivity patterns of active between each pair of molecules in the training set (Mekenyan et al., 1999). The cutoffs, i.e., the part of the non-active area in common with the active pattern maximum (Cutoff(AP/NAP)), can also be used as a measure of similarity. We assume the stereoelectronic descriptors that provide the maximal measure of similarity among chemicals in the training sets, and have least overlap between HAP and NAP (i.e., most distinct HAP and NAP), to be related to biological activity. Finally, common reactivity patterns for biologically similar molecules (e.g., chemicals within a defined range of RBA values) are obtained as products of the probabilistic distributions for specific stereoelectronic descriptors associated with chemicals in training sets of the active and non-active chemicals (Step 3). Well defined, or distinct, patterns for descriptors are observed when the conformer distributions for the chemicals from the same training set are in phase.
The dissimilarity between overall patterns of active and non-active chemicals, as well as between overall patterns and chemical-specific distributions, can be evaluated by Euclidean distance based on the squared differences between distribution densities over the entire range of the descriptor variation. The Euclidean distance metric can also be used to compare distributions derived from different chemical training sets or for training sets derived from different weighting schemes (e.g., different Hf° thresholds for conformer selection). The Euclidean metric distance can be used to ascertain the extent to which an overall conformer distribution of active or non-active chemicals is influenced by a specific chemical(s). In this respect, the "stability" of a pattern can be assessed by a "leave-one-out" procedure. The metric is used to iteratively assess differences between patterns derived for n vs. n-1 chemicals in the training subsets. Variation of similarity indices, cutoffs between HAP and NAP, and descriptor ranges can also be quantified. More stable patterns are associated with smaller Euclidean distances, reduced variations in similarity indices, and equivalent smaller descriptor ranges.
The common reactivity patterns are described in terms of molecular descriptor ranges around the probability maxima of the distributions. The width of these ranges depends on values of , which is related to the half-width of the gamma function, and confidence limits chosen around the probability maxima. Based on our previous study (Mekenyan et al., 1999
), default
values of 0.1 to 0.125 of the variation range for global descriptors, and 0.01 or 0.05 of the variation for local indices were used. Ultimately, a common reactivity pattern is a collection of the specified ranges of each molecular descriptor determined to be associated with the biological activity of concern.
In the present study, atomic sites with differing levels of generality were used in establishing reactivity patterns. Wild-card heteroatoms are denoted by R, where R stands for all (or a specified subset) of the heteroatoms in the molecules. In subsequent analyses, R was assumed to represent the following 3 groups of atoms: O; O, N; and O, N, Cl, F.
Rule interpreter.
To facilitate screening of large chemical data sets, common reactivity patterns were coded into a decision tree (Schmieder et al., 2000). The decision tree consists of multiple hierarchically ordered rules that capture specific stereoelectronic descriptors that comprise common reactivity patterns. Each energetically reasonable conformer of a chemical is processed through the decision tree by making use of an interpreter that is based on an extended SMILES notation that permits the use of stereoelectronic structure-based rules. Boolean logic operators are used to establish "rules" in the decision tree. In the context of this study, a rule statement results in the assignment of the likelihood of binding to the hER
which is the probability of having a RBA value equal to or greater than a predetermined threshold. If the value of a parameter calculated for a conformer falls in a range of the molecular descriptor defined by a confidence limit with a probability of P%, around the pattern maximum, then it is assumed that the conformer meets the specific requirement with a probability (100-P)%. If a chemical has to meet 2 successive stereoelectronic requirements to equal or exceed an activity threshold (with probabilities nA and nB, respectively), the total probability of meeting both requirements (PA and B) is obtained as a product of the probabilities of meeting the two requirements separately, i.e., PA and B = nA.nB. The total probability (PA or B) of meeting either one of the requirements, in case of "or" binary logic, is obtained as PA or B = nA + nB nA.nB. If the value of the descriptor calculated for a conformer falls outside of at least one of the parameter ranges, then the overall probability of having an RBA above that threshold is 0. As seen, the approach offers flexibility in establishing hazard ranking protocols for unknown compounds based on choices of RBA thresholds and confidence limits around pattern maxima. It must be stressed that the probability outcomes from the decision tree should not be viewed in absolute terms. Rather, the output from the algorithm permits a relative ranking of unknown chemicals in terms of their likelihood to have an RBA above a user-defined threshold.
To simplify presentation of the rule interpreter, a binary version of the decision tree was employed in the present study. In the binary version, a value of 100% is assigned to a chemical if at least one conformer falls within the range of the molecular descriptor defined by a confidence limit with a probability of P% around the pattern maximum. This simplified version provides a discrimination of chemicals as being active or non-active. Thus, chemicals with similar hER affinity (within an RBA range) have at least one conformer that meets all the specified parameter ranges, whereas those from the other RBA ranges should have no conformers that meet all the multiparameter requirements simultaneously.
![]() |
RESULTS AND DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Although user-defined, the ability of different RBA ranges to discriminate active and non-active chemicals can be assessed by estimating similarity between the respective reactivity patterns defined in Steps 2 and 3. Lower similarity between the patterns associated with HAP and NAP indicates a low probability of incorrectly assigning a conformer to an RBA range.
Step 2. Identification of stereoelectronic descriptors associated with biologically similar compounds.
As discussed previously, Step 2 of the algorithm consists of evaluating normalized pairwise similarity (S(AP/NAP)) between chemicals in the different RBA ranges across the steric and electronic indices. The calculated indices are listed in Table 2. Average similarity (Syz(x)) among the chemicals within the training sets (within-group similarity) are also presented in Table 2
.
|
Among the local electronic descriptors, charges (Q(R)) and donor-delocalizabilities (SE(R)) of heteroatoms had lowest between-group similarities for a wide range of HAR/NAR comparisons. Consistency between RBA and pattern similarities was also observed for local electronic indices. Again, the lowest between-group similarities for Q(R) and SE(R) were obtained for the training sets derived from the largest activity threshold separations (e.g., S(HAP1/NAP2) = 3.2 and 4.2% for Q(R: O, N, Cl, F, S) and SE(R: O, N, Cl, F, S), respectively). Alternatively, the largest between-group similarity was obtained for training sets with greatest biological similarity (e.g., S(HAP4/NAP2) = 30.5 and 30.4% for Q(R: O, N, Cl, F, S) and SE(R: O, N, Cl, F, S), respectively). No significant differences were found in similarity assessments based on Q(R) and SE(R) for the different types of R.
Among the steric descriptors, distances between heteroatoms, d(R_R), were found to provide the lowest between-group similarity, which was observed for the training sets derived from the largest RBA separations (i.e., S(HAP1/NAP1) = 20.4% (data not shown) and S(HAP1/NAP2) = 31.8%). Moreover, between-group similarity assessments were found to depend on the types of heteroatoms included in d(R_R). Thus, the highest discrimination ability between HAPs and NAPs was for R: O, N, Cl, F, S, with the lowest discrimination when R was restricted to O only.
Based on these results, EHOMO was employed as the global descriptor in the COREPA analyses. In addition, the local electronic descriptors Q(R) or SE(R) where used, as well as the distances between heteroatoms, as a local steric descriptor. COREPA analysis of similarity within and between a variety of pairings of HAPs and NAPs, with respect to EHOMO, Q( R), SE(R), and d(R_R) suggested that the greatest discrimination between patterns was associated with largest RBA range separation, i.e., between HAR1 or HAR2, and NAR2 (Table 2).
A leave-one-out analysis based on the above parameters also indicated that patterns derived from training sets with the highest RBA ranges (HAR1 or HAR2) were more "stable" than those obtained for the lower RBA ranges of HAR3 and HAR4. For example, with HAR1, the 10% confidence limit for d(R_R) was 11.74 to 11.91 Å ( = 2.47). The mean distance range from the leave-one-out analysis was 11.75 (11.5911.91) Å to 11.91 (11.8112.10) Å. The 10% confidence limit d(R_R) with HAR4 was 10.46 to 10.72 Å, while the associated mean-distance range from the leave-one-out analysis was 10.13 (2.8610.70) to 10.40 (3.2710.90) Å. The stability of patterns also can be assessed in terms of Euclidean distance. The mean Euclidean-distance values for active patterns derived from leave-one-out analyses for HAR1, compared to the pattern for the intact training set, were 0.019 and 0.0027, respectively, for EHOMO (
= 0.55) and d(R_R) (
= 2.47). The same analysis, based on chemicals in HAR4, resulted in mean Euclidean distance values of 0.032 and 0.0074 for EHOMO and d(R_R), respectively. Thus, the COREPA analysis supports the biologically reasonable hypothesis that ligands with higher RBA values are associated with more specific receptor interactions. Consequently, these ligands are associated with more stable reactivity patterns and are more reliable in determining hER
ligand binding affinity for this data set.
Step 3. Recognition of the common reactivity pattern based on relevant molecular descriptors.
The variation of parameter ranges of the relevant molecular descriptors are given in Table 3, parts ad, for the 4 different HARs, respectively. A collection of these ranges forms the reactivity patterns that were used to define rules in the hER
ligand decision tree. Patterns based on the global electronic descriptors were derived for different RBA ranges at a constant
value of 1 eV or 3 eV. To increase the specificity of d(R_R)-based patterns, the latter were analyzed within smaller descriptor distances from 9 to 10 Å, 9 to 10.5 Å, 10.0 to 11.5 Å, 10.3 to 10.7 Å, or 11.5 to 13 Å. Following is a detailed discussion of the reactivity patterns associated with each molecular descriptor.
|
|
Reactivity Pattern Based on Donor Delocalizabilities and Charges
The comparison between activity patterns based on donor delocalizability, SE(R: O, N, Cl, F), for different HARs and NAR2 is illustrated in Figures 3ad, for
= 0.05 (a.u.)2/eV (also see Table 3
, parts ac). The discrimination ability of the SE patterns decreases with a decrease in RBA values (e.g., S(HAR1/NAR2) = 4.2%, whereas S(HAP4/NAP2) = 30.4%). An analysis of the HAPs indicate this pattern is predominantly due to R attached to a phenyl moiety. If a lower
(0.01 (a.u.)2/eV) is employed on a restricted SE(R) range, additional probability maxima were detected (Figs. 4
, Table 3d
). This analysis indicates that the probabilistic distribution maximum with the lowest SE(R) value (around 0.247 (a.u.)2/eV) is associated with an electron-withdrawing R attached to aromatic fragments. The maximum with an SE(R) value of about 0.26 (a.u.)2/eV is associated with an R attached to non-aromatic rings, while the highest SE(R) maximum of about 0.30 (a.u.)2/eV (identified in the SE(R) range of 0.3 to 0.4 (a.u.)2/eV; data not shown) is associated with halogen heteroatoms (ligands included in HAR4).
|
|
|
|
Derivation of hER Ligand Reactivity Patterns
Mekenyan et al. (1999) reported previously that distributions based on larger values, or larger confidence limits, will lead to wider ranges in EHOMO, d(R_R) and Q(O) (or SE(R)) maximum probability values. In general, larger descriptor ranges are associated with a greater likelihood that ligands will be incorrectly identified as having an RBA value within the specified HAR (i.e., an increasing rate of false positive identifications). Conversely, smaller descriptor ranges increase the rate of false negative identifications. It is important to note that due to the probabilistic nature of the algorithm, it is possible that the choice of an active pattern based on a small
or confidence limit can lead to incorrect classifications of conformers from an active ligand. For example, when using an EHOMO profile derived for the most potent hER
ligands in this data set (HAR1) in a single descriptor screen of 8.72 to 8.60 eV (
= 3.0 eV; 10% confidence limit; see Table 3a
), all of the conformers of ligands 1 and 3 were incorrectly identified as having an RBA < 100%. Consequently, larger confidence limits were required to establish patterns that would include conformers for all of the ligands included in HAR1. Thus, if the 50% confidence limit of 8.99 to 8.37 eV (at
= 3.0 eV) is used, all conformers of the 4 ligands in HAR1 were screened correctly, except for one conformer of ligand 3. The similarity between a ligand distribution for all conformers of one chemical and the EHOMO reactivity pattern for all chemicals in HAR1 (S(k/AP) ranged from 70.2% for ligand 1 to 76.2% for ligand 4. This single descriptor screen, however, does lead to false positive classifications (i.e., prediction of RBA > 150%) for less active chemicals: HAR 2 ligands 5, 712, 14, and 15; HAR 3 ligands 16, 18, and 21; HAR 4 ligands 27 and 28; and NAR1 ligands 33 and 37, respectively. For these compounds, the similarity in ligand distributions for EHOMO with the corresponding activity pattern ranged from 69.1% for ligand 11 to 78.8% for ligand 12.
Similarly, if a d(R_R) range of 11.98 to 12.01 Å ( = 0.5 Å; 10% CL; see Table 3
, part a) derived for HAR1 is used as a single descriptor screen, all of the conformers of ligands 13 were incorrectly identified as having an RBA < 100%. However, after using the 90% confidence limit of 11.77 to 12.22 Å (at
= 0.5 Å), conformers of chemicals 14 were correctly identified. The similarity between ligand distributions and this reactivity pattern ranged from 47.1% for ligand 4 to 68.0% for ligand 1. The larger descriptor screen, however, resulted in false positive classifications of ligands 7, 10, 19, 23, 45, and 46, respectively. The similarity between individual ligand distributions and the d(R_R)-based activity pattern ranged from 41.5% for ligand 10 to 67.0% for ligand 19.
The selection of specific and/or confidence limit values for deriving HAR1 screening rules was based on a strategy to first minimize the probability of false negative identifications, while secondarily minimizing the number of false positive identifications. Ultimately, a 3-descriptor screen based on EHOMO, d(R_R) and Q(R) (or SE(R)) was used to minimize the rate of false negative identifications, while maintaining a low rate of false positive identifications. For a ligand to be classified as active within a specified RBA range, at least one conformer was required to fall within all 3 of the descriptor ranges. For HAR1, a screening pattern with EHOMO of a 8.99 to 8.37 eV(
= 3.0 eV; 50% confidence limit, Table 3
, part a) combined with the least conservative d(R_R) range of 11.77 to 12.22 Å (confidence limit = 90%) and a Q(R) range of 0.272 to 0.233 a.u. (
= 0.05 a.u; 90% confidence limit, Table 3
, part a), imposed on both electronegative sites forming the d(R_R), correctly discriminated chemicals 14 from chemicals 546.
The above classification of an "unknown" ligand was based on whether or not one or more conformers for the compound fell within any of the EHOMO, d(R_R), and Q(R) activity patterns. Alternatively, values of Euclidean distance or S(k/AP) for comparisons of distributions of unknown compounds to an HAR could be used to screen ligands. For the EHOMO-based pattern, Euclidean distance and S(k/AP) values for ligands included in HAR1 varied from 0.18 to 0.21 and from 72.5 to 77.2%, respectively. For d(R_R), Euclidean distance and S(k/AP) values varied from 0.58 to 0.91 and 47.1 to 68.0%, respectively, whereas for Q(R), the Euclidean distance and S(k/AP) values ranged from 3.04 to 3.56 and from 45.9 to 54.2%. Based on S(k/AP), thresholds of 72.5%, 47.1%, and 45.9% for EHOMO, d(R_R), and Q(R), respectively, could be used in a 3-parameter screen. Employing this as a screen, chemicals 546 were also correctly classified as having RBA values < 150%. Using Euclidean distance or S(k/AP) to assess unknown ligands facilitates a more quantifiable approach for determining similarity in reactivity patterns. The use of these similarity metrics also permits the sensitivity of different thresholds to be readily evaluated in terms of potential rates of false negative and positive classifications of unknown ligands.
A 3-descriptor screen also was developed for HAR2 based on EHOMO, d(R_R), and Q(R). An EHOMO pattern of 9.44 to 8.32 eV ( = 3.0 eV; 90% confidence limit, Table 3
, part b) combined with d(R_R) ranges of 10.62 to 10.95 Å (
= 0.5 Å, confidence limit = 90%), and the requirement that at least one of these heteroatoms meets the least conservative Q(R) screen of 0.273 to 0.236 a.u. (
= 0.05 a.u; 95% confidence limit [not shown]), successfully discriminated nine out of 11 ligands from the training set (chemicals 515) as having RBA values between 10 and 100%. The threshold values for Euclidean distance and S(k/AP) associated with EHOMO, d(R_R), and Q(R) reactivity patterns in the HAR2, were: 0.34 and 58.0%, 1.51 and 22.7%, and 5.56 and 18.4%, respectively. The conformers of chemical 10 (nafoxidine) had higher EHOMO values (from 8.37 eV to 8.08 eV), whereas conformers of chemical 9 (17
-estradiol) had d(R_R) distances shorter than 10.62 Å (the lower boundary of the least restrictive distance screen in the range of 1011.5 Å). If the distance screens of 10.38 to 10.51 Å or 11.50 to 11.80 Å, obtained with a more precise continuous approximation (
= 0.1 Å, 90% CL) using windows of 10.3 to 10.7 Å and 11.5 and 13 Å, respectively, were additionally included in the reactivity pattern, then 17
-estradiol (9) was correctly identified as having an RBA between 10 and 100%.
Conformers of ligands 14 were not identified by this HAR2 reactivity pattern. For ligands in the lower activity ranges, the HAR reactivity pattern (even after inclusion of additional distance screens) identified 16 (2-hydroxy-estradiol), which lies on the boundary between training sets HAR2 and HAR3 (RBA = 7%) and 21 (estrone), which was also included in HAR2 as chemical 8, due to the discrepancy in reported RBA values. No ligands with RBA < 1% were identified as having an RBA value between 10 and 100%.
The COREPA associated with HAR3 was based on a EHOMO range of 9.87 to 8.32 eV ( = 3.0 eV; 90% confidence limit, Table 3
, part c), combined with distance screens of 9.38 to 9.93 Å or 10.56 to 11.28 Å, observed within windows of 9.0 to 10.0 Å and 10.0 to 11.5 Å, respectively (Table 3
, part c, for confidence limit = 90%). The EHOMO and d(R_R) screens were combined with a SE(R) pattern of 0.237 to 0.273 (a.u.)2/eV (95% confidence limit, data not shown), imposed on at least one of the electronegative sites. This reactivity pattern correctly identified 5 out of 7 chemicals from the training set. It should be noted that a d(R_R) of 9.75 to 10.44 Å in the range of 9.5 to 10.5 Å could also be included in one of the other distance ranges to attain 5 out of 7 correct predictions. Chemical 18 (tamoxifen) was not identified as active in this RBA range because d(R_R) for the 2 electronegative sites was not larger than 5 Å, while chemical 20 (3ß-androstanediol) had an extremely low global electron donor ability with a EHOMO range of 10.35 to 10.34 eV. The HAR3 pattern also identified 11 more active chemicals (48, 1015) and 5 less active chemicals (2325, 34, 37). As mentioned previously, as RBA values decreased, the specificity and stability of reactivity patterns also decreased, which is consistent with an increasing rate of false positive and negative assignments. Use of patterns based on more precise approximations of conformer distributions did not change the rate of false positive or negative identifications (data not shown).
The reactivity pattern associated with HAR4 had the lowest specificity. The EHOMO range of 9.95 to 8.73 eV( = 1.0 eV; 99% confidence limit, data not shown) was combined with SE(R) patterns of 0.239 to 0.269 (a.u.)2/eV or 0.248 to 0.279 (a.u.)2/eV, defined within the window of 0.2 to 0.3 (a.u.)2/eV (for
= 0.05 a.u. and confidence limit = 90%; Table 3
, part d). No distance screen was included in the HAR4 pattern, due to its limited ability to discriminate ligands. The combined EHOMO/SE(R) pattern correctly identified 5 out of 7 chemicals in the training set. Ligand 29 (kepone; having a single conformer) was not identified as active due to a EHOMO value of 10.92 eV and 26(o,p`-DDT) was not identified due to an SE(R) range of 0.320 to 0.330 (a.u.)2/eV. The specified pattern does, however, capture basic electronic requirements for eliciting hER
binding affinity, as it successfully identified almost all of the chemicals in HAR1HAR3. Of the compounds with measured RBAs between 0.1 and 0.01%, 4 of 8 chemicals were selected, while 2 of the 9 compounds between 0.01 and 0.00% were selected.
Decision Tree for Identification of hER Ligands
The stereoelectronic requirements of the reactivity pattern associated with each RBA range were organized in a hierarchical decision tree, whose output was an estimated probability that a conformer would bind to the hER within a given RBA range. The initial part of the tree consists of absolute screens, i.e., the necessary structural requirements for eliciting minimal ER binding affinity, i.e., RBA
0.1%. For example, enantiomers of steroids were required to have trans-trans (B/C trans and C/D trans) ring fusion as an absolute steriochemistry screen. Global nucleophilicity was also assumed an absolute electronic requirement, and an EHOMO of 9.95 eV was selected as the necessary nucleophilicity threshold. This value is equivalent to the left side boundary of the EHOMO range for the 99% confidence limit of the pattern associated with the least active training subset (HAR4). The presence of negatively charged (i.e., potential electron donors) atomic sites was also employed as a basic requirement for a ligand to have an RBA
0.1%. This requirement was specified as any hetero-atomic site (R = O, N, Cl, F, S, etc.) with a donor-delocalizability (i.e., atomic nucleophilicity) in the range of 0.239 to 0.279 (a.u.)2/eV. This range was based on a delocalizability screen derived from the 90% confidence distribution for the active ligands in HAR4 across
values of 0.05 and 0.01 (a.u.)2/eV. Alternatively, a charge requirement defined as 0.298 to 0.233 a.u. (based on the 90% confidence limit ranges in Table 3
) could be used as an absolute electronic site requirement. In our previous study with androgen ligands, an atomic charge requirement of 0.322 to 0.300 a.u. was determined (Mekenyan et al., 1997
, 1999
), which indicates that active ER ligands in this training set have significantly less negative atomic sites than androgen receptor ligands.
Conformers which had EHOMO values of less than 9.95 eV, electronegative sites not meeting the specified donor delocalizability, or steroids not conforming to stereochemical requirements of the natural enantiomer, were assigned a 0% probability to bind to hER with a RBA > 0.1% (Fig. 7
). Conformers that passed these absolute requirements were then compared to the EHOMO, interatomic distance and charge or delocalizability screens associated with HAR1. Using the simplified binary screening approach described in the Methods, the identification of a ligand with a binding affinity within a RBA range requires that at least one conformer meets all three specified parameter ranges. If a compound was not identified as having an RBA > 150%, it was then screened to determine if it had an RBA between 10 and 100% (HAR2) and so on (Fig. 7
). Thus, the decision reflected a sequential ordering of the reactivity patterns derived from HAR1, HAR2, HAR3, and HAR4.
|
The distance patterns require that at least one of the heteroatoms meet a previously specified charge or donor-delocalizability requirement. The distance and delocalizability/charge requirements of the pattern are described in terms of the least restrictive screens based on confidence limits 90% around the most probable values determined in the respective distributions.
Application of the decision tree to the data set used in this study is summarized in Table 4. In general, most predictions were within an order of magnitude of observed RBA values. Consistent with the conservative bias in selection of reactivity patterns, the majority of predictions that were not within an order of magnitude of the observed RBA values over-predicted binding potential.
|
For compounds with observed RBA values between 1 and 10%, the false negative predictions were for tamoxifen (18; RBA = 5.1%) predicted to have an RBA between 0.1 and 1% and 3ß-androstanediol (20; RBA = 3%) predicted to have RBA values less than 0.1%. False positive identifications for the RBA range between 1 and 10% included estrone-3-sulfate (23), norethynodrel (24), 4-androstenediol (25), dehydroepiandrosterone (34), and methoxychlor (37), with measured RBA values of 1, 0.7, 0.5, 0.04, and 0.012%, respectively.
Kepone (29; RBA = 0.2%) was the only false negative identification in the range of 0.1 to 1%, whereas 4 compounds were false positive identifications for binding affinity, in this range, having measured RBA values of 0.045 (33), 0.015 (36), 0.003 (43), and < 0.001% (44).
As depicted in Table 4, the accuracy of predictions was greatest for RBA values exceeding 10%. This observation is consistent with the specificity of the associated reactivity patterns and the high degree of biological similarity, in terms of RBA values, for the compounds in this portion of the data set. Thus, the exploratory prioritization scheme, based on the current knowledge base, appears to provide a reasonably robust means to identify hER
ligands whose binding affinities are at least 10% of E2. Within the current data set, the specificity of reactivity patterns decreases as RBA values fall below 10% of E2, which is consistent with expectations; i.e., as compounds become less similar to E2 in terms of binding affinity, one would expect a decreased basis for establishing chemical similarity to the natural ligand. Several chemicals with low binding affinity are of concern, however, due to their ubiquity. To accurately predict activity for these types of chemicals requires development of models based on a restricted congeneric series. The COREPA approach, in fact, has been used successfully to predict the estrogenicity of alkylphenolic chemicals, a group of relatively weak, but important, environmental estrogens (Schmieder et al., 2000
).
Prospectus
Development of quantitative chemical similarity and structure-activity models requires well defined biological and/or toxicological effect data from chemicals representative of the diversity of structures for which predictions are to be made. The data used in the current study are all based on RBA values of hER derived from 2 laboratories using similar experimental techniques (Bolger et al., 1998
; Kuiper et al., 1997
). As noted previously, only estrone RBA values differed significantly between the 2 studies. While these RBA values represent a broad range of structures, additional data from a more diverse set of structures would improve the basis for evaluating the reactivity patterns and the prioritization scheme for the ranges of RBA values modeled in the present investigation. Of particular concern is whether the reactivity patterns, and associated decision tree, have been over-specified to the training set. To the extent this may be the case, the results of this study need to be interpreted with caution in terms of immediate use in hazard identification.
In our companion paper (Mekenyan et al., 2000), the accuracy of the reactivity patterns for ER binding affinity are explored by comparing predicted values to observed RBAs using receptors obtained from rodents and MCF7 cells. Use of data from outside the hER
training set allows a more complete evaluation of the reactivity patterns and decision tree in terms of the its applicability to a more diverse chemical structure space. Inclusion of this additional data "enriches" the chemical structure space for defining chemical similarity in specified RBA ranges, while seemingly providing minimal cross-species variability that could confound the interpretation. The companion paper also discusses how a "mammalian" ER binding affinity chemical prioritization scheme, based on the COREPAC algorithm, could be applied to chemical data sets reflecting existing inventories in the United States and Europe.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
(gamma): Corresponds to the half-width of a gamma function
HAP: High activity pattern, i.e., the reactivity pattern based on the learning set of active chemicals
HAR: High activity range
HAT: High activity threshold, used to define the "learning" set of active chemicals
MaxPkA(x): Value of parameter x with the maximum probability of occurrence based on the distribution of the training set of active chemicals
MaxPkNA(x): Value of parameter x with the maximum probability of occurrence based on the distribution of the training set of non-active chemicals
NAP: Non-active pattern, i.e., the reactivity pattern based on the learning set of non-active chemicals
NAR: Non-active range
NAT: Non-active threshold, used to define the "learning" set of non-active chemicals
Pi(x): Gamma function (probabilistic) distribution of I-th conformer across the axis of molecular descriptor x (single or multiple descriptor values are associated with the conformer when x is a global or local molecular parameter, respectively)
Pk(x): Gamma function distribution for all conformers of chemical k across the axis of the molecular descriptor x
PkA(x): Probabilistic distribution for the training set of active chemicals across the axis of the molecular descriptor x
PkNA(x): Overall probabilistic distribution for the training set of non-active chemicals across the axis of the molecular descriptor x
S(AP/NAP): 3-D similarity between reactivity patterns of active and non-active chemicals with respect to the molecular descriptor x (overlap of distributions of chemicals within each training set)
S(k/AP): 3-D similarity between a chemical k and the active pattern with respect to the molecular descriptor x (overlap between conformer distribution of the chemical and the distribution of the chemicals from the active training set)
S(k/NAP): 3-D similarity between a chemical k and the non-active pattern with respect to the molecular descriptor x (overlap between conformer distribution of the chemical and the distribution of the chemicals from the non-active training set)
Syz(x): 3-D similarity between two chemicals, y and z, with respect to the molecular descriptor x (i.e., the overlap between conformer distributions of two chemicals across x)
QSAR Descriptors
d(O_O): Interatomic distances between oxygen atoms (Å)
d(R_R): Interatomic distances between all heteroatoms (Å)
Egap: Electronic gap (EHOMO-ELUMO) (eV)
EHOMO: Energy of highest occupied molecular orbital (eV)
ELUMO: Energy of lowest unoccupied molecular orbital (eV)
EN: Electronegativity
GW: Sum of geometric distances
µ: Dipole moment (D)
Max distance: The greatest interatomic distance
Planarity: The normalized sum of torsion angles in a molecule
Q(O): Charges of oxygen atoms (a.u.)
Q(R): Charges of all heteroatoms (a.u.)
RBA: Relative binding affinity to human ER (hER
) expressed as percent relative to 17ß-estradiol = 100%
RMS: Root mean square
SE(O): Donor delocalizabilities of oxygen atoms ((a.u.)2/eV)
SE(R): Donor delocalizabilities of all heteroatoms ((a.u.)2/eV)
SN(O): Acceptor delocalizabilities of oxygen atoms
SN(R): Acceptor delocalizabilities of all heteroatoms
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
NOTES |
---|
1 To whom correspondence should be addressed. Fax: (218) 529-5015. E-mail: bradbury.steven{at}epa.gov.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Anstead, G. M., Carlson, K. E., and Katzenellenbogen, J. A. (1997). The estradiol pharmacophore: Ligand structure-estrogen receptor binding affinity relationships and a model for the receptor binding site. Steroids 62, 268303.[ISI][Medline]
Anstead, G. M., Wilson, S. R., and Katzenellenbogen, J. A. (1989). 2-Arylindenes and 2-arilindenones: Molecular structures and considerations in the binding orientation of unsymmetrical non-steroidal ligands to the estrogen receptor. J. Med. Chem. 32, 21632171.[ISI][Medline]
Bolger, R., Wiese, T. E., Ervin, K., Nestich, S., and Checovich, W. (1998). Rapid screening of environmental chemicals for estrogen receptor binding capacity. Environ. Health Perspect. 106, 551557.[ISI][Medline]
Bradbury, S. P. (1994). Predicting modes of toxic action from chemical structure: An overview. SAR QSAR Environ. Res. 2, 89104.[Medline]
Bradbury, S. P., Mekenyan, O. G., and Ankley, G. T. (1996). Quantitative structure-activity relationships for polychlorinated hydroxybiphenyl estrogen receptor-binding affinity: An assessment of conformational flexibility. Environ. Chem. Toxicol. 15, 19451954.
Bradbury, S. P., Mekenyan, O. G. and Ankley, G. T. (1998). The role of ligand flexibility in predicting biological activity: Structure-activity relationships for aryl hydrocarbon, estrogen, and androgen receptor-binding affinity. Environ. Toxicol. Chem. 17, 1525.[ISI]
Eliel, E. L. (1993). Chemistry in three dimensions. In Chemical Structures (W. A. Warr, Ed.), Vol. 1, pp. 18. Springer, Berlin.
Goldstein, R. A., Katzenellenbogen, J. A., Luthey-Schulten, Z. A., Seielstad, D. A., and Wolynes, P. G. (1993). Three-dimensional model for the hormone binding domains of steroid receptors. Proc. Natl. Acad. Sci. U.S.A. 90, 99499953.[Abstract]
Ivanov, J. M., Karabunarliev, S. H., and Mekenyan, O. G. (1994). 3DGEN: A system for an exhaustive 3D molecular design. J. Chem. Inf. Comput. Sci. 34, 234243.[ISI]
Ivanov, J. M., Mekenyan, O. G., Bradbury, S. P., and Schüürmann, G. (1998). A kinetic analysis of the conformational flexibility of steroids. Quant. Struct.-Act. Relat. 17, 437449.
Kavlock, R. J., Daston, G. P., DeRosa, C., Fenner-Crisp, P., Earl Gray, L., Kaattari, S., Lucier, G., Luster, M., Mac, M. J., Maczka, C., Miller, R., Moore, J., Rolland, R., Scott, G., Sheehan, D. M., Sinks, T., and Tilson, H. A. (1996). Research needs for the risk assessment of health and environmental effects of endocrine disruptors: A report of the U.S. EPA-sponsored workshop. Environ. Health Perspect. 104, 715740.[ISI][Medline]
Kuiper, G. G. Carlsson, B., Grandien, K., Enmark, E., Haggblad, J., Nilsson, S., and Gustafsson, J.-K. (1997). Comparison of the ligand binding specificity and transcript tissue distribution of estrogen receptor and ß. Endocrinology 138, 853870.
Lewis, D. F. Parker, M. G., and King, R. J. (1995). Molecular modeling of the human estrogen receptor ligand interactions based on site-directed mutagenesis and amino acid sequence homology. J Steroid Biochem. Mol. Biol. 52, 5565.[ISI][Medline]
Mekenyan, O. G., Bradbury, S. P., Kamenska, V. B., Schmieder, P. K., Marafante, E., and Ankley, G. T. (2000). A computationally-based identification algorithm for estrogen receptor ligands: II. Evaluation of a hER binding affinity model. Toxicol. Sci. 58, 270281.
Mekenyan, O. G., Ivanov, J. M., Karabunarliev, S. H., Bradbury, S. P., Ankley, G. T., and Karcher, W. (1997). A computationally based hazard identification algorithm that incorporates ligand flexibility: I. Identification of potential androgen receptor ligands. Environ. Sci. Technol. 31, 37023711.[ISI]
Mekenyan, O. G., Ivanov, J. M., Veith, G. D., and Bradbury, S. P. (1994a). DYNAMIC QSAR: A new search for active conformations and significant stereoelectronic indices. Quant. Struct. Act. Relat. 13, 302307.[ISI]
Mekenyan, O. G., Karabunarliev, S. H., Ivanov, J. M., and Dimitrov, D. N. (1994b). A new development of the OASIS computer system. Comput. Chem. 18, 173187.[ISI]
Mekenyan, O. G., Nikolova, N., Karabunarliev, S. H., Bradbury, S. P., Ankley, G. T., and Hansen, B. (1999). New developments in a hazard-identification algorithm for hormone receptor ligands. Quant. Struct. Act. Relat. 18, 139153.[ISI]
Mekenyan, O. G., Peitchev, D., Bonchev, D., Trinajstic, N., and Bangov, I. (1986). Modeling the interaction of small molecules with biomacromolecules: I. Interaction of substituted pyridines with anti-3-azopyridine antibody. Arzneimittelforschung 36, 176183.[Medline]
Mekenyan, O. G., Schultz, T. W., Veith, G. D., and Kamenska, V. B. (1996a). "Dynamic" QSAR for semicarbazide-induced mortality in frog embryos. J. Appl. Toxicol. 16, 355363.[ISI][Medline]
Mekenyan, O. G., Veith, G. D., Call, D. J., and Ankley, G. T. (1996b). A QSAR evaluation of Ah-receptor binding of halogenated aromatic xenobiotics. Environ. Health Perspect. 104, 13021309.[ISI][Medline]
Schmieder, P. K., Aptula, A. O., Routledge, E. J., Sumpter, J. P., and Mekenyan, O. G. (2000). Estrogenicity of alkylphenolic compounds: A 3-D-structure activity evaluation of gene activation. Environ. Toxicol. Chem. 19, 17271740.[ISI]
Schüürmann, G. (1990). Quantitative structure-property relationships for the polarization, solvatochromic parameters and lipophilicity. Quant. Struct.-Act. Relat. 59, 326333.
Stewart, J. J. (1990). MOPAC: A semiempirical molecular orbital program. J. Comput.-Aided Mol. Des. 4, 1105.
Stewart, J. J. (1993). MOPAC 93. Fujitso Limited, 93, Nakese 1-Chome, Mihama-Ku, Chiba-City, Chiba 2al, Japan, and Stewart Computational Chemistry, Colorado Springs, CO.
Topliss, J. G., and Edwards, R. P. (1979). Chance factor in studies of quantitative structure-activity relationships. J. Med. Chem. 22, 12381244.[ISI][Medline]
VanderKuur, J. A., Wiese, T., and Brooks, S. C. (1993). Influence of estrogen structure on nuclear binding and progesterone receptor induction by the receptor complex. Biochemistry 32, 70027008.[ISI][Medline]
Waller, C. L., Minor, D. L., and McKinney, J. D. (1995). Using three-dimensional quantitative structure-activity relationships to examine estrogen receptor binding affinities of polychlorinated hydroxybiphenyls. Environ. Health Perspect. 103, 702707.[ISI][Medline]
Waller, C. L., Oprea, T. I., Chae, K., Park, H.-K., Korach, K. S., Laws, S. C., Wiese, T. E, Kelce, W. R, and Gray, L. E., Jr. (1996a). Ligand-based identification of environmental estrogens. Chem. Res. Toxicol. 9, 12401248.[ISI][Medline]
Waller, C. L., Juma, B. W., Gray, L. E., Jr., and Kelce, W. R. (1996b). Three-dimensional quantitative structure-activity relationships for androgen receptor ligands. Toxicol. Appl. Pharmacol. 137, 219227.[ISI][Medline]
Wiese, T., and Brooks, S. C. (1994). Molecular modeling of steroidal estrogens: Novel conformations and their role in biological activity. J. Steroid Biochem. Mol. Biol. 50, 6173.[ISI][Medline]
Wurtz, J. M., Bourguet, W., Renaud, J. P., Vivat, V., Chambon, P., Moras, D., and Gronemeyer. H. (1996). A canonical structure for the ligand-binding domain of nuclear receptors. Nat. Struct. Biol. 3, 206.[ISI][Medline]