A Computationally Based Identification Algorithm for Estrogen Receptor Ligands: Part 2. Evaluation of a hER{alpha} Binding Affinity Model

O. G. Mekenyan*, V. Kamenska*, P. K. Schmieder{dagger}, G. T. Ankley{dagger} and S. P. Bradbury{dagger},1

* Bourgas University "Prof. As. Zlatarov," Laboratory of Mathematical Chemistry, Department of Physical Chemistry, 118010 Bourgas, Bulgaria; and {dagger} U.S. Environmental Protection Agency, National Health and Environmental Effects Research Laboratory, Mid-Continent Ecology Division, 6201 Congdon Boulevard, Duluth, Minnesota 55804

Received April 17, 2000; accepted July 31, 2000


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
The objective of this study was to evaluate the capability of an expert system described in the previous paper (S. Bradbury et al., Toxicol. Sci. 58, 253–269) to identify the potential for chemicals to act as ligands of mammalian estrogen receptors (ERs). The basis of the expert system was a structure activity relationship (SAR) model, based on relative binding affinity (RBA) values for steroidal and nonsteroidal chemicals derived from human ER{alpha} (hER{alpha}) competitive binding assays. The expert system enables categorization of chemicals into (RBA ranges of < 0.1, 0.1 to 1, 1 to 10, 10 to 100, and >150% relative to 17ß-estradiol. In the current analysis, the algorithm was evaluated with respect to predicting RBAs of chemicals assayed with ERs from MCF7 cells, and mouse and rat uterine preparations. The best correspondence between predicted and observed RBA ranges was obtained with MCF7 cells. The agreement between predictions from the expert system and data from binding assays with mouse and rat ER(s) were less reliable, especially for chemicals with RBAs less than 10%. Prediction errors often were false positives, i.e., predictions of greater than observed RBA values. While discrepancies were likely due, in part, to species-specific variations in ER structure and ligand binding affinity, a systematic bias in structural characteristics of chemicals in the hER{alpha} training set, compared to the rodent evaluation data sets, also contributed to prediction errors. False-positive predictions were typically associated with ligands that had shielded electronegative sites. Ligands with these structural characteristics were not well represented in the training set used to derive the expert system. Inclusion of a shielding criterion into the original expert system significantly increased the accuracy of RBA predictions. With this additional structural requirement, 38 of 46 compounds with measured RBA values greater than 10% in hER{alpha}, MCF7, and rodent uterine preparations were correctly categorized. Of the remaining 129 compounds in the combined data sets, RBA values for 65 compounds were correctly predicted, with 47 of the incorrect predictions being false positives. Based upon this exploratory analysis, the modeling approach, combined with a high-quality training set of RBA values derived from a diverse set of chemical structures, could provide a credible tool for prioritizing chemicals with moderate to high ER binding affinity for subsequent in vitro or in vivo assessments.

Key Words: structure activity relationships; expert systems; mammalian estrogen receptors; binding affinity; estrogen receptor ligands.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
Structure activity relationships (SARs) for predicting ligand-hormone receptor binding affinity have been proposed as screening tools to help prioritize untested compounds for more intensive investigations to assess potential effects on steroid signaling pathways (Ankley et al., 1997Go; Bradbury et al., 1998Go). Mekenyan et al. (1997, 1999) recently described the COmmon REactivity PAttern (COREPA) algorithm, which was developed specifically for this purpose. The algorithm is a 3-dimensional (3-D) SAR technique that assesses conformational flexibility of ligands. It permits identification and quantification of specific global and local stereoelectronic characteristics associated with the biological activity of a chemical, without the need to specify a predetermined toxicophore or the alignment of conformers to a lead compound.

In the companion paper, Bradbury et al. (2000) described a prototypical expert system for predicting human estrogen receptor alpha (hER{alpha}) binding affinity based on the COREPA algorithm. In that study, they defined stereoelectronic requirements associated with binding affinity of 45 steroidal and nonsteroidal ligands to the receptor. Reactivity patterns for hER{alpha} relative binding affinity (RBA; 17ß-estradiol = 100%) were established, based on global nucleophilicity, interatomic distances between electronegative atoms, charge on heteroatoms, and electron donor capability of heteroatoms. These reactivity patterns were used to establish descriptor profiles, within the context of an expert system, to identify ligands with RBAs of >150%, 100 to 10%, 10 to 1%, and 1 to 0.1%. Using a "leave-one-out" evaluation, the reactivity patterns were determined to be stable and the resulting expert system correctly classified 30 of 45 compounds in this training set.

To more completely evaluate the prototypical expert system, hER{alpha} ligand binding affinity for compounds not used in the original training set is required. Unfortunately, such data are not available in the open literature. However, a variety of ER binding affinity data sets for other experimental human and rodent models are available. While the use of RBA values from different experimental systems and species adds uncertainty to the evaluation of endpoint- and species-specific SARs, if the variability between experimental systems is within the desired precision of the predictions (i.e., variability across species and experimental systems is less than the variability across chemicals), such data can provide insights on the reliability of a model. In the current study, the 3-D SAR-based expert system derived from the hER{alpha} data set was assessed against RBA values for ER binding affinity obtained from MCF7 cells, and mouse and rat uterine preparations.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
ER Ligands and Receptor Binding Affinity
Relative binding affinity values of steroidal and nonsteroidal compounds to hER from MCF7 cells and rodent uterine ERs were used to evaluate the model described by Bradbury et al. (2000), which was based on RBA values derived from the hER{alpha}. Specifically, we assessed 3 data sets of 36, 35, and 58 chemicals, respectively, which were evaluated as to their affinity to MCF7 cells (Bolger et al., 1998Go; Brooks et al., 1987Go; Palomino et al., 1994Go; VanderKuur et al., 1993Go; Table 1Go), mouse uterine ER (Bolger et al., 1998; Connor et al., 1997; mER; Korach et al., 1988; Table 2Go), and rat uterine ER (rER; Anstead et al., 1989; Bolger et al., 1998; Connor et al., 1997; Gabbard and Segaloff, 1983; Qian and Abul-Haji, 1990; Table 3Go). The RBA values of the ligands were calculated by dividing the concentration of test compound required to reduce the specific binding of radiolabeled 17ß-estradiol (E2) by 50% by the concentration of unlabeled E2 required to achieve the same reduction. Species-specific RBA values obtained across studies were typically within an order of magnitude. With the exception of mouse uterine RBA values for 4-nonylphenol, average values are listed in Tables 1–3GoGoGo. In the case of 4-nonylphenol, two RBA values (0.313 and 0.01%) were employed in the analyses (note "compounds" 9 and 30 in Table 2Go).


View this table:
[in this window]
[in a new window]
 
TABLE 1 Ligands, Observed Relative Binding Affinities (RBA) to hER from MCF7 Cells, Source of Data (Ref), Number of Conformers Generated (N), and Associated Ranges of Heat of Formation, and Root Mean Square (RMS) Differences
 

View this table:
[in this window]
[in a new window]
 
TABLE 2 Ligands, Observed Relative Binding Affinities (RBA) to Mouse Uterine ER (mER), Source of Data (Ref), Number of Conformers Generated (N), and Associated Ranges of Heat of Formation and Root Mean Square (RMS) Differences
 

View this table:
[in this window]
[in a new window]
 
TABLE 3 Ligands, Observed Relative Binding Affinities (RBA) to Rat Uterine ER (rER), Source of Data (Ref), Number of Conformers Generated (N), and Associated Ranges of Heat of Formation, and Ranges of Root Mean Square (RMS) Differences
 
The overlap of compounds between these data sets and those in the hER{alpha} knowledge base (Bradbury et al., 2000Go) is summarized in Table 4Go. The hER{alpha} and mouse uterine data sets were most similar, with 14 compounds in common, whereas the hER{alpha} and rat uterine data sets had 4 compounds in common. For RBA values >1%, agreement between hER{alpha} values and those derived from the other biological models were within an order of magnitude; however, at lower RBA values, differences sometimes exceeded an order of magnitude.


View this table:
[in this window]
[in a new window]
 
TABLE 4 Summary of RBA Values (from Tables 1–3GoGoGo) of Chemicals Tested for ER Binding Affinity in More than One Biological Model
 
ER Ligand Conformations and Molecular Descriptors
The 3-D structures of ligand conformers were generated based on the method of Ivanov et al. (1994), using torsion resolution, distance between nonbonded atoms and ring closure, and related parameters, as described in our companion study (see Bradbury et al., 2000 and abbreviations given therein). Conformer geometry optimization was obtained with MOPAC 93 (Stewart, 1990Go, 1993Go), using the AM1 Hamiltonian with the key words >PRECISE= and >NOMM=. For a given ligand, only conformers with a {Delta}Hf° within 20 kcal/mol of the {Delta}Hf° associated with the conformer with the absolute energy minimum were used (Tables 1–3GoGoGo). The conformers within this range of {Delta}Hf° were assumed to be energetically reasonable from a thermodynamic and kinetic perspective (Bradbury et al., 1998Go, 2000Go; Ivanov et al., 1998Go; Mekenyan et al., 1997Go, 1999Go). As in our previous study (Bradbury et al., 2000Go), it was assumed that conformers of each chemical could be considered as a statistical ensemble, based on the Boltzman's statistics. Also included in Tables 1–3GoGoGo are ligand-specific ranges of root mean square (RMS) differences between atoms of each conformer with the corresponding atoms in the lowest-energy conformer. As in our preceding study (Bradbury et al., 2000Go), conformers of a given chemical within the specified 20 kcal/mol range of {Delta}Hf° often exhibited significant variation in potentially relevant electronic descriptors (data not shown). This observation is consistent with previous studies highlighting the necessity of including all energetically reasonable conformers when defining common reactivity patterns (Bradbury et al., 1998Go, 2000Go; Mekenyan et al., 1997Go, 1999Go).

To generate common reactivity patterns, the same set of global and local molecular descriptors used in our previous study (Bradbury et al., 2000Go) were employed. These descriptors were associated with global nucleophilicity, heteroatom electronegativity and charge, and interatomic distances between heteroatoms.

Evaluation of the COREPA-Based hER{alpha} Ligand Reactivity Patterns
A summary of the COREPA method to assess hER{alpha} binding affinity was provided by Bradbury et al. (2000), while the conceptual basis and mathematical derivations for the method are reported elsewhere (Mekenyan et al., 1997Go, 1999Go). Using this technique, a decision tree was developed to predict RBA ranges for chemical binding to hER{alpha}. The decision tree, based on the energy of the highest occupied molecular orbital (EHOMO), interatomic distances between heteroatoms (d(R_R)), charge of a heteroatom (Q(R)), and donor delocalizabilities of heteroatoms (SE(R)) was optimized to first minimize the probability of false negative identifications (i.e., underpredicted RBA values), while secondarily minimizing the number of false positive identifications. In the current study the decision tree was modified, as summarized below, using additional screens described by Bradbury et al. (2000), to further minimize the probability of false negative identifications. These modifications were applied with the realization that an increase in false positive predictions likely would result.

A "prescreen" reactivity pattern was used to eliminate those compounds whose RBA values were likely not to exceed 0.1%. Thus, conformers which had EHOMO values of less than –9.95 eV, electronegative sites not meeting a SE(R) range of 0.239 to 0.277 (a.u.)2/eV, or steroids not conforming to stereochemical requirements of the natural enantiomer were assigned a 0% probability of having an RBA value >0.1%. A reactivity pattern, with EHOMO > –8.99 eV combined with a d(R_R) range of 11.77 to 12.22 Å between heteroatoms and a Q(R) range of –0.272 to –0.233 a.u. (imposed on both electronegative sites forming the d(R_R)), was employed to identify chemicals with an RBA value >150%. The reactivity pattern for the binding activity range 100 > RBA > 10% was based on an EHOMO pattern > –9.44 eV, combined with d(R_R) ranges of 10.62 to 10.95, 10.38 to 10.51, or 11.50 to 11.80 Å, and the requirement that at least one of the heteroatoms in the distance range meet the Q(R) screen of –0.273 to –0.236 a.u. For the activity range of 10 > RBA > 1% a pattern was derived based on a EHOMO > –9.87 eV, combined with distance screens of 9.38 to 9.93, 9.75 to 10.44, or 10.56 to 11.28 Å and a SE(R) pattern of 0.237 to 0.273 (a.u.)2/eV, imposed on at least one electronegative site. Finally, the reactivity pattern based on EHOMO > –9.93 eV and SE(R) of 0.239 to 0.269, 0.248 to 0.279, or 0.300 to 0.330 (a.u.)2/eV was associated with the low binding activity range of 1 >RBA > 0.1%. These reactivity patterns were organized in a hierarchical decision tree that sequentially assessed the energetically reasonable conformers of a ligand in decreasing order of RBA ranges. Once identified as meeting a reactivity pattern for a particular binding activity range, a compound was assigned to that RBA range and not evaluated using patterns associated with lower RBA ranges.

The ER ligands in Tables 1–3GoGoGo were used to evaluate the ability of this decision tree, and associated reactivity patterns, to predict RBA ranges for chemicals not included in the original hER{alpha} training set (see Bradbury et al., 2000). Each energetically reasonable conformer of a chemical was processed through the decision tree by making use of an interpreter based on the SMILES algorithm, which permits the use of stereoelectronic structure-based rules. The decision tree provided a binary discrimination (i.e., a "yes" or "no" determination) of chemicals being within a specified RBA range. Thus, a chemical would be predicted to have an hER{alpha} affinity within a specified RBA range if at least one of its conformers met the associated reactivity pattern.


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
MCF7 Cell Data Set (Table 5Go)
Nine of 12 chemicals with measured ER binding affinity within 10 > RBA > 100% were correctly categorized. Five out of these 9 compounds were not in the original hER{alpha} data set. Estratrien-3-ol (3; RBA = 107%) was incorrectly predicted to have 1 > RBA > 0.1%, while 2-hydroxyestratien-17ß-ol (7; RBA = 79%) and 4-nitroestratriene-3,17ß-diol (11; RBA = 52%) were incorrectly predicted to have 10 > RBA > 1%. Explanations for these false-negative identifications include lack of a second electronegative site (3) or slightly lower global electron donor ability (–9.57 < EHOMO < –9.51 eV) and shorter than required distances between electronegative atoms (9.1–9.9 Å) for compounds 7 and 11. 11ß-Hydroxy-estradiol (18; RBA = 1.7%), 11{alpha}-hydroxy-estradiol (23; RBA = 0.31%) and 11-keto-estradiol (26; RBA = 0.1%) were incorrectly predicted as having RBAs greater than 10% (i.e., false positive identifications).


View this table:
[in this window]
[in a new window]
 
TABLE 5 The Predicted RBA Ranges Using the Decision Tree Based on the hER{alpha} Reactivity Pattern, for Ligands with Measured Binding Affinity to hER from MCF7 Cells
 
For compounds with observed RBA values between 1 and 10%, estratrien-17ß-ol (14; RBA = 8%) was incorrectly predicted to have an RBA between 0.1 and 1%, due to the lack of a second electronegative site. False positive identifications using the reactivity pattern for the RBA range between 1 and 10% included 2-nitroestratriene-3,17ß-diol (20), 5-androstene-3ß, 17ß-diol (21), 4-aminoestratrien-17ß-ol (24), 2-nitroestratrien-3-ol,17-one (25), and 4-hydroxyestratien-17ß-ol (27), with measured RBA values of 1, 0.7, 0.17, 0.1, and 0.08%, respectively. One of the false-positives, 5-androstene-3ß, 17ß-diol (21), was also in the hER{alpha} training set; however, in that biological model, the observed RBA was 6%. Thus, the discrepancy for this compound may reflect variability across receptor systems and/or laboratories.

For compounds with observed RBA values between 0.1 and 1%, 5{alpha}-androstane-3ß, 17ß-diol (22; RBA = 0.5%) was the only false negative identification, due to a very low global nucleophilicity (–10.35 < EHOMO < –10.31 eV). Four compounds were false positive identifications for binding affinity in this range, including 29 (measured RBA < 0.05%), 31 (RBA < 0.05%), 35 (RBA = 0.021%), and 36 (RBA = 0.0003%). It should be noted that compounds 35 and 36 were evaluated using the hER{alpha}, with measured RBA values of 0.3 and 0.4%, respectively (Bradbury et al., 2000Go), hence discrepancies between observed and predicted values for these compounds are likely due to variability across receptor systems.

In summary, of the 36 compounds in the MCF7 cell data set, 19 RBA ranges were correctly predicted, with RBA ranges for 5 compounds underpredicted (false negative identifications) and 12 overpredicted (false positives). Based on several compounds in common between the MCF7 cell data set and the hER{alpha} training set, it appears that the false positive identifications may be due, in part, to inherently higher binding affinity in the hER{alpha} system. For the false negative identifications with compounds having observed RBA values between 10 and 100%, the most notable discrepancy was observed for compound 3 and suggests that in some cases one, rather than two, electronegative sites may be required for ER binding in MCF7 cells.

Mouse Uterine Data Set (Table 6Go)
The common reactivity patterns for RBA ranges greater than 150% and for 10 to 100%, did not result in any false positive identifications. Compounds with measured RBA values greater than 10%, which were also in the hER{alpha} training set, were correctly discriminated.


View this table:
[in this window]
[in a new window]
 
TABLE 6 The Predicted RBA Ranges Using the Decision Tree Based on the hER{alpha} Reactivity Pattern for Ligands with Measured Binding Affinity to Mouse Uterine Estrogen Receptors (mER)
 
For compounds with measured RBA values between 1 and 10%, 3 of 4 compounds were correctly categorized. Tamoxifen (4; RBA = 6%) was incorrectly predicted to have an RBA between 0.1 and 1%. This is consistent with the incorrect identification also observed for tamoxifen with the hER{alpha} data set (Bradbury et al., 2000Go); this is due to the fact that the maximum interatomic distance between electronegative atoms for tamoxifen is much less than that specified in the common reactivity pattern. Of the 3 compounds whose RBA values were correctly predicted, two (6, 4-hydroxy-2',4',6'-trichlorobiphenyl, RBA = 2.4%; and 7, 4,4'-dihydroxy-2'-chlorobiphenyl, RBA = 1.1%) were not in the original hER{alpha} training set.

In terms of false positive classifications, the reactivity pattern for 10 > RBA > 1% incorrectly identified chemical 8 (4-hydroxy-2',3',4',5'-tetrachlorobiphenyl; RBA = 1.0%) and several compounds with measured RBA values in the range of 0.01 to 0.1% (compounds 14–17, 19, 22, 24, 25, and 27) and measured RBA values less than 0.01% (compounds 31, 34, and 35). An analysis of this set of structures suggests that RBA values can be significantly reduced if the electronegative site (i.e., a charge greater than –0.3 a.u.) is shielded. With an additional rule added to the expert system in which ligands with atoms or fragments, either directly bonded to an electronegative heteroatom or in an ortho position to the heteroatom, were considered incapable of binding with an RBA >0.1%, the number of false positive identifications decreased from 13 to 5. The remaining false positive identifications were compounds 17, 22, 24, 27, and 34. Compound 8, with a measured RBA of 1.0%, was a false negative with a predicted RBA of <0.1%. This steric requirement was also observed in a recent application of the COREPA approach to the data set analyzed by Waller et al. (1996), which consisted of 9 steroidal and 49 nonsteroidal ligands (unpublished data).

The reactivity pattern for the range of 0.1 to 1% correctly predicted 5 of the 6 compounds with observed RBA values within this range, with 2 of the chemicals (10, 12) not in the original hER{alpha} training set. The pattern also identified 8 false positive ligands whose measured RBA values were 1 > RBA > 0.1% (compounds 18, 20, 23, 26, 29, 30, 32, and 33). In this case, use of the shielding rule did not reduce the number of false positives.

In summary, for the 35 compounds in the mER data set, 13 compounds were predicted to bind within the correct RBA ranges. Of the 22 incorrect classifications, one was a false negative and 21 were false positive predictions. Inclusion of the shielding rule in the expert system resulted in 20 correct classifications, with the number of false positive predictions reduced from 21 to 13 compounds, and with one additional false negative prediction. For the 7 compounds with measured RBA values between 150 and 1%, the one false negative prediction was associated with tamoxifen. With the inclusion of the shielding rule, the 13 false positive predictions included 9 biphenyl compounds, with measured RBA values typically between 0.1 and 0.01%. These low-affinity biphenyl compounds were not represented in the hER{alpha} training set, which may have led to a bias in the reactivity patterns for RBA values between 10 and 1% and 1 and 0.1%. Other remaining false positives were 30 (4-nonylphenol; RBA = 0.01), which was measured to have a mER binding affinity of 0.313 (9), similar to that measured for hER{alpha} (0.3), and 26 (bisphenol A), 32 (BBP), and 33 (p,p`-DDT), all with lower observed affinity to mouse receptors than previously measured for hER{alpha} (Table 4Go).

Rat Uterine Data Set (Table 7Go)
An evaluation of the RBA > 150% and 100 > RBA > 10% screening patterns against the rat RBA data set resulted in 5 false negative and 5 false positive predictions. The reactivity pattern derived for RBA > 150% correctly identified 1 (diethylstilbestrol; RBA = 470%), with no false positive identifications. The rER data set contained 3 compounds with observed RBA values between 100 and 150%, a range not available in the hER{alpha} training set. Although technically classified as false negatives, the hER{alpha}-based pattern for 100 > RBA > 10% identified 3 (D14 estradiol-17ß; RBA = 107%) and 4 (7{alpha}-methyl estradiol-17ß; RBA = 104%). The third chemical in this range (2; 11ß-methyl estradiol-17ß; RBA = 124%) was identified as having an RBA between 10 and 1%. For the 12 chemicals with observed RBA values between 10 and 100%, the reactivity pattern for 100 > RBA > 10% resulted in false negative predictions for 10 (6-hydroxy-2,3-diphenylindenone-1; RBA = 59%) and 16 (1-methyl-6-hydroxy; 2,3-diphenylindene; RBA = 12%), which were predicted to have RBAs between 0.1 and 1%. These incorrect predictions were due to short distances between electronegative sites and the lack of a second electronegative site, respectively. False positive identifications obtained with the 100 > RBA > 10% pattern included compounds 18 (RBA = 9%), 21 (RBA = 6%), 24 (RBA = 5%), 27 (RBA = 2.6%), and 30 (RBA = 1.6%). Using the additional filter for nonshielded electronegative sites (see Mouse Uterine Data Set), compounds 27 and 30 would not be predicted to have RBA values between 10 and 100%, but would be predicted to have RBA < 0.1, and therefore be identified as false negatives.


View this table:
[in this window]
[in a new window]
 
TABLE 7 The RBA Ranges Predicted Using the Decision Tree Based on the hER{alpha} Reactivity Pattern for Ligands with Measured Affinity to Rat Uterine Estrogen Receptors (rER)
 
Application of the reactivity pattern for 10 > RBA > 1% to compounds with observed RBA values between 1 and 10% resulted in false negative predictions for 17 (1,3-diethyl-4-hydroxy-2-phenylindene; RBA = 9.3%), 19 (3-phenyl-6-hydroxy-2-phenylindene; RBA = 8.9%), 20 (tamoxifen; RBA = 6%), 25 (3-ethyl-4'-hydroxy-2-phenylindenone-1; RBA= 4.6%), 28 (3-ethyl-4'-hydroxy-2-phenylindene; RBA = 2.3%), 29 (1,3-diethyl-6-hydroxy-2-phenylindene; RBA = 2.2%), and 32 (3-ethyl-6-hydroxy-2-phenylindenone-1; RBA = 1.2%). For all of these compounds, the predicted RBA ranges were between 0.1 and 1%. Reasons for these incorrect classifications included the lack of a second electronegative site (17, 19, 25, 28, 29, and 32) or a small distance between electronegative sites (20; 4 Å). As noted in Table 7Go, the 10 > RBA > 1% screen resulted in 17 false-positive identifications, including 2 chemicals in the RBA range of 0.1 to 1% (33 and 38), 4 chemicals in the RBA range of 0.01 to 0.1% (39, 40, 42, and 43), and 11 chemicals with RBA values of less than 0.01% (47, 49–58). When the shielding screen for electronegative heteroatoms was employed, the number of false positive identifications decreased to 5 chemicals (33, 39, 47, 49, and 58), with one additional false negative, 38 (2,2',3',4',6'-penta CB-4-ol; RBA = 0.12%), with a predicted RBA < 0.1.

Using the screening rule for RBA values between 1 and 0.1%, false positive identifications were noted for 41 (RBA = 0.068%), 45 (RBA = 0.01%), and 48 (0.0005%). Of these, 45 and 48 had greater hER{alpha} measured affinities (Table 4Go), again suggesting interspecies or interlaboratory differences as a basis for the discrepancy.

In summary, of the 58 compounds in the rat uterine data set, RBA ranges were correctly predicted for 21 ligands, with 12 false negative and 25 false positive classifications. Thirty-four compounds were, however, correctly classified with inclusion of the shielding rule, which decreased the number of false-positive identifications to 11 ligands, while increasing the number of false negatives to 15. For the 16 compounds with measured RBA values greater than 10%, there were 5 false negative identifications. For the 16 compounds with measured RBA values between 10 and 1%, 3 false positive identifications and two additional false negatives were noted (after inclusion of the shielding rule). Of the 7 ligands in this range incorrectly predicted to have an RBA value between 1 and 0.1%, the 4 most notable false negative predictions were associated with compounds whose measured RBA values ranged from 9.3 to 4.6%. The remaining 3 ligands had measured RBA values that ranged from 2.3 to 1.2%. Finally, for the 26 ligands with measured RBA values less than 1%, there was one false negative prediction generated upon inclusion of the shielding rule, and 8 false positive predictions.

Summary and Conclusions
Development and evaluation of similarity relationships to predict a specific type of biological activity based upon chemical structure requires the establishment of a knowledge base that contains training and evaluation sets of chemicals whose modes of action and potency are well defined. Minimizing the introduction of biological variability in the model development and evaluation process is critical to assessing the performance of SARs. It is also important that the training set of chemicals represents a range of structures, and associated properties, representative of the "chemical universe" of interest. Structure similarity relationships based on well-defined endpoints, and developed across a diverse set of chemicals, provide transparent models whose uncertainties can be better defined. It is also essential to define the required precision of a model to determine the data quality in the training and evaluation knowledge bases. In the present study, the model predicts RBA values within a factor of 10 ranges across 6 orders of magnitude. Consequently, variability of less than 10-fold in the training and evaluation data sets is not problematic. Of course, other applications of these training and evaluation data sets may require greater levels of precision and accuracy.

To develop a model to predict potential ER binding affinity from chemical structure, Bradbury et al. (2000) used a data set of 45 compounds that had been assayed with the hER{alpha}. Although a "leave-one-out" statistical approach was used to evaluate common reactivity patterns for 4 RBA ranges between 150 to 0.1%, it was not possible to independently assess the model with compounds not used in the training set. To more completely evaluate the model, an optimum approach would be to compare hER{alpha} RBA predictions to measured hER{alpha} values for chemicals not used in the training set. Unfortunately, such a data set did not exist in the open literature. While risking the addition of interspecies/test system variability in the evaluation process, the current study employed a data set of 99 structures that were assessed in MCF7 cells (hER) and rodent models (mER and rER) as a means to determine ability of the model to assess the activity of compounds not used in the original training set. For those 17 compounds that overlapped with the original hER{alpha} data set, measured RBA values were within an order of magnitude.

The largest percentage of correct predictions was obtained for the MCF7 data set, which was most likely due to similarity in chemical structures between the two data sets, as well as the fact that the two systems are based on a similar, perhaps the same, receptor. The increase in incorrect predictions with the mouse and rat data sets was largely due to false positive identifications. The rate of false positive identifications was markedly reduced when an additional rule was included that required at least one electronegative atom to be unshielded. This occurred at the expense of a slight increase in the number of false negatives. The observation that the electronegative atom must be unshielded could reflect differences in the hER{alpha} and rodent receptors, or reflect a bias in the original training set where the occurrence of shielded electronegative atoms was rare. An analysis of the data set reported by Waller et al. (1996), where binding affinity was expressed in terms of pKi rather than RBA values, also indicated the need for an unshielded electronegative atom, which suggests the original hER{alpha} training set may not have had sufficient diversity in chemical structure. Assuming the false positive error rates for the rodent data sets are not primarily due to interspecies differences, but instead due to a bias in the original training set, a modification to the expert system rules that requires at least one electronegative atom be unshielded appears warranted.

Figure 1AGo summarizes the results of using the hER{alpha}-based reactivity patterns to predict RBA ranges from the combined hER{alpha}, MCF7 cell and rodent data sets using the expert system. Figure 1BGo represents predictions obtained with inclusion of the shielding rule. Eight false negatives were identified among the 46 ligands (31 compounds in the MCF7 cell, mER and rER data sets) whose measured RBA values were greater than 10%. No false positive identifications were observed for ligands with measured RBA values >10%. Thus, the interspecies comparison, undertaken assuming similarity between ER binding domains in the human and rodent assays, provides a reasonably robust screening result for ER ligands whose binding affinities are at least 10% of E2, independent of the mammalian receptor system. Thirteen of the remaining 129 compounds with RBA values < 10% were incorrectly classified as false negatives, without the shielding rule (Fig. 1AGo), and 17 with the shielding rule (Fig. 1BGo). The number of false positives obtained for chemicals with RBA < 10% was 69 with the original expert system and 47 upon incorporation of the shielding rule. The increased occurrence of false positive identifications for chemicals with lower binding affinities is consistent with a lower level of biological similarity to E2 and therefore a lower level of chemical similarity and specificity of reactivity patterns (Bradbury et al., 2000Go).



View larger version (27K):
[in this window]
[in a new window]
 
FIG. 1. Relationship of observed hER{alpha}, MCF7 cell, mouse uterine, and rat uterine RBAs to RBAs predicted from an expert system reported by Bradbury et al. (2000); (A) original model; (B) model with the additional requirement that at least one electronegative atoms, with an atomic charge of -0.3 or greater, must be unshielded.

 
As reported in a recent summary from a workshop sponsored by the U.S. Environmental Protection Agency (EPA), ranking and prioritization schemes for screening industrial chemicals for "endocrine-disrupting potential" relies exclusively on existing chemical-specific human health and wildlife exposure and effects data. Consequently, when prioritizing chemicals for testing there is a tendency to focus upon those compounds for which data exist. One approach to help obviate this bias is to utilize 3-D SARs to expand the knowledge base for prioritization (Meridian Institute, 1999Go). In a related workshop co-sponsored by the Society of Toxicology and Environmental Chemistry—Europe, the Organization for Economic Cooperation, and Development, and the European Commission, SARs for receptor binding and gene expression were also endorsed as promising approaches to enhance prioritization efforts (Ankley et al., 1997Go). The exploratory COREPA 3-D SAR technique described here was developed, in part, to facilitate this need for rapid and mechanistically credible evaluations of large chemical data sets, including the Toxic Substances Control Act (TSCA) chemical inventory, which contains more than 75,000 chemicals. Specifically, the development of COREPA-based expert systems and complementary training data sets to predict ligand-receptor binding affinity are intended to support ranking paradigms for the EPA Endocrine Disruptor Priority Setting Database (Meridian Institute, 1999), and similar international programs, where prioritization decisions for testing thousands of chemicals in commerce for endocrine disruption are needed.


    ACKNOWLEDGMENTS
 
This research was supported, in part, by a cooperative agreement between Bourgas University "Prof. As. Zlatarov" and U.S. EPA (CR822306-01-0); and by an agreement between the European Union and Bourgas University "Prof. As. Zlatarov" (EU Project IC20-CT98-0114, EDAEP). The authors thank Dr. Stoyan Karabunarliev for fruitful discussions. Comments on an earlier draft of the manuscript were provided by Chris Russom and Mike Hornung. Diane Spehar and Roger LePage assisted in manuscript preparation.


    NOTES
 
This article has been reviewed according to EPA guidelines. Mention of modeling or modeling approaches does not indicate endorsement by the EPA.

1 To whom correspondence should be addressed. Fax: (218) 529-5015. E-mail: bradbury.steven{at}epa.gov. Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
Ankley, G., Bradbury, S. P., Hermens, J., Mekenyan, O. G., Tollefsen, K.-E. (1997). Current approaches to the use of structure activity relationships (SARs) in identifying the hazards of endocrine-disrupting chemicals to wildlife. Proceedings of the EMWAT Workshop, April 1997, pp. 19–40. SETAC-Europe, The Netherlands.

Anstead, G. M., Wilson, S. R., and Katzenellenbogen, J. A. (1989). 2-Arylindenes and 2-arilindenones: Molecular structures and considerations in the binding orientation of unsymmetrical non-steroidal ligands to the estrogen receptor. J. Med. Chem. 32, 2163–2171.[ISI][Medline]

Bolger, R., Wiese, T. E., Ervin, K., Nestich, S., and Checovich, W. (1998). Rapid screening of environmental chemicals for estrogen receptor. Environ. Health Perspect. 106, 551–557.[ISI][Medline]

Bradbury, S. P., Mekenyan, O. G., Ankley, G. T. (1998). The role of ligand flexibility in predicting biological activity: Structure-activity relationships for aryl hydrocarbon, estrogen, and androgen receptor-binding affinity. Environ. Toxicol. Chem. 17, 15–25.[ISI]

Bradbury, S. P., Kamenska, V., Schmieder, P. K., Ankley, G. T., and Mekenyan, O. G. (2000). A computationally based identification algorithm for estrogen receptor-ligands. Part I. Predicting hER{alpha} Binding Affinity. Toxicol. Sci. 58, 253–269.[Abstract/Free Full Text]

Brooks, S. C., Wappler, N. L., Corombos, J. D., Doherty, L. M. (1987). Estrogen structure-receptor function relationships. In Recent Advances in Steroid Hormone Action (V. K. Moudgil, Ed.), pp. 443–466. Walter de Gruyter, Berlin.

Connor, K., Ramamoorthy, K., Moore, M., Mustain, M., Chen, I., Safe, S., Zacharewski, T., Gillesby, B., Joyeux, A., and Balaguer, P. (1997). Hydroxylated polychlorinated biphenyls as estrogens and antiestrogens: Structure-activity relationships. Toxicol. Appl. Pharmacol. 145, 111–123.[ISI][Medline]

Gabbard, R. B., and Segaloff, A. (1983). Structure-activity relationships of estrogens. Effects of 14-dehydrogenation and axial methyl groups at C-7, C-9, C-11. Steroids41 , 791–805.[ISI][Medline]

Ivanov, J. M., Karabunarliev, S. H., and Mekenyan, O. G. (1994). 3DGEN: A system for an exhaustive 3D molecular design. J. Chem. Inf. Comput. Sci.34 , 234-243.[ISI]

Ivanov, J. M., Mekenyan, O. G., Bradbury, S. P, and Schuurmann, G. (1998). A kinetic analysis of the conformational flexibility of steroids. Quant. Struc. Act. Relat. 17, 437–449.

Korach, K. S., Sarver, P., Chae, K., McLachlan, J. A., and McKinney, J. D. (1988). Estrogen receptor-binding activity of polychlorinated hydroxybiphenyls: Conformationally restricted structural probes. Mol. Pharmacol. 33, 120–126.[Abstract]

Kuiper, G. G. J. M., Carlsson, B., Grandien, K., Enmark, E., Haggblad, J., Nilsson, S., and Gustafsson, J.-K. (1997). Comparison of the ligand binding specificity and transcript tissue distribution of estrogen receptor a and b. Endocrinology 138, 853–870.

Mekenyan, O. G., Ivanov, J. M., Karabunarliev, S. H., Bradbury, S. P., Ankley, G. T., and Karcher, W. (1997). A computationally-based hazard identification algorithm that incorporates ligand flexibility. 1. Identification of potential androgen receptor ligands. Environ. Sci. Technol. 31, 3702–3711.[ISI]

Mekenyan, O. G., Nikolova, N., Karabunarliev, S. H., Bradbury, S. P., Ankley, G. T., Hansen, B. (1999). New developments in a hazard identification algorithm for hormone receptor ligands. Quant. Struct. Act. Relat. 18, 139–153.[ISI]

Meridian Institute. (1999). EPA priority-setting workshop for the endocrine disruptor screening program. January 20–21, 1999. A meeting summary. Dillon, CO.

Palomino, E., Heeg, M. J., Horwitz, J. P., Polin, L., Brooks, S. C. (1994). Skeletal conformations and receptor binding in some 9,11-modified estradiols. J. Steroid Biochem. Mol. Biol. 50, 75–85.[ISI][Medline]

Qian, X., and Abul-Hajj, Y. J. (1990). Synthesis and biological activities of 11ß-substituted estradiol as potential antiestrogens. Steroids 55, 238–241[ISI][Medline]

Stewart, J. J. (1990). MOPAC: A semiempirical molecular orbital program. J. Comput.Aid. Mol. Des. 4, 1–105.

Stewart, J. J. (1993). MOPAC 93. Fujitsu Limited, Chiba-city, Chiba 261, Japan, and Stewart Computational Chemistry. Colorado Springs, CO.

VanderKuur, J. A., Wiese, T., and Brooks, S. C. (1993). Influence of estrogen structure on nuclear binding and progesterone receptor induction by the receptor complex. Biochemistry 32, 7002–7008.[ISI][Medline]

Waller, C. L., Oprea, T. I., Chae, K., Park, H.-K., Korach, K. S., Laws, S. C., Wiese, T. E., Kelce, W. R., and Gray, L. E., Jr. (1996). Ligand-based identification of environmental estrogens. Chem. Res. Toxicol. 9, 1240–1248.[ISI][Medline]