* Bourgas University "Prof. As. Zlatarov," Laboratory of Mathematical Chemistry, Department of Physical Chemistry, 118010 Bourgas, Bulgaria; and
U.S. Environmental Protection Agency, National Health and Environmental Effects Research Laboratory, Mid-Continent Ecology Division, 6201 Congdon Boulevard, Duluth, Minnesota 55804
Received April 17, 2000; accepted July 31, 2000
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: structure activity relationships; expert systems; mammalian estrogen receptors; binding affinity; estrogen receptor ligands.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the companion paper, Bradbury et al. (2000) described a prototypical expert system for predicting human estrogen receptor alpha (hER) binding affinity based on the COREPA algorithm. In that study, they defined stereoelectronic requirements associated with binding affinity of 45 steroidal and nonsteroidal ligands to the receptor. Reactivity patterns for hER
relative binding affinity (RBA; 17ß-estradiol = 100%) were established, based on global nucleophilicity, interatomic distances between electronegative atoms, charge on heteroatoms, and electron donor capability of heteroatoms. These reactivity patterns were used to establish descriptor profiles, within the context of an expert system, to identify ligands with RBAs of >150%, 100 to 10%, 10 to 1%, and 1 to 0.1%. Using a "leave-one-out" evaluation, the reactivity patterns were determined to be stable and the resulting expert system correctly classified 30 of 45 compounds in this training set.
To more completely evaluate the prototypical expert system, hER ligand binding affinity for compounds not used in the original training set is required. Unfortunately, such data are not available in the open literature. However, a variety of ER binding affinity data sets for other experimental human and rodent models are available. While the use of RBA values from different experimental systems and species adds uncertainty to the evaluation of endpoint- and species-specific SARs, if the variability between experimental systems is within the desired precision of the predictions (i.e., variability across species and experimental systems is less than the variability across chemicals), such data can provide insights on the reliability of a model. In the current study, the 3-D SAR-based expert system derived from the hER
data set was assessed against RBA values for ER binding affinity obtained from MCF7 cells, and mouse and rat uterine preparations.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
To generate common reactivity patterns, the same set of global and local molecular descriptors used in our previous study (Bradbury et al., 2000) were employed. These descriptors were associated with global nucleophilicity, heteroatom electronegativity and charge, and interatomic distances between heteroatoms.
Evaluation of the COREPA-Based hER Ligand Reactivity Patterns
A summary of the COREPA method to assess hER binding affinity was provided by Bradbury et al. (2000), while the conceptual basis and mathematical derivations for the method are reported elsewhere (Mekenyan et al., 1997
, 1999
). Using this technique, a decision tree was developed to predict RBA ranges for chemical binding to hER
. The decision tree, based on the energy of the highest occupied molecular orbital (EHOMO), interatomic distances between heteroatoms (d(R_R)), charge of a heteroatom (Q(R)), and donor delocalizabilities of heteroatoms (SE(R)) was optimized to first minimize the probability of false negative identifications (i.e., underpredicted RBA values), while secondarily minimizing the number of false positive identifications. In the current study the decision tree was modified, as summarized below, using additional screens described by Bradbury et al. (2000), to further minimize the probability of false negative identifications. These modifications were applied with the realization that an increase in false positive predictions likely would result.
A "prescreen" reactivity pattern was used to eliminate those compounds whose RBA values were likely not to exceed 0.1%. Thus, conformers which had EHOMO values of less than 9.95 eV, electronegative sites not meeting a SE(R) range of 0.239 to 0.277 (a.u.)2/eV, or steroids not conforming to stereochemical requirements of the natural enantiomer were assigned a 0% probability of having an RBA value >0.1%. A reactivity pattern, with EHOMO > 8.99 eV combined with a d(R_R) range of 11.77 to 12.22 Å between heteroatoms and a Q(R) range of 0.272 to 0.233 a.u. (imposed on both electronegative sites forming the d(R_R)), was employed to identify chemicals with an RBA value >150%. The reactivity pattern for the binding activity range 100 > RBA > 10% was based on an EHOMO pattern > 9.44 eV, combined with d(R_R) ranges of 10.62 to 10.95, 10.38 to 10.51, or 11.50 to 11.80 Å, and the requirement that at least one of the heteroatoms in the distance range meet the Q(R) screen of 0.273 to 0.236 a.u. For the activity range of 10 > RBA > 1% a pattern was derived based on a EHOMO > 9.87 eV, combined with distance screens of 9.38 to 9.93, 9.75 to 10.44, or 10.56 to 11.28 Å and a SE(R) pattern of 0.237 to 0.273 (a.u.)2/eV, imposed on at least one electronegative site. Finally, the reactivity pattern based on EHOMO > 9.93 eV and SE(R) of 0.239 to 0.269, 0.248 to 0.279, or 0.300 to 0.330 (a.u.)2/eV was associated with the low binding activity range of 1 >RBA > 0.1%. These reactivity patterns were organized in a hierarchical decision tree that sequentially assessed the energetically reasonable conformers of a ligand in decreasing order of RBA ranges. Once identified as meeting a reactivity pattern for a particular binding activity range, a compound was assigned to that RBA range and not evaluated using patterns associated with lower RBA ranges.
The ER ligands in Tables 13 were used to evaluate the ability of this decision tree, and associated reactivity patterns, to predict RBA ranges for chemicals not included in the original hER
training set (see Bradbury et al., 2000). Each energetically reasonable conformer of a chemical was processed through the decision tree by making use of an interpreter based on the SMILES algorithm, which permits the use of stereoelectronic structure-based rules. The decision tree provided a binary discrimination (i.e., a "yes" or "no" determination) of chemicals being within a specified RBA range. Thus, a chemical would be predicted to have an hER
affinity within a specified RBA range if at least one of its conformers met the associated reactivity pattern.
![]() |
RESULTS AND DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
For compounds with observed RBA values between 0.1 and 1%, 5-androstane-3ß, 17ß-diol (22; RBA = 0.5%) was the only false negative identification, due to a very low global nucleophilicity (10.35 < EHOMO < 10.31 eV). Four compounds were false positive identifications for binding affinity in this range, including 29 (measured RBA < 0.05%), 31 (RBA < 0.05%), 35 (RBA = 0.021%), and 36 (RBA = 0.0003%). It should be noted that compounds 35 and 36 were evaluated using the hER
, with measured RBA values of 0.3 and 0.4%, respectively (Bradbury et al., 2000
), hence discrepancies between observed and predicted values for these compounds are likely due to variability across receptor systems.
In summary, of the 36 compounds in the MCF7 cell data set, 19 RBA ranges were correctly predicted, with RBA ranges for 5 compounds underpredicted (false negative identifications) and 12 overpredicted (false positives). Based on several compounds in common between the MCF7 cell data set and the hER training set, it appears that the false positive identifications may be due, in part, to inherently higher binding affinity in the hER
system. For the false negative identifications with compounds having observed RBA values between 10 and 100%, the most notable discrepancy was observed for compound 3 and suggests that in some cases one, rather than two, electronegative sites may be required for ER binding in MCF7 cells.
Mouse Uterine Data Set (Table 6)
The common reactivity patterns for RBA ranges greater than 150% and for 10 to 100%, did not result in any false positive identifications. Compounds with measured RBA values greater than 10%, which were also in the hER training set, were correctly discriminated.
|
In terms of false positive classifications, the reactivity pattern for 10 > RBA > 1% incorrectly identified chemical 8 (4-hydroxy-2',3',4',5'-tetrachlorobiphenyl; RBA = 1.0%) and several compounds with measured RBA values in the range of 0.01 to 0.1% (compounds 1417, 19, 22, 24, 25, and 27) and measured RBA values less than 0.01% (compounds 31, 34, and 35). An analysis of this set of structures suggests that RBA values can be significantly reduced if the electronegative site (i.e., a charge greater than 0.3 a.u.) is shielded. With an additional rule added to the expert system in which ligands with atoms or fragments, either directly bonded to an electronegative heteroatom or in an ortho position to the heteroatom, were considered incapable of binding with an RBA >0.1%, the number of false positive identifications decreased from 13 to 5. The remaining false positive identifications were compounds 17, 22, 24, 27, and 34. Compound 8, with a measured RBA of 1.0%, was a false negative with a predicted RBA of <0.1%. This steric requirement was also observed in a recent application of the COREPA approach to the data set analyzed by Waller et al. (1996), which consisted of 9 steroidal and 49 nonsteroidal ligands (unpublished data).
The reactivity pattern for the range of 0.1 to 1% correctly predicted 5 of the 6 compounds with observed RBA values within this range, with 2 of the chemicals (10, 12) not in the original hER training set. The pattern also identified 8 false positive ligands whose measured RBA values were 1 > RBA > 0.1% (compounds 18, 20, 23, 26, 29, 30, 32, and 33). In this case, use of the shielding rule did not reduce the number of false positives.
In summary, for the 35 compounds in the mER data set, 13 compounds were predicted to bind within the correct RBA ranges. Of the 22 incorrect classifications, one was a false negative and 21 were false positive predictions. Inclusion of the shielding rule in the expert system resulted in 20 correct classifications, with the number of false positive predictions reduced from 21 to 13 compounds, and with one additional false negative prediction. For the 7 compounds with measured RBA values between 150 and 1%, the one false negative prediction was associated with tamoxifen. With the inclusion of the shielding rule, the 13 false positive predictions included 9 biphenyl compounds, with measured RBA values typically between 0.1 and 0.01%. These low-affinity biphenyl compounds were not represented in the hER training set, which may have led to a bias in the reactivity patterns for RBA values between 10 and 1% and 1 and 0.1%. Other remaining false positives were 30 (4-nonylphenol; RBA = 0.01), which was measured to have a mER binding affinity of 0.313 (9), similar to that measured for hER
(0.3), and 26 (bisphenol A), 32 (BBP), and 33 (p,p`-DDT), all with lower observed affinity to mouse receptors than previously measured for hER
(Table 4
).
Rat Uterine Data Set (Table 7)
An evaluation of the RBA > 150% and 100 > RBA > 10% screening patterns against the rat RBA data set resulted in 5 false negative and 5 false positive predictions. The reactivity pattern derived for RBA > 150% correctly identified 1 (diethylstilbestrol; RBA = 470%), with no false positive identifications. The rER data set contained 3 compounds with observed RBA values between 100 and 150%, a range not available in the hER training set. Although technically classified as false negatives, the hER
-based pattern for 100 > RBA > 10% identified 3 (D14 estradiol-17ß; RBA = 107%) and 4 (7
-methyl estradiol-17ß; RBA = 104%). The third chemical in this range (2; 11ß-methyl estradiol-17ß; RBA = 124%) was identified as having an RBA between 10 and 1%. For the 12 chemicals with observed RBA values between 10 and 100%, the reactivity pattern for 100 > RBA > 10% resulted in false negative predictions for 10 (6-hydroxy-2,3-diphenylindenone-1; RBA = 59%) and 16 (1-methyl-6-hydroxy; 2,3-diphenylindene; RBA = 12%), which were predicted to have RBAs between 0.1 and 1%. These incorrect predictions were due to short distances between electronegative sites and the lack of a second electronegative site, respectively. False positive identifications obtained with the 100 > RBA > 10% pattern included compounds 18 (RBA = 9%), 21 (RBA = 6%), 24 (RBA = 5%), 27 (RBA = 2.6%), and 30 (RBA = 1.6%). Using the additional filter for nonshielded electronegative sites (see Mouse Uterine Data Set), compounds 27 and 30 would not be predicted to have RBA values between 10 and 100%, but would be predicted to have RBA < 0.1, and therefore be identified as false negatives.
|
Using the screening rule for RBA values between 1 and 0.1%, false positive identifications were noted for 41 (RBA = 0.068%), 45 (RBA = 0.01%), and 48 (0.0005%). Of these, 45 and 48 had greater hER measured affinities (Table 4
), again suggesting interspecies or interlaboratory differences as a basis for the discrepancy.
In summary, of the 58 compounds in the rat uterine data set, RBA ranges were correctly predicted for 21 ligands, with 12 false negative and 25 false positive classifications. Thirty-four compounds were, however, correctly classified with inclusion of the shielding rule, which decreased the number of false-positive identifications to 11 ligands, while increasing the number of false negatives to 15. For the 16 compounds with measured RBA values greater than 10%, there were 5 false negative identifications. For the 16 compounds with measured RBA values between 10 and 1%, 3 false positive identifications and two additional false negatives were noted (after inclusion of the shielding rule). Of the 7 ligands in this range incorrectly predicted to have an RBA value between 1 and 0.1%, the 4 most notable false negative predictions were associated with compounds whose measured RBA values ranged from 9.3 to 4.6%. The remaining 3 ligands had measured RBA values that ranged from 2.3 to 1.2%. Finally, for the 26 ligands with measured RBA values less than 1%, there was one false negative prediction generated upon inclusion of the shielding rule, and 8 false positive predictions.
Summary and Conclusions
Development and evaluation of similarity relationships to predict a specific type of biological activity based upon chemical structure requires the establishment of a knowledge base that contains training and evaluation sets of chemicals whose modes of action and potency are well defined. Minimizing the introduction of biological variability in the model development and evaluation process is critical to assessing the performance of SARs. It is also important that the training set of chemicals represents a range of structures, and associated properties, representative of the "chemical universe" of interest. Structure similarity relationships based on well-defined endpoints, and developed across a diverse set of chemicals, provide transparent models whose uncertainties can be better defined. It is also essential to define the required precision of a model to determine the data quality in the training and evaluation knowledge bases. In the present study, the model predicts RBA values within a factor of 10 ranges across 6 orders of magnitude. Consequently, variability of less than 10-fold in the training and evaluation data sets is not problematic. Of course, other applications of these training and evaluation data sets may require greater levels of precision and accuracy.
To develop a model to predict potential ER binding affinity from chemical structure, Bradbury et al. (2000) used a data set of 45 compounds that had been assayed with the hER. Although a "leave-one-out" statistical approach was used to evaluate common reactivity patterns for 4 RBA ranges between 150 to 0.1%, it was not possible to independently assess the model with compounds not used in the training set. To more completely evaluate the model, an optimum approach would be to compare hER
RBA predictions to measured hER
values for chemicals not used in the training set. Unfortunately, such a data set did not exist in the open literature. While risking the addition of interspecies/test system variability in the evaluation process, the current study employed a data set of 99 structures that were assessed in MCF7 cells (hER) and rodent models (mER and rER) as a means to determine ability of the model to assess the activity of compounds not used in the original training set. For those 17 compounds that overlapped with the original hER
data set, measured RBA values were within an order of magnitude.
The largest percentage of correct predictions was obtained for the MCF7 data set, which was most likely due to similarity in chemical structures between the two data sets, as well as the fact that the two systems are based on a similar, perhaps the same, receptor. The increase in incorrect predictions with the mouse and rat data sets was largely due to false positive identifications. The rate of false positive identifications was markedly reduced when an additional rule was included that required at least one electronegative atom to be unshielded. This occurred at the expense of a slight increase in the number of false negatives. The observation that the electronegative atom must be unshielded could reflect differences in the hER and rodent receptors, or reflect a bias in the original training set where the occurrence of shielded electronegative atoms was rare. An analysis of the data set reported by Waller et al. (1996), where binding affinity was expressed in terms of pKi rather than RBA values, also indicated the need for an unshielded electronegative atom, which suggests the original hER
training set may not have had sufficient diversity in chemical structure. Assuming the false positive error rates for the rodent data sets are not primarily due to interspecies differences, but instead due to a bias in the original training set, a modification to the expert system rules that requires at least one electronegative atom be unshielded appears warranted.
Figure 1A summarizes the results of using the hER
-based reactivity patterns to predict RBA ranges from the combined hER
, MCF7 cell and rodent data sets using the expert system. Figure 1B
represents predictions obtained with inclusion of the shielding rule. Eight false negatives were identified among the 46 ligands (31 compounds in the MCF7 cell, mER and rER data sets) whose measured RBA values were greater than 10%. No false positive identifications were observed for ligands with measured RBA values >10%. Thus, the interspecies comparison, undertaken assuming similarity between ER binding domains in the human and rodent assays, provides a reasonably robust screening result for ER ligands whose binding affinities are at least 10% of E2, independent of the mammalian receptor system. Thirteen of the remaining 129 compounds with RBA values < 10% were incorrectly classified as false negatives, without the shielding rule (Fig. 1A
), and 17 with the shielding rule (Fig. 1B
). The number of false positives obtained for chemicals with RBA < 10% was 69 with the original expert system and 47 upon incorporation of the shielding rule. The increased occurrence of false positive identifications for chemicals with lower binding affinities is consistent with a lower level of biological similarity to E2 and therefore a lower level of chemical similarity and specificity of reactivity patterns (Bradbury et al., 2000
).
|
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
NOTES |
---|
1 To whom correspondence should be addressed. Fax: (218) 529-5015. E-mail: bradbury.steven{at}epa.gov.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Anstead, G. M., Wilson, S. R., and Katzenellenbogen, J. A. (1989). 2-Arylindenes and 2-arilindenones: Molecular structures and considerations in the binding orientation of unsymmetrical non-steroidal ligands to the estrogen receptor. J. Med. Chem. 32, 21632171.[ISI][Medline]
Bolger, R., Wiese, T. E., Ervin, K., Nestich, S., and Checovich, W. (1998). Rapid screening of environmental chemicals for estrogen receptor. Environ. Health Perspect. 106, 551557.[ISI][Medline]
Bradbury, S. P., Mekenyan, O. G., Ankley, G. T. (1998). The role of ligand flexibility in predicting biological activity: Structure-activity relationships for aryl hydrocarbon, estrogen, and androgen receptor-binding affinity. Environ. Toxicol. Chem. 17, 1525.[ISI]
Bradbury, S. P., Kamenska, V., Schmieder, P. K., Ankley, G. T., and Mekenyan, O. G. (2000). A computationally based identification algorithm for estrogen receptor-ligands. Part I. Predicting hER Binding Affinity. Toxicol. Sci. 58, 253269.
Brooks, S. C., Wappler, N. L., Corombos, J. D., Doherty, L. M. (1987). Estrogen structure-receptor function relationships. In Recent Advances in Steroid Hormone Action (V. K. Moudgil, Ed.), pp. 443466. Walter de Gruyter, Berlin.
Connor, K., Ramamoorthy, K., Moore, M., Mustain, M., Chen, I., Safe, S., Zacharewski, T., Gillesby, B., Joyeux, A., and Balaguer, P. (1997). Hydroxylated polychlorinated biphenyls as estrogens and antiestrogens: Structure-activity relationships. Toxicol. Appl. Pharmacol. 145, 111123.[ISI][Medline]
Gabbard, R. B., and Segaloff, A. (1983). Structure-activity relationships of estrogens. Effects of 14-dehydrogenation and axial methyl groups at C-7, C-9, C-11. Steroids41 , 791805.[ISI][Medline]
Ivanov, J. M., Karabunarliev, S. H., and Mekenyan, O. G. (1994). 3DGEN: A system for an exhaustive 3D molecular design. J. Chem. Inf. Comput. Sci.34 , 234-243.[ISI]
Ivanov, J. M., Mekenyan, O. G., Bradbury, S. P, and Schuurmann, G. (1998). A kinetic analysis of the conformational flexibility of steroids. Quant. Struc. Act. Relat. 17, 437449.
Korach, K. S., Sarver, P., Chae, K., McLachlan, J. A., and McKinney, J. D. (1988). Estrogen receptor-binding activity of polychlorinated hydroxybiphenyls: Conformationally restricted structural probes. Mol. Pharmacol. 33, 120126.[Abstract]
Kuiper, G. G. J. M., Carlsson, B., Grandien, K., Enmark, E., Haggblad, J., Nilsson, S., and Gustafsson, J.-K. (1997). Comparison of the ligand binding specificity and transcript tissue distribution of estrogen receptor a and b. Endocrinology 138, 853870.
Mekenyan, O. G., Ivanov, J. M., Karabunarliev, S. H., Bradbury, S. P., Ankley, G. T., and Karcher, W. (1997). A computationally-based hazard identification algorithm that incorporates ligand flexibility. 1. Identification of potential androgen receptor ligands. Environ. Sci. Technol. 31, 37023711.[ISI]
Mekenyan, O. G., Nikolova, N., Karabunarliev, S. H., Bradbury, S. P., Ankley, G. T., Hansen, B. (1999). New developments in a hazard identification algorithm for hormone receptor ligands. Quant. Struct. Act. Relat. 18, 139153.[ISI]
Meridian Institute. (1999). EPA priority-setting workshop for the endocrine disruptor screening program. January 2021, 1999. A meeting summary. Dillon, CO.
Palomino, E., Heeg, M. J., Horwitz, J. P., Polin, L., Brooks, S. C. (1994). Skeletal conformations and receptor binding in some 9,11-modified estradiols. J. Steroid Biochem. Mol. Biol. 50, 7585.[ISI][Medline]
Qian, X., and Abul-Hajj, Y. J. (1990). Synthesis and biological activities of 11ß-substituted estradiol as potential antiestrogens. Steroids 55, 238241[ISI][Medline]
Stewart, J. J. (1990). MOPAC: A semiempirical molecular orbital program. J. Comput.Aid. Mol. Des. 4, 1105.
Stewart, J. J. (1993). MOPAC 93. Fujitsu Limited, Chiba-city, Chiba 261, Japan, and Stewart Computational Chemistry. Colorado Springs, CO.
VanderKuur, J. A., Wiese, T., and Brooks, S. C. (1993). Influence of estrogen structure on nuclear binding and progesterone receptor induction by the receptor complex. Biochemistry 32, 70027008.[ISI][Medline]
Waller, C. L., Oprea, T. I., Chae, K., Park, H.-K., Korach, K. S., Laws, S. C., Wiese, T. E., Kelce, W. R., and Gray, L. E., Jr. (1996). Ligand-based identification of environmental estrogens. Chem. Res. Toxicol. 9, 12401248.[ISI][Medline]