Toward Construction of a Transcript Profile Database Predictive of Chemical Toxicity

J. Christopher Corton1 and Anja J. Stauber

Chemical Industry Institute of Toxicology, 6 Davis Drive, PO Box 12137, Research Triangle Park, North Carolina 27709

ABSTRACT

The article highlighted in this issue is "Toxicogenomics-Based Discrimination of Toxic Mechanism in HepG2 Human Hepatoma Cells" by Michael E. Burczynski, Michael McMillian, Joe Ciervo, Li Li, James B. Parker, Robert Y. Dunn II, Sam Hicken, Spencer Farr, and Mark D. Johnson (pp. 399–415).

Large-scale DNA sequencing efforts have generated an enormous amount of genomic information. The sequence of the ~3 x 109 base pairs of the human genome is now officially complete and sequencing of mammals of interest to toxicologists such as the mouse and rat are likely to begin in the near future. This sequence information will ultimately give us a complete inventory of the estimated 100,000 genes encoded by mammalian genomes. In the post-genome era the next major challenge is to determine the molecular wiring of the cell. This daunting task will require the systematic and comprehensive analysis of levels of mRNAs, proteins and metabolites within a discipline called functional genomics. An understanding of the relationships between chemical exposure and perturbations in this molecular circuitry will allow toxicologists to more completely understand the effects of chemicals on mammalian physiology.

There is intense interest in the toxicological community in the application of functional genomics approaches to a better understanding of chemical-induced toxicity. New techniques are now allowing for the simultaneous analysis of mRNA levels of hundreds or thousands of genes (sometimes called transcriptomics). Genome-wide mRNA levels are determined using DNA arrays composed of silicon, glass or nylon onto which fragments of the cDNAs of sequenced genes are spotted at high density. The mRNA from expressed genes in a tissue or cell line is converted to labeled cDNA, hybridized to the array, and quantitated using various imaging techniques. The information is analyzed using bioinformatics tools that allow biologically meaningful data to be "mined" from large data sets (Duggan et al., 1999Go) with the goal of increasing focus on key events associated with chemical response. Bioinformatics tools will also allow for the eventual linkage of information from efforts in transcriptomics, proteomics (global analysis of protein levels), and metabonomics (comprehensive analysis of metabolites) for use in creating comprehensive biologically-based models of chemical action.

Toxicologists working in the area of functional genomics have created an exciting new scientific discipline many have called "toxicogenomics". Toxicogenomics refers to the study of perturbations in cellular components on a genome-wide scale, used to understand the toxicological relevance of chemical exposure. However, this term potentially misleads the public into believing that all genomic changes after chemical exposure have toxicological significance. We are just beginning to understand the relationships between chemical exposure and alterations in the expression of large batteries of genes. It is likely that many of these genomic changes will not be mechanistically linked to toxicity. Indeed, functional genomics will be used to predict beneficial outcomes as well as the likelihood of adverse responses after chemical exposure.

One of the most intriguing prospects for the use of genomic information in toxicology is the creation of databases useful for categorizing chemicals according to mode of action. The term mode of action refers to the key obligatory process governing the action of chemicals without the level of detail necessary to determine mechanism of action (Butterworth et al., 1995Go). Comparison of gene expression profiles of individual chemicals from many mode-of-action classes (e.g., cytotoxic chemicals, peroxisome proliferators, estrogenic chemicals) will allow the identification of common sets of genes whose expression is consistently linked to particular disease outcomes. The examination of large numbers of chemicals in each mode-of-action class might eventually lead to definition of subcategories based on subtle but distinctly different patterns of expression. When the database becomes sufficiently comprehensive, the transcript profiles of new chemicals with unknown properties could be compared to profiles in databases, allowing new chemicals to be provisionally placed into one or more mode-of-action classes. Additional, more directed studies could be undertaken to confirm or refute the predicted mode of action and toxic outcome for the new chemical. This type of analysis could potentially simplify the battery of tests needed to characterize toxicity and could reduce time and resources needed to determine the potential toxicity of each new chemical. A substantial matrix of data on many chemicals with known exposure-disease outcomes will need to be obtained to maximize the likelihood of detecting true positives and minimize false negatives. This will require the evaluation of gene expression profiles of structurally-related chemicals not causing disease as well as those known to cause disease with varying potency. This approach to predicting toxicity has been recently discussed (Nuwaysir et al., 1999Go).

In this issue of Toxicological Sciences, Burczynski et al. (pp. 399–415) are the first to describe the construction and use of a transcript profile database that begins to distinguish chemicals from different mode-of-action classes. Through their research, the authors systematically learned the rules for how best to use information in a database to distinguish between two mode-of-action classes. The steps taken highlight the problems and the promise in the use of a transcript profile database to predict toxicity.

As the first step in constructing a database, the biological model from which transcript profiles will be derived needs to be carefully selected. The authors determined gene expression profiles in the human hepatoblastoma cell line (HepG2) 24 h after chemical treatment at concentrations predetermined to induce 30% cell killing in 72 h. Rigid standardization of exposure conditions is important because rather minor changes in treatment conditions can have significant effects on gene expression and contribute to the inherent "noise" of the system. Although the conditions used here were a logical first step in allowing for direct comparisons between transcript profiles, two points should be raised. First, the dynamic changes in gene expression over time and dose cannot be taken into account in this study. This more complete analysis is important not only for identifying sequential events that lead to toxicity, but also for distinguishing between gene expression changes directly linked to toxicity versus those that may be involved in responses independent of toxicity. The authors suggest that expanding the database to include data sets derived from multiple time points may improve predictability. Second, the toxic effects of many chemicals are not necessarily associated with generalized cell killing. Many chemicals that bind and activate nuclear receptors, for example, induce alterations in homeostasis in the absence of overt toxicity. The exposure conditions defined here run the risk of shifting the mode-of-action category from that commonly associated with chemical exposure at lower doses to another mode-of-action category at high doses.

Using the HepG2 model the authors generated an impressive data set consisting of 2.5 million data points from transcript profiles after exposure to 100 chemicals. In contrast to the thousands of genes screened in many published studies, the profiles were generated using a glass slide-based array containing a modest number (~250) of toxicologically relevant genes. To compare profiles between different chemicals, the authors derived a Pearson's correlation coefficient representing the sum of the differences between all genes on the array. Using this analysis the first attempt at distinguishing between chemicals in each group met with failure; there were no consistent correlations between chemicals within a mode of action group or between groups. The authors then went on to systematically improve the correlations, especially for the groups under the greatest scrutiny, i.e. anti-inflammatory versus DNA damaging chemicals.

One of the most important findings of this study is that the success of the correlations is dependent on the group of genes used to make the comparisons. There is evidence that certain classes of genes appear to be more sensitive to slight variations in experimental conditions independent of chemical exposure (Hughes et al., 2000Go). Exclusion of those genes from the correlation calculations might improve the ability to distinguish between different categories. With this in mind the authors identified a small subset of the original ~250 genes that resulted in consistent clustering of chemicals within each mode of action group and between DNA damaging agents and anti-inflammatory agents. This approach of identifying an appropriate subset of genes to make the comparisons holds promise for application to additional mode-of-action classes. On the other hand, identification of appropriate genes for making comparisons may better come from unsupervised analyses of the transcript profile data. Clustering algorithms are now routinely used in an unbiased analysis of gene expression data (Young, 2000Go). These techniques applied to chemical-induced patterns of gene expression might assist in identifying a larger set of genes that could be used to compare all of the mode-of-action categories as well as to more appropriately cluster chemicals based on their true mode of action.

Beyond their use to predict chemical toxicity, databases of genomic information might someday be used to identify specific targets of chemicals allowing for precise determination of mechanism of action. Indications that this is a feasible goal came from recent work carried out by Friend and colleagues who have constructed a compendium of expression profiles in budding yeast from ~300 different mutant or chemically-treated strains (Hughes et al., 2000Go). An important lesson from these studies was that the expression profile obtained from chemical- or drug-treated cells is very similar to a profile from a strain in which the gene encoding the target of the chemical is mutated. Once databases are sufficiently robust, it may be possible for toxicologists to compare an expression profile of a chemical under scrutiny to hundreds or thousands of profiles in web-based databases to identify likely targets that can be confirmed experimentally. Although many years off, this type of analysis may revolutionize toxicology by shifting research from a mode-of-action orientation to that in which mechanism of action can be precisely defined.

To realize the full potential of genomic databases, significant challenges must be overcome. These include determining how to compare data sets that are derived from different array platforms found in databases of different structures. There is an urgent need to create genomic databases from the ground up that are compatible with one another, composed of data of known quality and robustness and freely accessible to all toxicologists. With this in mind, it should be noted that not all of the data used in the paper by Burczynski et al. will be made publicly available. However, a number of academic and government labs including NIEHS (Nuwaysir et al., 1999Go; http://www.niehs.nih.gov/oc/news/toxgen.htm) have plans for constructing publicly accessible web-based databases. A database structure useful for toxicologists from academia as well as the pharmaceutical industry is also being built as part of the efforts of the International Life Sciences Institute Subcommittee on Application of Genomics and Proteomics to Mechanism-Based Risk Assessment. Achievement of the goal of predicting chemical toxicity from genomic information may ultimately depend on the ease with which investigators can interface with multiple databases as well as the ability to interpret the effects of chemicals on changes in transcripts, proteins and metabolites.

NOTES

1 To whom correspondence should be addressed. Fax: (919) 558-1300. E-mail: corton{at}ciit.org. Back

REFERENCES

Butterworth B. E., Conolly, R. B., and Morgan, K. T. (1995) A strategy for establishing mode of action of chemical carcinogens as a guide for approaches to risk assessments. Cancer Lett. 93, 129–146.[ISI][Medline]

Duggan, D. J., Bittner, M., Chen, Y., Meltzer, P., Trent, J. M. (1999). Expression profiling using cDNA microarrays. Nat. Genet. 21(Supp.), 10–14.[ISI][Medline]

Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R., Armour, C. D., Bennett, H. A., Coffey, E., Dai, H., He, Y. D., Kidd, M. J., King, A. M., Meyer, M. R., Slade, D., Lum, P. Y., Stepaniants, S. B., Shoemaker, D. D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., Friend, S. H. (2000). Functional discovery via a compendium of expression profiles. Cell 102, 109–126.[ISI][Medline]

Nuwaysir, E. F., Bittner, M., Trent, J., Barrett, J. C., Afshari, C. A. (1999). Microarrays and toxicology: the advent of toxicogenomics. Mol. Carcinog. 24, 153–159.[ISI][Medline]

Young R. A. (2000) Biomedical discovery with DNA arrays. Cell 102, 9–15.[ISI][Medline]





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (5)
Disclaimer
Request Permissions
Google Scholar
Articles by Corton, J. C.
Articles by Stauber, A. J.
PubMed
PubMed Citation
Articles by Corton, J. C.
Articles by Stauber, A. J.