Discrimination of genotoxic from non-genotoxic carcinogens by gene expression profiling

J. H. M. van Delft1, E. van Agen, S. G. J. van Breda, M. H. Herwijnen, Y. C. M. Staal and J. C. S. Kleinjans

Department of Health Risk Analysis and Toxicology, Maastricht University, PO Box 616, 6200 MD Maastricht, The Netherlands

1 To whom correspondence should be addressed Email: j.vandelft{at}grat.unimaas.nl


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Two general mechanisms are implicated in chemical carcinogenesis. The first involves direct damage to DNA, referred to as genotoxic (GTX), to which the cell responds by repair of the damages, arrest of the cell cycle or induction of apoptosis. The second is non-DNA damaging, non-genotoxic (NGTX), in which a wide variety of cellular processes may be involved. Therefore, it can be hypothesized that modulation of the underlying gene expression patterns is profoundly distinct between GTX and NGTX carcinogens, and thus that expression profiling is applicable for classification of chemical carcinogens as GTX or NGTX. We investigated this hypothesis by analysing modulation of gene expression profiles induced by 20 chemical carcinogens in HepG2 cells with application of cDNA microarrays that contain 597 toxicologically relevant genes. In total, 22 treatments were included, divided in two sets. The training set consisted of 16 treatments (nine genotoxins and seven non-genotoxins) and the validation set of six treatments (three and three). Class discrimination models based on Pearson correlation analyses for the 20 most discriminating genes were developed with data from the training set, where after the models were tested with all data. Using all data, the correctness for classification of the carcinogens from the training set was clearly better than that for the validation set, namely 81 and 33%, respectively. Exclusion of the treatments that had only marginal effects on the expression profiles, improved the discrimination for the training and validation sets to 92 and 100% correctness, respectively. Exclusion of the gene expression signals that were hardly altered also improved classification, namely to 94 and 80%. Therefore, our study proves the principle that gene expression profiling can discriminate carcinogens with major differences in their mode of actions, namely genotoxins versus non-genotoxins.

Abbreviations: GTX, genotoxic; N-GTX, not similar to GTX; N-NGTX, not similar to NGTX; NGTX, non-genotoxic


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The screening of chemical compounds for hazardous properties relies on a few in vitro assays and on acute to chronic studies with animal models. Despite their frequent use, the reliability, relevance and effectiveness of these methodologies are continuously questioned. Powerful technologies for functional genomics may provide new, mechanism-based, assays with a high predictive value for toxic risks in humans. It is envisioned that toxicology will benefit enormously from the application of DNA microarray technologies to analyse chemically induced alteration of gene expression (14). Ultimately, this may lead to fast screening systems with a high throughput, and to animal tests with reduced inconvenience for the animals and with a higher predictability of human safety. Possible topics include, for instance, genotoxicity screening and identification of modes of action of carcinogens. Here, we investigated whether gene expression profiling can be used for mechanism-based classification of chemical carcinogens.

Over the past years, DNA microarray technologies have been developed that enable the examination of the expression of >10000 genes simultaneously, thus providing insight into the complexity of gene–function relationships by gene expression analyses (5,6). Application of these technologies in toxicological research is currently drastically expanding. A major focus is on unravelling the modes of action for toxic compounds and on identification of gene expression profiles that can be applied as a biomarker to predict specific toxic end-points. Other topics include dose–response relationships especially at low dosages, interspecies extrapolation in order to improve human risk assessment, and interactions of mixtures of chemicals (4,710).

The power for predicting the properties of a biological sample based on gene expression patterns, has frequently been demonstrated numerous times. A first, demonstration to discriminate chemicals with different mechanisms of actions, was with antitumour drugs in cell lines (11). Application of similar approaches in toxicology was initiated by Burczynski and colleagues (12), who demonstrated that selecting a sub-set of genes drastically improves the discrimination between toxicant classes. Thereafter, discriminating toxic compounds based on expression patterns has been shown frequently, both by in vitro studies using cell lines or primary hepatocytes (13) as well as by in vivo studies with rat (1416). Recently, Hamadeh et al. (17) showed in a study using coded samples, that gene expression profiling led to successful prediction of whether these samples were derived from livers of rats exposed to enzyme inducers or to peroxisome proliferators. This underlines the correctness of the assumption that a toxicogenomic approach is valid to classify the mode of action for a compound.

Based on mode of action, carcinogenic compounds can be roughly divided into two classes, namely genotoxic (GTX) and non-genotoxic (NGTX) carcinogens (18,19). GTX carcinogens damage DNA by covalently binding to it, either directly or after activation by metabolizing enzymes, or intercalate into the DNA-helix. In response to a variety of types of DNA damage, the p53 tumour suppressor gene product is activated and regulates a number of downstream cellular processes such as cell cycle arrest, apoptosis and DNA repair (2022). These responses involve the coordinated action of many genes. Upon induction of DNA damage p53 can cause a temporary cell cycle arrest by raising the transcription of CIP1, GADD45 and MDM2. At high damage levels p53 induces apoptosis by modulating the expression of genes like BAX and IGF-BP (20,23).

Modulation of gene expression profiles by NGTX carcinogens is much more complicated. The modes of action are numerous and very diverse, including the modulation of metabolizing enzymes, the induction of peroxisome proliferation, the stimulation of oxidative stress, the alteration of intercellular communication, the suppression of apoptosis and the stimulation of regenerative cell growth following cytotoxic effects (18,19,2426). Even more complicating is the fact that many NGTX carcinogens frequently affect several of these pathways.

Based on the above, it can be hypothesized that modulation of gene expression profiles by GTX carcinogens might be profoundly distinct from that by NGTX carcinogens, and thus that gene expression profiling is applicable for mechanism-based classification of chemical carcinogens as genotoxin or non-genotoxin. Here, we investigated this hypothesis by analysing modulation of gene expression profiles induced by model carcinogens in HepG2 human hepatoma cells. HepG2 cells are metabolically competent with respect to biotransformation of mutagens and carcinogens, frequently applied in toxicology and gene expression studies and carry no p53 mutations (12,2731). Following a training phase with 16 carcinogens, the classification was tested with the training set and a validation set of six additional treatments. Specific aims of the study were: (i) the discrimination of GTX from NGTX carcinogens based on mRNA profiles and (ii) the identification of discriminating biomarker genes.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Cell culture and treatment
Human HepG2 cells were cultured in Minimal Essential medium supplemented with non-essential amino acids, pyruvate, penicillin/streptomycin (35 U/ml) and 10% fetal bovine serum at 37°C and 5% CO2. All media compounds were from Gibco BRL (Breda, The Netherlands).

Cytotoxicity of the chemicals was investigated with the dimethylthiazol-diphenyltetrazolium (MTT) test in 96-well plates following a 24 h treatment period (32). This period was chosen as it is routinely used in all current in vitro genetic toxicity tests on mammalian cells, such as for the induction of micronuclei, chromosomal aberrations and gene mutations. Furthermore, it has been shown that 24 h exposure results in more gene expression changes than 4 h (33). The concentrations selected for the gene expression studies were based on limited cytotoxicity (<25% reduction of signal), or in the case no cytotoxic effects were measured on solubility with 2 mM as the highest dose. A single dose was applied per carcinogen. Exceptions are TCDD, BaP and TPA; their dose levels are the highest levels used by others in HepG2 cells (29,34,35). See Table I for information about the chemical carcinogens, including suppliers and the applied dose level. All chemicals were of the highest purity that could be obtained. Two days before treatment 107 cells were seeded in 75 cm2 culture flasks. Treatment started by adding the chemicals, which were dissolved in the appropriate vehicle, and was terminated 24 h later by removing the culture medium and immediately adding Trizol. Negative controls were treated with the vehicle. Twenty-two different treatments were conducted, which are subdivided in a training set, consisting of treatments with 16 chemicals, and a validation set, consisting of treatments with six chemicals of which two were also in the training set.


View this table:
[in this window]
[in a new window]
 
Table I. Overview of the chemicals used for treating HepG2 cells

 
Total RNA isolation and cDNA probe synthesis
Cells were lysed in 3 ml TRIZOL Reagent (Gibco BRL). Total RNA was extracted according to the manufacturer's instructions. The RNeasy® Mini Kit (Qiagen, Westburg bv, The Netherlands) was used to purify total RNA from salts and residual DNA. Quantity of each RNA sample was measured by a spectrophotometer, and integrity was determined with the Bioanalyzer (Agilent Technologies, Amstelveen, The Netherlands).

Most of the procedures for labelling of RNA with fluorophores were as described by others with minor modifications (36) (protocol ‘Aminoallyl labelling of RNA for microarrays’ by Jeremy Hasseman, Inst. Genomic Research, USA; http://www.tigr.org/tdb/microarray/protocolsTIGR.shtml). Cyanine 3 (Cy3)- and Cyanine 5 (Cy5)-labelled cDNA probes were prepared from each treated and the concomitant control culture. First strand cDNA synthesis occurred with aminoallyl labelled nucleotides followed by a coupling either Cy3 or Cy5 fluorescent molecules to the aminoallyl groups. Ten micrograms of total RNA and 6 µg random hexamer primers (Invitrogen, Breda, The Netherlands) were incubated at 70°C for 10 min and snap-frozen for 30 s, where after a 32 µl reaction mixture was made consisting of first strand buffer, DDT (9.5 mM), 0.5 mM dATP, dCTP and dGTP, 0.3 mM dTTP, 0.2 mM 5-(3-aminoallyl)-2'-deoxyuridine-5'-triphosphate (AA-dUTP) and 400 U Superscript II reverse transcriptase (Invitrogen, Life Technologies), which was incubated overnight at 42°C. RNA was hydrolysed by adding 10 µl 1 M NaOH and 10 µl 0.5 M EDTA (Merck) and a 15 min incubation at 65°C, followed by neutralization with 10 µl 1 M HCl. cDNA probes were purified using a QIAquick PCR Purification Kit (Qiagen) as described by the manufacturer but buffers were substituted with phosphate buffers (phosphate wash buffer: 5 mM phosphate buffer, pH 8.0, 80% ethanol; phosphate elution buffer: 4 mM phosphate buffer, pH 8.5). After elution and drying in vacuo, the aminoallyl labelled cDNA was resuspended in 4.5 µl 0.1 M Na2CO3 pH 9.0 and 4.5 ml CyTM5 or CyTM3 Monofunctional Reactive Dye esters (Amersham Pharmacia Biotech) and incubated in the dark at ambient temperature for 1 h, followed by a clean up step with the QIAquick PCR Purification Kit (Qiagen). Fluorescent-labelled cDNAs for the treatment and the concomitant control were mixed and dried in vacuo.

Microarray analysis
Gene expression analysis was carried out using the PHASE-1 Microarray Human-600 (PHASE-1 Molecular Toxicology, Santa Fe, USA). These arrays contain 597 sequence verified human genes, representing a number of toxicologically relevant, as well as control, genes. The toxicologically relevant genes include pathways that are important for chemical carcinogenesis, such as for apoptosis (caspases, BAK, Bax, Fas, cyclins, TNFs), cell cycle control (cyclins, DNA binding proteins, Waf 1), cell proliferation (kinases, transcription factors, growth factors and receptors, connexins), DNA damage/repair (DNA repair enzymes, ERCCs, GADDs, helicases, topoisomerases), inflammation (serum amyloids, interleukins, adhesion molecules, chemokines), metabolism (CYP450 s, glucuronidation enzymes, glutathione enzymes, methyltransferases, redox enzymes), oxidative stress (O2 response enzymes, superoxide dismutase, redox enzymes), peroxisome proliferation (peroxisomal enzymes), transport (multi-drug resistance proteins, organic anion and cation transporters) and cell–environment interaction (connexins, integrins, selectins, cadherins). Target genes are single stranded, ~500 nt in length and spotted in quadruplicate on glass microscope slides. Each gene was printed in quadruplicate, thus minimizing the intra-assay variation.

For each carcinogen treatment, two hybridizations were done, the second with switching the dyes, in order to reduce variation and possible fluorophore-related effects. Each hybridization consisted of a treated sample versus the concomitant control sample. The procedure for hybridization and washing was according to the instructions from PHASE-1 Molecular Toxicology. Labelled cDNA probe was resuspended in 30 µl hybridization buffer (50% formamide, 5x SSC, 0.1% SDS, 0.1 mg/µl salmon sperm DNA) and incubated for 15 min in the dark, denatured by heating for 5 min at 95°C, and then centrifuged for 3 min at 12 000 g. Denatured probes were placed in a heat block at 70°C until use. Twenty-eight microlitres of the probe mixture was used for hybridization of the microarray slide under an 18 x 18 mm cover slip. Slides were hybridized overnight in a humidified slide hybridization chamber (Corning, Life Sciences, The Netherlands) submerged in a 42°C water bath. Thereafter slides were washed with 2x SSC at 34°C with gentle shaking for 2 min, 2x SSC/0.1% SDS, 0.1x SSC/0.1% SDS and twice with 0.1x SSC for 5 min, and finally with water for 1 min, all at room temperature. To dry the slides, they were centrifuged for 1 min at 200 g. Slides were scanned on a GMS 418 Array Scanner (Affymetrix, Santa Clara, USA). Both Cy3 (532 nm) and Cy5 (635 nm) channels were scanned at a photo multiplier setting of 65%. Laser power was adjusted until there were no saturated spots. The images obtained (resolution 10 micron; 16 bit tiff image) were processed with ImaGene 5.0 software (BioDiscovery, EL Segundo, USA) to measure mean signal intensities for spots and local backgrounds. Poor spots were manually flagged (these are spots that are not well processed by ImaGene, in general due to severe blurring or contamination with a dust particle).

Data analysis
Data were transferred from ImaGene into GeneSight 4.0 (BioDiscovery) for further analysis. Flagged spots were not included. For each spot, mean local background intensity was subtracted from mean signal intensity, and spots with a mean net signal of <5 were omitted from analysis. Background corrected mean intensities were log transformed (base 2). Next, the expression difference for each spot was calculated by subtracting the log transformed mean intensity of the control culture from the log transformed mean intensity of the treated culture. Expression differences were normalized using Lowess with all spots and the replicates of each gene were combined to a mean expression difference with exclusion of outliers (beyond 2 SD). The background corrected mean spot intensities will be submitted to ArrayExpress, the public repository at the European Bioinformatics Institute (http://www.ebi.ac.uk/microarray/).

To estimate the number of differentially expressed genes following a treatment, two approaches were followed, namely by using the confidence analysis and significance analysis tools from GeneSight. For confidence analyses, data of the two replicate arrays per treatment were combined, and up- and down-regulated genes were identified at 99% confidence intervals with up- and down-regulation levels set at 0.6 (implying a 1.5-fold up- or down-regulation). For significance analyses, data of the two replicate arrays per treatment were compared with data from four self–self hybridizations by t-tests with or without Holm's P-value adjustment. The adjustment is to compensate for the possibility that some undifferentiated genes will, by chance, show differential expression.

Class discrimination
Three different approaches were used for discrimination of GTX from NGTX carcinogens based on gene expression profiling. In all approaches data of the two replicate arrays per treatment were combined. In the first approach all the data points for all treatments were included. In the second approach, all data points were used, but only from the treatments that resulted in >6 differentially expressed genes according to confidence analyses (see Table II). This cut-off limit was chosen as at 99% confidence limits, about six genes might by chance show differential expression when analysing 597 genes. This resulted in the exclusion of seven treatments, with 15 remaining (12 in the training set, and three in the validation set). In the third approach, for all treatments only the data points beyond the 0.5 SD for all data points were included. This resulted in exclusion of mean expression differences between –0.267 and 0.267 (implying a 1.2-fold up- and down-regulation), with 44.4% of the data points remaining. Exclusion of genes that are not differentially expressed has been used by others (17,37).


View this table:
[in this window]
[in a new window]
 
Table II. Number of responding genes for the chemical treatments according to various analyses

 
Next, all three approaches followed the same procedure. First the mean expression differences per gene were calculated for the treatments with GTX and for the treatments with NGTX carcinogens of the training set, followed by calculating the differences between these means. The latter thus represents the average difference in modulation of expression by GTX versus NGTX carcinogens. Secondly, t-tests were performed per gene of the expression differences for the treatments with GTX carcinogens versus that for the NGTX carcinogens of the training set (two-tailed, two-sample with unequal variance). Selection of the genes for the class discrimination consisted of two steps. First by selecting the 60 genes (the top 10%) with the largest difference between the mean expression difference of GTX and the mean expression difference of NGTX carcinogens, followed by selecting the 20 genes with the lowest P-value in the t-tests among these 60 genes.

Pearson correlation analysis with the 20 selected genes was used for the classification (applying SSPS version 10.1 for Windows), similar to approaches used by others (12,17). Each treatment, both from the training set as from the validation set, was correlated with the mean expression differences of the GTX carcinogens from the training set. A carcinogen was classified as GTX if r > 0.4, otherwise it was considered to be not similar to GTX (N-GTX). Similarly, each treatment was correlated with the mean expression differences of the NGTX carcinogens from the training set. A carcinogen was then classified as NGTX if r > 0.4, otherwise it was considered to be not similar to NGTX (N-NGTX). This limit for r was chosen as it was on the border of significance (none of the correlations with r < 0.4 had a P-value < 0.05).

GeneSight 4.0 (Biodiscovery, USA) was also used for principal component analyses and clustering analyses, with average cluster linkage and Pearson correlation for distance metric, and generating the presented figures.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Treatment effects
In order to generate a database on modulation of expression patters by chemical carcinogens for predicting their mode of action, namely GTX or NGTX, 20 carcinogens were included, roughly equally divided between both classes. As the interest was in carcinogen-specific effects rather than cytotoxic-specific effects, no overt toxic doses were investigated. This is confirmed by cytotoxicity analyses, as done by the MTT method, which are presented in Figure 1. The selected dose levels for gene expression studies are shown in Table I. In total, 22 different treatments were conducted, divided into a training set of 16 treatments (for developing the class discrimination models) and a validation set of six treatments (for independent testing of the class discrimination models).



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 1. Survival plots of HepG2 cells exposed to the test compounds for 24 h expressed as % of that for the negative (vehicle) controls as determined by the MTT method. Data for the GTX and NGTX compounds are in (A) and (B). Some compounds have been tested in two assays.

 
For each treatment, two microarray hybridizations were performed and the numbers of responding genes were determined by several methods (see Table II). Large differences in the numbers of responding genes are revealed between the various carcinogens and also between the different methods. In general, however, the carcinogens with the largest effect according to one method also returned the highest numbers among the others. Also noteworthy, is that some carcinogens only have modest effects on gene expressions, among which are all the methylating agents. This should be kept in mind, as it may hamper the classifications.

Class discrimination
Selection of the genes for the class discrimination consisted of two steps. First by selecting 60 genes (the top 10%) with the largest difference between the mean expression difference for GTX and the mean expression difference for NGTX carcinogens in the training set, followed by selecting the 20 genes with the lowest P-value in the t-tests among these 60 genes. Many other selection criteria were assayed, either based on expression differences, t-tests (including repeated tests with exclusion of a treatment in each test) and combinations thereof. Eventually the approach was chosen that in general gave the best classification.

Pearson correlation analysis on data of the 20 selected genes was used for classifying the carcinogens as GTX or NGTX. The correctness for discrimination of the carcinogens from the training set was clearly better than that for the validation set, namely 81 (13 of the 16 treatments) and 33% (2/6), respectively, for the combined analyses on GTX and NGTX data (see Table III, last column). This is not unexpected, as the model was built on the data from the training set. When observing the correlation with GTX and NGTX data separately, the NGTX data resulted in the best discrimination [73 (16/22) versus 86% (19/22) correctness].


View this table:
[in this window]
[in a new window]
 
Table III. Class discrimination for the carcinogens using selected genes and based on Pearson correlation analyses for all chemical treatments and all data points

 
As mentioned above, several compounds induced only modest effects on gene expressions, which might obscure the classifications. Therefore, the classification was also conducted with exclusion of the treatments with ≤6 differentially expressed genes according to confidence analyses (see Table II), which resulted in the exclusion of seven treatments, with 15 remaining (12 in the training set, and three in the validation set). The class discriminations for these are given in Table IV. For this reduced data set, the discrimination for the training and validation sets were 92 (11/12) and 100% (3/3) correct, respectively, the discrimination based on the GTX data was 100% (15/15) correct, and based on the NGTX data or both combined was for 93% (14/15) correct. Thus, exclusion of treatments with limited gross effects on gene expression, indeed improves class discrimination.


View this table:
[in this window]
[in a new window]
 
Table IV. Class discrimination for the carcinogens using selected genes and based on Pearson correlation analyses for only chemical treatments with more than six responding genes and including all data points

 
However, as even limited effects on gene expression profiles might already contain the information that is required for discriminating GTX and NGTX carcinogens, a third approach was performed. In this case, data points (mean expression differences per gene and per treatment) were filtered by excluding those with small expression modulations. Because of this exclusion, insufficient data were left for one treatment, namely with DMN. This is in line with the fact that this treatment hardly affected the expression profiles (Table II). This procedure also improved class discrimination (Table V), leading to 94 (15/16) and 80% (4/5) correct discrimination for the training and validation sets, respectively, 95% (20/21) correct discrimination using the GTX data, and 90% (19/21) correctness for the NGTX data and both combined. An expression matrix with hierarchical clustering of both genes and treatments is presented in Figure 2, together with the mean expression values for the GTX and NGTX treatments of the training set, their absolute difference, and the significance for this difference. For some genes the expression is in general induced by NGTX carcinogens and reduced by GTX carcinogens, like AHR and SERPINB2, or vice versa, such as BAX and CALB. Others are only down-regulated, e.g. PCNA and TYMS by NGTX, or up-regulated, such as MT2A and ACTG1 by NGTX. In general the effects by NGTX carcinogens appear more profound. Clustering analysis of the treatments reveals two main groups, which almost perfectly separates GTX from NGTX compounds. The two compounds that were placed in the wrong clusters are the same as those that were incorrectly classified, namely Reserpine and tetrachloroethylene (TCE).


View this table:
[in this window]
[in a new window]
 
Table V. Class discrimination for the carcinogens using selected genes and based on Pearson correlation analyses for all chemical treatments with exclusion of data points that are below a threshold

 


View larger version (69K):
[in this window]
[in a new window]
 
Fig. 2. Clustering analyses of the selected genes used for class discrimination for all carcinogen treatments with exclusion of the data points that are below a threshold (see Table V). Expression data are given as expression differences, meaning log2(treated culture) – log2(control culture). Mean expression differences per gene are presented for the treatments of the training set with GTX carcinogens (mean GTX) and for the treatments with NGTX carcinogens (mean NGTX), together with the absolute differences between these means (difference GTX-NGTX), and the P-values for t-tests per gene of the expression differences of the training set for the treatments with GTX carcinogens versus that for the NGTX carcinogens (t-test). The colours for the experimental conditions are in red the GTX carcinogens and in blue the NGTX carcinogens from the training set, and in green and yellow those from the validation set.

 
Principal component analyses to visualize the discrimination of gene expression profiles are shown in Figure 3. When using all genes, all data and all treatments, the GTX and NGTX carcinogens are completely mixed (Figure 3A). Using the 20 genes, the treatments and the data points applied for the three different classifications models, the GTX and NGTX carcinogens are mostly well separated in the principal component plots (Figure 3B, C and D, respectively). Strikingly, always one NGTX compound (a blue spot) is in between the GXT compounds, which is TCE. This agrees with the three classification models, as they classified this agent as GTX.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 3. Principal component plots of GTX and NGTX carcinogens for: (A) all genes, all carcinogens and all data points; (B) 20 selected genes for all carcinogens and all data points; (C) 20 selected genes for only the chemical treatments with more than six responding genes and all data points; (D) 20 selected genes for all carcinogens with exclusion of data points that are below a threshold. In red are the GTX carcinogens and in blue the NGTX carcinogens from the training set, and in green and yellow those from the validation set.

 

    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
This study demonstrates that expression profiling for 20 selected genes in cells following exposure to chemical carcinogens can discriminate between various classes of carcinogenic agents with different modes of action, namely genotoxins and non-genotoxins. Using the complete set of data, the discrimination is already correct in the majority of cases for the training set, but clearly less perfect for the additional treatments of the validation set. Next, the study shows that the discrimination model can be drastically improved by exclusion of weak data that possibly only increase noise and thus hide the available information. Both omitting the treatments that seem to marginally affect gene expression patterns or excluding the expression signals that were hardly altered, improved correct class discrimination from ~70 to 90% or more for all the treatments. Especially for the validation set, this improvement is substantial (from 33 to 80% or more).

To our knowledge predicting the class of a chemical toxicant based on modulation of gene expression patterns in cultured cells, has so far not been shown before. Only in the case of animal studies on liver toxicants, it has been shown that expression profiling on coded liver RNA samples led to successful prediction by Pearson correlation analysis of whether these samples were derived from livers of rats exposed to enzyme inducers or to peroxisome proliferators (17). Recently, Newton et al. referred to unpublished data, which show that gene expression fingerprints can discriminate between direct and indirect acting genotoxins (33). Clustering of compounds with similar toxic mechanisms by gene expression profiles, was demonstrated in in vitro studies with HepG2 cells or primary rat hepatocytes, but in these cases prediction was not validated with coded samples as done in the current study (12,13). In clinical studies, such as on cancer diagnostics and therapy, many different methods for classification of disease state or prediction of therapy efficacy have been used (3842). Besides the straightforward approach used by us and others, more sophisticated prediction methods exist, such as k-nearest neighbour analysis, neural networks, support vector machines, decision tree classifiers, but to date none is clearly preferred (43).

The genes selected for the class discrimination model for all carcinogen treatments with exclusion of data points that are below a threshold (Table V) are presented in Table VI. Eleven of these were also selected for one or both of the other two discrimination models, and six were used in all models. Annotations by Gene Ontology (http://www.geneontology.org/) show that many of the selected genes are involved in apoptosis (AHR, BAX, CASP8 and SERPINB2) or cell cycle control (AHR, CDKN1A and PCNA). Indeed it is known that DNA damage induced by GTX carcinogens leads to stimulation of apoptosis and suppression of the cell cycle (20,23). The reverse, suppression of apoptosis and stimulation cell division, are considered as some of the many modes of actions for NGTX carcinogens (19,24). To our surprise, none of the modulated genes are involved in the repair of DNA damage, except PCNA. Beforehand we hypothesized that DNA repair genes were probable candidates to discriminate between GTX and NGTX compounds, but classification based on modulation of the expression for these genes alone proved to be poor (data not shown). This agrees with their absence in the selections. Several DNA damage response genes that are regulated by the tumour suppressor gene p53, namely BAX and CDKN1A, were found induced in all three discrimination models. This suggests that DNA damage response is induced by a majority of the GTX compounds. Also, in a recent study on effects by four GTX compounds in rat liver, no effects were observed on DNA repair genes but only on the p53 target DNA damage response genes (44). The only exception seems to be O6-methylguanine-DNA methyltransferase (MGMT), which is up-regulated in the rat model but not in our HepG2 cell line.


View this table:
[in this window]
[in a new window]
 
Table VI. Presentation of the selected genes used for class discrimination for all carcinogen treatments with exclusion of data points that are below a threshold (see Table V)

 
In all three classification models, TCE was predicted to be GTX, which is in disagreement with the evaluations by NTP and IARC (see Table I). Also, the hierarchical clustering and principal component analyses grouped TCE amid the genotoxins (Figures 2 and 3). Metabolism of TCE occurs by Cytochrome P450-dependent oxidation and glutathione (GSH) conjugation in the liver (45). The GSH conjugates are processed via the cysteine conjugate ß-lyase pathway in the kidneys to reactive mutagenic species (46,47). Whether this also can occur in human HepG2 cells and thereby cause DNA damage is unknown, but if so it may explain our findings. TCE, however, does not activate the p53 related DNA damage response that is induced by BaP, DBA, carboPt and MMC (data not shown).

All methylating agents in this study caused marginal alterations in gene expression profiles compared with the PAH and cross-linking carcinogens (Table I), despite that some were also given at the threshold for cytotoxicity (namely for MMS and MNU). An explanation for this might be that the DNA damages caused by the methylating agents are much smaller and hardly distort the DNA-helix, and thereby trigger less genetic pathways. Indeed, the p53 target genes for DNA damage response BAX, CDKN1A, GADD45 and PCNA are all up-regulated by BaP, DBA, carboPt and MMC, but not by any methylating agent (data not shown). Literature data indicate that the p53 pathway is well activated by PAH and cross-linking damages, but also by small lesions like methyl-groups (21,4850). Another explanation can be that at 24 h after starting the treatment, most methyl-DNA damages are repaired and thus the levels are too low to alter gene expression patterns. Although no studies about repair kinetics of the many methyl-DNA-damages in HepG2 cells are known, this explanation seems not likely as in rat liver the half-lives are in general >1 day (51).

Our study proves the principle that gene expression profiling can discriminate carcinogens with major differences in their mode of actions, namely genotoxins versus non-genotoxins. Although in regulatory toxicology this may not be of key importance as GTX properties of chemical compounds are routinely investigated in a battery of tests, it can help to understand unexpected carcinogenicity in rodent bio-assays. Nevertheless, it emphasizes again the power of new emerging technologies like toxicogenomics for mechanism-based hazard identification and risk assessment, and sheds light on the possible future applications. Adaptation of the current assay, may lead to the development of a new test for screening chemicals on their GTX properties including to discriminate between important classes like inducing small adducts (e.g. methyl or oxidative damage), bulky adducts (such as PAH or heterocyclic aromatic amines) and cross-linkers (e.g. mycotoxins like MMC). Especially if a subset of genes can be defined of which the combined modulation of expression acts as a robust biomarker, this will open the door for implementation in regulatory toxicology. Before we reach that point not only many more chemicals, including non-carcinogens, need to be examined, but also information must be gathered on time-effect and dose-effect relationships, reproducibility and optimal cell system (e.g. cell lines or primary cells such as hepatocytes).


    Acknowledgments
 
We are grateful to our colleagues of the Genome Centre Maastricht, i.e. B.Vlietinck, B.Smeets, E.Timmer and R.Jansen, for assisting us in starting up the microarray technologies, to G.Schoeters from the Flemish Institute for Technological Research, Belgium, providing TCDD, and to T.Burzykowski from the Centre for Statistics of the Limburgs Universitair Centrum, Belgium, for his critical comments regarding the paper. The Nutrition and Toxicology Research Institute Maastricht (NUTRIM) supported parts of this study.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 

  1. Lovett,R.A. (2000) Toxicogenomics—toxicologists brace for genomics revolution. Science, 289, 536–537.[Free Full Text]
  2. Tennant,R.W. (2002) The National Center for Toxicogenomics: using new technologies to inform mechanistic toxicology. Environ. Health Perspect., 110, A8–10.[ISI][Medline]
  3. Medlin,J.F. (1999) Timely toxicology. Environ. Health Perspect., 107, A256–A258.[ISI][Medline]
  4. Waring,J.F. and Halbert,D.N. (2002) The promise of toxicogenomics. Curr Opin. Mol. Ther., 4, 229–235.[ISI][Medline]
  5. Brown,P.O. and Botstein,D. (1999) Exploring the new world of the genome with DNA microarrays. Nature Genet., 21, 33–37.[CrossRef][ISI][Medline]
  6. Lipshutz,R.J., Fodor,S.P., Gingeras,T.R. and Lockhart,D.J. (1999) High density synthetic oligonucleotide arrays. Nature Genet., 21, 20–24.[CrossRef][ISI][Medline]
  7. Afshari,C.A., Nuwaysir,E.F. and Barrett,J.C. (1999) Application of complementary DNA microarray technology to carcinogen identification, toxicology and drug safety evaluation. Cancer Res., 59, 4759–4760.[Abstract/Free Full Text]
  8. Hamadeh,H.K., Amin,R.P., Paules,R.S. and Afshari,C.A. (2002) An overview of toxicogenomics. Curr. Issues Mol. Biol., 4, 45–56.[Medline]
  9. Pennie,W.D., Woodyatt,N.J., Aldridge,T.C. and Orphanides,G. (2001) Application of genomics to the definition of the molecular basis for toxicity. Toxicol. Lett., 120, 353–358.[CrossRef][ISI][Medline]
  10. Olden,K. and Guthrie,J. (2001) Genomics: implications for toxicology. Mutat. Res., 473, 3–10.[ISI][Medline]
  11. Scherf,U., Ross,D.T., Waltham,M. et al. (2000) A gene expression database for the molecular pharmacology of cancer. Nature Genet., 24, 236–244.[CrossRef][ISI][Medline]
  12. Burczynski,M.E., McMillian,M., Ciervo,J., Li,L., Parker,J.B., Dunn,R.T.,2nd, Hicken,S., Farr,S. and Johnson,M.D. (2000) Toxicogenomics-based discrimination of toxic mechanism in HepG2 human hepatoma cells. Toxicol. Sci., 58, 399–415.[Abstract/Free Full Text]
  13. Waring,J.F., Ciurlionis,R., Jolly,R.A., Heindel,M. and Ulrich,R.G. (2001) Microarray analysis of hepatotoxins in vitro reveals a correlation between gene expression profiles and mechanisms of toxicity. Toxicol. Lett., 120, 359–368.[CrossRef][ISI][Medline]
  14. Waring,J.F., Jolly,R.A., Ciurlionis,R., Lum,P.Y., Praestgaard,J.T., Morfitt,D.C., Buratto,B., Roberts,C., Schadt,E. and Ulrich,R.G. (2001) Clustering of hepatotoxins based on mechanism of toxicity using gene expression profiles. Toxicol. Appl. Pharmacol., 175, 28–42.[CrossRef][ISI][Medline]
  15. Waring,J.F., Cavet,G., Jolly,R.A. et al. (2003) Development of a DNA microarray for toxicology based on hepatotoxin-regulated sequences. EHP Toxicogenom., 111, 53–60.
  16. Bulera,S.J., Eddy,S.M., Ferguson,E., Jatkoe,T.A., Reindel,J.F., Bleavins,M.R. and De La Iglesia,F.A. (2001) RNA expression in the early characterization of hepatotoxicants in Wistar rats by high-density DNA microarrays. Hepatology, 33, 1239–1258.[CrossRef][ISI][Medline]
  17. Hamadeh,H.K., Bushel,P.R., Jayadev,S. et al. (2002) Prediction of compound signature using high density gene expression profiling. Toxicol. Sci., 67, 232–240.[Abstract/Free Full Text]
  18. Ashby,J. (1992) Use of Short-Term Tests in Determining the Genotoxicity or Nongenotoxicity of Chemicals. IARC Scientific Publications, IARC, Lyon, pp. 135–164.
  19. Silva Lima,B. and Van der Laan,J.W. (2000) Mechanisms of nongenotoxic carcinogenesis and assessment of the human hazard. Regul. Toxicol. Pharmacol., 32, 135–143.[CrossRef][ISI][Medline]
  20. Levine,A.J. (1997) p53, the cellular gatekeeper for growth and division. Cell, 88, 323–331.[ISI][Medline]
  21. Lakin,N.D. and Jackson,S.P. (1999) Regulation of p53 in response to DNA damage. Oncogene, 18, 7644–7655.[CrossRef][ISI][Medline]
  22. Adimoolam,S. and Ford,J.M. (2003) p53 and regulation of DNA damage recognition during nucleotide excision repair. DNA Repair, 2, 947–954.[CrossRef][ISI][Medline]
  23. Wahl,G.M. and Carr,A.M. (2001) The evolution of diverse biological responses to DNA damage: insights from yeast and p53. Nature Cell Biol., 3, E277–286.[CrossRef][ISI][Medline]
  24. Nguyen Ba,G. and Vasseur,P. (1999) Epigenetic events during the process of cell transformation induced by carcinogens (review). Oncol. Rep., 6, 925–932.[ISI][Medline]
  25. Williams,G.M., Iatropoulos,M.J. and Weisburger,J.H. (1996) Chemical carcinogen mechanisms of action and implications for testing methodology. Exp. Toxicol. Pathol., 48, 101–111.[ISI][Medline]
  26. Butterworth,B.E. and Bogdanffy,M.S. (1999) A comprehensive approach for integration of toxicity and cancer risk assessments. Regul. Toxicol. Pharmacol., 29, 23–36.[CrossRef][ISI][Medline]
  27. Knasmuller,S., Parzefall,W., Sanyal,R. et al. (1998) Use of metabolically competent human hepatoma cells for the detection of mutagens and antimutagens. Mutat. Res., 402, 185–202.[ISI][Medline]
  28. Wilkening,S., Stahl,F. and Bader,A. (2003) Comparison of primary human hepatocytes and hepatoma cell line Hepg2 with regard to their biotransformation properties. Drug Metab. Dispos., 31, 1035–1042.[Abstract/Free Full Text]
  29. Puga,A., Maier,A. and Medvedovic,M. (2000) The transcriptional signature of dioxin in human hepatoma HepG2 cells. Biochem. Pharmacol., 60, 1129–1142.[CrossRef][ISI][Medline]
  30. Gore,M.A., Morshedi,M.M. and Reidhaar-Olson,J.F. (2000) Gene expression changes associated with cytotoxicity identified using cDNA arrays. Funct. Integr. Genom., 1, 114–126.[CrossRef]
  31. Hsu,I.C., Tokiwa,T., Bennett,W., Metcalf,R.A., Welsh,J.A., Sun,T. and Harris,C.C. (1993) p53 gene mutation and integrated hepatitis B viral DNA sequences in human liver cancer cell lines. Carcinogenesis, 14, 987–992.[Abstract]
  32. Mosmann,T. (1983) Rapid colorimetric assay for cellular growth and survival: application to proliferation and cytotoxicity assays. J. Immunol. Methods, 65, 55–63.[CrossRef][ISI][Medline]
  33. Newton,R.K., Aardema,M.J. and Aubrecht,J. (2004) The utility of DNA microarrays for characterizing genotoxicity. EHP Toxicogenomics, in press.
  34. Todd,M.D., Lee,M.J., Williams,J.L., Nalezny,J.M., Gee,P., Benjamin,M.B. and Farr,S.B. (1995) The CAT-Tox (L) assay: a sensitive and specific measure of stress-induced transcription in transformed human liver cells. Fundam. Appl. Toxicol., 28, 118–128.[CrossRef][ISI][Medline]
  35. Eickelmann,P., Morel,F., Schulz,W.A. and Sies,H. (1995) Turnover of glutathione S-transferase alpha mRNAs is accelerated by 12-O-tetradecanoyl phorbol-13-acetate in human hepatoma and colon carcinoma cell lines. Eur. J. Biochem., 229, 21–26.[Abstract]
  36. Hegde,P., Qi,R., Abernathy,K., Gay,C., Dharap,S., Gaspard,R., Hughes,J.E., Snesrud,E., Lee,N. and Quackenbush,J. (2000) A concise guide to cDNA microarray analysis. Biotechniques, 29, 548–562.[ISI][Medline]
  37. Bushel,P.R., Hamadeh,H.K., Bennett,L., Green,J., Ableson,A., Misener,S., Afshari,C.A. and Paules,R.S. (2002) Computational selection of distinct class- and subclass-specific gene expression signatures. J. Biomed. Inform., 35, 160–170.[CrossRef][ISI][Medline]
  38. Golub,T.R., Slonim,D.K., Tamayo,P. et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.[Abstract/Free Full Text]
  39. Ross,D.T., Scherf,U., Eisen,M.B. et al. (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nature Genet., 24, 227–235.[CrossRef][ISI][Medline]
  40. Alizadeh,A.A., Eisen,M.B., Davis,R.E. et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503–511.[CrossRef][ISI][Medline]
  41. van't Veer,L.J., Dai,H., van de Vijver,M.J. et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536.[CrossRef][ISI][Medline]
  42. Chung,C.H., Bernard,P.S. and Perou,C.M. (2002) Molecular portraits and the family tree of cancer. Nature Genet., 32 (suppl.), 533–540.[CrossRef][ISI][Medline]
  43. Slonim,D.K. (2002) From patterns to pathways: gene expression data analysis comes of age. Nature Genet., 32 (suppl.), 502–508.[CrossRef][ISI][Medline]
  44. Ellinger-Ziegelbauer,H., Stuart,B., Wahle,B., Bomann,W. and Ahr,H.J. (2004) Characteristic expression profiles induced by genotoxic carcinogens in rat liver. Toxicol. Sci., 77, 19–34.[Abstract/Free Full Text]
  45. Lash,L.H. and Parker,J.C. (2001) Hepatic and renal toxicities associated with perchloroethylene. Pharmacol. Rev., 53, 177–208.[Abstract/Free Full Text]
  46. Anders,M.W. and Dekant,W. (1998) Glutathione-dependent bioactivation of haloalkenes. Annu. Rev. Pharmacol. Toxicol., 38, 501–537.[CrossRef][ISI][Medline]
  47. Dreessen,B., Westphal,G., Bunger,J., Hallier,E. and Muller,M. (2003) Mutagenicity of the glutathione and cysteine S-conjugates of the haloalkenes 1,1,2-trichloro-3,3,3-trifluoro-1-propene and trichlorofluoroethene in the Ames test in comparison with the tetrachloroethene-analogues. Mutat. Res., 539, 157–166.[ISI][Medline]
  48. Binkova,B., Giguere,Y., Rossner,P.,Jr, Dostal,M. and Sram,R.J. (2000) The effect of dibenzo[a,1]pyrene and benzo[a]pyrene on human diploid lung fibroblasts: the induction of DNA adducts, expression of p53 and p21 (WAF1) proteins and cell cycle distribution. Mutat. Res., 471, 57–70.[ISI][Medline]
  49. Jordan,P. and Carmo Fonseca,M. (2000) Molecular mechanisms involved in cisplatin cytotoxicity. Cell Mol. Life Sci., 57, 1229–1235.[ISI][Medline]
  50. Ellinger-Ziegelbauer,H., Stuart,B., Wahle,B., Bomann,W. and Ahr,H.J. (2004) Characteristic expression profiles induced by genotoxic carcinogens in rat liver. Toxicol. Sci., 77, 19–34.[Abstract/Free Full Text]
  51. Den Engelse,L., Menkveld,G.J., De Brij,R.J. and Tates,A.D. (1986) Formation and stability of alkylated pyrimidines and purines (including imidazole ring-opened 7-alkylguanine) and alkylphosphotriesters in liver DNA of adult rats treated with ethylnitrosourea or dimethylnitrosamine. Carcinogenesis, 7, 393–403[Abstract]
Received October 30, 2003; revised January 28, 2004; accepted February 1, 2004.